Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240824となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# シンタクス誘導による分子の手続き的合成 Syntax-Guided Procedural Synthesis of Molecules ( http://arxiv.org/abs/2409.05873v1 ) ライセンス: Link先を確認	Michael Sun, Alston Lo, Wenhao Gao, Minghao Guo, Veronika Thost, Jie Chen, Connor Coley, Wojciech Matusik,	(参考訳) 合成可能な分子を設計し、合成不可能な分子に類似することを推奨することは、分子発見を加速させる重要な問題である。プログラム合成のアイデアを用いて,両問題を再認識する。シンタクティックスケルトンを合成木の意味論から切り離し、合成経路の組合せ空間を推論するための二段階の枠組みを構築する。アナログを創り出す分子が与えられたら、マルコフ・チェイン・モンテカルロシミュレーションを通じて、シンタクティック骨格の空間上の骨格特性を反復的に洗練する。ブラックボックスのオラクルが最適化されると、私たちは統語的テンプレートと分子記述子の上に共同設計空間を定式化し、統語的次元と意味論的次元の両方を相乗的に最適化する進化的アルゴリズムを導入します。我々の重要な洞察は、構文的スケルトンが設定されると、構文的テンプレートによって課される固定地平面マルコフ決定プロセスを完全に活用するトレーニングポリシーにより、プログラムの意味論を導出する検索複雑さを記憶することができるということである。合成可能なアナログ生成および合成可能な分子設計のための両レベルフレームワークの性能上の利点を示す。特に,本手法は, ユーザに対して, 合成に必要なリソースを明示的に制御し, 設計空間をよりシンプルなソリューションに偏り, 自律的な合成プラットフォームに特に有望である。 Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that once the syntactic skeleton is set, we can amortize over the search complexity of deriving the program's semantics by training policies to fully utilize the fixed horizon Markov Decision Process imposed by the syntactic template. We demonstrate performance advantages of our bilevel framework for synthesizable analog generation and synthesizable molecule design. Notably, our approach offers the user explicit control over the resources required to perform synthesis and biases the design space towards simpler solutions, making it particularly promising for autonomous synthesis platforms.	翻訳日:2024-09-15 05:31:27 公開日:2024-08-24
# HSR-KAN:Kolmogorov-Arnold Networksによる高効率ハイパースペクトル画像超解像 HSR-KAN: Efficient Hyperspectral Image Super-Resolution via Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2409.06705v1 ) ライセンス: Link先を確認	Baisong Li, Xingwang Wang, Haixiao Xu,	(参考訳) ハイパースペクトル画像(HSI)は、豊富なスペクトル情報のために様々な視覚的タスクにおいて大きな可能性を秘めている。しかし、物理画像の限界のため、高分解能ハイパースペクトル画像の取得は依然として困難である。 Kolmogorov-Arnold Networks (KANs) に触発され,低分解能HSI (LR-HSI) と高分解能マルチスペクトル像 (HR-MSI) を融合し高分解能HSI (HR-HSI) を得る効率的なHSI超解像 (HSI-SR) モデルを提案する。 HR-MSIからの空間情報の効果的な統合を実現するため,KAN-Fusionと呼ばれるKANをベースとした融合モジュールを設計する。チャネルアテンション機構にさらにインスパイアされた我々は、後核機能抽出のためのKANチャネルアテンションブロック(KAN-CAB)と呼ばれるスペクトルチャネルアテンションモジュールを設計する。 kansと統合されたチャネルアテンションモジュールとして、kan-CABはディープネットワークのきめ細かい調整能力を高め、スペクトルシーケンスや空間テクスチャの詳細を正確にシミュレートするだけでなく、次元の曲線(COD)を効果的に回避する。 HSR-KANは, 現状技術(SOTA)法と比較して, 定性評価と定量的評価の両面で, 最高の性能を達成している。私たちのコードは、https://github.com/Baisonm-Li/HSR-KAN.comで利用可能です。 Hyperspectral images (HSIs) have great potential in various visual tasks due to their rich spectral information. However, obtaining high-resolution hyperspectral images remains challenging due to limitations of physical imaging. Inspired by Kolmogorov-Arnold Networks (KANs), we propose an efficient HSI super-resolution (HSI-SR) model to fuse a low-resolution HSI (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution HSI (HR-HSI). To achieve the effective integration of spatial information from HR-MSI, we design a fusion module based on KANs, called KAN-Fusion. Further inspired by the channel attention mechanism, we design a spectral channel attention module called KAN Channel Attention Block (KAN-CAB) for post-fusion feature extraction. As a channel attention module integrated with KANs, KAN-CAB not only enhances the fine-grained adjustment ability of deep networks, enabling networks to accurately simulate details of spectral sequences and spatial textures, but also effectively avoid Curse of Dimensionality (COD). Extensive experiments show that, compared to current state-of-the-art (SOTA) HSI-SR methods, proposed HSR-KAN achieves the best performance in terms of both qualitative and quantitative assessments. Our code is available at: https://github.com/Baisonm-Li/HSR-KAN.	翻訳日:2024-09-15 05:21:30 公開日:2024-08-24
# パラメータ効率の良い微調整における長期的影響の解明 Discovering Long-Term Effects on Parameter Efficient Fine-tuning ( http://arxiv.org/abs/2409.06706v1 ) ライセンス: Link先を確認	Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang,	(参考訳) 事前訓練されたニューラルネットワーク(ANN)は、堅牢なパターン認識能力を示し、人間の脳、特にバイオニューラルネットワーク(BNN)と広範囲に類似している。我々はこれらのモデルが微調整によって新しい知識を得る能力に特に興味をそそられる。この点において,パラメータ効率のよいファインチューニング(PEFT)は,適応時のトレーニング可能なパラメータの数を制限することにより,トレーニングコストの削減と過適合リスクの軽減により,フルファインチューニングの代替として広く採用されている。 ANNの重みはBNNのシナプスを表し、ANNの機能(潜伏変数またはロジットとも呼ばれる)はBNNのニューロンによって放出される神経伝達物質を表す。主流PEFT法は、限られた数のトレーニング可能なパラメータ(通常は全パラメータの1%未満)で特徴値やパラメータ値を調整することを目的としているが、驚くほど良い結果が得られる。この手がかりに基づいて,特徴量調整とパラメータ調整の関連性を探究し,特徴量行列のスケーリングを学習し,後部重量行列に対するそれらの効果を伝播する手法であるSynapses & Neurons (SAN)を提案する。我々のアプローチは、よく知られた神経科学現象であるLTP(Long-term Potentiation)とLTD(Long-term Depression)から強いインスピレーションを受け、シナプス発生と神経伝達物質放出レベルとの関係を明らかにする。我々は、注意に基づくネットワークと畳み込みに基づくネットワークを用いて26のデータセットに対してPEFTを広範囲に比較し、他のチューニング手法(+8.5%、+7%、Visual Prompt Tuning、+3.2%)と比較して大幅に改善した。コードはリリースされます。 Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern recognition capabilities and share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs). We are particularly intrigued by these models' ability to acquire new knowledge through fine-tuning. In this regard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption as a substitute for full fine-tuning due to its cost reduction in training and mitigation of over-fitting risks by limiting the number of trainable parameters during adaptation. Since both ANNs and BNNs propagate information layer-by-layer, a common analogy can be drawn: weights in ANNs represent synapses in BNNs, while features (also known as latent variables or logits) in ANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT methods aim to adjust feature or parameter values using only a limited number of trainable parameters (usually less than 1% of the total parameters), yet achieve surprisingly good results. Building upon this clue, we delve deeper into exploring the connections between feature adjustment and parameter adjustment, resulting in our proposed method Synapses & Neurons (SAN) that learns scaling matrices for features and propagates their effects towards posterior weight matrices. Our approach draws strong inspiration from well-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term Depression (LTD), which also reveal the relationship between synapse development and neurotransmitter release levels. We conducted extensive comparisons of PEFT on 26 datasets using attention-based networks as well as convolution-based networks, leading to significant improvements compared to other tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning, and +3.2% over LoRA). The codes would be released.	翻訳日:2024-09-15 05:21:30 公開日:2024-08-24
# 安全運転における歩行者交叉予測のための総合現実知識の学習 Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving ( http://arxiv.org/abs/2409.06707v1 ) ライセンス: Link先を確認	Jie Bai, Jianwu Fang, Yisheng Lv, Chen Lv, Jianru Xue, Zhengguo Li,	(参考訳) 運転シーンにおける歩行者交叉予測(PCP)は、インテリジェントな車両の安全な運転を保証する上で重要な役割を担っている。典型的な状況下での歩行者の横断行動の観察が限られているため、近年では予測性能を高めるために柔軟な変動を伴う合成データの利用が始められ、ドメイン適応フレームワークが採用されている。しかし、異なるドメイン知識は異なるドメイン間分配ギャップを持ち、PCPタスクに適したドメイン知識適応方法を必要とする。本研究では,PCP(Gated-S2R-PCP)のためのGated Syn-to-Real Knowledge Transfer手法を提案する。 1)異なる種類のクロスドメイン知識に適したドメイン適応方法の設計、及び 2) 特定の状況に適切な知識を強制的な知識融合で伝達すること。具体的には, 視覚, 意味, 深度, 位置などの様々な情報に対する, スタイル伝達, 分布近似, 知識蒸留を含む3つのドメイン適応手法を含むフレームワークを設計する。学習可能なゲートユニット(LGU)は、横断歩道予測を促進するために適切なクロスドメイン知識を融合するために使用される。歩行者の位置,RGBフレーム,セマンティックイメージ,深度画像を含む3181フレーム(489,740フレーム)の合成ベンチマークS2R-PCP-3181を構築した。合成S2R-PCP-3181により、PIEとJAADの2つの真の挑戦的データセットに知識を伝達し、最先端の手法に優れたPCP性能を得る。 Pedestrian Crossing Prediction (PCP) in driving scenes plays a critical role in ensuring the safe operation of intelligent vehicles. Due to the limited observations of pedestrian crossing behaviors in typical situations, recent studies have begun to leverage synthetic data with flexible variation to boost prediction performance, employing domain adaptation frameworks. However, different domain knowledge has distinct cross-domain distribution gaps, which necessitates suitable domain knowledge adaption ways for PCP tasks. In this work, we propose a Gated Syn-to-Real Knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: 1) designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and 2) transferring suitable knowledge for specific situations with gated knowledge fusion. Specifically, we design a framework that contains three domain adaption methods including style transfer, distribution approximation, and knowledge distillation for various information, such as visual, semantic, depth, location, etc. A Learnable Gated Unit (LGU) is employed to fuse suitable cross-domain knowledge to boost pedestrian crossing prediction. We construct a new synthetic benchmark S2R-PCP-3181 with 3181 sequences (489,740 frames) which contains the pedestrian locations, RGB frames, semantic images, and depth images. With the synthetic S2R-PCP-3181, we transfer the knowledge to two real challenging datasets of PIE and JAAD, and superior PCP performance is obtained to the state-of-the-art methods.	翻訳日:2024-09-15 05:21:30 公開日:2024-08-24
# AIシステムにおける定量的バイアスの透過的監査による公正性の確保 Ensuring Fairness with Transparent Auditing of Quantitative Bias in AI Systems ( http://arxiv.org/abs/2409.06708v1 ) ライセンス: Link先を確認	Chih-Cheng Rex Yuan, Bow-Yaw Wang,	(参考訳) AIの急速な進歩により、AIを意思決定プロセスに統合する傾向が高まっている。しかし、AIシステムは意思決定者が不公平な結論を導くバイアスを示すかもしれない。特に、アメリカの司法制度で再犯を評価するために使用されるCompASシステムは、人種的多数派を好んでいることが判明した。 AIの公正性を評価するための様々な手段が提案されている。我々は、サードパーティの監査官やAIシステムプロバイダを含むAIフェアネスを監査するためのフレームワークを提案し、AIシステムの体系的な検査を容易にするツールを作成しました。このツールはオープンソースで公開されています。従来のAIシステムとは異なり、私たちは透明なホワイトボックスと統計ベースのアプローチを提唱します。これは、サードパーティの監査官、AI開発者、あるいは一般大衆がAIシステムの公正性基準を判断する際に利用することができる。 With the rapid advancement of AI, there is a growing trend to integrate AI into decision-making processes. However, AI systems may exhibit biases that lead decision-makers to draw unfair conclusions. Notably, the COMPAS system used in the American justice system to evaluate recidivism was found to favor racial majority groups; specifically, it violates a fairness standard called equalized odds. Various measures have been proposed to assess AI fairness. We present a framework for auditing AI fairness, involving third-party auditors and AI system providers, and we have created a tool to facilitate systematic examination of AI systems. The tool is open-sourced and publicly available. Unlike traditional AI systems, we advocate a transparent white-box and statistics-based approach. It can be utilized by third-party auditors, AI developers, or the general public for reference when judging the fairness criterion of AI systems.	翻訳日:2024-09-15 05:21:30 公開日:2024-08-24
# 機械翻訳における低リソース言語データ拡張のための生成逆ネットワーク Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation ( http://arxiv.org/abs/2409.00071v1 ) ライセンス: Link先を確認	Linda Zeng,	(参考訳) ニューラルネットワーク翻訳(NMT)システムは、トレーニングに使用するモデルのための大規模データコーパスが欠如している低リソース言語への翻訳に苦労する。手動データキュレーションは高価で時間を要するため,低リソース言語データの拡張にGAN(Generative-Adversarial Network)を活用することを提案する。シミュレーションされた低リソース環境で、非常に少量の言語データ(20,000文以下)をトレーニングする場合、我々のモデルは、データ拡張の可能性を示し、"料理中の健康な昼食を教えてくれ"や"祖父は以前よりも一生懸命働く"といった文でモノリンガル言語データを生成する。我々の新しいデータ拡張アプローチは、低リソースNMTにおけるGANの能力を調べるための第一歩であり、低リソースNMTへのGANの将来の拡張が期待できることを示す。 Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.	翻訳日:2024-09-08 15:21:17 公開日:2024-08-24
# LLMベースの手法は不公平なサービス条件を検出するのに十分か? Are LLM-based methods good enough for detecting unfair terms of service? ( http://arxiv.org/abs/2409.00077v1 ) ライセンス: Link先を確認	Mirgita Frasheri, Arian Bakhtiarnia, Lukas Esterle, Aleksandros Iosifidis,	(参考訳) 数え切れないほどのサービス規約(ToS)は、世界中のユーザーが毎日、あらゆる種類のアプリやWebサイトと対話しながら署名している。多くの場合、この2桁のページにまたがるオンライン契約は、単に希望のサービスに即座にアクセスしたいというユーザーによって盲目的に署名される。通常、法務チームとの相談を必要とするものは、ユーザーがデータプライバシーの観点から、無数のオンラインエンティティやパートナーに登録する、いくつかのクリックからなる日常的な活動になっている。大きな言語モデル(LLM)は、長いテキストベースのドキュメントのパースに長けており、ToSの疑わしい条項とその基盤となるプライバシーポリシーを扱う際に、ユーザを支援するために採用される可能性がある。このタスクのために既存のモデルの有用性を調べるために、まず、人気のあるウェブサイトからクロールされたプライバシーポリシーの集合に対して、個別に適用された12の質問からなるデータセットを構築した。その後、ChatGPTのような一連のオープンソースおよび商用チャットボットが各質問に対して質問され、回答は与えられた根拠の真実と比較される。これらの結果から,オープンソースモデルによっては,商用モデルと比較して精度が高いことが示唆された。しかし、最高のパフォーマンスは商用チャットボット(ChatGPT4)から記録される。全体として、全てのモデルは、このタスクにおいてランダムよりもわずかにパフォーマンスが良いだけである。そのため、この目的のために広く採用される前に、パフォーマンスを著しく改善する必要がある。 Countless terms of service (ToS) are being signed everyday by users all over the world while interacting with all kinds of apps and websites. More often than not, these online contracts spanning double-digit pages are signed blindly by users who simply want immediate access to the desired service. What would normally require a consultation with a legal team, has now become a mundane activity consisting of a few clicks where users potentially sign away their rights, for instance in terms of their data privacy, to countless online entities/companies. Large language models (LLMs) are good at parsing long text-based documents, and could potentially be adopted to help users when dealing with dubious clauses in ToS and their underlying privacy policies. To investigate the utility of existing models for this task, we first build a dataset consisting of 12 questions applied individually to a set of privacy policies crawled from popular websites. Thereafter, a series of open-source as well as commercial chatbots such as ChatGPT, are queried over each question, with the answers being compared to a given ground truth. Our results show that some open-source models are able to provide a higher accuracy compared to some commercial models. However, the best performance is recorded from a commercial chatbot (ChatGPT4). Overall, all models perform only slightly better than random at this task. Consequently, their performance needs to be significantly improved before they can be adopted at large for this purpose.	翻訳日:2024-09-08 15:21:17 公開日:2024-08-24
# SGP-RI: 還元次元入力によるスパースガウス過程に基づくリアルタイムトレーサブル・分散IoT室内局在モデル SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs ( http://arxiv.org/abs/2409.00078v1 ) ライセンス: Link先を確認	Zhe Tang, Sihao Li, Zichen Huang, Guandong Yang, Kyeong Soo Kim, Jeremy S. Smith,	(参考訳) IoT(Internet of Things, モノのインターネット)デバイスは、提出されたデバイスにデプロイされるが、それらのIoTデバイス上でのローカルコンピューティングには、膨大な量の未使用のポテンシャルがある。そのため、この可能性を屋内のローカライゼーションに当てはめることは、エキサイティングな研究分野となる。従来、屋内ローカライゼーションモデルのトレーニングと展開は、かなりの計算資源を持つ集中型サーバに基づいている。この集中型アプローチは、屋内電磁環境の動的で予測不可能な性質をデータベースが対応できないこと、モデル再トレーニングコスト、集中型サーバのセキュリティ侵害に対する感受性など、いくつかの課題に直面している。これらの課題を軽減するために,SGP-RI(Sparse Gaussian Process with Reduced-dimensional Inputs)に基づくリアルタイム学習型および分散型IoT屋内ローカライゼーションモデルを用いて,従来の屋内ローカライゼーション手法のオフラインおよびオンラインのフェーズを,基準点と無線アクセスポイントフィルタリングによってそれぞれ削減することを目的とした。マルチビルディングおよびマルチフロアの静的データベースおよびシングルビルディングおよびシングルフロアの動的データベースに基づく実験結果は、入力を誘導するトレーニングサンプルの半分未満のSGP-RIモデルが、トレーニングサンプル全体で標準ガウスプロセスモデルに匹敵するローカライゼーション性能を得ることができることを示す。 SGP-RIモデルは、屋内のローカライゼーションの分散化を可能にし、リソース制限されたIoTデバイスへのデプロイメントを容易にし、セキュリティとプライバシの向上、コスト削減、ネットワーク依存性を提供する。また、実時間トレーニングの能力により、時間変化のある屋内電磁環境に迅速に適応することができる。 Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. This centralized approach faces several challenges, including the database's inability to accommodate the dynamic and unpredictable nature of the indoor electromagnetic environment, the model retraining costs, and the susceptibility of centralized servers to security breaches. To mitigate these challenges we aim to amalgamate the offline and online phases of traditional indoor localization methods using a real-time-trainable and decentralized IoT indoor localization model based on Sparse Gaussian Process with Reduced-dimensional Inputs (SGP-RI), where the number and dimension of the input data are reduced through reference point and wireless access point filtering, respectively. The experimental results based on a multi-building and multi-floor static database as well as a single-building and single-floor dynamic database, demonstrate that the proposed SGP-RI model with less than half the training samples as inducing inputs can produce comparable localization performance to the standard Gaussian Process model with the whole training samples. The SGP-RI model enables the decentralization of indoor localization, facilitating its deployment to resource-constrained IoT devices, and thereby could provide enhanced security and privacy, reduced costs, and network dependency. Also, the model's capability of real-time training makes it possible to quickly adapt to the time-varying indoor electromagnetic environment.	翻訳日:2024-09-08 15:21:17 公開日:2024-08-24
# 異なる研究コミュニティを理解する:オーサリングネットワーク Examining Different Research Communities: Authorship Network ( http://arxiv.org/abs/2409.00081v1 ) ライセンス: Link先を確認	Shrabani Ghosh,	(参考訳) Google Scholarは、学術文献の分野にまたがる研究論文にアクセスするためのトップ検索エンジンの1つだ。 Googleの学者による事前検索オプションでは、フレーズ、出版社名、著者名、期間などに基づいて記事の抽出を行うことができる。本研究では,コンピュータ科学の2つの異なる研究領域であるデータマイニングとソフトウェア工学について,Google Scholarデータ(2000-2021)を収集した。研究者データベースリソースは、ネットワーク分析、データマイニング、著者ネットワークを介して著者間のリンクを特定するために強力である。各ドメインの共著者シップネットワークを調査し,そのネットワーク構造について検討した。出版物の動向を分析し、各分野の影響力のある著作者や関連団体を特定するための大規模な実験が実施されている。ネットワーク分析により、ネットワークの特徴は互いに異なることが示され、特定のドメインの影響力のある著者の中に小さなコミュニティが存在している。 Google Scholar is one of the top search engines to access research articles across multiple disciplines for scholarly literature. Google scholar advance search option gives the privilege to extract articles based on phrases, publishers name, authors name, time duration etc. In this work, we collected Google Scholar data (2000-2021) for two different research domains in computer science: Data Mining and Software Engineering. The scholar database resources are powerful for network analysis, data mining, and identify links between authors via authorship network. We examined coauthor-ship network for each domain and studied their network structure. Extensive experiments are performed to analyze publications trend and identifying influential authors and affiliated organizations for each domain. The network analysis shows that the networks features are distinct from one another and exhibit small communities within the influential authors of a particular domain.	翻訳日:2024-09-08 15:21:17 公開日:2024-08-24
# 複雑なプロセス・エンジニアリング・スキームのヒューマン・レベル理解に向けて--オープン・ドメイン質問応答のための教育的・イントロスペクティブ・マルチエージェント・フレームワーク Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering ( http://arxiv.org/abs/2409.00082v1 ) ライセンス: Link先を確認	Sagar Srinivas Sakhinana, Geethan Sannidhi, Venkataramana Runkana,	(参考訳) 化学・プロセス産業では、プロセス・フロー・ダイアグラム(PFD)とパイプ・アンド・インスツルメンテーション・ダイアグラム(P&ID)が設計、建設、保守に不可欠である。 GPT4(Omni)のようなLMM(Large Multimodal Models)のようなジェネレーティブAIの最近の進歩は、ビジュアル質問回答(VQA)のプロセス図の理解と解釈において有望であることを示している。しかし、プロプライエタリなモデルはデータプライバシのリスクを生じさせ、その計算複雑性は、消費者ハードウェアにおけるドメイン固有のカスタマイズのための知識編集を妨げる。これらの課題を克服するために、オープンドメイン質問応答(ODQA)タスクのための階層的・マルチエージェント検索拡張生成(RAG)フレームワークを用いて、セキュアでオンプレミスなエンタープライズソリューションを提案し、データプライバシ、説明可能性、費用対効果を提供する。我々の新しいマルチエージェントフレームワークは、PFDとP&ID分析のためのReAct(Reason+Act)プロンプト技術を用いたオープンソースの小型マルチモーダルモデルを用いて、イントロスペクティブで専門的なサブエージェントを採用し、複数の情報ソースを統合し、正確で文脈的に関係のある回答を提供する。反復的自己補正によって支援された我々のアプローチは,ODQAタスクにおいて優れたパフォーマンスを実現することを目的としている。厳密な実験を行い,提案手法の有効性を実証した。 In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance. Recent advancements in Generative AI, such as Large Multimodal Models (LMMs) like GPT4 (Omni), have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA). However, proprietary models pose data privacy risks, and their computational complexity prevents knowledge editing for domain-specific customization on consumer hardware. To overcome these challenges, we propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework for open-domain question answering (ODQA) tasks, offering enhanced data privacy, explainability, and cost-effectiveness. Our novel multi-agent framework employs introspective and specialized sub-agents using open-source, small-scale multimodal models with the ReAct (Reason+Act) prompting technique for PFD and P&ID analysis, integrating multiple information sources to provide accurate and contextually relevant answers. Our approach, supported by iterative self-correction, aims to deliver superior performance in ODQA tasks. We conducted rigorous experimental studies, and the empirical results validated the proposed approach effectiveness.	翻訳日:2024-09-08 15:21:17 公開日:2024-08-24
# 量子キャンディーを用いた量子テレポーテーション Quantum Teleportation using Quantum Candies ( http://arxiv.org/abs/2408.16016v1 ) ライセンス: Link先を確認	Nikhitha Nunavath, Sandeep Mishra, Anirban Pathak,	(参考訳) 量子キャンディー(Quantum Candies)またはカンディー(Qandies)は、キャンディーの言語における量子情報と量子科学の概念を理解するための、巧妙な方法を提供する。カンディーの批判的な考えは、量子科学を一般大衆に直感的に描写することであり、この領域の研究の大部分は納税者によって資金提供されているので理にかなっている。カンディーズモデルは既に量子科学と量子暗号の基本概念を説明するために使われている。しかし、テレポーテーションと関連する概念はまだ説明されていない。この事実に触発されて、我々はヤコブズとリン=モー=シャピラのアイデアを調査・拡張し、カンディーを用いたテレポーテーションを説明する。ここでは、テレポーテーションプロトコルを明示的に設計し、カンディーゲートを用いた回路モデルを実行する。このプロトコルは、相関したカンディーが適切に事前共有され、両方の端でいくつかのローカル操作を使用するときに成功する。私たちが開発しているモデルは、一般大衆が量子科学とテクノロジーに関する洞察を得るのを助けたいと願う、科学と工学の教育者にとって貴重なツールとなり得る。 Quantum Candies or Qandies provide us with a lucid way of understanding the concepts of quantum information and quantum science in the language of candies. The critical idea of qandies is intuitively depicting quantum science to the general public, making sense as most of the research in this domain is funded by the taxpayers. The qandies model is already used to explain the essential concepts of quantum science and quantum cryptography. However, teleportation and related concepts are yet to be explained. Motivated by this fact, we investigate and extend the idea of Jacobs and Lin-Mor-Shapira to explain teleportation using qandies. Here, we explicitly design the teleportation protocol and perform a circuit model using qandy gates. The protocol is successful when the correlated qandies are appropriately pre-shared and use of some local operations at both ends. The model we develop can be a valuable tool for science and engineering educators who want to help the general public to gain more insights into quantum science and technology.	翻訳日:2024-08-30 18:04:21 公開日:2024-08-24
# スマートグリッドにおける電力時系列データの差分公開 Differentially Private Publication of Electricity Time Series Data in Smart Grids ( http://arxiv.org/abs/2408.16017v1 ) ライセンス: Link先を確認	Sina Shaham, Gabriel Ghinita, Bhaskar Krishnamachari, Cyrus Shahabi,	(参考訳) スマートグリッドは、消費者の行動を研究し、エネルギー政策決定を導くための貴重なデータソースである。特に、地理的領域における電力消費の時系列は、高価な資源(例えば、トランスフォーマー、ストレージ要素)の最適配置とその活性化スケジュールを決定するのに不可欠である。しかし、そのようなデータの公開は、個人の習慣やライフスタイルに関する繊細な詳細を明らかにする可能性があるため、重要なプライバシー問題を引き起こす。差分プライバシー(DP)は、個々のデータの衛生化に適しているが、現在の時系列のDP技術は、データ読取の間に時間的相関が存在するため、実用性が著しく低下している。本稿では、時空間特性を分析し、RNNを利用してマイクロパターンとマクロパターンをキャプチャするDP準拠の電力消費データを公開するための新しい手法である {\em STPT(Spatio-Temporal Private Timeseries)を紹介する。また、特定パターンに基づいて電力消費時系列を解放する分割方式も採用している。実世界のデータセットと合成データセットの両方で広範な実験を行い、STPTは既存のベンチマークを著しく上回り、データユーティリティとユーザのプライバシのバランスのとれたトレードオフを提供します。 Smart grids are a valuable data source to study consumer behavior and guide energy policy decisions. In particular, time-series of power consumption over geographical areas are essential in deciding the optimal placement of expensive resources (e.g., transformers, storage elements) and their activation schedules. However, publication of such data raises significant privacy issues, as it may reveal sensitive details about personal habits and lifestyles. Differential privacy (DP) is well-suited for sanitization of individual data, but current DP techniques for time series lead to significant loss in utility, due to the existence of temporal correlation between data readings. We introduce {\em STPT (Spatio-Temporal Private Timeseries)}, a novel method for DP-compliant publication of electricity consumption data that analyzes spatio-temporal attributes and captures both micro and macro patterns by leveraging RNNs. Additionally, it employs a partitioning method for releasing electricity consumption time series based on identified patterns. We demonstrate through extensive experiments, on both real-world and synthetic datasets, that STPT significantly outperforms existing benchmarks, providing a well-balanced trade-off between data utility and user privacy.	翻訳日:2024-08-30 18:04:21 公開日:2024-08-24
# SpeechCraft: 自然言語記述によるきめ細かい表現型音声データセット SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description ( http://arxiv.org/abs/2408.13608v1 ) ライセンス: Link先を確認	Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu,	(参考訳) 発話スタイルに固有の微妙なニュアンス情報により,多モーダル学習は重要な課題となる。そのため,音声と自然言語の理解を深めるためには,音声スタイルの精巧な理解を提供する大規模データセットが緊急に必要である。しかし、そのようなデータセットの構築は、大規模なデータ収集と高品質なアノテーションの間に大きなトレードオフをもたらす。この課題に対処するため、我々は、表現力と鮮明な人間の言語記述で、単語中の音声クリップに注釈を付ける、表現力の解釈のための自動音声アノテーションシステムを提案する。音声音声は、最初は一連の専門家分類器とキャプションモデルによって処理され、多様な音声特性をキャプチャし、その後、カスタマイズされたアノテーション生成のための微調整されたLLaMAが続く。情報量や多様性が制限された従来のタグ/テンプレットベースのアノテーションフレームワークとは違って,提案システムは,自然言語記述の調整による音声スタイルの深い理解を提供し,大規模なモデルトレーニングのための正確で高機能なデータ生成を可能にする。このシステムにより、細粒度のバイリンガル表現型音声データセットであるSpeechCraftを作成する。約2000時間の音声データを含み、200万以上の音声クリップを含む、高度に記述的な自然言語スタイルのプロンプトによって区別されている。大規模な実験により,提案したデータセットは,スタイリスト音声合成と音声スタイル理解において,言語タスクのパフォーマンスを著しく向上させることが示された。 Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate insightful interplay between speech audio and natural language. However, constructing such datasets presents a major trade-off between large-scale data collection and high-quality annotation. To tackle this challenge, we propose an automatic speech annotation system for expressiveness interpretation that annotates in-the-wild speech clips with expressive and vivid human language descriptions. Initially, speech audios are processed by a series of expert classifiers and captioning models to capture diverse speech characteristics, followed by a fine-tuned LLaMA for customized annotation generation. Unlike previous tag/templet-based annotation frameworks with limited information and diversity, our system provides in-depth understandings of speech style through tailored natural language descriptions, thereby enabling accurate and voluminous data generation for large model training. With this system, we create SpeechCraft, a fine-grained bilingual expressive speech dataset. It is distinguished by highly descriptive natural language style prompts, containing approximately 2,000 hours of audio data and encompassing over two million speech clips. Extensive experiments demonstrate that the proposed dataset significantly boosts speech-language task performance in stylist speech synthesis and speech style understanding.	翻訳日:2024-08-29 18:22:33 公開日:2024-08-24
# 脳モデルとしての概念価値ネットワーク A Concept-Value Network as a Brain Model ( http://arxiv.org/abs/1904.04579v5 ) ライセンス: Link先を確認	Kieran Greer,	(参考訳) 本稿では,脳様モデルの物理的実体と概念的実体の関係を記述するための統計的枠組みを提案する。特徴と概念のインスタンスはコンテキストに置かれ、化学接続も可能であるが、この論文は特徴が電気配線である可能性を示唆している。この考え方では、実際の接続長は、発射速度とニューロン同期と関係があるため重要であるが、信号タイプはそれほど重要ではない。この論文は、概念が特徴集合と概念インスタンスをリンクするニューロン群であり、それらのグループからの化学信号によって決定されることを示唆している。したがって、特徴はニューラルネットワークの静的水平フレームワークとなり、概念はこれらを垂直に相互に結合する。機能に関して、ニューロンは機能的と考えられ、より水平な記憶構造はグリアとなる。これはまた、機能が分散エンティティであり、単一の領域に集中していないことを示唆する。 This paper suggests a statistical framework for describing the relations between the physical and conceptual entities of a brain-like model. Features and concept instances are put into context, where the paper suggests that features may be the electrical wiring, although chemical connections are also possible. With this idea, the actual length of the connection is important, because it is related to firing rates and neuron synchronization, but the signal type is less important. The paper then suggests that concepts are neuron groups that link feature sets and concept instances are determined by chemical signals from those groups. Therefore, features become the static horizontal framework of the neural system and concepts are vertically interconnected combinations of these. With regards to functionality, the neuron is then considered to be functional and the more horizontal memory structures can even be glial. This would also suggest that features can be distributed entities and not concentrated to a single area.	翻訳日:2024-08-28 20:36:52 公開日:2024-08-24
# 有限スペクトル三重項に対するフェルミオン積分 Fermion integrals for finite spectral triples ( http://arxiv.org/abs/2403.18428v2 ) ライセンス: Link先を確認	John W. Barrett,	(参考訳) フェルミオン函数積分は、有限実スペクトル三重項のディラック作用素に対して計算される。複素、実およびキラルな汎函数積分は、それらが非自明な各KO次元に対して考慮され、定義における位相あいまいさが注目される。 Fermion functional integrals are calculated for the Dirac operator of a finite real spectral triple. Complex, real and chiral functional integrals are considered for each KO-dimension where they are non-trivial, and phase ambiguities in the definition are noted.	翻訳日:2024-08-28 19:29:21 公開日:2024-08-24
# コンクリート製造プロセス最適化のための物理インフォームニューラルネットワーク Physics-Informed Neural Network for Concrete Manufacturing Process Optimization ( http://arxiv.org/abs/2408.14502v1 ) ライセンス: Link先を確認	Sam Varghese, Mr. Rahul Anand, Gaurav Paliwal,	(参考訳) コンクリート製造プロジェクトは、コンサルティング機関にとって最も一般的なプロジェクトの一つである。灰, 水, セメント, 超塑性などの入力材料の非線形依存性が高く, コンクリートの強度が高いことから, 機械学習モデルでは, この関係をうまく把握し, コスト最適化を行うのが困難になる。本稿では、PINN(Physics Informed Neural Networks)が与えられた状況でどのように役立つかを明らかにする。この最先端モデルは、線形回帰、ランダムフォレスト、グラディエントブースティング、ディープニューラルネットワークといった従来のモデルと比較される。調査の結果は、データセットが減ったとしてもPINNがいかにうまく機能したかを強調し、MLモデルの限られたデータ可用性に関する最大の課題の1つを解決した。 PINNは平均して、Deep Neural Networkに比べて40%少ないデータであっても、損失値を26.3%削減した。また, 材料量の予測に加えて, 粒子群最適化(PSO)などのヒューリスティック最適化手法を用いて, 与えられた強度のコンクリートを最小コストで製造するために必要な材料量の予測を行った。 Concrete manufacturing projects are one of the most common ones for consulting agencies. Because of the highly non-linear dependency of input materials like ash, water, cement, superplastic, etc; with the resultant strength of concrete, it gets difficult for machine learning models to successfully capture this relation and perform cost optimizations. This paper highlights how PINNs (Physics Informed Neural Networks) can be useful in the given situation. This state-of-the-art model shall also get compared with traditional models like Linear Regression, Random Forest, Gradient Boosting, and Deep Neural Network. Results of the research highlights how well PINNs performed even with reduced dataset, thus resolving one of the biggest issues of limited data availability for ML models. On an average, PINN got the loss value reduced by 26.3% even with 40% lesser data compared to the Deep Neural Network. In addition to predicting strength of the concrete given the quantity of raw materials, the paper also highlights the use of heuristic optimization method like Particle Swarm Optimization (PSO) in predicting quantity of raw materials required to manufacture concrete of given strength with least cost.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# 関数的正確性はコード言語モデルを評価するのに十分か? 生成コードの多様性を探る Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes ( http://arxiv.org/abs/2408.14504v1 ) ライセンス: Link先を確認	Heejae Chon, Seonghyeon Lee, Jinyoung Yeo, Dongha Lee,	(参考訳) 言語モデル(LM)は、自然言語の要求からコードを生成する素晴らしい能力を示した。本研究では,LMが生成するコードの多様性を,機能的正確性に加えて,コード生成能力を評価する重要な基準として強調する。その実践的な意味にもかかわらず、生成されたコードの多様性を評価することに焦点を当てた研究が不足しており、コードLMの開発においてその重要性を見落としている。本稿では,コード間の類似性や機能的正しさを指標として,生成コードの多様性を評価するための体系的なアプローチを提案する。具体的には、コード理解と推論において、大規模なLMの能力を活用し、人間の判断と最も高い相関性を示すペアワイズコード類似度尺度を導入する。モデルのサイズ,温度,トレーニングアプローチ,戦略の推進,入力問題の難しさなど,生成コードの品質に対するさまざまな要因の影響を幅広く検討する。テストパススコアとコード間類似度スコアとの正の相関関係について一貫した観察を行ったところ、現在のLMは機能的に正しいコードを生成する傾向にあることがわかった。 Language models (LMs) have exhibited impressive abilities in generating codes from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities, in addition to functional correctness. Despite its practical implications, there is a lack of studies focused on assessing the diversity of generated code, which overlooks its importance in the development of code LMs. We propose a systematic approach to evaluate the diversity of generated code, utilizing various metrics for inter-code similarity as well as functional correctness. Specifically, we introduce a pairwise code similarity measure that leverages large LMs' capabilities in code understanding and reasoning, demonstrating the highest correlation with human judgment. We extensively investigate the impact of various factors on the quality of generated code, including model sizes, temperatures, training approaches, prompting strategies, and the difficulty of input problems. Our consistent observation of a positive correlation between the test pass score and the inter-code similarity score indicates that current LMs tend to produce functionally correct code with limited diversity.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# 離散再プログラミングのデカップリングによる時空間予測のための事前学習型言語モデル Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming ( http://arxiv.org/abs/2408.14505v1 ) ライセンス: Link先を確認	Hao Wang, Jindong Han, Wei Fan, Hao Liu,	(参考訳) 時空間時系列予測は、輸送最適化、エネルギー管理、気候分析など、様々な実世界の応用において重要な役割を果たす。最近のPLM(Pre-trained Language Models)の進歩は、それらの優れた推論と一般化能力を活用することで、時系列予測タスクのためにこれらのモデルを再プログラミングする努力にインスピレーションを与えている。しかし、既存のアプローチは、複雑な空間的相互関係や本質的な系列内周波数成分の扱いに乏しく、時空間予測性能を制限している。さらに、連続時系列の圧縮部分語彙への線形写像は、PLMの時空間的表現性に制約を与え、潜在的な情報のボトルネックを引き起こす可能性がある。上記の制約を克服するため,時空間予測のための PLM プログラムフレームワークである \textsc{RePST} を提案する。 textsc{RePST} の重要な洞察は、周波数領域における時空間の時空間ダイナミクスを分離し、PLMテキスト空間との整合性を高めることである。具体的には、まず、フーリエ空間で時空間データを分離し、時間的内在的および空間的拡散信号を得る構造拡散演算子を考案し、このダイナミクスをより理解し、予測可能とした。さらに,限られた語彙からの情報ボトルネックを回避するために,拡張された語彙空間から関連する離散テキスト情報を選択する離散的再プログラミング戦略を提案する。 4つの実世界のデータセットに対する大規模な実験により、提案手法は、特にデータスカースシナリオにおいて、最先端の時空間予測モデルよりも大幅に優れていることが示された。 Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However, existing approaches fall short in handling complex spatial inter-series dependencies and intrinsic intra-series frequency components, limiting their spatio-temporal forecasting performance. Moreover, the linear mapping of continuous time series to a compressed subset vocabulary in reprogramming constrains the spatio-temporal semantic expressivity of PLMs and may lead to potential information bottleneck. To overcome the above limitations, we propose \textsc{RePST}, a tailored PLM reprogramming framework for spatio-temporal forecasting. The key insight of \textsc{RePST} is to decouple the spatio-temporal dynamics in the frequency domain, allowing better alignment with the PLM text space. Specifically, we first decouple spatio-temporal data in Fourier space and devise a structural diffusion operator to obtain temporal intrinsic and spatial diffusion signals, making the dynamics more comprehensible and predictable for PLMs. To avoid information bottleneck from a limited vocabulary, we further propose a discrete reprogramming strategy that selects relevant discrete textual information from an expanded vocabulary space in a differentiable manner. Extensive experiments on four real-world datasets show that our proposed approach significantly outperforms state-of-the-art spatio-temporal forecasting models, particularly in data-scarce scenarios.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# 蒸留ロングテールデータセット Distilling Long-tailed Datasets ( http://arxiv.org/abs/2408.14506v1 ) ライセンス: Link先を確認	Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan,	(参考訳) データセット蒸留(DD)は、より大規模なデータセットから小さな情報に富んだデータセットを蒸留して、効率的なニューラルネットワークトレーニングを実現することを目的としている。しかし、既存のDDメソッドは、現実世界のシナリオで広く使われている長い尾のデータセットに苦しむ。この予期せぬ結果の背景にある理由を調査した結果、2つの主な原因が判明した。 1) 不均衡なデータに基づいて訓練されたエキスパートネットワークはバイアス勾配を発達させ、同様に不均衡な蒸留データセットを合成する。 DDの一般的な手法であるパラメータマッチングでは、蒸留データセットの学習パラメータと元のデータセットの学習パラメータを整合させる。しかし、長い尾のデータセットの文脈では、バイアスのある専門家が元のデータに存在する不均衡を継承し、蒸留されたデータセットは尾のクラスを不十分に表現する。 2) これらのデータセットを訓練した専門家は, 蒸留監督の誤認や, 品質の悪いソフトラベルの初期化を招いた。これらの課題に対処するため,我々は,Long-tailed Aware Dataset distillation (LAD) という,新しい長鎖データセット蒸留法を提案する。具体的には,偏りのある専門家の軌道と直接一致することを避けるために,ウェイトミスマッチ回避法を提案する。これは、学生と偏りのある専門家の軌跡の間の距離を減らし、尾のクラスバイアスが合成データセットに蒸留されるのを防ぐ。さらに,アダプティブ・デカップリング・マッチング(Adaptive Decoupled Matching)を提案する。この研究は長い尾のデータセット蒸留(LTDD)の分野を開拓し、長い尾のデータセットを蒸留する最初の効果的な取り組みとなった。 Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# GPT-4とスキーママッチングにおけるコスト意識の不確実性低減: Prompt-Matcher フレームワーク Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework ( http://arxiv.org/abs/2408.14507v1 ) ライセンス: Link先を確認	Longyu Feng, Huahang Li, Chen Jason Zhang,	(参考訳) スキーママッチングは、与えられた2つのスキーマの要素間の対応を識別するプロセスであり、データベース管理システム、データ統合、データウェアハウスに必須である。現在のスキーママッチングアルゴリズムの固有の不確実性は、一連の候補マッチングの生成につながる。これらの結果を維持するには、確率的クエリを処理できるデータベースやシステムを使う必要がある。これにより、クエリプロセスが複雑になり、関連するストレージコストが増加する。 GPT-4の優れた性能により、不確実性を低減できる可能性を探る。本提案では,GPT-4を用いて,候補の集合を問合せするクラウドワーカーの役割を代替することを目的とする。 GPT-4からより正確な対応確認応答を得るため、我々は、GPT-4のセマンティック・マッチとAbbreviation-matchプロンプトを作成し、2つのベンチマークデータセットであるDeepMDatasets 100% (+0.0) と Fabricated-Datasets 91.8% (+2.2) のリコールレートに対して、最先端の結果を達成する。予算の活用を最適化するため、我々はコスト対応ソリューションを考案した。予算の制約の中で、我々のソリューションは、最小限の時間支出で好ましい結果をもたらす。本稿では,複数の自動スキーママッチングアルゴリズムの統合プロセスにおける不確実性を低減し,複雑なパラメータ化を選択するための新しいフレームワークであるPrompt-Matcherを紹介する。これは、候補スキーマの結果に関連する不確実性を減らし、最も有望なマッチを最適にランク付けするのに役立つ。我々は、GPT-4予算の範囲内での収益を最適化することを目的として、対応選択問題を正式に定義する。 CSPがNP-Hardであることを示し、最小時間支出の近似アルゴリズムを提案する。最終的に、厳密な実験を通してPrompt-Matcherの有効性を実証する。 Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling probabilistic queries. This complicates the querying process and increases the associated storage costs. Motivated by GPT-4 outstanding performance, we explore its potential to reduce uncertainty. Our proposal is to supplant the role of crowdworkers with GPT-4 for querying the set of candidate matches. To get more precise correspondence verification responses from GPT-4, We have crafted Semantic-match and Abbreviation-match prompt for GPT-4, achieving state-of-the-art results on two benchmark datasets DeepMDatasets 100% (+0.0) and Fabricated-Datasets 91.8% (+2.2) recall rate. To optimise budget utilisation, we have devised a cost-aware solution. Within the constraints of the budget, our solution delivers favourable outcomes with minimal time expenditure. We introduce a novel framework, Prompt-Matcher, to reduce the uncertainty in the process of integration of multiple automatic schema matching algorithms and the selection of complex parameterization. It assists users in diminishing the uncertainty associated with candidate schema match results and in optimally ranking the most promising matches. We formally define the Correspondence Selection Problem, aiming to optimise the revenue within the confines of the GPT-4 budget. We demonstrate that CSP is NP-Hard and propose an approximation algorithm with minimal time expenditure. Ultimately, we demonstrate the efficacy of Prompt-Matcher through rigorous experiments.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# 科学のための人工知能:簡単で難しい問題 Artificial intelligence for science: The easy and hard problems ( http://arxiv.org/abs/2408.14508v1 ) ライセンス: Link先を確認	Ruairidh M. Battleday, Samuel J. Gershman,	(参考訳) 人工知能の最近の進歩によって、科学的な発見が目覚ましいものとなりました。これらはほとんどすべて、大量のデータにアクセス可能なドメイン科学者とエンジニアのチームによって事前に特定された難しい最適化問題を解決するために、柔軟なアルゴリズムをトレーニングした結果である。非常に有用ではあるが、この種の問題解決は科学の1つの部分、すなわち「簡単な問題」にしか対応しない。科学研究のもう1つの部分は、その問題そのもの、すなわち「ハード問題」を思い浮かび上がっている。難しい問題の解決は、未定義の制約に基づいて連続的な概念修正を必要とするため、科学的な発見のための現在のアルゴリズムの能力を超える。我々は、科学者の認知科学を研究することによって、人間がどのように難しい問題を解くかを理解し、その結果を使って、科学パラダイムを自動推論し更新する新しい計算エージェントを設計することができる。 A suite of impressive scientific discoveries have been driven by recent advances in artificial intelligence. These almost all result from training flexible algorithms to solve difficult optimization problems specified in advance by teams of domain scientists and engineers with access to large amounts of data. Although extremely useful, this kind of problem solving only corresponds to one part of science - the "easy problem." The other part of scientific research is coming up with the problem itself - the "hard problem." Solving the hard problem is beyond the capacities of current algorithms for scientific discovery because it requires continual conceptual revision based on poorly defined constraints. We can make progress on understanding how humans solve the hard problem by studying the cognitive science of scientists, and then use the results to design new computational agents that automatically infer and update their scientific paradigms.	翻訳日:2024-08-28 18:01:37 公開日:2024-08-24
# シンセティック・インターベンション Synthetic Interventions ( http://arxiv.org/abs/2006.07691v7 ) ライセンス: Link先を確認	Anish Agarwal, Devavrat Shah, Dennis Shen,	(参考訳) SC(Synthetic Control)方法論は,パネルデータアプリケーションにおけるポリシー評価のための重要なツールである。研究者は一般的にSCフレームワークを低次元の行列係数モデルで正当化し、潜在的な結果が低次元単位および時間固有の潜在因子によって記述されると仮定する。近年の[Abadie '20]では,SC手法の先駆者の一人が,SCフレームワークを複数の治療法に拡張する方法について疑問を投げかけている。本稿では、このオープンな疑問に対して、私たちが合成介入(SI)と呼ぶ一つの解決法を提供する。 SIフレームワークの基本は低ランクテンソル因子モデルであり、これは治療に対する潜在因子化を含めることで行列因子モデルを拡張する。本モデルでは,標準SCに基づく推定器の一般化を提案する。このアプローチの1つのインスタンス化に対する一貫性を証明し、漸近的に正常な条件を提供する。さらに,本研究では,その予測性能について検討し,これまでに検討されていない関連質問を探索し,抗タバコ法の影響について, [Abadie-Diamond-Hainmueller '10] の標準SCケーススタディを再検討する代表シミュレーションを行った。 The synthetic controls (SC) methodology is a prominent tool for policy evaluation in panel data applications. Researchers commonly justify the SC framework with a low-rank matrix factor model that assumes the potential outcomes are described by low-dimensional unit and time specific latent factors. In the recent work of [Abadie '20], one of the pioneering authors of the SC method posed the question of how the SC framework can be extended to multiple treatments. This article offers one resolution to this open question that we call synthetic interventions (SI). Fundamental to the SI framework is a low-rank tensor factor model, which extends the matrix factor model by including a latent factorization over treatments. Under this model, we propose a generalization of the standard SC-based estimators. We prove the consistency for one instantiation of our approach and provide conditions under which it is asymptotically normal. Moreover, we conduct a representative simulation to study its prediction performance and revisit the canonical SC case study of [Abadie-Diamond-Hainmueller '10] on the impact of anti-tobacco legislations by exploring related questions not previously investigated.	翻訳日:2024-08-28 01:41:09 公開日:2024-08-24
# VPIT:Voxel Pseudo画像を用いたリアルタイム埋め込み単体3D追跡 VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images ( http://arxiv.org/abs/2206.02619v2 ) ライセンス: Link先を確認	Illia Oleksiienko, Paraskevi Nousi, Nikolaos Passalis, Anastasios Tefas, Alexandros Iosifidis,	(参考訳) 本稿では,Voxel Pseudo Image Tracking (VPIT) と呼ばれる,Voxel-based 3D Single Object Tracking (3D SOT) 手法を提案する。 VPITは3D SOTにボクセル擬似画像を使用する最初の方法である。入力点雲は、柱ベースのボキセル化により構成され、結果として得られる擬似画像は、2DライクなSiamese SOT法の入力として使用される。擬似画像はBird's-eye View (BEV)座標で生成されるため、その中のオブジェクトのサイズは一定である。したがって、新しい座標系ではオブジェクトの回転のみが変化し、オブジェクトのスケールは変化しない。そこで我々は,対象物の位置と回転の両方を予測するために,異なる回転する探索領域を単一のターゲット表現と比較するマルチローテーション探索に置き換える。 KITTI追跡データセットの実験は、VPITが最速の3D SOT法であり、競合的な成功と精度の値を維持することを示している。実世界のシナリオにおけるSOT手法の適用は、組み込み機器の計算能力の低下や、推論速度が十分高くなければ特定のデータフレームをスキップせざるを得ない遅延非推奨環境といった制限に満たされる。我々は、リアルタイム評価プロトコルを実装し、他のメソッドが組み込みデバイスでの性能の大部分を失うことを示す一方、VPITはオブジェクトの追跡能力を維持している。 In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird's-eye View (BEV) coordinates, and therefore the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices, while VPIT maintains its ability to track the object.	翻訳日:2024-08-28 01:37:08 公開日:2024-08-24
# 時間と空間におけるロボットのハグ動作の学習とブレンディング Learning and Blending Robot Hugging Behaviors in Time and Space ( http://arxiv.org/abs/2212.01507v2 ) ライセンス: Link先を確認	Michael Drolet, Joseph Campbell, Heni Ben Amor,	(参考訳) 複数の相互作用の重畳を含む複雑な相互作用において、適切なロボット応答を予測できる模倣学習に基づく物理ロボットインタラクションアルゴリズムを提案する。提案アルゴリズムであるBlending Bayesian Interaction Primitives (B-BIP) により, 複雑なハグシナリオにおいて応答性のある相互作用を実現できる。本稿では,本アルゴリズムが先行研究の一般化であり,元の定式化が単一インタラクションの特定のケースに還元されることを示す。提案アルゴリズムは,既存の最先端手法と比較して,精度,応答性,タイミングに関して,定量的な予測誤差と,より良好な参加者応答が得られる。 We introduce an imitation learning-based physical human-robot interaction algorithm capable of predicting appropriate robot responses in complex interactions involving a superposition of multiple interactions. Our proposed algorithm, Blending Bayesian Interaction Primitives (B-BIP) allows us to achieve responsive interactions in complex hugging scenarios, capable of reciprocating and adapting to a hugs motion and timing. We show that this algorithm is a generalization of prior work, for which the original formulation reduces to the particular case of a single interaction, and evaluate our method through both an extensive user study and empirical experiments. Our algorithm yields significantly better quantitative prediction error and more-favorable participant responses with respect to accuracy, responsiveness, and timing, when compared to existing state-of-the-art methods.	翻訳日:2024-08-28 01:37:08 公開日:2024-08-24
# 反射結合によるSGLDの幾何学的エルゴディディティ Geometric ergodicity of SGLD via reflection coupling ( http://arxiv.org/abs/2301.06769v2 ) ライセンス: Link先を確認	Lei Li, Jian-Guo Liu, Yuliang Wang,	(参考訳) 非凸条件下での確率勾配ランゲヴィンダイナミクス(SGLD)の幾何学的エルゴディディティを考察する。反射結合の技法により、目標分布がコンパクトな集合の外側のみに対数展開されているとき、SGLDのワッサーシュタイン収縮を証明できる。 SGLDにおける時間離散化とミニバッチは、条件付き予測の一連の注意深く見積もられたリフレクション結合の適用においていくつかの困難をもたらす。直系として、一定のステップサイズを持つSGLDは不変分布を持ち、その幾何学的エルゴディディティを$W_1$距離で得ることができる。非勾配ドリフトへの一般化も含んでいる。 We consider the geometric ergodicity of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm under nonconvexity settings. Via the technique of reflection coupling, we prove the Wasserstein contraction of SGLD when the target distribution is log-concave only outside some compact set. The time discretization and the minibatch in SGLD introduce several difficulties when applying the reflection coupling, which are addressed by a series of careful estimates of conditional expectations. As a direct corollary, the SGLD with constant step size has an invariant distribution and we are able to obtain its geometric ergodicity in terms of $W_1$ distance. The generalization to non-gradient drifts is also included.	翻訳日:2024-08-28 01:37:08 公開日:2024-08-24
# シーケンスレコメンデーションのためのインテリジェントモデル更新戦略 Intelligent Model Update Strategy for Sequential Recommendation ( http://arxiv.org/abs/2302.07335v2 ) ライセンス: Link先を確認	Zheqi Lv, Wenqiao Zhang, Zhengyu Chen, Shengyu Zhang, Kun Kuang,	(参考訳) 現代のオンラインプラットフォームでは、情報の過負荷に対処し、ユーザエンゲージメントを改善するためのレコメンデーションシステムがますます多くなっている。この研究分野には、クラウドとエッジの両方でネットワーク学習を推奨するパラダイムが進化している(すなわち、エッジクラウドのコラボレーション)。最近の研究は、エッジ固有のコンテキスト対応適応を可能にすることで、この分野をさらに推し進めている。しかしながら、クラウドとエッジ間の頻繁なデータ交換は、かなりのパラメータ更新が冗長である可能性があるため、非効率性と通信/計算リソースの浪費につながることが多い、と我々は主張する。そこで本研究では,IntellectReqと略されるIntelligent Edge-Cloudパラメータ要求モデルを提案する。 IntellectReqはエッジで動作するように設計されており、最小の計算と通信オーバーヘッドでパラメータ要求のコスト対効果を評価できる。我々はこれを,配布外データの検出を目的とした新しい学習タスクとして定式化し,微調整適応通信戦略を提案する。さらに,実時間ユーザ動作を正規分布に変換するための統計マッピング手法を用いて,モデルの不確かさの定量化と一般化能力の定量化にマルチサンプル出力を用いる。広範に評価された4つのベンチマークに対する厳密な実証的検証は、エッジクラウド協調型および動的レコメンデーションシステムの効率と一般化性において顕著に改善されていると判断し、我々のアプローチを評価する。 Modern online platforms are increasingly employing recommendation systems to address information overload and improve user engagement. There is an evolving paradigm in this research field that recommendation network learning occurs both on the cloud and on edges with knowledge transfer in between (i.e., edge-cloud collaboration). Recent works push this field further by enabling edge-specific context-aware adaptivity, where model parameters are updated in real-time based on incoming on-edge data. However, we argue that frequent data exchanges between the cloud and edges often lead to inefficiency and waste of communication/computation resources, as considerable parameter updates might be redundant. To investigate this problem, we introduce Intelligent Edge-Cloud Parameter Request Model, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal computation and communication overhead. We formulate this as a novel learning task, aimed at the detection of out-of-distribution data, thereby fine-tuning adaptive communication strategies. Further, we employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities. Rigorous empirical validation on four widely-adopted benchmarks evaluates our approach, evidencing a marked improvement in the efficiency and generalizability of edge-cloud collaborative and dynamic recommendation systems.	翻訳日:2024-08-28 01:26:59 公開日:2024-08-24
# Leo: Lagrangeの基本的な最適化 Leo: Lagrange Elementary Optimization ( http://arxiv.org/abs/2304.05346v3 ) ライセンス: Link先を確認	Aso M. Aladdin, Tarik A. Rashid,	(参考訳) グローバル最適化問題は、進化的洗練の実践的で効率的な方法を用いて頻繁に解決される。しかし、元の問題がより複雑になると、その有効性と拡張性も向上する。そこで本研究では,ヒト血液のアルブミン投与量を用いたワクチン接種精度の顕著な向上から着想を得た,ラグランジュ基本最適化(Leo)を進化的手法として導入することを目的とする。彼らは、遺伝子交差後の適合関数値を用いてインテリジェントエージェントを開発する。これらの遺伝子は探索と搾取の両方において探索エージェントを誘導する。この論文ではLeoアルゴリズムの主目的と概念のインスピレーションとモチベーションについて述べる。その精度を示すために、提案アルゴリズムは、19の従来のベンチマーク関数やCECC06 2019テスト関数を含む、様々なテスト関数に対して検証される。 19の古典的ベンチマークテスト関数に対するLeoの結果は、DA、PSO、GAに対して別々に評価され、FDO、LPBなどの最近の2つのアルゴリズムも評価に含まれる。さらに、LeoはCECC06 2019の10の関数でDA、WOA、SSA、FDO、PB、FOXアルゴリズムをはっきりとテストしている。累積的な結果は、レオが人口を増やし、世界的最適な方向に進む能力を示している。異なる標準測定は、探検と搾取の両方の段階でレオの安定性を検証し、証明するために用いられる。さらに, 統計的解析は, 提案研究の結果を支持する。最後に、Leoの実用性を実証するために、現実世界における新しい応用を紹介した。 Global optimization problems are frequently solved using the practical and efficient method of evolutionary sophistication. But as the original problem becomes more complex, so does its efficacy and expandability. Thus, the purpose of this research is to introduce the Lagrange Elementary Optimization (Leo) as an evolutionary method, which is self-adaptive inspired by the remarkable accuracy of vaccinations using the albumin quotient of human blood. They develop intelligent agents using their fitness function value after gene crossing. These genes direct the search agents during both exploration and exploitation. The main objective of the Leo algorithm is presented in this paper along with the inspiration and motivation for the concept. To demonstrate its precision, the proposed algorithm is validated against a variety of test functions, including 19 traditional benchmark functions and the CECC06 2019 test functions. The results of Leo for 19 classic benchmark test functions are evaluated against DA, PSO, and GA separately, and then two other recent algorithms such as FDO and LPB are also included in the evaluation. In addition, the Leo is tested by ten functions on CECC06 2019 with DA, WOA, SSA, FDO, LPB, and FOX algorithms distinctly. The cumulative outcomes demonstrate Leo's capacity to increase the starting population and move toward the global optimum. Different standard measurements are used to verify and prove the stability of Leo in both the exploration and exploitation phases. Moreover, Statistical analysis supports the findings results of the proposed research. Finally, novel applications in the real world are introduced to demonstrate the practicality of Leo.	翻訳日:2024-08-28 01:26:59 公開日:2024-08-24
# 時間共有計算資源に関する学習可能性 Learnability with Time-Sharing Computational Resource Concerns ( http://arxiv.org/abs/2305.02217v5 ) ライセンス: Link先を確認	Zhi-Hua Zhou,	(参考訳) 従来の理論的機械学習研究は、一般に、十分に、あるいは無限に供給された計算資源が存在することを明示的または暗黙的に仮定する。しかし、実際には、計算リソースは通常限られており、機械学習のパフォーマンスは、受信したデータの数だけでなく、利用可能な計算リソースの処理量にも依存する。現在の 'intelligent supercomputing'' 施設は、学習性能要求や学習プロセス状態などの重要な要因を考慮して、適応的なスケジューリング戦略を使わずに、一定の量のリソースを機械学習タスクに割り当てる排他的オペレーティングシステムのように機能する。本稿では,機械学習のスループットの概念を導入し,計算資源効率学習(CoRE-Learning)を定義し,学習理論における計算資源の影響を考慮した理論的枠組みを提案する。このフレームワークは、入ってくるデータストリームが圧倒的なサイズで無限に終止符を打つことができるようなストリーム学習に自然に適用することができ、受信したすべてのデータを時間内に処理できると仮定するのは現実的ではない。これはまた、インテリジェントなスーパーコンピュータオペレーティングシステムの設計に対する理論的視点を提供するかもしれない。 Conventional theoretical machine learning studies generally assume explicitly or implicitly that there are enough or even infinitely supplied computational resources. In real practice, however, computational resources are usually limited, and the performance of machine learning depends not only on how many data have been received, but also on how many data can be handled subject to computational resources available. Note that most current ``intelligent supercomputing'' facilities work like exclusive operating systems, where a fixed amount of resources are allocated to a machine learning task without adaptive scheduling strategies considering important factors such as the learning performance demands and learning process status. In this article, we introduce the notion of machine learning throughput, define Computational Resource Efficient Learning (CoRE-Learning), and present a theoretical framework that takes into account the influence of computational resources in learning theory. This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless with overwhelming size and it is impractical to assume that all received data can be handled in time. It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.	翻訳日:2024-08-28 01:26:59 公開日:2024-08-24
# Virtual Quantum Device (VQD): 量子コンピュータの詳細なエミュレーションのためのツール The Virtual Quantum Device (VQD): A tool for detailed emulation of quantum computers ( http://arxiv.org/abs/2306.07342v3 ) ライセンス: Link先を確認	Cica Gustiani, Tyson Jones, Simon C. Benjamin,	(参考訳) 我々はQuEST量子エミュレータに基づくシステムであるVirtual Quantum Device (VQD) プラットフォームを提案する。 VQDを使用することで、エキスパートでないユーザは、特定の量子コンピュータに詳細なエラーモデル、分岐ゲートセット、接続性をエミュレートすることができる。プラットフォームには直感的なインターフェース、強力な視覚化、複雑な量子アルゴリズムやさまざまな量子コンピューティングハードウェアにおけるアイデアの効率的なテストと最適化のための高性能な計算との互換性がある。我々は、トラップされたイオン、窒素空孔中心、中性原子配列、シリコン量子ドットスピン、超伝導デバイスに対応するVQDの5つのファミリーを作成し、探索する。それぞれが、調整されたパラメータのセットを通じて、高度に設定可能である。ツールの有用性を実例で示すとともに,各デバイス固有の属性を強調表示する。多様な量子ハードウェアのユーザフレンドリなカプセル化された記述を提供することで、VQDプラットフォームは研究者に、現実的な環境でアルゴリズムやプロトコルを迅速に探索する機能を提供する。 We present the Virtual Quantum Device (VQD) platform, a system based on the QuEST quantum emulator. Through the use of VQDs, non-expert users can emulate specific quantum computers with detailed error models, bespoke gate sets and connectivities. The platform boasts an intuitive interface, powerful visualisation, and compatibility with high-performance computation for effective testing and optimisation of complex quantum algorithms or ideas across a range of quantum computing hardware. We create and explore five families of VQDs corresponding to trapped ions, nitrogen-vacancy-centres, neutral atom arrays, silicon quantum dot spins, and superconducting devices. Each is highly configurable through a set of tailored parameters. We showcase the key characteristics of each virtual device, providing practical examples of the tool's usefulness and highlighting each device's specific attributes. By offering user-friendly encapsulated descriptions of diverse quantum hardware, the VQD platform offers researchers the ability to rapidly explore algorithms and protocols in a realistic setting; meanwhile hardware experts can create their own VQDs to compare with their experiments.	翻訳日:2024-08-28 01:17:09 公開日:2024-08-24
# UrbanIR: ワンビデオによる大規模都市シーンの逆レンダリング UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video ( http://arxiv.org/abs/2306.09349v3 ) ライセンス: Link先を確認	Zhi-Hao Lin, Bohan Liu, Yi-Ting Chen, Kuan-Sheng Chen, David Forsyth, Jia-Bin Huang, Anand Bhattad, Shenlong Wang,	(参考訳) Urban Scene Inverse Rendering(Urban Scene Inverse Rendering)は,様々な照明条件下でのシーンのリアルかつ自由視点レンダリングを可能にする,新しい逆グラフィックモデルである。形状、アルベド、可視性、および太陽と空の照明は、車載カメラなど、NeRFの密集した視界設定とは異なる広いベースラインの映像から正確に推測される。この文脈では、標準的な手法は、不正確な屋根表現や多数の「フローター」のような、サブパー幾何学と材料推定をしばしば得る。 UrbanIRはこれらの問題に、逆グラフィック推論とレンダリングアーティファクトのエラーを減らす新たな損失で対処する。その技術により、元のシーンで正確に影の体積を推定できる。このモデルの出力は、制御可能な編集をサポートし、夜間シミュレーションのフォトリアリスティックな自由視点レンダリング、シーンへの依存、挿入オブジェクトを可能にし、既存の最先端手法よりも大幅に改善されている。 We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous 'floaters'. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods.	翻訳日:2024-08-28 01:17:09 公開日:2024-08-24
# ハイパーネットを用いた高速非教師付き深部外乱モデル選択 Fast Unsupervised Deep Outlier Model Selection with Hypernetworks ( http://arxiv.org/abs/2307.10529v3 ) ライセンス: Link先を確認	Xueying Ding, Yue Zhao, Leman Akoglu,	(参考訳) 外乱検出(OD)は、多くのテクニックの豊富な文献で多くの応用を見出す。ディープニューラルネットワークに基づくOD(DOD)は、ディープラーニングの多くの進歩のおかげで、近年注目を集めている。本稿では,教師なしDOD,すなわち実効性ハイパーパラメータ(HP)チューニング/モデル選択によるクリティカル・イット・アンサンディドな課題について考察する。いくつかの先行研究は、ODモデルのHPに対する感受性を報告しているが、HPの長いリストを示す現代のDODモデルにとって、非常に重要なものになっている。我々は,DODモデルのチューニングにHYPERを導入し,(1)監督のない検証(ラベル付き異常の欠如による)と(2)HP/モデル空間の効率的な探索(HP数の増加による)という2つの基本的な課題に対処する。鍵となるアイデアは、HPをメインのDODモデルの最適な重みにマッピングする新しいハイパーネットワーク(HN)を設計し、訓練することである。 HYPERは、多くのDODモデルの重みを動的に生成できる単一のHN(HPの異なるモデルに対応する)に乗じて、大幅なスピードアップを実現している。さらに,従来のODタスクのメタラーニングを利用して,提案したHNを効率的にトレーニングしたプロキシバリデーション関数をラベルでトレーニングする。 35のODタスクに対する大規模な実験により、HYPERは高い効率で8つのベースラインに対して高いパフォーマンスを達成している。 Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.	翻訳日:2024-08-28 01:07:17 公開日:2024-08-24
# AI生成テキストにおける編集検出のための情報理論アプローチ An Information-Theoretic Approach for Detecting Edits in AI-Generated Text ( http://arxiv.org/abs/2308.12747v2 ) ライセンス: Link先を確認	Idan Kashtan, Alon Kipnis,	(参考訳) 本稿では,ある記事が生成言語モデルで完全に書かれたのか,あるいは異なる著者による編集を含むのか,あるいは人間なのかを判断する手法を提案する。我々のプロセスは、個々の文章や他のテキストの起点に関する複数のテストと、まれな代替品に敏感な手法を用いてこれらのテストを組み合わせることを含みます。興味深いことに、この方法は編集を含むと思われるテキストの断片も識別する。本手法の有効性を実データを用いた広範囲な評価により示すとともに,その成功に影響を及ぼす要因の情報理論解析を行う。特に、テキスト編集の理論的枠組みの下で、文は言語モデルによって主に生成されるという最適性について論じる。我々の分析は、情報理論とデータ科学の共通点における興味深い研究課題をいくつか提起する。 We propose a method to determine whether a given article was written entirely by a generative language model or perhaps contains edits by a different author, possibly a human. Our process involves multiple tests for the origin of individual sentences or other pieces of text and combining these tests using a method that is sensitive to rare alternatives, i.e., non-null effects are few and scattered across the text in unknown locations. Interestingly, this method also identifies pieces of text suspected to contain edits. We demonstrate the effectiveness of the method in detecting edits through extensive evaluations using real data and provide an information-theoretic analysis of the factors affecting its success. In particular, we discuss optimality properties under a theoretical framework for text editing saying that sentences are generated mainly by the language model, except perhaps for a few sentences that might have originated via a different mechanism. Our analysis raises several interesting research questions at the intersection of information theory and data science.	翻訳日:2024-08-28 01:07:17 公開日:2024-08-24
# 古典的到着時間のモーダル変形 Moyal deformation of the classical arrival time ( http://arxiv.org/abs/2309.00222v3 ) ライセンス: Link先を確認	Dean Alvin L. Pablico, Eric A. Galapon,	(参考訳) 到着の量子時間(TOA)問題は、粒子の初期状態のみを仮定して測定された到着時間の統計を必要とする。量子論の標準的な枠組みに従って、この問題は古典的到着時刻 $\mathcal{T}_C(q,p)$ の適切な量子像を見つけることに変換される。本稿では、量子力学の位相空間定式化における問題を新たに考察する。得られた量子画像は実数値で時間反転対称関数 $\mathcal{T}_M(q,p)$ の形式的級数$\hbar^2$ であり、古典的到着時刻を主項とする。これはハミルトニアン系とのモヤルブラケット関係から直接得られ、したがって古典的TOAのモヤル変形として解釈される。その性質について検討し、$\mathcal{T}_M(q,p)$ と[Eur で構築されたヒルベルト空間 TOA 作用素の間の同型性を示すことによって、既知の障害物を量子化にバイパスする方法について議論する。 Phys J. Plus \textbf{138}, 153 (2023)] は任意の解析ポテンシャルに対して常に時間-エネルギーの正準交換関係(TECCR)を満たす。次に、自由粒子と準振動子ポテンシャルのTOA問題を例として考察する。 The quantum time of arrival (TOA) problem requires the statistics of measured arrival times given only the initial state of a particle. Following the standard framework of quantum theory, the problem translates into finding an appropriate quantum image of the classical arrival time $\mathcal{T}_C(q,p)$, usually in operator form $\hat{\mathrm{T}}$. In this paper, we consider the problem anew within the phase space formulation of quantum mechanics. The resulting quantum image is a real-valued and time-reversal symmetric function $\mathcal{T}_M(q,p)$ in formal series of $\hbar^2$ with the classical arrival time as the leading term. It is obtained directly from the Moyal bracket relation with the system Hamiltonian and is hence interpreted as a Moyal deformation of the classical TOA. We investigate its properties and discuss how it bypasses the known obstructions to quantization by showing the isomorphism between $\mathcal{T}_M(q,p)$ and the rigged Hilbert space TOA operator constructed in [Eur. Phys. J. Plus \textbf{138}, 153 (2023)] which always satisfy the time-energy canonical commutation relation (TECCR) for arbitrary analytic potentials. We then examine TOA problems for a free particle and a quartic oscillator potential as examples.	翻訳日:2024-08-28 01:07:17 公開日:2024-08-24
# Lyra: 自動定理証明における二重補正のオーケストレーション Lyra: Orchestrating Dual Correction in Automated Theorem Proving ( http://arxiv.org/abs/2309.15806v4 ) ライセンス: Link先を確認	Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li,	(参考訳) 大言語モデル (LLMs) は、公式な定理証明の分野における探索の興味深い道を示す。しかし、その可能性、特に幻覚の緩和と証明エラーメッセージによる改善については、まだ徹底的に研究されていない領域である。 LLMの有効性を高めるために,ツール補正(TC)とコンジェクチュア補正(CC)の2つの異なる補正機構を取り入れた新しいフレームワークであるLyraを導入する。形式的証明の後処理にツール補正を実装するために,事前に定義された証明ツール(例えばSledgehammer)を用いて,不正なツールの置き換えを誘導する。ツール補正は幻覚の緩和に大きく寄与し、それによって証明の全体的な精度が向上する。さらに,証明者と対話し,形式的証明予想を証明者エラーメッセージで洗練するエラーフィードバック機構であるConjecture Correctionを導入する。従来の改良フレームワークと比較して、提案されたConjecture Correctionは命令で生成を洗練させるが、ペア化された(生成、エラー、改善)プロンプトは収集しない。提案手法は, MiniF2F 検証 (48.0% -> 55.3%) とテスト (45.5% -> 51.2%) の両方で最先端 (SOTA) 性能を達成した。また、Lyraによって解決された3つのIMO問題を提示する。ツール補正(幻覚の緩和プロセス)とコンジェクチュア補正(環境との相互作用による副次的な調整)が今後の研究の道筋となると信じている。 Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field.	翻訳日:2024-08-28 01:07:17 公開日:2024-08-24
# Lemur: 自然言語の調和と言語エージェントのコード Lemur: Harmonizing Natural Language and Code for Language Agents ( http://arxiv.org/abs/2310.06830v2 ) ライセンス: Link先を確認	Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu,	(参考訳) 自然言語とコーディング機能の両方に最適化されたオープンアクセス型言語モデルであるLemurとLemur-Chatを導入し、多言語エージェントのバックボーンとして機能する。言語チャットモデルから関数型言語エージェントへの進化は、モデルが人間のインタラクション、推論、計画をマスターするだけでなく、関連する環境の基盤を確保することを要求する。これにより、モデルにおける言語とコーディング機能の調和が求められます。 Lemur と Lemur-Chat はこの必要性に対処するために提案され、両方の領域でバランスの取れた熟練度を示す。コード集約コーパスを用いた厳密な事前学習とテキストとコードデータの微調整により,我々のモデルは,オープンソースモデル間での多種多様なテキストおよびコーディングベンチマークの最先端平均性能を達成する。総合的な実験は、ルムールが既存のオープンソースモデルよりも優れていること、そして人間のコミュニケーション、ツールの使用、完全に観察可能な環境下での相互作用を含む様々なエージェントタスクの能力を示している。自然言語とプログラミング言語の調和により、Lemur-Chatはエージェント能力に関するプロプライエタリなモデルとのギャップを著しく狭め、推論、計画、環境間のシームレスな操作に適した高度なオープンソースエージェントの開発に関する重要な洞察を提供する。 https://github.com/OpenLemur/Lemur We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls for a harmonious blend of language and coding capabilities in the models. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks among open-source models. Comprehensive experiments demonstrate Lemur's superiority over existing open-source models and its proficiency across various agent tasks involving human communication, tool usage, and interaction under fully- and partially- observable environments. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities, providing key insights into developing advanced open-source agents adept at reasoning, planning, and operating seamlessly across environments. https://github.com/OpenLemur/Lemur	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# 時系列分類のためのデータ拡張:広範囲にわたる実証研究と包括的調査 Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey ( http://arxiv.org/abs/2310.10060v5 ) ライセンス: Link先を確認	Zijun Gao, Haibao Liu, Lingbo Li,	(参考訳) データ拡張(DA)は、トレーニングデータセットを拡張し、モデルの堅牢性を高め、多様性を導入し、オーバーフィッティングを減らす能力のために、時系列分類(TSC)において重要なアプローチとなっている。しかし、TSCにおけるDAの現在の状況は、断片化された文献レビュー、曖昧な方法論の分類、不適切な評価基準、そしてアクセス可能でユーザ指向のツールの不足に悩まされている。本研究は, TSC領域内におけるDA手法の総合的な検討を通じて, これらの課題に対処するものである。我々の研究は10年間にわたる広範な文献レビューから始まり, 既存の調査における大きなギャップを明らかにし, 60以上のDA手法を識別するために,100以上の学術論文の詳細な分析を必要とする。この厳格なレビューにより、TSCにおけるDAの特定のニーズに合わせた新しい分類法が開発され、テクニックを変換ベース、パターンベース、生成ベース、分解ベース、自動データ拡張の5つの主要なカテゴリに分類した。この分類法は、研究者がより明確で適切な方法を選択するのを導くことを目的としている。基礎DA手法の包括的評価の欠如に対して,UCR時系列リポジトリ内の全型を表す15の多様なデータセットに対して,20近いDA戦略を検証し,徹底的な実証実験を行った。 ResNet と LSTM アーキテクチャを用いて,精度,メソッドランク,残留分析などの指標を含む多面的評価手法を用いて,ResNet では 84.98 +- 16.41%,LSTM では 82.41 +- 18.71% のベンチマーク精度を得た。例えば、RGWやランダム置換といった手法はモデル性能を大幅に改善する一方、EMDのような手法では効果が低かった。 Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC), primarily for its capacity to expand training datasets, enhance model robustness, introduce diversity, and reduce overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible and user-oriented tools. This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain.Our research began with an extensive literature review spanning a decade, revealing significant gaps in existing surveys and necessitating a detailed analysis of over 100 scholarly articles to identify more than 60 distinct DA techniques. This rigorous review led to the development of a novel taxonomy tailored to the specific needs of DA in TSC, categorizing techniques into five primary categories: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. This taxonomy is intended to guide researchers in selecting appropriate methods with greater clarity. In response to the lack of comprehensive evaluations of foundational DA techniques, we conducted a thorough empirical study, testing nearly 20 DA strategies across 15 diverse datasets representing all types within the UCR time-series repository. Using ResNet and LSTM architectures, we employed a multifaceted evaluation approach, including metrics such as Accuracy, Method Ranking, and Residual Analysis, resulting in a benchmark accuracy of 84.98 +- 16.41% in ResNet and 82.41 +- 18.71% in LSTM. Our investigation underscored the inconsistent efficacies of DA techniques, for instance, methods like RGWs and Random Permutation significantly improved model performance, whereas others, like EMD, were less effective.	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# 未知のトークンによる学習は、より強力な視力学習者を駆り立てる Learning with Unmasked Tokens Drives Stronger Vision Learners ( http://arxiv.org/abs/2310.13593v3 ) ライセンス: Link先を確認	Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han,	(参考訳) マスク付き画像モデリング(MIM)は,自己指導型学習戦略の先駆けとなる。 Masked Autoencoder (MAE) のようなMIMは、入力トークンをランダムにマスキングして処理し、デコーダが入力にマスクされたトークンを再構成することで、強力な表現を学ぶ。しかし、MIM事前訓練エンコーダは、マスク付きトークンのみを回帰することにのみ焦点をあてているため、限られた注意幅を持つことが多いため、エンコーダのより広範な文脈学習を阻害する可能性がある。この制限に対処するため、トレーニングプロセスに無意味なトークンを明示的に組み込むことによりMIMを改善する。具体的には,デコーダがマスク付きトークンを再構成している間に,アンマスク付きトークンが広いコンテキストを体験できるようにする。このように、符号化されたアンマスクトークンは、広範囲なコンテキスト情報を備えており、マスクされたトークンはMIMの強化されたアンマスクトークンを利用することができる。その結果,ImageNet-1K上でのVT-Bによる84.2%のトップ-1の精度と0.6%の利得を達成して,より差別的な表現を訓練した。この成功は、特異値スペクトルと注意分析によって証明されたように、事前学習の強化によるものである。最後に、下流のセマンティックセグメンテーションときめ細かい視覚的分類タスク、そして多様なロバストな評価指標において、我々のモデルは大きなパフォーマンス向上を達成する。コードはhttps://github.com/naver-ai/lutで入手できる。 Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing masked tokens only, which may impede the encoder's broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage the enhanced unmasked tokens for MIM. As a result, our simple remedy trains more discriminative representations revealed by achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, our models achieve significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics. Code is available at https://github.com/naver-ai/lut	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# 最小限に修正されたマルコフゲームは、あらゆるナッシュ均衡と価値を得る Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value ( http://arxiv.org/abs/2311.00582v5 ) ライセンス: Link先を確認	Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie,	(参考訳) 本稿では,ゲーム修正問題について検討する。このゲーム修正問題では,ゼロサムマルコフゲームの報酬関数を,目標決定的あるいは確率的ポリシープロファイルが独自のマルコフ完全ナッシュ均衡となり,目標範囲内に値を持つように変更コストを最小限に抑える方法として,ゼロサムマルコフゲームの報酬関数を変更する。ゲームの一意平衡としてインストール可能なポリシープロファイルの集合を特徴付け,インストールを成功させるために十分な,必要な条件を確立する。線形制約で凸最適化問題を解き、次にランダムな摂動を行い、ほぼ最適コストで修正計画を得る効率的なアルゴリズムを提案する。アルゴリズムのコードはhttps://github.com/YoungWu559/game-modification で利用可能です。 We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost. The code for our algorithm is available at https://github.com/YoungWu559/game-modification .	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# 知識集中型視覚質問応答におけるGPT-4Vの総合的評価 A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering ( http://arxiv.org/abs/2311.07536v3 ) ライセンス: Link先を確認	Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang,	(参考訳) マルチモーダル大モデル(MLM)の出現は、視覚的理解の分野を著しく進歩させ、視覚的質問応答(VQA)の領域において顕著な能力を提供している。しかし、真の課題は知識集約型VQAタスクの領域にある。これは視覚要素の認識だけでなく、学習した知識の膨大なリポジトリとともに視覚情報の深い理解を必要とする。 MLM、特に新たに導入されたGPT-4VとGeminiの機能を明らかにするために、3つの視点から詳細な評価を行う。 1) 共通知識(Commonsense Knowledge)とは,モデルが視覚的手がかりをいかに理解し,一般知識に結び付けるかを評価すること。 2 細かな世界知識は、画像から特定の知識を推論し、様々な専門分野においてその習熟度を示すためのモデルの技能を検査する。 3) モデルが推論に論理的説明を与える能力を検証し, 解釈可能性の観点からより深い分析を容易にする。さらに、視覚的知識強化トレーニング戦略とマルチモーダル検索強化ジェネレーションアプローチを用いて、MDMの強化を行い、今後の研究方向性の進歩の必要性を浮き彫りにしている。大規模な実験は次のように示している。 a)GPT-4Vは、合成画像を少数ショットとして使用する際の説明生成の強化を示す。 b) GPT-4Vその他のMLMは,世界知識を扱う際に,深刻な幻覚を生じさせる。 c) 視覚的知識により訓練が強化され、技術が性能を向上させる可能性があること。コード:https://github.com/HITsz-TMG/Cognitive-Visual-Language-Mapper The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V and Gemini, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Additionally, we utilize a visual knowledge-enhanced training strategy and multimodal retrieval-augmented generation approach to enhance MLMs, highlighting the future need for advancements in this research direction. Extensive experiments indicate that: a) GPT-4V demonstrates enhanced explanation generation when using composite images as few-shots; b) GPT-4V and other MLMs produce severe hallucinations when dealing with world knowledge; c) Visual knowledge enhanced training and prompting technicals present potential to improve performance. Codes: https://github.com/HITsz-TMG/Cognitive-Visual-Language-Mapper	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# minimax: JAX における Autocurricula の効率的なベースライン minimax: Efficient Baselines for Autocurricula in JAX ( http://arxiv.org/abs/2311.12716v3 ) ライセンス: Link先を確認	Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel,	(参考訳) 教師なし環境設計(英語: Unsupervised Environment Design, UED)は、堅牢な意思決定エージェントを訓練し、目に見えない環境にゼロショットで移行するための自動カリキュラム学習の形式である。このようなオートキュリキュラは、RLコミュニティから大きな関心を集めている。しかし、CPUロールアウトとGPUモデルの更新に基づくUED実験は、しばしば数週間のトレーニングを必要とした。この計算要求は、この分野の急速な革新の大きな障害である。この研究は、加速ハードウェア上でのUEDトレーニングのためのminimaxライブラリを導入している。 JAXを使って完全に拡張された環境とオートキュラムアルゴリズムを実装し、minimaxはハードウェアアクセラレーションのためにトレーニングループ全体をコンパイルできる。手続き的に生成された環境でオートキュリキュラを行うための再利用可能な抽象化に加えて、MiniGridに基づくテンソル化グリッドワールドを含む、迅速な実験用のペトリ皿を提供する。これらのコンポーネントによって minimax は強力な UED ベースラインを提供し、これには新たな並列化版が含まれており、同じバッチサイズでトレーニングした場合の以前の実装と比較して、壁時間で 120$\times$ のスピードアップを実現している。 minimaxライブラリはApache 2.0ライセンスでhttps://github.com/facebookresearch/minimax.comから入手できる。 Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# 分類から臨床への展望:大規模言語モデルを用いたモバイルおよび行動保健データの分析と分析に向けて From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models ( http://arxiv.org/abs/2311.13063v3 ) ライセンス: Link先を確認	Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer,	(参考訳) ユビキタスセンサーからの受動的に収集された行動健康データは、患者の日常生活からメンタルヘルスの専門家に洞察を提供するという大きな約束を持っているが、このデータを臨床実践に使用する分析ツールを開発するには、デバイス全体の一般化と、測定された信号と個人のメンタルヘルスの間の弱いあるいはあいまいな相関に関する課題に対処する必要がある。これらの課題に対処するために,我々は,大規模言語モデル(LLM)を活用して,多センサデータから臨床的に有用な知見を合成する,新しいアプローチを採っている。歩数や睡眠などのデータにおける傾向がうつ病や不安などの状態とどのように関係しているかを,LSMを用いて推論する思考促進手法の連鎖を構築した。まず,LLMによる2次うつ病分類を行い,61.1%のアキュラシーを達成した。分類よりも影響があり、価値の高いアプローチは、新たな人間とAIのコラボレーションアプローチであり、臨床の専門家がこれらのツールを対話的にクエリし、臨床意思決定をサポートするためにAIが生成した推論に関するドメインの専門知識とコンテキストを組み合わせる。 GPT-4のようなモデルでは数値データの75%を正確に参照しており、臨床参加者は、この手法を用いて自己追跡データを解釈することへの強い関心を表明している。 Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental health. To address these challenges, we take a novel approach that leverages large language models (LLMs) to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data such as step count and sleep relate to conditions like depression and anxiety. We first demonstrate binary depression classification with LLMs achieving accuracies of 61.1% which exceed the state of the art. While it is not robust for clinical use, this leads us to our key finding: even more impactful and valued than classification is a new human-AI collaboration approach in which clinician experts interactively query these tools and combine their domain expertise and context about the patient with AI generated reasoning to support clinical decision-making. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# Snapshot Spectral Compressive Imaging における遅延拡散前処理 Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging ( http://arxiv.org/abs/2311.14280v2 ) ライセンス: Link先を確認	Zongliang Wu, Ruiying Lu, Ying Fu, Xin Yuan,	(参考訳) スナップショット圧縮分光画像再構成は, 単発2次元圧縮画像から3次元空間スペクトル像を再構成することを目的としている。既存の最先端の手法は、主に深い展開構造に基づいているが、固有の性能ボトルネックがある:$i$) 過度に劣化した測定を扱う不適切な問題、そして$ii$) 回帰損失に基づく再構成モデルは、ほとんど詳細を持って画像を復元する傾向にある。本稿では,遅延拡散モデル(LDM)と呼ばれる生成モデルを導入し,回帰に基づく深部展開法を強化する前に劣化のないモデルを生成する。さらに, LDMにおける大規模計算コストの課題を克服するために, 深層展開デノイザにおける知識事前生成のための軽量モデルを提案し, それらの先行処理を統合し, 高品質なスペクトル信号の詳細を補償する再構成プロセスを導出する。合成データセットと実世界のデータセットの数値的および視覚的比較は、再構成品質と計算効率の両面で提案手法の優位性を示している。コードはリリースされる。 Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: $i$) the ill-posed problem of dealing with heavily degraded measurement, and $ii$) the regression loss-based reconstruction models being prone to recover images with few details. In this paper, we introduce a generative model, namely the latent diffusion model (LDM), to generate degradation-free prior to enhance the regression-based deep unfolding method. Furthermore, to overcome the large computational cost challenge in LDM, we propose a lightweight model to generate knowledge priors in deep unfolding denoiser, and integrate these priors to guide the reconstruction process for compensating high-quality spectral signal details. Numeric and visual comparisons on synthetic and real-world datasets illustrate the superiority of our proposed method in both reconstruction quality and computational efficiency. Code will be released.	翻訳日:2024-08-28 00:57:20 公開日:2024-08-24
# SPECT画像のマルチモーダル融合によるコントラストグラフクロスビュー学習を用いたパーキンソン病の分類と臨床像 Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features ( http://arxiv.org/abs/2311.14902v4 ) ライセンス: Link先を確認	Jun-En Ding, Chien-Chin Hsu, Feng Liu,	(参考訳) パーキンソン病(PD)は世界中の何百万もの人に影響を与え、運動に影響を与えている。以前の研究では、ディープラーニングをPD予測に利用し、主に医療画像に焦点を当て、データの基盤となる多様体構造を無視した。本研究では,画像特徴と非画像特徴の両方を包含するマルチモーダルアプローチを提案し,PD分類にコントラッシブなクロスビューグラフ融合を利用する。画像と臨床特徴の低次元表現から得られたグラフビューからの埋め込みを統合した,新しいマルチモーダル・コアテンション・モジュールを提案する。これにより、より堅牢で構造化された特徴抽出が実現され、マルチビューデータ分析が改善される。さらに、クロスビュー融合学習を強化するために、簡易なコントラッシブ・ロスベース融合法が考案された。グラフビューによるマルチモーダル手法は, 精度0.91, 受信機動作特性曲線0.93の領域を5倍のクロスバリデーションで達成する。また、単に機械学習ベースの手法と比較して、非画像データに対して優れた予測能力を示す。 Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 0.91 and an area under the receiver operating characteristic curve (AUC) of 0.93 in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# DIPR: 動的イテレーションによる効率的なポイントクラウド登録 DIPR: Efficient Point Cloud Registration via Dynamic Iteration ( http://arxiv.org/abs/2312.02877v2 ) ライセンス: Link先を確認	Yang Ai, Qiang Bai, Jindong Li, Xi Yang,	(参考訳) ポイントクラウド登録(PCR)は、3Dビジョンにおいて必須のタスクである。既存の手法はますます精度を高めている。しかし、ポイントクラウド登録における重複しないポイントの多数は、多くの計算資源を消費し、登録精度に悪影響を及ぼす。この課題を克服するために、我々は、ダイナミックイテレーションフレームワークであるDIPRを通じて、スペーサー入力ポイントに基づいたオーバーラップポイントに対話的にフォーカスする、新しい効率的なポイントクラウド登録を導入する。我々は,効率的なコースツーファイン処理を実現するために,グローバルおよびローカルな登録段階を設計する。基本整合モジュールの他に,Refined Nodesでは,高密度クラスタリングを用いて重なり合う点の範囲を狭め,計算量を大幅に削減する手法を提案する。そして、SC分類器は、一致した精度に応じて登録プロセスを終了する早期終了機構として機能する。複数のデータセットに対する大規模な実験により,提案手法は,最先端の手法に比べて計算時間とGPUメモリ消費を著しく削減しつつ,優れた登録精度を実現することが示された。 Point cloud registration (PCR) is an essential task in 3D vision. Existing methods achieve increasingly higher accuracy. However, a large proportion of non-overlapping points in point cloud registration consume a lot of computational resources while negatively affecting registration accuracy. To overcome this challenge, we introduce a novel Efficient Point Cloud Registration via Dynamic Iteration framework, DIPR, that makes the neural network interactively focus on overlapping points based on sparser input points. We design global and local registration stages to achieve efficient course-tofine processing. Beyond basic matching modules, we propose the Refined Nodes to narrow down the scope of overlapping points by using adopted density-based clustering to significantly reduce the computation amount. And our SC Classifier serves as an early-exit mechanism to terminate the registration process in time according to matching accuracy. Extensive experiments on multiple datasets show that our proposed approach achieves superior registration accuracy while significantly reducing computational time and GPU memory consumption compared to state-of-the-art methods.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# 継続的な敵防衛 Continual Adversarial Defense ( http://arxiv.org/abs/2312.09481v3 ) ライセンス: Link先を確認	Qian Wang, Yaoyao Liu, Hefei Ling, Yingwei Li, Qihao Liu, Ping Li, Jiazhong Chen, Alan Yuille, Ning Yu,	(参考訳) 視覚的分類器に対する敵の攻撃は、月々急速に進化しているため、可能な限り多くの既知の攻撃に対して、多くの防衛策が提案されている。しかし、防衛システムが動作している環境は動的であり、時間とともに現れる様々なユニークな攻撃を含むため、あらゆる種類の攻撃に一般化する防衛手法を設計することは現実的ではない。動的環境に対するよく整合したアプローチは、敵データをオンラインで継続的に収集し、自らを迅速に改善する防衛システムにある。そこで,我々は,挑戦的脅威モデルに対する実践的な防衛展開を提唱し,(1)壊滅的忘れを伴わない新たな攻撃への継続的な適応,(2)少数ショット適応,(3)メモリ効率の適応,(4)クリーンデータと逆データの両方において高い精度で攻撃列に適応する継続的敵防衛(CAD)フレームワークを初めて提案した。最先端の継続的学習、少数ショット学習、およびアンサンブル学習技術を探求し、統合し、原則を立証する。大規模な実験により, 現代の敵攻撃の複数段階に対するアプローチの有効性が検証され, 多数のベースライン法に対して有意な改善が見られた。特にCADは、前回の攻撃に対して優れた性能を維持しつつ、最小限の予算と低コストの防衛失敗に迅速に適応することができる。我々の研究は、動的および進化的攻撃に対する継続的な防御適応のための、新しいパラダイムに光を当てています。 In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible. However, designing a defense method that generalizes to all types of attacks is not realistic because the environment in which defense systems operate is dynamic and comprises various unique attacks that emerge as time goes on. A well-matched approach to the dynamic environment lies in a defense system that continuously collects adversarial data online to quickly improve itself. Therefore, we put forward a practical defense deployment against a challenging threat model and propose, for the first time, the Continual Adversarial Defense (CAD) framework that adapts to attack sequences under four principles: (1) continual adaptation to new attacks without catastrophic forgetting, (2) few-shot adaptation, (3) memory-efficient adaptation, and (4) high accuracy on both clean and adversarial data. We explore and integrate cutting-edge continual learning, few-shot learning, and ensemble learning techniques to qualify the principles. Extensive experiments validate the effectiveness of our approach against multiple stages of modern adversarial attacks and demonstrate significant improvements over numerous baseline methods. In particular, CAD is capable of quickly adapting with minimal budget and a low cost of defense failure while maintaining good performance against previous attacks. Our research sheds light on a brand-new paradigm for continual defense adaptation against dynamic and evolving attacks.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# アルツハイマー病検出のための分散型プライバシ保存モデル A Distributed Privacy Preserving Model for the Detection of Alzheimer's Disease ( http://arxiv.org/abs/2312.10237v4 ) ライセンス: Link先を確認	Paul K. Mandal,	(参考訳) 急速に進歩する医療技術の時代には、医療データのセグメンテーションは避けられなくなり、分散データでトレーニングできるプライバシー保護機械学習アルゴリズムの開発が必要とされるようになった。特に、健康保険可搬性会計法(HIPAA)が課している厳格なプライバシー規制のために、機密性の高い医療データを統合することは、必ずしも選択肢ではない。本稿では,分散データからトレーニングできるHIPAA準拠のフレームワークについて紹介する。次に、認知症、重度の脳機能障害、特に予防的ケアを伴わない簡単な作業の妨げとなる重度の神経変性疾患であるアルツハイマー病(AD)検出のための多モード垂直連合モデルを提案する。この垂直連合学習(VFL)モデルは、HIPAAが課したプライバシー制約を尊重しながら、さまざまな医療データのソースをまたいだ協調学習を可能にする分散アーキテクチャを提供する。ここで提案されたVFLアーキテクチャは、法的なプライバシー制約を尊重しながら、さまざまな医療データのソースをまたいだ協調学習を可能にする、新しい分散アーキテクチャを提供する。複数のデータモダリティを活用することにより、AD検出の堅牢性と精度を向上させることができる。このモデルは、フェデレーション学習技術の進歩に寄与するだけでなく、医学研究におけるデータセグメンテーションによるハードルを克服する公約も持つ。 In the era of rapidly advancing medical technologies, the segmentation of medical data has become inevitable, necessitating the development of privacy preserving machine learning algorithms that can train on distributed data. Consolidating sensitive medical data is not always an option particularly due to the stringent privacy regulations imposed by the Health Insurance Portability and Accountability Act (HIPAA). In this paper, I introduce a HIPAA compliant framework that can train from distributed data. I then propose a multimodal vertical federated model for Alzheimer's Disease (AD) detection, a serious neurodegenerative condition that can cause dementia, severely impairing brain function and hindering simple tasks, especially without preventative care. This vertical federated learning (VFL) model offers a distributed architecture that enables collaborative learning across diverse sources of medical data while respecting privacy constraints imposed by HIPAA. The VFL architecture proposed herein offers a novel distributed architecture, enabling collaborative learning across diverse sources of medical data while respecting statutory privacy constraints. By leveraging multiple modalities of data, the robustness and accuracy of AD detection can be enhanced. This model not only contributes to the advancement of federated learning techniques but also holds promise for overcoming the hurdles posed by data segmentation in medical research.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# ロバスト目標音声抽出のための自己教師付き遠交表現学習 Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction ( http://arxiv.org/abs/2312.10305v3 ) ライセンス: Link先を確認	Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang,	(参考訳) 音声信号は、大域的な音響特性と局所的な意味情報の両方を含むため、本質的に複雑である。しかし、ターゲット音声抽出のタスクでは、話者識別とは無関係な参照音声における大域的・局所的な意味情報の特定の要素は、音声抽出ネットワーク内で話者の混乱を引き起こす可能性がある。この課題を克服するために,自己教師付き不整合表現学習法を提案する。提案手法は、参照音声符号化ネットワークとグローバル情報アンタングルネットワークを用いて、2段階のプロセスでこの問題に対処し、話者識別情報を他の無関係な要因から徐々に切り離す。我々は、音声抽出ネットワークを誘導するために、非絡み合った話者識別情報のみを用いる。さらに、適応変調変換器を導入し、混合信号の音響的表現が話者埋め込みによって乱れないようにする。このコンポーネントは、話者埋め込みを条件情報として含み、音声抽出ネットワークの自然かつ効率的なガイダンスを容易にする。実験により,本手法の有効性を実証し,話者混同の可能性を大幅に低減した。 Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we propose a self-supervised disentangled representation learning method. Our approach tackles this issue through a two-phase process, utilizing a reference speech encoding network and a global information disentanglement network to gradually disentangle the speaker identity information from other irrelevant factors. We exclusively employ the disentangled speaker identity information to guide the speech extraction network. Moreover, we introduce the adaptive modulation Transformer to ensure that the acoustic representation of the mixed signal remains undisturbed by the speaker embeddings. This component incorporates speaker embeddings as conditional information, facilitating natural and efficient guidance for the speech extraction network. Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# 一般化されたパウリ安定化符号の2次元における位相順序の抽出 Extracting topological orders of generalized Pauli stabilizer codes in two dimensions ( http://arxiv.org/abs/2312.11170v3 ) ライセンス: Link先を確認	Zijian Liang, Yijia Xu, Joseph T. Iosue, Yu-An Chen,	(参考訳) 本稿では,2次元システムにおける一般化されたパウリ安定化符号からトポロジカルデータを抽出するアルゴリズムを提案する。このアルゴリズムは$\mathbb{Z}_d$ quditsに適用される。この能力により、$\mathbb{Z}_d$ トーリック符号とは異なる位相順序を識別できる。これは、$\mathbb{Z}_p$ qudits ($p$は素数)のパウリ安定化符号が$\mathbb{Z}_p$ トーリック符号と自明な安定化符号の有限複写に等しいという確立された定理を超えて、我々の理解を拡張している。このアルゴリズムは、全てのエノンとその弦演算子を決定し、融合規則、トポロジカルスピン、ブレイディング統計の計算を可能にするように設計されている。この方法は、位相的順序の同定をガウス的除去、エルミート正規形式、スミス正規形式のトランケートされたローラン多項式を含む計算問題に変換する。さらに、このアルゴリズムは量子誤り訂正符号を研究するための体系的なアプローチを提供する。例えば、2dハニカムカラーコードから修正された自己双対CSS量子コードや、ダブルセミオントポロジオーダーや6セミオントポロジオーダーを含む非CSS量子コードなどです。 In this paper, we introduce an algorithm for extracting topological data from translation invariant generalized Pauli stabilizer codes in two-dimensional systems, focusing on the analysis of anyon excitations and string operators. The algorithm applies to $\mathbb{Z}_d$ qudits, including instances where $d$ is a nonprime number. This capability allows the identification of topological orders that differ from the $\mathbb{Z}_d$ toric codes. It extends our understanding beyond the established theorem that Pauli stabilizer codes for $\mathbb{Z}_p$ qudits (with $p$ being a prime) are equivalent to finite copies of $\mathbb{Z}_p$ toric codes and trivial stabilizers. The algorithm is designed to determine all anyons and their string operators, enabling the computation of their fusion rules, topological spins, and braiding statistics. The method converts the identification of topological orders into computational tasks, including Gaussian elimination, the Hermite normal form, and the Smith normal form of truncated Laurent polynomials. Furthermore, the algorithm provides a systematic approach for studying quantum error-correcting codes. We apply it to various codes, such as self-dual CSS quantum codes modified from the 2d honeycomb color code and non-CSS quantum codes that contain the double semion topological order or the six-semion topological order.	翻訳日:2024-08-28 00:46:25 公開日:2024-08-24
# 虚偽の否定とクラス不均衡に対する時系列コントラスト学習 Time-Series Contrastive Learning against False Negatives and Class Imbalance ( http://arxiv.org/abs/2312.11939v2 ) ライセンス: Link先を確認	Xiyuan Jin, Jing Wang, Lei Liu, Youfang Lin,	(参考訳) 表現学習における模範的な自己指導的アプローチとして、時系列コントラスト学習は現代研究において顕著な進歩を見せている。近年のコントラスト学習戦略は,適切な正と負の作法に重点を置いているが,本研究では理論的分析を行い,偽の負とクラス不均衡という,InfoNCEの損失に基づくフレームワークに固有の根本的な問題を見落としている。そこで本研究では,SimCLRフレームワークに基盤を置き,インスタンス識別タスクに係わるモデルに普遍的に適応する直感的な修正を導入する。インスタンス間の対話的学習を容易にするためにインスタンスグラフを構築することにより、複数インスタンス識別タスクを通じて教師付きコントラスト学習をエミュレートし、偽陰性の有害な影響を緩和する。さらに、グラフ構造と少ないラベル付きデータを活用し、半教師付き整合性分類を行い、マイノリティクラスの代表的能力を高める。提案手法を,4つの実世界の時系列データセット上で最も一般的な時系列比較学習法と比較し,全体的な性能において有意な優位性を示した。 As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.	翻訳日:2024-08-28 00:36:11 公開日:2024-08-24
# AI研究に対するビッグデータの影響再考:アイデアのアフィリエイトへの貢献に関するメメティック分析 Big Tech influence over AI research revisited: memetic analysis of attribution of ideas to affiliation ( http://arxiv.org/abs/2312.12881v2 ) ライセンス: Link先を確認	Stanisław Giziński, Paulina Kaczyńska, Hubert Ruczyński, Emilia Wiśnios, Bartosz Pieliński, Przemysław Biecek, Julian Sienkiewicz,	(参考訳) 人工知能(AI)研究のランドスケープでは、ビッグデータの優位性に関する議論が増えているが、この現象の理解はいまだに順調だ。本稿は、AI研究におけるビッグデータのリーチとパワーの理解を広げ、深化することを目的としている。これは単なる出版量ではなく、新しいアイデアやミームの伝播における支配性を強調している。現在の研究は、一般的にarXivや特定の学術会議のような限られたデータベースから得られる学術論文における関係の共有に対する影響の概念を単純化する。本稿の主な目的は、その影響の特定のニュアンスを解明し、どのAIアイデアがビッグデータのエンティティによって主に駆動されているかを決定することである。 AI指向の論文抽象化とその引用ネットワークにネットワークとメメティック分析を適用することで、この現象に関する深い知見を把握できる。 OpenAlexとS2ORCの2つのデータベースを利用することで、従来の試みよりもはるかに大きなスケールでそのような分析を行うことができる。以上の結果から,Big Tech関連論文は,一部地域では不当に引用されているものの,最も引用されている論文はBig TechとAcademiaの関連論文であることが示唆された。最も伝染的なミームに着目して、それらの特定のアフィリエイトグループ(Big Tech、Academia、Mixed Affiliation)への帰属は、これら3つのグループに等しく分布しているように見える。これは、AI研究に対するビッグデータの優位の概念が、議論の中で過度に単純化されていることを示唆している。 There exists a growing discourse around the domination of Big Tech on the landscape of artificial intelligence (AI) research, yet our comprehension of this phenomenon remains cursory. This paper aims to broaden and deepen our understanding of Big Tech's reach and power within AI research. It highlights the dominance not merely in terms of sheer publication volume but rather in the propagation of new ideas or memes. Current studies often oversimplify the concept of influence to the share of affiliations in academic papers, typically sourced from limited databases such as arXiv or specific academic conferences. The main goal of this paper is to unravel the specific nuances of such influence, determining which AI ideas are predominantly driven by Big Tech entities. By employing network and memetic analysis on AI-oriented paper abstracts and their citation network, we are able to grasp a deeper insight into this phenomenon. By utilizing two databases: OpenAlex and S2ORC, we are able to perform such analysis on a much bigger scale than previous attempts. Our findings suggest that while Big Tech-affiliated papers are disproportionately more cited in some areas, the most cited papers are those affiliated with both Big Tech and Academia. Focusing on the most contagious memes, their attribution to specific affiliation groups (Big Tech, Academia, mixed affiliation) seems equally distributed between those three groups. This suggests that the notion of Big Tech domination over AI research is oversimplified in the discourse.	翻訳日:2024-08-28 00:36:11 公開日:2024-08-24
# コンテキストの復活:マルチモーダル知識グラフのリンク予測としてのカメラトラップ種別分類 Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs ( http://arxiv.org/abs/2401.00608v5 ) ライセンス: Link先を確認	Vardaan Pahuja, Weidi Luo, Yu Gu, Cheng-Hao Tu, Hong-You Chen, Tanya Berger-Wolf, Charles Stewart, Song Gao, Wei-Lun Chao, Yu Su,	(参考訳) カメラトラップは生物多様性の監視と保全のための動物生態学における重要なツールである。しかし、それらの実践的応用は、新しい場所や目に見えない場所への一般化の欠如のような問題によって制限されている。画像は典型的には様々な形態の文脈と結びついており、様々な形態が存在する可能性がある。本研究では,カメラトラップ画像に関連付けられた構造化コンテキストを利用して,カメラトラップ内の種分類タスクの分布外一般化を促進する。例えば、野生動物の写真は、捕獲された時間と場所の詳細と、動物種に関する構造化された生物学的知識に関連付けられる。既存の研究ではしばしば見過ごされるが、そのようなコンテキストを組み込むことは、データの不足への対処や一般化の強化など、画像理解の改善にいくつかの潜在的な利点をもたらす。しかし、このような異種コンテキストを視覚領域に効果的に組み込むことは難しい問題である。そこで本研究では,種分類をリンク予測として,マルチモーダル知識グラフ(KG)に変換する新しいフレームワークを提案する。このフレームワークは、視覚認識のための多様なマルチモーダルコンテキストのシームレスな統合を可能にする。本フレームワークをiWildCam2020-WILDSおよびSnapshot Mountain Zebraデータセットの分布外種分類に適用し,最先端のアプローチによる競合性能を実現する。さらに,本フレームワークは,外来種を認識するためのサンプル効率を向上させる。 Camera traps are important tools in animal ecology for biodiversity monitoring and conservation. However, their practical application is limited by issues such as poor generalization to new and unseen locations. Images are typically associated with diverse forms of context, which may exist in different modalities. In this work, we exploit the structured context linked to camera trap images to boost out-of-distribution generalization for species classification tasks in camera traps. For instance, a picture of a wild animal could be linked to details about the time and place it was captured, as well as structured biological knowledge about the animal species. While often overlooked by existing studies, incorporating such context offers several potential benefits for better image understanding, such as addressing data scarcity and enhancing generalization. However, effectively incorporating such heterogeneous context into the visual domain is a challenging problem. To address this, we propose a novel framework that transforms species classification as link prediction in a multimodal knowledge graph (KG). This framework enables the seamless integration of diverse multimodal contexts for visual recognition. We apply this framework for out-of-distribution species classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets and achieve competitive performance with state-of-the-art approaches. Furthermore, our framework enhances sample efficiency for recognizing under-represented species.	翻訳日:2024-08-28 00:36:11 公開日:2024-08-24
# クロム二量体Cr$_2$の「ノズル」に向けて:ボルン・オッペンハイマーの可視光スペクトルを予測する Towards the "puzzle" of Chromium dimer Cr$_2$: predicting the Born-Oppenheimer rovibrational spectrum ( http://arxiv.org/abs/2401.03259v3 ) ライセンス: Link先を確認	Horacio Olivares-Pilón, Daniel Aguilar-Díaz, Alexander V. Turbiner,	(参考訳) Cr$_2$二量体の実験的に観測された非自明な電子構造は、そのポテンシャルエネルギー曲線の計算を過去数十年で理論的に挑戦した。小さな核間距離での摂動理論と大きな距離での多極展開の$R$(漸近的な性質の両方が仮定される)をマッチングし、Casey-Leopold (1993) の実験データから抽出された数個のRydberg-Klein-Rees (RKR) の回転点を追加することにより、基底状態に対するポテンシャルエネルギー曲線の解析形式 $X^1\Sigma^+$ of Cr$2$ dimer が最初に発見された。これは2点Pad\'e近似の形で、29の実験振動エネルギーで3-4桁の精度を提供する。結果として得られる基底状態 $X^1\Sigma^+$ ポテンシャル曲線は、最大振動数 $\nu_\text{max}=104$ で最大振動量 $L_\text{max}=312$ で最大振動量 $> 10^{-4}$ { hartree} で、さらに 218 で弱有界な状態 (解離極限に近い) でエネルギー$<10^{-4}$ { hartree} で支える。 The experimentally-observed non-trivial electronic structure of the Cr$_2$ dimer has made the calculation of its potential energy curve a theoretical challenge in the last decades. By matching the perturbation theory at small internuclear distances $R$ and the multipole expansion at large distances $R$ (supposedly both of asymptotic nature), and by adding a few Rydberg-Klein-Rees (RKR) turning points, extracted from experimental data by Casey-Leopold (1993), the analytic form of the potential energy curve for the ground state $X^1\Sigma^+$ of the Cr$_2$ dimer is found for the first time for the whole range of internuclear distances $R$. This has the form of a two-point Pad\'e approximant and provides an accuracy of 3-4 decimal digits in 29 experimental vibrational energies. The resulting ground state $X^1\Sigma^+$ potential curve supports 19694 rovibrational states with a maximal vibrational number $\nu_\text{max}=104$ at zero angular momentum and with a maximal angular momentum $L_\text{max}=312$ with energies $> 10^{-4}$ { hartree}, and additionally 218 weakly-bound states (close to the dissociation limit) with energies $< 10^{-4}$ { hartree}.	翻訳日:2024-08-28 00:36:11 公開日:2024-08-24
# 幾何学的滑らかな運動量を持つランダム化カッツマルツ Randomized Kaczmarz with geometrically smoothed momentum ( http://arxiv.org/abs/2401.09415v3 ) ライセンス: Link先を確認	Seth J. Alderman, Roan W. Luikart, Nicholas F. Marshall,	(参考訳) 本稿では, 線形最小二乗損失関数上の確率勾配勾配の例であるランダム化Kaczmarzアルゴリズムに幾何的に滑らかな運動量を加える効果について検討する。最小二乗損失を定義する行列の特異ベクトル方向の予測誤差に関する結果を証明する。結果の有用性を示す数値的な例をいくつか提示し,いくつかの疑問を呈する。 This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.	翻訳日:2024-08-28 00:36:11 公開日:2024-08-24
# SpeechDPR--to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering ( http://arxiv.org/abs/2401.13463v3 ) ライセンス: Link先を確認	Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee,	(参考訳) SQA(Spken Question Answering)は、機械がユーザの質問に応答するために必要である。 SQAは、認識エラーや外語彙(OOV)の問題を避けるために、これまでASRなしで達成されてきた。しかし,オープンドメインSQA(open-domain SQA)の現実的な問題として,音声アーカイブから応答を含む可能性のあるパスをマシンが最初に取り出す必要があることが考えられた。本稿では,openSQA問題の検索コンポーネントとして,最初のエンドツーエンドフレームワークであるSpeechDPR(SpeechDPR)を提案する。 SpeechDPRは、教師なしASR (UASR) とテキスト密度検索 (TDR) のカスケーディングモデルから知識を蒸留することにより、文レベルの意味表現を学習する。手書きの音声データの書き起こしは不要。最初の実験では、UASRとTDRのカスケードモデルに匹敵する性能を示し、UASRが貧弱な場合には、この手法が音声認識エラーに対してより堅牢であることを示す。 Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.	翻訳日:2024-08-28 00:26:06 公開日:2024-08-24
# 古典的ハードハミルトニアンの基底状態解く多項式時間散逸に基づく量子アルゴリズム A polynomial-time dissipation-based quantum algorithm for solving the ground states of a class of classically hard Hamiltonians ( http://arxiv.org/abs/2401.13946v7 ) ライセンス: Link先を確認	Zhong-Xia Shang, Zi-Han Chen, Chao-Yang Lu, Jian-Wei Pan, Ming-Cheng Chen,	(参考訳) 本研究では、ハミルトン群の基底状態を解決するための量子アルゴリズムを提案する。我々のアルゴリズムに現れた指数的スピードアップのメカニズムは、オープン量子系における散逸に由来する。この散逸を利用するために、中心的なアイデアはベクトル化と正規化により$n$-qubit 密度行列 $\rho$ を 2n$-qubit 純状態 $\|\rho\rangle$ として扱うことである。そうすることによって、リンドブラッドマスター方程式(LME)は、非エルミート的ハミルトニアン$L$を持つシュリンガー方程式となる。したがって、 LME の定常状態 $\rho_{ss}$ は、基底状態 $\|\rho_{ss}\rangle$ と $L^\dag L$ の形で対応する。 LMEのランタイムは、初期状態と基底状態の重複を$\zeta$に依存しない。入力部分に対して、ハミルトニアン$H$が妥当な仮定の下で与えられたとき、多項式時間的古典的手続きを与え、$L$が存在して$H-E_0=L^\dag L$であるかどうかを判断し、解決する。出力部分について、ミッションは基底状態 $\|\rho_{ss}\rangle$ に対する任意の作用素の期待値を推定するものと定義する。我々は、実際に$\|\rho_{ss}\rangle$を作成することの量子硬さに関するいくつかの証拠を与え、これは、我々のアルゴリズムと量子位相推定のような射影に基づく量子アルゴリズムの間の潜在的な複雑さの分離を示す。さらに、我々のアルゴリズムで効率的に解けるハミルトニアンは、$\text{P}\neq \text{BQP}$を仮定する古典的なハードなインスタンスを含むことを示す。その後、他の種類のハミルトニアンへの一般化や、アルゴリズムの「非線形」力学など、アルゴリズムの重要な側面について論じ、分析する。 In this work, we give a quantum algorithm for solving the ground states of a class of Hamiltonians. The mechanism of the exponential speedup that appeared in our algorithm comes from dissipation in open quantum systems. To utilize the dissipation, the central idea is to treat $n$-qubit density matrices $\rho$ as $2n$-qubit pure states $\|\rho\rangle$ by vectorization and normalization. By doing so, the Lindblad master equation (LME) becomes a Schr\"odinger equation with non-Hermitian Hamiltonian $L$. The steady-state $\rho_{ss}$ of the LME, therefore, corresponds to the ground states $\|\rho_{ss}\rangle$ of Hamiltonians with the form $L^\dag L$. The runtime of the LME has no dependence on $\zeta$ the overlap between the initial state and the ground state compared with the Heisenberg scaling $\mathcal{O}(\zeta^{-1})$ in other algorithms. For the input part, given a Hamiltonian $H$, under plausible assumptions, we give a polynomial-time classical procedure to judge and solve whether there exists $L$ such that $H-E_0=L^\dag L$. For the output part, we define the mission as estimating expectation values of arbitrary operators with respect to the ground state $\|\rho_{ss}\rangle$, which can be done surprisingly by an efficient measurement protocol on $\rho_{ss}$ with no need to prepare $\|\rho_{ss}\rangle$. We give several pieces of evidence on the quantum hardness of really preparing $\|\rho_{ss}\rangle$, which indicates a potential complexity separation between our algorithm and those projection-based quantum algorithms such as quantum phase estimation. Further, we show that the Hamiltonians that can be efficiently solved by our algorithms contain classically hard instances assuming $\text{P}\neq \text{BQP}$. Later, we discuss and analyze several important aspects of the algorithm including generalizing to other types of Hamiltonians and the "non-linear`` dynamics in the algorithm.	翻訳日:2024-08-28 00:26:06 公開日:2024-08-24
# Synergy-of-Thoughts:ハイブリッド言語モデルにおける効率的な推論 Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models ( http://arxiv.org/abs/2402.02563v4 ) ライセンス: Link先を確認	Yu Shang, Yu Li, Fengli Xu, Yong Li,	(参考訳) 大規模言語モデル(LLM)は、広範囲のタスクにおいて顕著な創発能力を示しているが、関連する高価なAPIコストは、実際のアプリケーションを大幅に制限している。チェーン・オブ・シント(CoT)やツリー・オブ・シント(ToT)といったこれまでの作業は、精度の向上に重点を置いていたが、APIコストの急激な増加を見落としている。人間の認知の二重過程理論に触発されて、効率的な推論のために、異なるスケールのハイブリッドLLMの相乗的ポテンシャルを解き放つために、「思考のシネルギー」(SoT)を提案する。デフォルトでは、SoTはより小規模の言語モデルを使用して、System 1の並列直感に類似した、低コストで直感的な思考を生成する。次に、直感的思考を相互評価する信頼度評価器を設計し、相互の対立を決定するための制御可能なしきい値機構を導入する。これらの直感的な思考が矛盾を示す場合、SoTはシステム2の介入をエミュレートするためにスケールアップされた言語モデルの反射的推論を実行し、直観的な思考をオーバーライドし、推論結果を修正します。このフレームワークはモデルに依存しないトレーニングフリーで、様々な既製のLCMで柔軟に実装できる。 6つの代表的な推論タスクの実験では、SoTはAPIコストを38.3%-75.1%削減し、最先端の推論精度とソリューションの多様性を同時に達成している。特に、オープンエンドタスクの平均トークンコストの削減は69.1%に達する。 Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. Previous works like chain-of-thought (CoT) and tree-of-thoughts (ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing API cost, which could be particularly problematic for open-ended real-world tasks with huge solution spaces. Motivated by the dual process theory of human cognition, we propose "Synergy of Thoughts"(SoT) to unleash the synergistic potential of hybrid LLMs with different scales for efficient reasoning. By default, SoT uses smaller-scale language models to generate multiple low-cost intuitive thoughts, which resembles the parallel intuitions produced by System 1. We then design a confidence evaluator where the intuitive thoughts are cross-evaluated and introduce a controllable threshold mechanism to decide their mutual conflict. If these intuitive thoughts exhibit conflicts, SoT will invoke the reflective reasoning of scaled-up language models to emulate the intervention of System 2, which will override the intuitive thoughts and rectify the reasoning results. This framework is model-agnostic and training-free, which can be flexibly implemented with various off-the-shelf LLMs. Experiments on six representative reasoning tasks show that SoT substantially reduces the API cost by 38.3%-75.1%, and simultaneously achieves state-of-the-art reasoning accuracy and solution diversity. Notably, the average token cost reduction on open-ended tasks reaches up to 69.1%.	翻訳日:2024-08-28 00:26:06 公開日:2024-08-24
# マルチアームバンドに対するL2正規化ポリシー勾配アルゴリズムの収束性 Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit ( http://arxiv.org/abs/2402.06388v2 ) ライセンス: Link先を確認	Stefana Anita, Gabriel Turinici,	(参考訳) 一方のマルチアームバンド(MAB)と他方のポリシー勾配アプローチは強化学習の最もよく使われるフレームワークであるが、MABで使用されるポリシー勾配アルゴリズムの理論的性質は十分に注目されていない。本研究では,L2$正規化項が 'softmax' パラメトリゼーションと共同で存在する状況に対する,そのような手順の収束について検討する。我々は、適切な技術的仮説の下で収束を証明し、理論的な設定を超えた状況を含む手順を数値的に検証する。実験の結果,初期推定値が解から遠い場合,時間依存正規化手順が標準手法よりも改善できることが示唆された。 Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a $L2$ regularization term is present jointly with the 'softmax' parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution.	翻訳日:2024-08-28 00:16:18 公開日:2024-08-24
# 表面符号復号のためのプログレッシブ・プロクシミティ・ビット・フリップ Progressive-Proximity Bit-Flipping for Decoding Surface Codes ( http://arxiv.org/abs/2402.15924v2 ) ライセンス: Link先を確認	Michele Pacenti, Mark F. Flanagan, Dimitris Chytas, Bane Vasic,	(参考訳) トリックやサーフェス符号のようなトポロジカル量子符号は、エラーに対する堅牢性や量子ビット間の局所的な相互作用により、ハードウェア実装の優れた候補である。既存のデコーダは、計算複雑性の低い(コードのブロック長が理想的に線形である)、デコード遅延の低い、消費電力の低いといった要件を満たしていないことが多い。本稿では,トリックおよび表面符号に適したビットフリップ(BF)デコーダを提案する。近接ベクトルをビットを反転させるヒューリスティックな計量として導入し、隣接する量子ビットの多重誤差を補正する新しいサブルーチンを開発した。我々のアルゴリズムは2次複雑さの増大があり、最小ウェイト完全マッチングやユニオン探索のような最先端の復号アルゴリズムのように動的メモリの操作を必要としないため、効率よく実装できる。提案した復号器は、2次元トーリック符号に対して7.5%の復号しきい値を示し、2次元対称チャネル上で回転した平面符号に対して7%の復号しきい値を示した。 Topological quantum codes, such as toric and surface codes, are excellent candidates for hardware implementation due to their robustness against errors and their local interactions between qubits. However, decoding these codes efficiently remains a challenge: existing decoders often fall short of meeting requirements such as having low computational complexity (ideally linear in the code's blocklength), low decoding latency, and low power consumption. In this paper we propose a novel bit-flipping (BF) decoder tailored for toric and surface codes. We introduce the proximity vector as a heuristic metric for flipping bits, and we develop a new subroutine for correcting degenerate multiple errors on adjacent qubits. Our algorithm has quadratic complexity growth and it can be efficiently implemented as it does not require operations on dynamic memories, as do state-of-art decoding algorithms such as minimum weight perfect matching or union find. The proposed decoder shows a decoding threshold of 7.5% for the 2D toric code and 7% for the rotated planar code over the binary symmetric channel.	翻訳日:2024-08-28 00:16:18 公開日:2024-08-24
# GCAN:fMRI機能接続性に基づく説明可能な認知劣化診断のための生成的非現実的注意誘導ネットワーク GCAN: Generative Counterfactual Attention-guided Network for Explainable Cognitive Decline Diagnostics based on fMRI Functional Connectivity ( http://arxiv.org/abs/2403.01758v2 ) ライセンス: Link先を確認	Xiongri Shen, Zhenxi Song, Zhiguo Zhang,	(参考訳) 軽度認知障害(MCI)の診断とfMRI機能的接続(FC)からの主観的認知低下(SCD)が普及しているが、ほとんどのFCベースの診断モデルは、カジュアルな推論を欠いたブラックボックスであり、認知低下に関するFCベースの神経バイオマーカーに関する知識にはほとんど寄与しない。さらに,Atlas-Aware Bidirectional Transformer (AABT) 法を考案した。 AABTは双方向戦略を用いて、脳房の各ネットワークからトークンをエンコードしデコードし、高品質なターゲットラベルFCを生成する。病院で収集したデータセットとADNIデータセットの実験では、SCDとMCIに関する文献において、生成されたアテンションマップはFC異常によく似ている。診断性能はベースラインモデルよりも優れている。コードはhttps://github.com/SXR3015/GCANで公開されている。 Diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) from fMRI functional connectivity (FC) has gained popularity, but most FC-based diagnostic models are black boxes lacking casual reasoning so they contribute little to the knowledge about FC-based neural biomarkers of cognitive decline.To enhance the explainability of diagnostic models, we propose a generative counterfactual attention-guided network (GCAN), which introduces counterfactual reasoning to recognize cognitive decline-related brain regions and then uses these regions as attention maps to boost the prediction performance of diagnostic models. Furthermore, to tackle the difficulty in the generation of highly-structured and brain-atlas-constrained FC, which is essential in counterfactual reasoning, an Atlas-Aware Bidirectional Transformer (AABT) method is developed. AABT employs a bidirectional strategy to encode and decode the tokens from each network of brain atlas, thereby enhancing the generation of high-quality target label FC. In the experiments of hospital-collected and ADNI datasets, the generated attention maps closely resemble FC abnormalities in the literature on SCD and MCI. The diagnostic performance is also superior to baseline models. The code is available at https://github.com/SXR3015/GCAN	翻訳日:2024-08-28 00:16:18 公開日:2024-08-24
# 複雑度問題:純粋相関の存在下での特徴学習のダイナミクス Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations ( http://arxiv.org/abs/2403.03375v3 ) ライセンス: Link先を確認	GuanWen Qiu, Da Kuang, Surbhi Goel,	(参考訳) 既存の研究は、ニューラルネットワークの最適化におけるコア機能よりも、素早い特徴を学習しやすくすることが多いが、それらの相対的単純さの影響は、まだ解明されていない。さらに、主に特徴学習の学習力学よりも、エンドパフォーマンスに焦点を当てている。本稿では,ブール関数解析に基づく理論的枠組みと関連する合成データセットを提案する。この設定により、(中核的な特徴と比較して)相対的な複雑性と(ラベルに関して)相関強度をきめ細かな制御が可能となり、刺激的な相関の下で特徴学習のダイナミクスを研究することができる。その結果,(1) コア特徴の学習速度を低下させ,(2) コア特徴とスプリアス特徴を別々に学習するために,(2) コア特徴とコア特徴の学習フェーズは必ずしも分離可能ではなく,(4) コア特徴が完全に学習された後も,スプリアス特徴を忘れない,という2つの異なるサブネットが形成された。以上の結果から,最終層の再トレーニングの成功を正当化して,突発的相関を除去し,突発的特徴の早期学習を生かした一般的なデバイアスアルゴリズムの限界を識別できることが示唆された。単層ReLUネットワークを用いてXOR特徴を学習する場合の理論的解析により経験的発見を支援する。 Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean function analysis. This setup allows for fine-grained control over the relative complexity (compared to core features) and correlation strength (with respect to the label) of spurious features to study the dynamics of feature learning under spurious correlations. Our findings uncover several interesting phenomena: (1) stronger spurious correlations or simpler spurious features slow down the learning rate of the core features, (2) two distinct subnetworks are formed to learn core and spurious features separately, (3) learning phases of spurious and core features are not always separable, (4) spurious features are not forgotten even after core features are fully learned. We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation and also identifies limitations of popular debiasing algorithms that exploit early learning of spurious features. We support our empirical findings with theoretical analyses for the case of learning XOR features with a one-hidden-layer ReLU network.	翻訳日:2024-08-28 00:16:18 公開日:2024-08-24
# SheetAgent: 大規模言語モデルによるスプレッドシート推論と操作のための汎用エージェント SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models ( http://arxiv.org/abs/2403.03636v2 ) ライセンス: Link先を確認	Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao,	(参考訳) スプレッドシートの操作は、ほとんどの日常的な作業に広く存在し、作業効率を大幅に向上させる。大規模言語モデル(LLM)は、最近、自動スプレッドシート操作のために試みられているが、推論の課題が存在する複雑な現実的なタスク(例えば、多段階推論と曖昧な要求を含む長い地平線操作)では、まだ研究されていない。実世界の要件とのギャップを埋めるため, 実生活課題に起因する推論依存操作を伴う長期・多カテゴリタスクを特徴とするベンチマークである$\textbf{SheetRM}$を導入する。上記の課題を緩和するために、LLMの力を利用する新しい自律エージェントである$\textbf{SheetAgent}$を提案する。 SheetAgentは3つの協調モジュールで構成されている。 $\textit{Planner}$, $\textit{Informer}$, $\textit{Retriever}$。 SheetAgentは、ベースライン上の複数のベンチマークで20～30%のパスレート改善を実現し、スプレッドシート操作の精度の向上とテーブル推論能力の向上を実現している。詳細と視覚化はhttps://sheetagent.github.io.comで公開されている。 Spreadsheet manipulation is widely existing in most daily works and significantly improves working efficiency. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce $\textbf{SheetRM}$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose $\textbf{SheetAgent}$, a novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: $\textit{Planner}$, $\textit{Informer}$, and $\textit{Retriever}$, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20-30% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities. More details and visualizations are available at https://sheetagent.github.io.	翻訳日:2024-08-28 00:16:18 公開日:2024-08-24
# MUC:ロバストな3D人体再構築のための非校正カメラの混合 MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction ( http://arxiv.org/abs/2403.05055v3 ) ライセンス: Link先を確認	Yitao Zhu, Sheng Wang, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang,	(参考訳) 複数のカメラは、人物の包括的なマルチビュービデオカバレッジを提供することができる。このマルチビューデータを融合することは、行動分析のようなタスクには不可欠だが、伝統的にカメラのキャリブレーションを必要とする。さらに, 複数視点での自己閉塞による課題と, 人体形状推定の連続性を見落としている。本研究では,複数のカメラビューから3次元人体を再構築する手法を提案する。当初、トレーニング済みの人体エンコーダを用いて、各カメラビューを個別に処理し、予測されたカメラ位置とともに、人体モデルと各ビューのパラメータの再構成を可能にする。ビュー全体にわたってモデルを平均化するのではなく、各カメラからの関節距離の推定値に基づいて、人間の関節の個々のビューに重みを割り当てるように訓練されたニューラルネットワークを開発する。さらに,ダイナミックフュージョンのための人体のメッシュ面に焦点を合わせ,顔の表情と体形をシームレスに統合し,統一された人体モデルを構築する。本手法は, SMPLモデルからSMPL-Xモデルまで, 2つの公開データセット上での人体再構築に優れた性能を示した。この拡張には、より複雑な手ポーズと表情が含まれており、再建の詳細と精度が向上している。重要なのは、さまざまなカメラのフレキシブルなアドホック展開をサポートし、さまざまなアプリケーションに大きな可能性を秘めていることだ。私たちのコードはhttps://github.com/AbsterZhu/MUC.comで公開されています。 Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration, a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. In this study, we introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views. Initially, we utilize a pre-trained human body encoder to process each camera view individually, enabling the reconstruction of human body models and parameters for each view along with predicted camera positions. Rather than merely averaging the models across views, we develop a neural network trained to assign weights to individual views for all human body joints, based on the estimated distribution of joint distances from each camera. Additionally, we focus on the mesh surface of the human body for dynamic fusion, allowing for the seamless integration of facial expressions and body shape into a unified human body model. Our method has shown excellent performance in reconstructing the human body on two public datasets, advancing beyond previous work from the SMPL model to the SMPL-X model. This extension incorporates more complex hand poses and facial expressions, enhancing the detail and accuracy of the reconstructions. Crucially, it supports the flexible ad-hoc deployment of any number of cameras, offering significant potential for various applications. Our code is available at https://github.com/AbsterZhu/MUC.	翻訳日:2024-08-28 00:06:22 公開日:2024-08-24
# 永久電流輸送における非エルミアンフェルミ-ディラック分布 Non-Hermitian Fermi-Dirac Distribution in Persistent Current Transport ( http://arxiv.org/abs/2403.09569v2 ) ライセンス: Link先を確認	Pei-Xin Shen, Zhide Lu, Jose L. Lado, Mircea Trif,	(参考訳) 永久電流は外部電源を必要とせずに連続的に循環する。ここでは、これらの理論を非エルミート量子ハミルトニアンの枠組み内での散逸を含むように拡張する。グリーン関数フォーマリズムを用いて、非エルミートフェルミ・ディラック分布を導入し、複素スペクトルのみに依存する永続電流の解析式を導出する。持続電流を支持する2つの散逸モデルに適用する。 i) 位相バイアス型超伝導-常温超電導接合 (ii)磁束で結ばれた正常な環。両系統の持続電流は、現在の感受性でしか識別できない異常点に異常を示さないことを示す。本研究は, 厳密な対角化による検証を行い, 有限温度および相互作用効果を考慮に入れた。我々の定式化は、非エルミート系の量子多体観測可能を平衡で計算するための一般的な枠組みを提供し、非平衡シナリオへの潜在的な拡張を提供する。 Persistent currents circulate continuously without requiring external power sources. Here, we extend their theory to include dissipation within the framework of non-Hermitian quantum Hamiltonians. Using Green's function formalism, we introduce a non-Hermitian Fermi-Dirac distribution and derive an analytical expression for the persistent current that relies solely on the complex spectrum. We apply our formula to two dissipative models supporting persistent currents: (i) a phase-biased superconducting-normal-superconducting junction; (ii) a normal ring threaded by a magnetic flux. We show that the persistent currents in both systems exhibit no anomalies at any emergent exceptional points, whose signatures are only discernible in the current susceptibility. We validate our findings by exact diagonalization and extend them to account for finite temperatures and interaction effects. Our formalism offers a general framework for computing quantum many-body observables of non-Hermitian systems in equilibrium, with potential extensions to non-equilibrium scenarios.	翻訳日:2024-08-28 00:06:22 公開日:2024-08-24
# DSP: 多次元変圧器の動的シーケンス並列性 DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers ( http://arxiv.org/abs/2403.10266v3 ) ライセンス: Link先を確認	Xuanlei Zhao, Shenggan Cheng, Chang Chen, Zangwei Zheng, Ziming Liu, Zheming Yang, Yang You,	(参考訳) 長い列への多次元変換器のスケーリングは、様々な領域で必須である。しかし、大きなメモリ要求とそのようなシーケンスの遅い速度の課題は、シーケンス並列性を必要とする。既存のすべてのアプローチは、単一のシーケンス次元に沿ってシャードに制限された組込みシーケンス並列化のカテゴリに該当するため、かなりの通信オーバーヘッドが生じる。しかし、多次元変圧器の性質は、複数の列次元にまたがる独立計算を伴う。そこで本研究では,動的シーケンス並列性(DSP)を並列性の新たな抽象化として提案する。 DSPは効率的な再シャーディング戦略で計算段階に応じて全列の並列次元を動的に切り替える。 DSPは通信コストの大幅な削減、モジュール間の適応性、最小限の制約による実装の容易性を提供する。実験により、DSPは32.2%から10倍のスループット向上により25%未満の通信量で、最先端の組込みシーケンス並列化法よりも優れていることが示された。 Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall under the category of embedded sequence parallelism, which are limited to shard along a single sequence dimension, thereby introducing significant communication overhead. However, the nature of multi-dimensional transformers involves independent calculations across multiple sequence dimensions. To this end, we propose Dynamic Sequence Parallelism (DSP) as a novel abstraction of sequence parallelism. DSP dynamically switches the parallel dimension among all sequences according to the computation stage with efficient resharding strategy. DSP offers significant reductions in communication costs, adaptability across modules, and ease of implementation with minimal constraints. Experimental evaluations demonstrate DSP's superiority over state-of-the-art embedded sequence parallelism methods by remarkable throughput improvements ranging from 32.2% to 10x, with less than 25% communication volume.	翻訳日:2024-08-28 00:06:22 公開日:2024-08-24
# EAS-SNN: 繰り返しスパイクニューラルネットワークを用いた事象検出のためのエンドツーエンド適応サンプリングと表現 EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2403.12574v2 ) ライセンス: Link先を確認	Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Huajin Tang,	(参考訳) イベントカメラは、高いダイナミックレンジと時間分解能を持ち、特に動きのぼやけと困難な照明条件のシナリオにおいて、オブジェクト検出に最適である。しかし、ほとんどの既存手法は、高度な検出バックボーンと早期集約関数による時空間表現の最適化を優先しているが、適応的なイベントサンプリングの重要な問題は、ほとんど未適応のままである。スパーススパイク通信を通じてイベント駆動のパラダイムで動作するスパイキングニューラルネットワーク(SNN)は、この課題に対処するための自然なフィットとして現れます。本研究では、スパイキングニューロンの神経力学が理想的な時間事象サンプリング器の動作と密接に一致していることを明らかにする。そこで本研究では,時間記憶を付加した再帰的畳み込みSNNを活用する適応型サンプリングモジュールを提案する。さらに、スパイクベースサンプリングモジュールで発生する電位分布の制御と性能劣化に対処するため、Residual potential Dropout (RPD) と Spike-Aware Training (SAT) を導入する。ニューロモルフィック検出データセットの実証評価により,本手法は既存のスパイク法よりもはるかに少ないパラメータと時間ステップで優れていることが示された。例えば、我々の方法では、Gen1データセットで4.4\% mAPの改善が得られ、パラメータは38\%少なく、3段階しか必要としない。さらに, 適応サンプリング手法の適用性および有効性は, 従来の非スパイキングモデルに対するさらなる検証を通じて示されるように, SNN を超えて拡張される。コードはhttps://github.com/Windere/EAS-SNNで入手できる。 Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling remains largely unaddressed. Spiking Neural Networks (SNNs), which operate on an event-driven paradigm through sparse spike communication, emerge as a natural fit for addressing this challenge. In this study, we discover that the neural dynamics of spiking neurons align closely with the behavior of an ideal temporal event sampler. Motivated by this insight, we propose a novel adaptive sampling module that leverages recurrent convolutional SNNs enhanced with temporal memory, facilitating a fully end-to-end learnable framework for event-based detection. Additionally, we introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution and address performance degradation encountered in spike-based sampling modules. Empirical evaluation on neuromorphic detection datasets demonstrates that our approach outperforms existing state-of-the-art spike-based methods with significantly fewer parameters and time steps. For instance, our method yields a 4.4\% mAP improvement on the Gen1 dataset, while requiring 38\% fewer parameters and only three time steps. Moreover, the applicability and effectiveness of our adaptive sampling methodology extend beyond SNNs, as demonstrated through further validation on conventional non-spiking models. Code is available at https://github.com/Windere/EAS-SNN.	翻訳日:2024-08-27 23:56:35 公開日:2024-08-24
# スパース符号化アーキテクチャによるモデル反転攻撃に対するロバスト性の改善 Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures ( http://arxiv.org/abs/2403.14772v2 ) ライセンス: Link先を確認	Sayanton V. Dibbo, Adam Breuer, Juston Moore, Michael Teti,	(参考訳) 最近のモデル反転攻撃アルゴリズムでは、ニューラルネットワークのプライベートかつ潜在的に敏感なトレーニングデータを繰り返しクエリすることで、敵が再構築することができる。本研究では,この攻撃に対してより優れたロバスト性を得るために,スパース符号化層を利用した新しいネットワークアーキテクチャを開発する。 30年にわたるコンピュータサイエンス研究は、画像の認識、オブジェクト認識、および敵対的誤分類設定という文脈でスパースコーディングを研究してきたが、私たちの知る限りでは、最先端のプライバシー脆弱性への関連性はまだ研究されていない。本研究は,ネットワークによって符号化された無関係なプライベート情報の量を,分類精度にはほとんど影響しない方法で制御できるため,スパース符号化アーキテクチャがモデル反転攻撃を防御する有利な手段であることを仮定する。具体的には、さまざまな最先端防衛で訓練されたネットワークと比較して、スパースコーディングアーキテクチャは、様々な再構築品質指標(PSNR、SSIM、FID)で1.1～18.3の要因で、最先端のトレーニングデータ再構成を劣化させながら、同等またはそれ以上の分類精度を維持している。このパフォーマンス上のアドバンテージは、CelebAの顔から医療画像、CIFAR-10まで、5つのデータセットにまたがる。我々はクラスタ対応のPyTorchコードベースを提供し、研究を促進し、防衛評価を標準化する。 Recent model inversion attack algorithms permit adversaries to reconstruct a neural network's private and potentially sensitive training data by repeatedly querying the network. In this work, we develop a novel network architecture that leverages sparse-coding layers to obtain superior robustness to this class of attacks. Three decades of computer science research has studied sparse coding in the context of image denoising, object recognition, and adversarial misclassification settings, but to the best of our knowledge, its connection to state-of-the-art privacy vulnerabilities remains unstudied. In this work, we hypothesize that sparse coding architectures suggest an advantageous means to defend against model inversion attacks because they allow us to control the amount of irrelevant private information encoded by a network in a manner that is known to have little effect on classification accuracy. Specifically, compared to networks trained with a variety of state-of-the-art defenses, our sparse-coding architectures maintain comparable or higher classification accuracy while degrading state-of-the-art training data reconstructions by factors of 1.1 to 18.3 across a variety of reconstruction quality metrics (PSNR, SSIM, FID). This performance advantage holds across 5 datasets ranging from CelebA faces to medical images and CIFAR-10, and across various state-of-the-art SGD-based and GAN-based inversion attacks, including Plug-&-Play attacks. We provide a cluster-ready PyTorch codebase to promote research and standardize defense evaluations.	翻訳日:2024-08-27 23:56:35 公開日:2024-08-24
# LLM-as-a-Judgeに対する最適化型プロンプトインジェクション攻撃 Optimization-based Prompt Injection Attack to LLM-as-a-Judge ( http://arxiv.org/abs/2403.17710v2 ) ライセンス: Link先を確認	Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong,	(参考訳) LLM-as-a-Judgeは、大きな言語モデル(LLM)を使用して、ある質問に対する候補セットから最適な応答を選択する。 LLM-as-a-Judgeには、LLMを使った検索、AIフィードバックによる強化学習(RLAIF)、ツールの選択など、多くの応用がある。本稿では,LLM-as-a-Judgeに対する最適化に基づくプロンプトインジェクション攻撃であるJiceDeceiverを提案する。ジャッジデシーバーは、LLM-as-a-Judgeが攻撃者長質問に対する候補応答を他の候補応答が何であれ選択するように、攻撃者制御された候補応答に慎重に作成されたシーケンスを注入する。具体的には、最適化問題としてそのようなシーケンスを定式化し、近似解法として勾配法を提案する。我々の広範な評価によると、JiceDeceiveは極めて効果的であり、既存のインジェクションインジェクションアタックよりもはるかに効果的であり、私たちの問題に拡張された時に、手動でインジェクションシーケンスとジェイルブレイクアタックを作成できる。また,LLMを用いた検索,RLAIF,ツール選択の3つのケーススタディにおいて,JiceDeceiverの有効性を示す。さらに, 既知の問合せ検出, パープレキシティ検出, パープレキシティウィンドウ検出などの防御策も検討した。以上の結果から,これらの防衛戦略は不十分であり,新たな防衛戦略開発への緊急の必要性が浮き彫りにされている。 LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question. LLM-as-a-Judge has many applications such as LLM-powered search, reinforcement learning with AI feedback (RLAIF), and tool selection. In this work, we propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge. JudgeDeceiver injects a carefully crafted sequence into an attacker-controlled candidate response such that LLM-as-a-Judge selects the candidate response for an attacker-chosen question no matter what other candidate responses are. Specifically, we formulate finding such sequence as an optimization problem and propose a gradient based method to approximately solve it. Our extensive evaluation shows that JudgeDeceive is highly effective, and is much more effective than existing prompt injection attacks that manually craft the injected sequences and jailbreak attacks when extended to our problem. We also show the effectiveness of JudgeDeceiver in three case studies, i.e., LLM-powered search, RLAIF, and tool selection. Moreover, we consider defenses including known-answer detection, perplexity detection, and perplexity windowed detection. Our results show these defenses are insufficient, highlighting the urgent need for developing new defense strategies.	翻訳日:2024-08-27 23:56:35 公開日:2024-08-24
# 勾配偏光アルゴリズムによる3dB限界を超える最適機械的四次スキューズ Optimized mechanical quadrature squeezing beyond the 3-dB limit via a gradient-descent algorithm ( http://arxiv.org/abs/2404.13563v3 ) ライセンス: Link先を確認	Yu-Hong Liu, Jie-Qiao Liao,	(参考訳) メカニカル・クアチュア・スクイーズ状態の調製は、キャビティ・オプティメニクスにおいて重要な意味を持つ。なぜなら、圧縮された状態は基本的な量子力学を理解し、現代の量子技術を利用するために広く応用されているからである。そこで本研究では, 勾配偏光アルゴリズムを用いて, 最適キャビティフィールド駆動パルスを求めることにより, 典型的なキャビティ・オプティメカカル・システムにおいて, メカニカル・クィアリングを生成するための信頼性の高い手法を提案する。熱フォノン占有率100の3dB定常限界を超える機械共振器において, 強い4次スキューズを実現する。さらに、機械的スクイーズを1つの機械的発振期間内に超急速生成することができる。また、生成したメカニカルスクイーズに付随する最適パルス駆動値を求め、メカニカルスクイーズ生成のメカニズムを解析した。本稿では,量子光学および量子情報科学における最適量子制御の適用を促進する。 The preparation of mechanical quadrature-squeezed states holds significant importance in cavity optomechanics because the squeezed states have extensive applications in understanding fundamental quantum mechanics and exploiting modern quantum technology. Here, we propose a reliable scheme for generating mechanical quadrature squeezing in a typical cavity optomechanical system via seeking optimal cavity-field driving pulses using the gradient-descent algorithm. We realize strong quadrature squeezing in a mechanical resonator that exceeds the 3-dB steady-state limit, even with a thermal phonon occupancy of 100. Furthermore, the mechanical squeezing can be ultrarapidly created within one mechanical oscillation period. We also obtain the optimal pulsed drivings associated with the created mechanical squeezings and analyze the mechanism for mechanical squeezing generation. This paper will promote the application of optimal quantum control in quantum optics and quantum information science.	翻訳日:2024-08-27 23:46:51 公開日:2024-08-24
# 能動物体検出のためのパラメータ効率向上のための外部プロンプト特性 External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection ( http://arxiv.org/abs/2404.15008v2 ) ライセンス: Link先を確認	Wen Liang, Peipei Ran, Mengchao Bai, Xiao Liu, P. Bilha Githinji, Wei Zhao, Peiwu Qin,	(参考訳) Salient Object Detection (SOD) は、画像中の最も健全なオブジェクトを見つけ、ピクセルレベルのバイナリマスクを出力することを目的としている。トランスフォーマーに基づく手法は,グローバルなセマンティック理解によって有望な性能を達成する。しかし、これらのモデルは大規模であり、多くの訓練パラメータを必要とする傾向にある。そこで本研究では,SOD用変圧器のポテンシャルをよりよく活用するために,学習パラメータの削減を目的としたパラメータ効率の高い微調整手法を提案する。 ExPert(AdaptedR Tuning)と呼ばれる我々のモデルでは、冷凍トランスエンコーダの層間にアダプタとインジェクタが分散したエンコーダ・デコーダ構造が特徴的である。アダプタモジュールは事前訓練されたバックボーンをSODに適合させ、インジェクタモジュールは外部のプロンプト機能を組み込んで、正常なオブジェクトの認識を高める。総合的な実験により,本手法の優位性を実証した。従来の最先端(SOTA)モデルを5つのSODデータセットに渡すことで、ExPertは80.2Mのトレーニングパラメータを持つECSSDデータセットで0.215の平均絶対誤差(MAE)を達成し、SelfReformerより21%、EGNetより47%向上した。 Salient object detection (SOD) aims at finding the most salient objects in images and outputs pixel-level binary masks. Transformer-based methods achieve promising performance due to their global semantic understanding, crucial for identifying salient objects. However, these models tend to be large and require numerous training parameters. To better harness the potential of transformers for SOD, we propose a novel parameter-efficient fine-tuning method aimed at reducing the number of training parameters while enhancing the salient object detection capability. Our model, termed EXternal Prompt features Enhanced adapteR Tuning (ExPert), features an encoder-decoder structure with adapters and injectors interspersed between the layers of a frozen transformer encoder. The adapter modules adapt the pretrained backbone to SOD while the injector modules incorporate external prompt features to enhance the awareness of salient objects. Comprehensive experiments demonstrate the superiority of our method. Surpassing former state-of-the-art (SOTA) models across five SOD datasets, ExPert achieves 0.215 mean absolute error (MAE) in the ECSSD dataset with 80.2M trained parameters, 21% better than SelfReformer and 47% better than EGNet.	翻訳日:2024-08-27 23:46:51 公開日:2024-08-24
# 大規模言語モデルを用いた逆グラフの再合成 Re-Thinking Inverse Graphics With Large Language Models ( http://arxiv.org/abs/2404.15228v2 ) ライセンス: Link先を確認	Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Abrevaya, Michael J. Black,	(参考訳) 逆グラフィックス - イメージを物理変数に変換するタスクで、レンダリングされると観察されたシーンの再生を可能にする - は、コンピュータビジョンとグラフィックスの基本的な課題である。画像が3Dシーンのオブジェクトの形状、色、材料特性などの構成要素に切り離されるのに成功するには、環境を包括的に理解する必要がある。この複雑さは、ドメインをまたいで一般化する既存の慎重に設計されたアプローチの能力を制限します。大規模言語モデル(LLM)が新しい文脈に一般化するゼロショット能力に着想を得て,そのようなモデルに符号化された広い世界知識を活用して,逆グラフ問題の解法を提案する。そこで本研究では,LLMを中心とした逆グラフフレームワークである逆グラフ大言語モデル(IG-LLM)を提案する。我々は、凍結した事前学習されたビジュアルエンコーダと連続的な数値ヘッドを組み込んで、エンドツーエンドのトレーニングを可能にする。本研究は,画像空間の監督を使わずに,次から次へと予測することで,逆グラフィックスを促進するLLMの可能性を実証するものである。本分析により,LLMの視覚的知識を利用した画像の空間的推論が可能となった。コードとデータはhttps://ig-llm.is.tue.mpg.de/で公開しています。 Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.	翻訳日:2024-08-27 23:46:51 公開日:2024-08-24
# 物理インフォームドニューラルネットワークにおける最適時間サンプリング Optimal time sampling in physics-informed neural networks ( http://arxiv.org/abs/2404.18780v2 ) ライセンス: Link先を確認	Gabriel Turinici,	(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network、PINN)は、科学計算応用における方程式の解法として非常に強力なパラダイムである。手順の重要な部分は、方程式が時間依存であるとき、時間サンプリングを含む方程式残差の最小化である。文献では、サンプリングは均一である必要はないが、初期時間は過重であるべきだと論じられたが、この選択には厳密な説明は提供されなかった。本研究では, ニューラルネットワーク収束に関する標準的な仮説として, 最適時間サンプリングが指数分布に追従することを示す。特に、均一な時間サンプリングを使用するのが最適な時期と、そうすべきでない時期について説明する。この結果は、線形方程式、バーガーズ方程式、ローレンツ系に関する数値的な例で示される。 Physics-informed neural networks (PINN) is a extremely powerful paradigm used to solve equations encountered in scientific computing applications. An important part of the procedure is the minimization of the equation residual which includes, when the equation is time-dependent, a time sampling. It was argued in the literature that the sampling need not be uniform but should overweight initial time instants, but no rigorous explanation was provided for this choice. In the present work we take some prototypical examples and, under standard hypothesis concerning the neural network convergence, we show that the optimal time sampling follows a (truncated) exponential distribution. In particular we explain when is best to use uniform time sampling and when one should not. The findings are illustrated with numerical examples on linear equation, Burgers' equation and the Lorenz system.	翻訳日:2024-08-27 23:46:51 公開日:2024-08-24
# 単元ブロック最適化スキームと古典的後処理を組み合わせた変分量子固有解法の最適化 Better Optimization of Variational Quantum Eigensolvers by Combining the Unitary Block Optimization Scheme with Classical Post-Processing ( http://arxiv.org/abs/2404.19027v4 ) ライセンス: Link先を確認	Xiaochuan Ding, Bryan K. Clark,	(参考訳) 変分量子固有解法(VQE)は、ハミルトンの古典的に難解な基底状態を見つけるための有望なアプローチである。 Unitary Block Optimization Scheme (UBOS) は最先端のVQE方式であり、ゲートを網羅し、他のゲート環境における各ゲートの最適パラメータを求める。 UBOSは、SGD (Stochastic Gradient Descent) に対する等級によって、基底状態への収束時間を改善する。それにもかかわらず、ショットノイズから生じる非常にノイズの多い期待値に直面して、収束率と最終的な収束エネルギーの両方に苦しむ。ここではUBOSを改良する2つの古典的後処理手法について述べる。ガウス過程回帰(GPR)を用いて、量子コンピュータからの原データを用いて人工的な拡張現実データを生成し、改良されたパラメータを解く際の全体的なエラーを低減する。 DROPR(Double Robust Optimization plus Rejection)を用いることで、非典型的にノイズの多いデータの外部への流出を防止し、特に誤った単一最適化ステップを発生させ、ノイズ測定に対するロバスト性を高める。これらの手法を組み合わせることで、UBOSが3倍の誤差で到達する最終的な相対誤差をさらに削減し、追加の量子測定やサンプリングオーバーヘッドを追加することなく実現できる。この研究は、古典的資源を用いて量子計測結果を後処理する技術を開発することにより、VQEアルゴリズムを著しく改善することを示した。 Variational Quantum Eigensolvers (VQE) are a promising approach for finding the classically intractable ground state of a Hamiltonian. The Unitary Block Optimization Scheme (UBOS) is a state-of-the-art VQE method which works by sweeping over gates and finding optimal parameters for each gate in the environment of other gates. UBOS improves the convergence time to the ground state by an order of magnitude over Stochastic Gradient Descent (SGD). It nonetheless suffers in both rate of convergence and final converged energies in the face of highly noisy expectation values coming from shot noise. Here we develop two classical post-processing techniques which improve UBOS especially when measurements have large noise. Using Gaussian Process Regression (GPR), we generate artificial augmented data using original data from the quantum computer to reduce the overall error when solving for the improved parameters. Using Double Robust Optimization plus Rejection (DROPR), we prevent outlying data which are atypically noisy from resulting in a particularly erroneous single optimization step thereby increasing robustness against noisy measurements. Combining these techniques further reduces the final relative error that UBOS reaches by a factor of three without adding additional quantum measurement or sampling overhead. This work further demonstrates that developing techniques which use classical resources to post-process quantum measurement results can significantly improve VQE algorithms.	翻訳日:2024-08-27 23:36:49 公開日:2024-08-24
# 視覚言語概念ボトルネックモデルにおける概念アライメントの改善 Improving Concept Alignment in Vision-Language Concept Bottleneck Models ( http://arxiv.org/abs/2405.01825v2 ) ライセンス: Link先を確認	Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Wai-Kin Kong, Alex Kot,	(参考訳) 概念ボトルネックモデル (Concept Bottleneck Models, CBM) は、クラス予測を行う前に、イメージを人間の解釈可能な概念にマッピングする。近年のアプローチでは、大規模言語モデル(LLM)にテキスト概念の生成を促し、視覚言語モデル(VLM)を用いてこれらの概念をCBM訓練に活用することにより、CBM構築を自動化する。しかし、LCMが生成したものよりも、人間の専門家が定義した概念でCBMを構築し、より信頼できるものにすることが望まれている。本研究では, 鳥の細粒化や動物分類などの領域において, 専門家が定義した概念に対するVLM概念スコアの忠実性について, 詳しく検討する。これらの結果から,CLIPのようなVLMは高い分類性能を達成しつつも,概念と対応する視覚入力を正しく関連付けるのに苦慮していることが明らかとなった。このミスアライメントは、結果のモデルを解釈しにくく、信頼性の低いものにする。この問題に対処するために,数個のラベル付き概念サンプルを活用して,真に視覚的な概念を活性化し,CLIPモデルにおける概念アライメントを改善する,新しいコントラシブ・セミスーパーバイザード(CSS)学習法を提案する。 3つのベンチマークデータセットに対する大規模な実験により,提案手法は概念(+29.95)と分類(+3.84)の両方を著しく向上させるが,人間に注釈付けされた概念ラベルのごく一部しか必要としないことが示された。分類性能をさらに向上するために,クラスレベルの介入手順を導入し,クラス間の相違を識別し,それらの概念空間に介入することで誤りを低減した。 Concept Bottleneck Models (CBM) map images to human-interpretable concepts before making class predictions. Recent approaches automate CBM construction by prompting Large Language Models (LLMs) to generate text concepts and employing Vision Language Models (VLMs) to score these concepts for CBM training. However, it is desired to build CBMs with concepts defined by human experts rather than LLM-generated ones to make them more trustworthy. In this work, we closely examine the faithfulness of VLM concept scores for such expert-defined concepts in domains like fine-grained bird species and animal classification. Our investigations reveal that VLMs like CLIP often struggle to correctly associate a concept with the corresponding visual input, despite achieving a high classification performance. This misalignment renders the resulting models difficult to interpret and less reliable. To address this issue, we propose a novel Contrastive Semi-Supervised (CSS) learning method that leverages a few labeled concept samples to activate truthful visual concepts and improve concept alignment in the CLIP model. Extensive experiments on three benchmark datasets demonstrate that our method significantly enhances both concept (+29.95) and classification (+3.84) accuracies yet requires only a fraction of human-annotated concept labels. To further improve the classification performance, we introduce a class-level intervention procedure for fine-grained classification problems that identifies the confounding classes and intervenes in their concept space to reduce errors.	翻訳日:2024-08-27 23:36:49 公開日:2024-08-24
# Time Evidence Fusion Network: 長期連続予測におけるマルチソースビュー Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting ( http://arxiv.org/abs/2405.06419v2 ) ライセンス: Link先を確認	Tianxiang Zhan, Yuanpeng He, Yong Deng, Zhen Li,	(参考訳) 現実的なシナリオでは、特に大規模なデータセットを扱う場合、時系列予測がタイムラインを必要とする。その結果、モデルアーキテクチャの探索は研究において年々話題となっている。これらの性能要求を満たすため,情報融合の観点から新しいバックボーンを提案する。 The Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN) のエビデンス理論に基づく導入により,優れた性能を実現することができる。一方,マルチソース情報融合の視点は,予測精度を効果的に向上させる。 BPA がファジィ理論によって生成されるという事実から、EFN もかなり解釈可能である。実際のデータ実験では、TEFNはPatchTSTに匹敵する低い誤差で最先端を部分的に達成し、Dlinearのような性能モデルを上回る動作効率を実現した。一方、TEFNは、ランダムなハイパーパラメータ選択において、高いロバスト性および小さなエラー変動を有する。 TEFNは、単一面において究極のものを達成するモデルではなく、性能、正確性、安定性、解釈可能性のバランスをとるモデルである。 In practical scenarios, time series forecasting necessitates timeliness, especially when dealing with large datasets. Consequently, the exploration of model architectures remains a perennially trending topic in research. To meet these performance demands, we propose a novel backbone from the perspective of information fusion. Introducing the Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN), based on evidence theory, allows us to achieve superior performance. On the other hand, the perspective of multi-source information fusion effectively improves the accuracy of forecasting. Due to the fact that BPA is generated by fuzzy theory, TEFN also has considerable interpretability. In real data experiments, the TEFN partially achieved state-of-the-art, with low errors comparable to PatchTST, and operating efficiency surpass performance models such as Dlinear. Meanwhile, TEFN has high robustness and small error fluctuations in the random hyperparameter selection. TEFN is not a model that achieves the ultimate in single aspect, but a model that balances performance, accuracy, stability, and interpretability.	翻訳日:2024-08-27 23:36:49 公開日:2024-08-24
# プラケットモデル, セルオートマタおよび測定による臨界度 Plaquette Models, Cellular Automata, and Measurement-induced Criticality ( http://arxiv.org/abs/2405.08286v2 ) ライセンス: Link先を確認	Hanchen Liu, Xiao Chen,	(参考訳) ここでは,複数スピン相互作用項をプラケット項と呼ぶ2次元ランダム化プラケットモデルのクラスを,1-p$の確率で単一サイトスピン項に置き換える。異なる$p$により、基底状態の位相遷移、あるいは同値な対称性作用素の位相遷移を観察する。 p$ が変化するにつれて、対称性作用素は拡大から空間の局所化へと変化する。これらのモデルは1+1Dランダム化セルオートマトンダイナミクスと等価に理解することができ、2D遷移を1+1D動的吸収相転移と解釈することができる。本稿では,3体あるいは5体の相互作用を持つラケット項に着目し,遷移の普遍性クラスについて検討する。具体的には, 1+1D クリフォード力学で観測される測定誘起エンタングルメント相転移と, ランダムバルクパウリ測定により誘導される2次元クラスター状態の境界エンタングルメント遷移と同じ普遍性クラスに属することを示す。この研究は、古典的なスピンモデル、セルオートマトン、ハイブリッドランダム回路における遷移の間の接続を確立する。 We present a class of two-dimensional randomized plaquette models, where the multi-spin interaction term, referred to as the plaquette term, is replaced by a single-site spin term with a probability of $1-p$. By varying $p$, we observe a ground state phase transition, or equivalently, a phase transition of the symmetry operator. We find that as we vary $p$, the symmetry operator changes from being extensive to being localized in space. These models can be equivalently understood as 1+1D randomized cellular automaton dynamics, allowing the 2D transition to be interpreted as a 1+1D dynamical absorbing phase transition. In this paper, our primary focus is on the plaquette term with three or five-body interactions, where we explore the universality classes of the transitions. Specifically, for the model with five-body interaction, we demonstrate that it belongs to the same universality class as the measurement-induced entanglement phase transition observed in 1+1D Clifford dynamics, as well as the boundary entanglement transition of the 2D cluster state induced by random bulk Pauli measurements. This work establishes a connection between transitions in classical spin models, cellular automata, and hybrid random circuits.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# パッチ付き視覚プロンプトインジェクタに対する視覚言語モデルの保護 Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors ( http://arxiv.org/abs/2405.10529v2 ) ライセンス: Link先を確認	Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao,	(参考訳) 大規模言語モデルはますます顕著になり、人工知能の次のフロンティアとしてマルチモーダリティへのシフトを示唆している。視覚言語モデル(VLM)はこの進歩の最前線にあり、視覚とテキストのデータを組み合わせて理解と相互作用を強化する革新的な方法を提供している。しかし、この統合は攻撃面を拡大する。パッチベースの敵攻撃は、既存の多くの文献で示されているように、物理的な視覚応用において最も現実的な脅威モデルと考えられている。本稿では,VLMのターゲットコンテンツを生成するために,相手が相手のパッチを利用するようなパッチ付きビジュアルプロンプトインジェクションを提案する。本研究は, 画素単位のランダム化に対して, パッチを施した対向性刺激が感受性を示すことを明らかにした。この知見を活かして、スムージング技術に根ざした防御機構であるSmoothVLMを導入し、特に、パッチされた視覚的プロンプトインジェクタの脅威からVLMを保護するようにした。我々のフレームワークは、2つの主要なVLMにおいて攻撃成功率を0%から5.0%の範囲に格段に低下させ、67.3%から95.0%のコンテキスト回復を実現し、セキュリティとユーザビリティのバランスを示す。 Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 衝突モデルにおける可変非マルコフ力学-コヒーレント輸送への応用 Tunable non-Markovian dynamics in a collision model: an application to coherent transport ( http://arxiv.org/abs/2405.10685v2 ) ライセンス: Link先を確認	Simone Rijavec, Giuseppe Di Pietra,	(参考訳) 非マルコビアン性の異なる環境に結合したシステムの情報力学を解析するための衝突モデルを提案する。量子ビットの固定および剛性貯留層に偏極チャネルを適用することにより、非マルコビアン性の度合いを制御する。偏極チャネルの効果を特徴付けるとともに、3つの相互作用する量子ビットの連鎖上の励起のコヒーレント輸送を研究するためにモデルを適用する。システム-環境結合強度と非マルコビアン性の程度がプロセスにどのように影響するかを示す。興味深いことに、マルコフ環境は励起のコヒーレント輸送を強化するために好まれる場合もある。 We propose a collision model to investigate the information dynamics of a system coupled to an environment with varying degrees of non-Markovianity. We control the degree of non-Markovianity by applying a depolarising channel to a fixed and rigid reservoir of qubits. We characterise the effect of the depolarising channel and apply the model to study the coherent transport of an excitation on a chain of three interacting qubits. We show how the system-environment coupling strength and the degree of non-Markovianity affect the process. Interestingly, in some cases a Markovian environment is preferable to enhance the coherent transport of the excitation.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 実用的なマッハ・ツェンダー干渉計を用いた差動位相シフトQKD Differential-phase-shift QKD with practical Mach-Zehnder interferometer ( http://arxiv.org/abs/2405.11760v2 ) ライセンス: Link先を確認	Akihiro Mizutani, Masanori Terashita, Junya Matsubayashi, Shogo Mori, Ibuki Matsukura, Suzuna Tagawa, Kiyoshi Tamaki,	(参考訳) 微分位相シフト(DPS)量子鍵分布は、単純な実装のため有望なプロトコルであり、コヒーレントパルス列と受動測定ユニットで実現可能である。 DPSプロトコルを実装するためには、ユーザのデバイスに実用上の欠陥を取り入れたセキュリティ証明を確立することが重要であるが、既存のセキュリティ証明は、マッハ・ツェンダー干渉計を用いて測定ユニットに非現実的な仮定を行う。本稿では、測定ユニットに主要な欠陥を組み込むことにより、DPSプロトコルの実装セキュリティを強化する。具体的には、既存のセキュリティ証明で想定されているように、正確に50\%$のビームスプリッタよりも、送信範囲の既知の実用的なビームスプリッタを使用することが可能である。数値シミュレーションにより, 理想値からの透過率の変動が$\pm0.5\%である場合でも, 鍵レートは0.57でしか劣化しないことが示された。この結果は,DPSプロトコルの実現可能性を示すものである。 Differential-phase-shift (DPS) quantum key distribution stands as a promising protocol due to its simple implementation, which can be realized with a train of coherent pulses and a passive measurement unit. To implement the DPS protocol, it is crucial to establish security proofs incorporating practical imperfections in users' devices, however, existing security proofs make unrealistic assumptions on the measurement unit using a Mach-Zehnder interferometer. In this paper, we enhance the implementation security of the DPS protocol by incorporating a major imperfection in the measurement unit. Specifically, our proof enables us to use practical beam splitters with a known range of the transmittance rather than the one with exactly $50\%$, as was assumed in the existing security proofs. Our numerical simulations demonstrate that even with fluctuations of $\pm0.5\%$ in the transmittance from the ideal value, the key rate degrades only by a factor of 0.57. This result highlights the feasibility of the DPS protocol with practical measurement setups.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# ディジタル双生児における生産プロセス最適化のためのスパースアテンション駆動品質予測 Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins ( http://arxiv.org/abs/2405.11895v2 ) ライセンス: Link先を確認	Yanlei Yin, Lihua Wang, Dinh Thai Hoang, Wenbo Wang, Dusit Niyato,	(参考訳) プロセス産業では、生産ラインの長期的かつ効率的な最適化には、生産ラインパラメータを微調整するために、運用状態のリアルタイムモニタリングと分析が必要である。しかし、運用論理の複雑さと生産プロセスパラメータの複雑な結合は、プロセス全体の正確な数学的モデルを開発するのを難しくし、効率的な最適化機構の展開を妨げる。これらの困難を鑑みて、我々は、データ駆動方式で運用ロジックを符号化することで、生産ラインのデジタルツインをデプロイすることを提案する。デジタル双生児における機器運用状況と製品品質指標を反映した実世界のデータを反復的にマッピングすることにより、自己注意型時間畳み込みニューラルネットワークに基づく生産プロセスの品質予測モデルを採用する。このモデルは、デジタルツインのデータ駆動状態の進化を可能にする。デジタルツインは、実際の動作条件の情報と品質に敏感な分析結果を集約する役割を担い、仮想現実性進化によるプロセス生産の最適化を容易にする。ディジタルツインを情報フローキャリアとして活用し、キープロセスインジケータから時間的特徴を抽出し、提案したディープニューラルネットワークに基づく生産プロセス品質予測モデルを確立する。本手法は,本手法により,仮想及び実生産ライン間のシームレスな統合を促進できることを示す。この統合により、平均動作状態予測精度が98%以上、製品品質受け入れ率が96%以上となる。 In the process industry, long-term and efficient optimization of production lines requires real-time monitoring and analysis of operational states to fine-tune production line parameters. However, complexity in operational logic and intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by encoding its operational logic in a data-driven approach. By iteratively mapping the real-world data reflecting equipment operation status and product quality indicators in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production with virtual-reality evolution. Leveraging the digital twin as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed deep neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98% and a product quality acceptance rate of over 96%.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 非決定論的因果モデル Nondeterministic Causal Models ( http://arxiv.org/abs/2405.14001v2 ) ライセンス: Link先を確認	Sander Beckers,	(参考訳) 非巡回決定論的構造方程式モデルを非決定論的ケースに一般化し、反事実に対して改良された意味論を提供すると主張する。ハルパーンによって開発された標準的な決定論的意味論(およびギャレス・アンド・パールの最初の提案に基づく)は、親変数への値の割り当てにはそれぞれの子変数に固有の代入が存在すると仮定し、実際の世界(モデルのすべての変数に対する値の代入)がそれぞれの介入に対してユニークな逆実世界を特定すると仮定する。どちらの仮定も非現実的であり、それゆえ、我々は両方の仮定を我々の提案に落としている。構造方程式における多値関数を許容する。さらに, 実世界で得られた方程式の解が, あらゆる反現実の世界に保存されるようにセマンティクスを調整した。我々は、結果の論理の健全かつ完全な公理化を提供し、ハルパーンによる標準的な論理と、我々のより近いより最近の提案と比較する。最後に、我々のモデルを確率的ケースに拡張し、カウサルベイズネットワークにおいても、カウンターファクトの特定方法を公開することを示す。 We generalize acyclic deterministic structural equation models to the nondeterministic case and argue that it offers an improved semantics for counterfactuals. The standard, deterministic, semantics developed by Halpern (and based on the initial proposal of Galles & Pearl) assumes that for each assignment of values to parent variables there is a unique assignment to their child variable, and it assumes that the actual world (an assignment of values to all variables of a model) specifies a unique counterfactual world for each intervention. Both assumptions are unrealistic, and therefore we drop both of them in our proposal. We do so by allowing multi-valued functions in the structural equations. In addition, we adjust the semantics so that the solutions to the equations that obtained in the actual world are preserved in any counterfactual world. We provide a sound and complete axiomatization of the resulting logic and compare it to the standard one by Halpern and to more recent proposals that are closer to ours. Finally, we extend our models to the probabilistic case and show that they open up the way to identifying counterfactuals even in Causal Bayesian Networks.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 振り返り:教師投影ヘッドを用いた自己教師型学習による軽量モデルへの効率的な埋込み蒸留 Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning ( http://arxiv.org/abs/2405.15311v3 ) ライセンス: Link先を確認	Khanh-Binh Nguyen, Chae Jung Park,	(参考訳) 自己教師付き学習(SSL)は、大量のラベルのないデータで効果的な表現を学習する能力に注目が集まっている。軽量モデルは、コントラストと一貫性の制約を用いて、より大規模な自己教師付き事前訓練モデルから蒸留することができる。しかし、プロジェクションヘッドのサイズの違いは、生徒が先生の埋め込みを正確に模倣することを困難にしている。本稿では,教師のプロジェクションヘッドを学生に再利用する「textsc{Retro}」を提案する。例えば、ResNet-50/101/152を教師として使用したEfficientNet-B0のトレーニングでは、ImageNetの線形結果が6.9\%$、69.3\%$、69.8\%$に改善され、パラメータが大幅に少ない。 Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 強化サイテーションバイアスを用いた大規模言語モデルによる人間のクエンテーションパターンの反映 Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias ( http://arxiv.org/abs/2405.15739v3 ) ライセンス: Link先を確認	Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, Vincent Ginis,	(参考訳) サイテーションの実践は科学的知識の構造を形成するのに不可欠であるが、それらは現代の規範や偏見の影響を受けていることが多い。 LLM(Large Language Models)の出現は、これらのプラクティスに新たなダイナミクスをもたらす。興味深いことに、LLMが推奨する参照の特徴と潜在的なバイアスは、そのパラメトリックな知識に完全に依存しており、検索や検索強化世代に依存していない。本稿では,これらの特徴を,GPT-4の知識遮断日後に公表されたAAAI,NeurIPS,ICML,ICLRのデータセットを用いて解析する。本実験では,これらの論文の中で,匿名化された文中の引用を学術的に参照することを提案する。以上の結果より, 出版年, タイトル長, 著者数, 会場数に比して, 高い引用バイアスが持続することが明らかとなった。 GPT-4と、より有能なモデルであるGPT-4oとClaude 3.5の両方で、論文はトレーニングデータの一部である。さらに, LLMの既存の参照と存在しない参照の特徴との間に大きな一貫性が見られ, モデルが励起パターンを内部化していることが示唆された。引用グラフを解析することにより、推奨される参照が関連する引用コンテキストに埋め込まれていることを示し、引用ネットワークのより深い概念的内部化を示唆する。 LLMは引用生成に役立つが、マシュー効果のような既存のバイアスを増幅し、新しいバイアスを導入し、科学的知識の拡散を引き起こす可能性がある。 Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date. In our experiment, LLMs are tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias, which persists even after controlling for publication year, title length, number of authors, and venue. The results hold for both GPT-4, and the more capable models GPT-4o and Claude 3.5 where the papers are part of the training data. Additionally, we observe a large consistency between the characteristics of LLM's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases, such as the Matthew effect, and introduce new ones, potentially skewing scientific knowledge dissemination.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# 摂動フォージェリによる逆データ検出 Detecting Adversarial Data via Perturbation Forgery ( http://arxiv.org/abs/2405.16226v2 ) ライセンス: Link先を確認	Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Ping Li, Jiazhong Chen, Shijuan Huang, Ning Yu,	(参考訳) 敵対的攻撃に対する防御戦略として、敵対的検出は、自然・敵対的データ間の分布の相違とノイズパターンに基づいて、データフローから敵対的データを識別・フィルタリングすることを目的としている。従来の検出手法は勾配に基づく対向攻撃の検出では高い性能を示すが,不均衡および異方性雑音パターンを回避した生成モデルに基づく新たな攻撃は回避される。さらに悪いことに、既存のテクニックは、防衛を展開する前に攻撃データへのアクセスを必要とするか、推論にかなりの時間的コストを要し、防御者が目にしない新たな攻撃を防御するためには実用的ではない。本稿では, 対向雑音分布間の近接関係について検討し, 開放被覆の存在を実証する。このオープンカバーと自然データの分布を区別することで、あらゆる種類の敵攻撃に対して強力な一般化能力を持つ検出器を開発することができる。この知見に基づいて,ノイズ分布の摂動,スパースマスク生成,擬似逆数データ生成を含む摂動フォージェリを提案し,特定のモデルに依存せず,未知の勾配ベース,生成モデルベース,物理的逆数攻撃を検出可能な逆数検出器を訓練する。複数の汎用的および顔的データセットに対して行われた総合的な実験は、幅広い攻撃範囲で、我々の手法の強力な一般化を検証した。 As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalanced and anisotropic noise patterns evade detection. Even worse, existing techniques either necessitate access to attack data before deploying a defense or incur a significant time cost for inference, rendering them impractical for defending against newly emerging attacks that are unseen by defenders. In this paper, we explore the proximity relationship between adversarial noise distributions and demonstrate the existence of an open covering for them. By learning to distinguish this open covering from the distribution of natural data, we can develop a detector with strong generalization capabilities against all types of adversarial attacks. Based on this insight, we heuristically propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks, while remaining agnostic to any specific models. Comprehensive experiments conducted on multiple general and facial datasets, with a wide spectrum of attacks, validate the strong generalization of our method.	翻訳日:2024-08-27 23:27:05 公開日:2024-08-24
# ポリアディック超対称性 Polyadic supersymmetry ( http://arxiv.org/abs/2406.02188v2 ) ライセンス: Link先を確認	Steven Duplij,	(参考訳) 一次元超対称性量子力学の玩具モデルに適用した多元化法(著者が提案する)を考慮し、超対称性の多進アナログを導入する。スーパーチャージは、初期の研究で定義された$n$-ary sigma行列を用いてポリアディックに一般化される。このように、スーパーチャージとハミルトニアンのポリアディックアナログは巡回シフトブロック行列形式をとり、N$拡張および多重グレードSQMとは異なる方法で多生成量子状態を記述することができる。対応する超対称性を$n$-ary Lie superalgebra ("n$ is the arity of the initial associative multiplication") として構成する一方で、新たな括弧が2,2\leq m<n$と関連する$m$-ary superalgebrasシリーズ(二元超代数では不可能である)を発見した。さらに、アリティ$m$が小さくなったら、ハミルトン作用素でさえ高次(微分作用素として)の塔を得るが、奇数$m$の場合、高次奇超電荷の塔を得ることができ、対応する代数は奇数セクターのみからなる。 We introduce a polyadic analog of supersymmetry by considering the polyadization procedure (proposed by the author) applied to the toy model of one-dimensional supersymmetric quantum mechanics. The supercharges are generalized to polyadic ones using the $n$-ary sigma matrices defined in earlier work. In this way, polyadic analogs of supercharges and Hamiltonians take the cyclic shift block matrix form, and they can describe multidegenerated quantum states in a way that is different from the $N$-extended and multigraded SQM. While constructing the corresponding supersymmetry as an $n$-ary Lie superalgebra ($n$ is the arity of the initial associative multiplication), we have found new brackets with a reduced arity of $2\leq m<n$ and a related series of $m$-ary superalgebras (which is impossible for binary superalgebras). In the case of even reduced arity $m$ we obtain a tower of higher order (as differential operators) even Hamiltonians, while for $m$ odd we get a tower of higher order odd supercharges, and the corresponding algebra consists of the odd sector only.	翻訳日:2024-08-27 23:17:21 公開日:2024-08-24
# 語彙データ分類のためのファジィ畳み込みニューラルネットワーク Fuzzy Convolution Neural Networks for Tabular Data Classification ( http://arxiv.org/abs/2406.03506v4 ) ライセンス: Link先を確認	Arun D. Kulkarni,	(参考訳) 近年、畳み込みニューラルネットワーク(CNN)は、特に画像やテキストの分類タスクにおいて、様々な領域における顕著な性能のために、多くの注目を集めている。しかし、表形式のデータ分類への応用はいまだ未定である。バイオインフォマティクス、ファイナンス、非画像データが一般的である医療など、多くの分野がある。非画像データの分類にCNNを適用することは、依然として非常に困難である。本稿では,従来の機械学習手法と深層学習手法のギャップを埋めることを目的として,表層データ分類におけるCNNの有効性について検討する。本稿では,特徴ベクトル内の局所パターンを捉えるための表データに適した,ファジィ畳み込みニューラルネットワーク(FCNN)を提案する。提案手法では,特徴値をファジィメンバシップにマップする。ファジィメンバシップベクトルは、CNNモデルのトレーニングに使用される画像に変換される。訓練されたCNNモデルは未知の機能ベクトルを分類するために使用される。提案手法を検証するために,6つの複雑なノイズデータセットを生成した。各データセットからランダムに70パーセントのサンプルをトレーニングに使用し、30%をテストに使用しました。データセットはまた、決定木(DT)、サポートベクターマシン(SVM)、ファジィニューラルネットワーク(FNN)、ベイズ分類器、ランダムフォレスト(RF)といった最先端の機械学習アルゴリズムを使用して分類された。実験結果から,提案手法は従来の手法と比較して,有意な表現を表象データから効果的に学習し,競争力や優れた性能を達成できることが示唆された。全体として、提案したFCNNモデルは、表型データ分類タスクの代替として有望であり、構造化データ分析におけるディープラーニングを活用する新たな機会を、新たな期待と潜在的に解放する可能性を示唆している。 Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.	翻訳日:2024-08-27 23:17:21 公開日:2024-08-24
# 有限サイズ効果による量子相対エントロピーの測定 Measuring quantum relative entropy with finite-size effect ( http://arxiv.org/abs/2406.17299v2 ) ライセンス: Link先を確認	Masahito Hayashi,	(参考訳) 相対エントロピー$D(\rho\\|\sigma)$を$\sigma$が知られているときに推定する。我々は、Cram\'{e}r-Rao型が相対的バレントロピーと等しいことを示す。我々の推定器は次元 $d$ が固定されたときに Cram\'{e}r-Rao 型が有界となる。また、次元$d$が増加すると、サンプルの複雑さ$O(d^2)$も達成する。このサンプルの複雑さは、$\sigma$が完全に混合状態であるときに最適である。また、時間複雑性は$O(d^6 \polylog d)$である。提案する推定器は両設定で統一的に動作する。 We study the estimation of relative entropy $D(\rho\\|\sigma)$ when $\sigma$ is known. We show that the Cram\'{e}r-Rao type bound equals the relative varentropy. Our estimator attains the Cram\'{e}r-Rao type bound when the dimension $d$ is fixed. It also achieves the sample complexity $O(d^2)$ when the dimension $d$ increases. This sample complexity is optimal when $\sigma$ is the completely mixed state. Also, it has time complexity $O(d^6 \polylog d)$. Our proposed estimator unifiedly works under both settings.	翻訳日:2024-08-27 22:57:33 公開日:2024-08-24
# データセンターの不確実性を考慮した脱炭 Uncertainty-Aware Decarbonization for Datacenters ( http://arxiv.org/abs/2407.02390v2 ) ライセンス: Link先を確認	Amy Li, Sihang Liu, Yi Ding,	(参考訳) 本論文は, データセンター脱炭のための炭素強度予測の不確かさを定量化するための最初の試みである。我々は、時間的および空間的な2つの不確実性を特定し、分析し、システム含意について議論する。炭素強度予測の不確かさの定量化における時間的ダイナミクスに対処するために,共形予測に基づく枠組みを導入する。評価結果から, 本手法は, 種々の意義レベルにわたる不確実性定量化において, 対象範囲を頑健に達成できることが示唆された。生産電力トレースを用いた2つのケーススタディを行い,時間的および空間的負荷シフトに着目した。その結果, スケジュール決定に不確実性を導入することで, それぞれ5%と14%の二酸化炭素排出量の増加を防止できることがわかった。これらの割合は20MWのデータセンターで2.1トンと10.4トンの炭素排出量を絶対的に減少させる。 This paper represents the first effort to quantify uncertainty in carbon intensity forecasting for datacenter decarbonization. We identify and analyze two types of uncertainty -- temporal and spatial -- and discuss their system implications. To address the temporal dynamics in quantifying uncertainty for carbon intensity forecasting, we introduce a conformal prediction-based framework. Evaluation results show that our technique robustly achieves target coverages in uncertainty quantification across various significance levels. We conduct two case studies using production power traces, focusing on temporal and spatial load shifting respectively. The results show that incorporating uncertainty into scheduling decisions can prevent a 5% and 14% increase in carbon emissions, respectively. These percentages translate to an absolute reduction of 2.1 and 10.4 tons of carbon emissions in a 20 MW datacenter cluster.	翻訳日:2024-08-27 22:57:33 公開日:2024-08-24
# マルチ話者とターゲット話者の同時音声認識システムとしてのウィスパーの活用 Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System ( http://arxiv.org/abs/2407.09817v2 ) ライセンス: Link先を確認	Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng,	(参考訳) マルチトーカー音声認識とターゲットストーカー音声認識は、どちらもマルチトーカーコンテキストにおける転写を含むが、依然として大きな課題である。しかし、既存のメソッドは両方のタスクを同時に処理しようとすることは滅多にない。本研究では,言語基盤モデルであるWhisperを,複数話者とターゲット話者の同時音声認識タスクに適応させる先駆的手法を提案する。具体的には (i)Whisperを凍結し、Sidecarセパレータをエンコーダに差し込み、複数の話者に対する混合埋め込みを分離する。 2 目標話者識別器を導入して、目標話者のハエへの埋め込みの流れを識別し、cueとして3秒の音声のみを必要とする。 3) タスク適応性を向上させるため, デコーダのソフトプロンプトチューニングについて検討した。 AishellMix Mandarin データセット上で,2-および3-talker の LibriMix と LibriSpeechMix の2つのタスクに対して従来手法よりも優れており,AishellMix Mandarin データセット上でのマルチストーカー ASR のゼロショット性能が許容できる。 Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks. Specifically, (i) we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers; (ii) a Target Talker Identifier is introduced to identify the embedding flow of the target talker on the fly, requiring only three-second enrollment speech as a cue; (iii) soft prompt tuning for decoder is explored for better task adaptation. Our method outperforms previous methods on two- and three-talker LibriMix and LibriSpeechMix datasets for both tasks, and delivers acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.	翻訳日:2024-08-27 22:47:47 公開日:2024-08-24
# サルカスム検出は大規模言語モデルにおけるステップバイステップ推論プロセスか? Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? ( http://arxiv.org/abs/2407.12725v2 ) ライセンス: Link先を確認	Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin,	(参考訳) 一連の中間推論ステップを共同作業することで、LLMを逐次的に考えさせるような複雑な問題を解くための大きな言語モデル(LLM)の能力が大幅に向上する。しかしながら、人間の皮肉理解は直感的で全体論的認知過程と見なされ、様々な言語的、文脈的、感情的な手がかりが統合され、必ずしもステップバイステップのやり方に従わないような包括的理解を形成する。本論の妥当性を検証するために,4つのサブメソッド,Viz. chain of contradiction (CoC), Graph of cues (GoC), bagging of cues (BoC), tensor of cues (ToC) を含む新たなプロンプトフレームワーク(SarcasmCue)を導入する。 1) CoC と GoC は GPT-4 や Claude 3.5 といったより高度なモデルで優れた性能を示し,3.5% の改善を実現した。 2)ToCはLLMが小さく評価された場合,F1スコアが最良基準値に対して29.7%向上するなど,他の手法よりも優れていた。 (3)提案したフレームワークは、4つのデータセットでF1スコアの4.2%、2.0%、29.7%、58.2%を継続的に最先端(ToT)にプッシュします。これは提案したフレームワークの有効性と安定性を示している。 Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive understanding, in a way that does not necessarily follow a step-by-step fashion. To verify the validity of this argument, we introduce a new prompting framework (called SarcasmCue) containing four sub-methods, viz. chain of contradiction (CoC), graph of cues (GoC), bagging of cues (BoC) and tensor of cues (ToC), which elicits LLMs to detect human sarcasm by considering sequential and non-sequential prompting methods. Through a comprehensive empirical comparison on four benchmarks, we highlight three key findings: (1) CoC and GoC show superior performance with more advanced models like GPT-4 and Claude 3.5, with an improvement of 3.5%. (2) ToC significantly outperforms other methods when smaller LLMs are evaluated, boosting the F1 score by 29.7% over the best baseline. (3) Our proposed framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets. This demonstrates the effectiveness and stability of the proposed framework.	翻訳日:2024-08-27 22:47:47 公開日:2024-08-24
# PriPL-Tree: 局所微分プライバシー下での任意分布の正確なレンジクエリ PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy ( http://arxiv.org/abs/2407.13532v2 ) ライセンス: Link先を確認	Leixia Wang, Qingqing Ye, Haibo Hu, Xiaofeng Meng,	(参考訳) 局所微分プライバシー(LDP)の文脈における範囲クエリの回答は、オンライン分析処理(OLAP)において広く研究されている問題である。既存のLCPソリューションはすべて、各ドメインパーティション内の均一なデータ分散を前提としており、データの分散が変化している現実のシナリオと一致しない可能性があるため、不正確な見積もりをもたらす。この問題に対処するために、任意の分布に対する範囲クエリに答えるために、階層木構造とPL関数を組み合わせた新しいデータ構造であるPriPL-Treeを導入する。 PriPL-Treeは、いくつかの行セグメントで基礎となるデータ分散を正確にモデル化し、レンジクエリのより正確な結果をもたらす。さらに、新しいデータ認識適応グリッドを用いた多次元ケースに拡張する。これらのグリッドは、PriPL-Treesを通して得られた限界分布からの洞察を利用してグリッドを適応的に分割し、基礎となる分布の密度に適応する。実データと合成データの両方に対する広範な実験により、任意のデータ分布にまたがる範囲クエリに応答する最先端のソリューションに対するPriPL-Treeの有効性と優位性を示した。 Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree, a novel data structure that combines hierarchical tree structures with piecewise linear (PL) functions to answer range queries for arbitrary distributions. PriPL-Tree precisely models the underlying data distribution with a few line segments, leading to more accurate results for range queries. Furthermore, we extend it to multi-dimensional cases with novel data-aware adaptive grids. These grids leverage the insights from marginal distributions obtained through PriPL-Trees to partition the grids adaptively, adapting the density of underlying distributions. Our extensive experiments on both real and synthetic datasets demonstrate the effectiveness and superiority of PriPL-Tree over state-of-the-art solutions in answering range queries across arbitrary data distributions.	翻訳日:2024-08-27 22:47:47 公開日:2024-08-24
# トレーディング・デビル・ファイナル:株式市場によるバックドア攻撃とベイズ最適化 Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization ( http://arxiv.org/abs/2407.14573v4 ) ライセンス: Link先を確認	Orson Mengara,	(参考訳) 生成人工知能の出現以来、あらゆる企業や研究者が、商業的であろうとなかろうと、独自の生成モデルの開発を急いできた。これらの強力な新ツールのユーザ数を考えると、LLM(大規模言語モデル)が学習した時に何が起こるかを説明するための、本質的に検証可能な方法は今のところありません。例えば,Webから収集した膨大な量のデータに頼って高速かつ効率的な結果を得る自動音声認識システムでは,音響データ中毒に基づくMarketBackFinal 2.0と呼ばれるバックドアアタックが開発され,MarketBackFinal 2.0は主に現代の株式市場モデルに基づいている。 LLMに依存する可能性のある音声ベースのトランスフォーマーの脆弱性を示す。 Since the advent of generative artificial intelligence, every company and researcher has been rushing to develop their own generative models, whether commercial or not. Given the large number of users of these powerful new tools, there is currently no intrinsically verifiable way to explain from the ground up what happens when LLMs (large language models) learn. For example, those based on automatic speech recognition systems, which have to rely on huge and astronomical amounts of data collected from all over the web to produce fast and efficient results, In this article, we develop a backdoor attack called MarketBackFinal 2.0, based on acoustic data poisoning, MarketBackFinal 2.0 is mainly based on modern stock market models. In order to show the possible vulnerabilities of speech-based transformers that may rely on LLMs.	翻訳日:2024-08-27 22:47:47 公開日:2024-08-24
# ディープラーニングによるブラックスクールデルタヘッジの強化 Enhancing Black-Scholes Delta Hedging via Deep Learning ( http://arxiv.org/abs/2407.19367v2 ) ライセンス: Link先を確認	Chunhui Qiao, Xiangwei Wan,	(参考訳) 本稿では,ニューラルネットワークを応用して,ヒージング関数とインプリッドブラックスコールズデルタの間の残差を学習する,オプションのための深いデルタヒージングフレームワークを提案する。このアプローチはこれらの残留物のスムーズな特性を活用し、ディープラーニング性能を向上させる。 10年間の日次S&P 500指数データを用いて,平均2乗1ステップのヘッジ誤差を損失関数として用いた残差の学習が,ヒージング関数を直接学習するよりも,ヒージング性能を100%以上向上させることを示した。残差を学習する際に入力機能を追加することで、呼び出しよりもヘッジパフォーマンスが向上する。さらに,3年間のデータによる残差の学習は,10年間のデータを直接学習する際の過度な性能と一致し,本手法が要求するデータ量が少なくなることを証明した。 This paper proposes a deep delta hedging framework for options, utilizing neural networks to learn the residuals between the hedging function and the implied Black-Scholes delta. This approach leverages the smoother properties of these residuals, enhancing deep learning performance. Utilizing ten years of daily S&P 500 index option data, our empirical analysis demonstrates that learning the residuals, using the mean squared one-step hedging error as the loss function, significantly improves hedging performance over directly learning the hedging function, often by more than 100%. Adding input features when learning the residuals enhances hedging performance more for puts than calls, with market sentiment being less crucial. Furthermore, learning the residuals with three years of data matches the hedging performance of directly learning with ten years of data, proving that our method demands less data.	翻訳日:2024-08-27 20:50:26 公開日:2024-08-24
# TVDiag:マルチモーダルデータを用いたタスク指向・ビュー不変の故障診断フレームワーク TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data ( http://arxiv.org/abs/2407.19711v2 ) ライセンス: Link先を確認	Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li,	(参考訳) マイクロサービスベースのシステムは、複雑なインタラクションとスケールの拡大によって、信頼性上の問題に悩まされることが多い。観測可能性技術の急速な成長に伴い、ログやメトリクス、トレースといった多様なモニタリングデータを活用することにより、根本原因のローカライゼーションや障害タイプ識別など、さまざまな障害診断を実現する方法が提案されている。しかし、単一モーダルデータを使用する従来の障害診断手法では、制限された情報のため、すべての障害シナリオをほとんどカバーできない。近年,深層学習に基づくマルチモーダルデータ統合のための故障診断手法が提案されている。しかしながら、これらの手法は、特定のモダリティと異なる診断タスクとの関係を無視して、非差別的にモダリティを結合し、障害診断においてそれらを等しく扱う傾向にある。この監視は、各モダリティが提供するユニークな利点の有効利用を妨げる。この制限に対処するため、我々は、マイクロサービスベースのシステムにおいて、犯人のマイクロサービスインスタンスを特定し、それらの障害タイプ(Net-packets Corruptionなど)を特定するためのマルチモーダルな障害診断フレームワークである、‘textit{TVDiag}’を提案する。 \textit{TVDiag} はタスク指向学習を用いて各モダリティの潜在的な優位性を高め、対照的な学習に基づくクロスモーダルなアソシエーションを確立し、ビュー不変の障害情報を抽出する。さらに、トレーニング中の通常のマイクロサービスインスタンスの可観測性をランダムに不活性化するグラフレベルのデータ拡張戦略を開発し、トレーニングデータの不足を軽減する。実験結果によると、‘textit{TVDiag} はマルチモーダル故障診断における最先端の手法よりも優れており、2つのデータセットで F1スコアが4.08 %以上上昇し、少なくとも55.94 %高いHR@1$精度を達成した。 Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. Several failure diagnosis methods have been recently proposed to integrate multimodal data based on deep learning. These methods, however, tend to combine modalities indiscriminately and treat them equally in failure diagnosis, ignoring the relationship between specific modalities and different diagnostic tasks. This oversight hinders the effective utilization of the unique advantages offered by each modality. To address the limitation, we propose \textit{TVDiag}, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types (e.g., Net-packets Corruption) in microservice-based systems. \textit{TVDiag} employs task-oriented learning to enhance the potential advantages of each modality and establishes cross-modal associations based on contrastive learning to extract view-invariant failure information. Furthermore, we develop a graph-level data augmentation strategy that randomly inactivates the observability of some normal microservice instances during training to mitigate the shortage of training data. Experimental results show that \textit{TVDiag} outperforms state-of-the-art methods in multimodal failure diagnosis, achieving at least a 55.94\% higher $HR@1$ accuracy and over a 4.08\% increase in F1-score across two datasets.	翻訳日:2024-08-27 20:50:26 公開日:2024-08-24
# 拡散フィードバックがCLIPの改善に役立つ Diffusion Feedback Helps CLIP See Better ( http://arxiv.org/abs/2407.20171v4 ) ライセンス: Link先を確認	Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang,	(参考訳) ドメインやモダリティ間のオープンワールド表現を抽象化するコントラスト言語-画像事前学習(CLIP)は、さまざまなビジョンやマルチモーダルタスクの基盤となっている。しかし、最近の研究では、CLIPには、方向、量、色、構造などの区別がほとんどできない、深刻な視覚的欠点があることが示されている。これらの視覚的欠点は、CLIP上に構築されたマルチモーダルな大規模言語モデル(MLLM)の認識能力を制限している。主な理由は、CLIPのトレーニングに使用される画像テキストペアが、テキストの特異性や画像の多様性が欠如しているため、本質的にバイアスがあるためかもしれない。本稿では,CLIPモデルに対して,自己教師付き拡散プロセスを通じて視覚的欠点を克服する,簡単なポストトレーニング手法を提案する。私たちはDIVAを導入し、DIffusionモデルをCLIPのビジュアルアシスタントとして使用します。特に、DIVAはテキストから画像への拡散モデルからの生成的フィードバックを活用して、画像のみ(対応するテキストなしで)CLIP表現を最適化する。本研究では,MMVP-VLMベンチマークにおけるCLIPの性能向上を実証し,マルチモーダル理解とセグメンテーションタスクにおけるMLLMとビジョンモデルの性能向上を図る。 29の画像分類と検索ベンチマークの大規模な評価により、我々のフレームワークはCLIPの強力なゼロショット能力を保っていることを確認した。コードはhttps://github.com/baaivision/DIVA.comで公開されている。 Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the perception capabilities of multimodal large language models (MLLMs) built on CLIP. The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. We introduce DIVA, which uses the DIffusion model as a Visual Assistant for CLIP. Specifically, DIVA leverages generative feedback from text-to-image diffusion models to optimize CLIP representations, with only images (without corresponding text). We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e.g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks. Extensive evaluation on 29 image classification and retrieval benchmarks confirms that our framework preserves CLIP's strong zero-shot capabilities. The code is available at https://github.com/baaivision/DIVA.	翻訳日:2024-08-27 20:50:26 公開日:2024-08-24
# Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment ( http://arxiv.org/abs/2408.06266v2 ) ライセンス: Link先を確認	Karel D'Oosterlinck, Winnie Xu, Chris Develder, Thomas Demeester, Amanpreet Singh, Christopher Potts, Douwe Kiela, Shikib Mehri,	(参考訳) 大規模言語モデル(LLM)は、しばしばコントラスト的なアライメント目標と選好ペアデータセットを使って整列される。モデル、ペアデータ、および目的間の相互作用は複雑な手順を作り、時にサブパー結果を生成する。私たちはこれを研究し、それを見つけます二嗜好データにより、基礎となる応答が対照的な場合に、より良い学習信号が得られること。 (ii)アライメントの目的は、トレーニング中にモデルに対するさらなるコントロールを指定すると、パフォーマンスが向上する。これらの知見に基づき、よりコントラスト的な選好ペアを生み出すデータ生成手法であるContrastive Learning from AI Revisions (CLAIR)と、制御可能でより安定したアライメント目的であるAnchored Preference Optimization (APO)を紹介する。我々はLlama-3-8B-Instructを、様々な類似したデータセットとアライメント目標を用いて調整し、MixEval-Hardスコアを測定する。 CLAIRの選好はすべてのデータセットの中で最強のパフォーマンスをもたらし、APOは一貫してコントロール可能な目標よりも優れています。我々の最良のモデルは、APOで32K CLAIRの選好に基づいて訓練され、Llama-3-8B-Instructを7.65%改善し、GPT4-turboとのギャップを45%短縮しました。私たちのコードはhttps://github.com/ContextualAI/CLAIR_and_APO.orgで公開されています。 Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code is available at https://github.com/ContextualAI/CLAIR_and_APO.	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# 二次元マニピュレーションのための模倣学習アルゴリズムの比較 A Comparison of Imitation Learning Algorithms for Bimanual Manipulation ( http://arxiv.org/abs/2408.06536v2 ) ライセンス: Link先を確認	Michael Drolet, Simon Stepputtis, Siva Kailas, Ajinkya Jain, Jan Peters, Stefan Schaal, Heni Ben Amor,	(参考訳) ロボット工学における模倣学習アルゴリズムの普及の中で、ハイパーパラメータの感度、トレーニングの容易さ、データ効率、パフォーマンスに関するそれらの特性は、高精度産業にインスパイアされた環境ではよく研究されていない。本研究は,顕著な模倣学習アプローチの限界とメリットを実証し,それらの特性を解析する。我々は,操作対象と環境との複数の接触を含む設定において,過剰に制約された動的システムを含む複雑な双方向操作タスクにおいて,各アルゴリズムを評価する。模倣学習は複雑なタスクを解くのに適しているが、全てのアルゴリズムが環境やハイパーパラメータの摂動、訓練要件、性能、使いやすさを扱うという点で等しいわけではない。本研究では,これらの特徴の実証的影響について,慎重に設計した実験手法と学習環境を用いて検討する。 Paper website: https://bimanual-imitation.github.io/ Amidst the wide popularity of imitation learning algorithms in robotics, their properties regarding hyperparameter sensitivity, ease of training, data efficiency, and performance have not been well-studied in high-precision industry-inspired environments. In this work, we demonstrate the limitations and benefits of prominent imitation learning approaches and analyze their capabilities regarding these properties. We evaluate each algorithm on a complex bimanual manipulation task involving an over-constrained dynamics system in a setting involving multiple contacts between the manipulated object and the environment. While we find that imitation learning is well suited to solve such complex tasks, not all algorithms are equal in terms of handling environmental and hyperparameter perturbations, training requirements, performance, and ease of use. We investigate the empirical influence of these key characteristics by employing a carefully designed experimental procedure and learning environment. Paper website: https://bimanual-imitation.github.io/	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# 長時間のアウト・オブ・ディストリビューション検出:タイルへの注意の優先順位付け Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail ( http://arxiv.org/abs/2408.06742v2 ) ライセンス: Link先を確認	Yina He, Lei Peng, Yongcun Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li,	(参考訳) 現在のアウト・オブ・ディストリビューション(OOD)検出法は、通常はバランスの取れたイン・ディストリビューション(ID)データを仮定する。長い尾のOOD検出に対する以前のアプローチは、しばしばヘッドクラスのセマンティクスを減らしてIDデータのバランスをとる。しかし、この削減はIDデータの分類精度に深刻な影響を及ぼす可能性がある。このタスクの主な課題は、テールクラスの機能の深刻な欠如であり、OODデータとの混同につながります。この問題に対処するために,削減ではなく拡張を用いたPATT法を提案する。我々の主な直感は、von Mises-Fisher(vMF)分布を混合してIDデータと温度スケーリングモジュールをモデル化し、IDデータの信頼性を高めることである。これにより、IDとOODデータの区別を促進しながら、IDクラスのセマンティクスを暗黙的に強化し、無限のコントラスト対を生成することができる。 IDデータの分類性能を損なうことなくOODデータの検出をさらに強化するため,推測フェーズにおける特徴キャリブレーションを提案する。テールクラスを優先し、OODデータの信頼性を低下させる訓練セットから注意重みを抽出することにより、OOD検出能力を向上する。大規模実験により,本手法は様々なベンチマークにおいて最先端の手法よりも優れていることを確認した。 Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of this task lies in the severe lack of features for tail classes, leading to confusion with OOD data. To tackle this issue, we introduce a novel Prioritizing Attention to Tail (PATT) method using augmentation instead of reduction. Our main intuition involves using a mixture of von Mises-Fisher (vMF) distributions to model the ID data and a temperature scaling module to boost the confidence of ID data. This enables us to generate infinite contrastive pairs, implicitly enhancing the semantics of ID classes while promoting differentiation between ID and OOD data. To further strengthen the detection of OOD data without compromising the classification performance of ID data, we propose feature calibration during the inference phase. By extracting an attention weight from the training set that prioritizes the tail classes and reduces the confidence in OOD data, we improve the OOD detection capability. Extensive experiments verified that our method outperforms the current state-of-the-art methods on various benchmarks.	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# パーキンソン病の重症度評価のためのホームターン角推定 Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment ( http://arxiv.org/abs/2408.08182v2 ) ライセンス: Link先を確認	Qiushuo Cheng, Catherine Morgan, Arindam Sikdar, Alessandro Masullo, Alan Whone, Majid Mirmehdi,	(参考訳) パーキンソン病(PD)の患者は、疾患が進行するにつれて向きを変えるなど、歩行が徐々に悪化することがある。既存の臨床評価ツールでは、診療所内での短い評価に制限されるため、時間ごとのPD症状の変動を捉えることができない。歩行の回転角を連続的かつ受動的に測定することは、歩行特性をPDの疾患進行の敏感な指標として活用するための要素である。本稿では, ビデオから3次元骨格を抽出し, 股関節と膝関節の回転を計算し, 回転角を自動的に定量化する深層学習手法を提案する。我々は、現在最先端の人間のポーズ推定モデルであるFastposeとStrided Transformerを、24人の被験者(PDの12人、健康管理のボランティアの12人)の動画クリップを、自宅のような設定でPDデータセットからトリミングする(Turn-REMAP)。また、人間3.6Mの人間ポーズベンチマークからターンビデオデータセットであるTurn-H3.6Mを3D地上真実でキュレートし、我々の手法をさらに検証する。これまでの歩行研究は、主にクリニックや研究室でスクリプト歩行の結果を評価するが、この研究は、手ぶらりした衣服や照明不足などの複雑さがある自由生活の家庭環境に焦点を当てている。自由生活環境において正確な地上真実データを得るのに難しかったため、専門医の手によるラベル付けに基づいて、最寄りのビン45^\circ$に定量化する。提案手法は,旋回計算精度が41.6%,平均絶対誤差が34.7{\deg},重み付き精度WPrecが68.3%である。これは、一眼レフカメラデータを用いて、自宅のPD患者によるターンの定量化を行う最初の研究である。 People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring gait turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints. We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP). We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method. Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on free-living home settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin $45^\circ$ based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7{\deg}, and a weighted precision WPrec of 68.3% for Turn-REMAP. This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting.	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# LLMのフェローシップ:合成選好最適化データセット生成のためのマルチエージェントワークフロー The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation ( http://arxiv.org/abs/2408.08688v2 ) ライセンス: Link先を確認	Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza,	(参考訳) 本稿では、マルチエージェントワークフローを用いて生成した合成優先度最適化(PO)データセットについて、データセット生成プロセスにおけるこれらのワークフローの有効性とポテンシャルを評価する。 POデータセット生成には,(1)応答評価,(2)応答生成という2つのモジュールが必要である。応答評価モジュールでは,Lumge Language Models (LLMs) からの応答を評価し,評価する。反応評価モジュールを2段階のプロセスで評価する。ステップ1では,LLMを3つの異なるプロンプト戦略を用いて評価する。ステップ2では, LLM-as-a-Judge, LLMs-as-a-Jury, LLM Debateの性能の比較を行う。それぞれのステップで、人間のアノテーションとLDM間のCohen's Kappaを用いたラスタ間合意を用いる。応答生成モジュールについて、LLM評価器の設定を用いて、LLMフィードバックループの異なる構成を比較する。我々は、勝利率(LLM評価器によって生成フレームワークがベストに選択される回数)を用いて、生成のための最適なマルチエージェント構成を決定する。両方のモジュールで最適な設定を特定した後、GPT、Gemma、Llamaファミリーのモデルを使用して、上記のパイプラインを使用してPOデータセットを生成します。我々は2種類のPOデータセットを生成し、1つは個々のLLMの生成能力を向上し、もう1つはマルチエージェントワークフローを改善する。 GPT4o-as-a-Judgeは,GPTファミリーからの応答を含まない場合,データセット間でより一貫性があることが評価された。さらに、Llamaをジェネレータとし、GemmaをレビュアーとするLLMフィードバックループは、LlamaとGemmaをそれぞれ71.8%、73.8%の勝利率を達成した。 This paper presents synthetic Preference Optimization (PO) datasets generated using multi-agent workflows and evaluates the effectiveness and potential of these workflows in the dataset generation process. PO dataset generation requires two modules: (1) response evaluation, and (2) response generation. In the response evaluation module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. In each step, we use inter-rater agreement using Cohen's Kappa between human annotators and LLMs. For the response generation module, we compare different configurations for the LLM Feedback Loop using the identified LLM evaluator configuration. We use the win rate (the fraction of times a generation framework is selected as the best by an LLM evaluator) to determine the best multi-agent configuration for generation. After identifying the best configurations for both modules, we use models from the GPT, Gemma, and Llama families to generate our PO datasets using the above pipeline. We generate two types of PO datasets, one to improve the generation capabilities of individual LLM and the other to improve the multi-agent workflow. Our evaluation shows that GPT-4o-as-a-Judge is more consistent across datasets when the candidate responses do not include responses from the GPT family. Additionally, we find that the LLM Feedback Loop, with Llama as the generator and Gemma as the reviewer, achieves a notable 71.8% and 73.8% win rate over single-agent Llama and Gemma, respectively.	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# Unc-TTP: 文脈内事例選択を改善するLLM不確かさの分類方法 Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection ( http://arxiv.org/abs/2408.09172v3 ) ライセンス: Link先を確認	Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu,	(参考訳) 現在、Large Language Models (LLMs) は様々な下流タスクで例外的なパフォーマンスを示している。しかし、ユーザの期待に応えるために、応答が確実に生成されるか、あるいは作られているかを知ることは困難である。 LLMの不確実性を推定することは、その大規模化とホワイトボックスアクセスの欠如により特に困難である。本研究では,ラベル干渉をサンプリングベースアプローチに組み込む際のLCM出力の整合性を評価することによって,LCMの不確かさを分類する新しいUncertainty Tripartite Testing Paradigm(Unc-TTP)を提案する。 Unc-TTP出力に基づいて、インスタンスを特定のカテゴリと不確実なカテゴリに集約する。さらに,LLMの不確かさの詳細な解析を行い,既存のサンプリング法よりもUnc-TTPの方が優れていることを示す。さらに、得られた不確実性情報を利用して、文脈内サンプル選択を誘導し、Unc-TTPが明らかに検索ベースおよびサンプリングベースアプローチより優れていることを示す。本研究は,オープンソース LLM とクローズドソース LLM の両方の不確かさを分類する新たな手法を提案し,この不確実性を利用して LLM の性能を向上させるための実践的アプローチを提案する。 Nowadays, Large Language Models (LLMs) have demonstrated exceptional performance across various downstream tasks. However, it is challenging for users to discern whether the responses are generated with certainty or are fabricated to meet user expectations. Estimating the uncertainty of LLMs is particularly challenging due to their vast scale and the lack of white-box access. In this work, we propose a novel Uncertainty Tripartite Testing Paradigm (Unc-TTP) to classify LLM uncertainty, via evaluating the consistency of LLM outputs when incorporating label interference into the sampling-based approach. Based on Unc-TTP outputs, we aggregate instances into certain and uncertain categories. Further, we conduct a detailed analysis of the uncertainty properties of LLMs and show Unc-TTP's superiority over the existing sampling-based methods. In addition, we leverage the obtained uncertainty information to guide in-context example selection, demonstrating that Unc-TTP obviously outperforms retrieval-based and sampling-based approaches in selecting more informative examples. Our work paves a new way to classify the uncertainty of both open- and closed-source LLMs, and introduces a practical approach to exploit this uncertainty to improve LLMs performance.	翻訳日:2024-08-27 20:30:25 公開日:2024-08-24
# 小データを用いたソフト拘束型物理インフォームニューラルネットワークによるオシレータODEの解法 Solving Oscillator ODEs via Soft-constrained Physics-informed Neural Network with Small Data ( http://arxiv.org/abs/2408.11077v2 ) ライセンス: Link先を確認	Kai-liang Lu, Yu-meng Su, Zhuo Bi, Cheng Qiu, Wen-jun Zhang,	(参考訳) 本稿では,物理インフォームドニューラルネットワーク(PINN),従来のニューラルネットワーク(NN)および従来の数値離散化法を,文献調査と実験的検証を通じて比較した。我々は,ソフト制約のPINNアプローチに着目し,その数学的枠組みと計算フローを正規DESと部分DDE(ODE/PDE)の解法として定式化した。動作機構とその精度と効率は、典型的な線形および非線形(例えば、プライマー、ファンデルポル、ダッフィング)振動子ODEを解くことによって実験的に検証された。我々は、PINNのDeepXDEベースの実装が、トレーニングにおいて軽量コードであり、効率的なだけでなく、CPU/GPUプラットフォーム間で柔軟なことを実証した。 PINNは、ODEの非線形性が弱い場合、非常に少数の教師なしのトレーニングデータと少数の教師なしのコロケーションポイントが解を予測するのに十分であり、最小限の場合、それぞれ1階または2階のODEに対して1つまたは2つのトレーニングポイント(初期値)しか必要としない。また,コロケーションポイントの活用と物理情報の利用により,PINNはトレーニングセットの時間領域外からデータを外挿する能力を有し,特にノイズの多いデータに対して堅牢であり,一般化能力の強化が期待できる。損失関数項の増加による遅延よりも、データ量の削減とともに得られる利得が、トレーニングを加速する。ソフト制約されたPINNは、全損失関数に正規化項を追加することにより、物理法則(例えばエネルギーの保存)を容易に課すことができ、この物理法則に従うODEに対する解性能を向上させることができる。さらに、PINNは固いODEやPDE、その他のDESにも利用でき、デジタルツインズ時代において好ましい触媒になりつつある。 This paper compared physics-informed neural network (PINN), conventional neural network (NN) and traditional numerical discretization methods on solving differential equations (DEs) through literature investigation and experimental validation. We focused on the soft-constrained PINN approach and formalized its mathematical framework and computational flow for solving Ordinary DEs and Partial DEs (ODEs/PDEs). The working mechanism and its accuracy and efficiency were experimentally verified by solving typical linear and non-linear (e.g., Primer, Van der Pol, Duffing) oscillator ODEs. We demonstrate that the DeepXDE-based implementation of PINN is not only light code and efficient in training, but also flexible across CPU/GPU platforms. PINN greatly reduces the need for labeled data: when the nonlinearity of the ODE is weak, a very small amount of supervised training data plus a few unsupervised collocation points are sufficient to predict the solution; in the minimalist case, only one or two training points (with initial values) are needed for first- or second-order ODEs, respectively. We also find that, with the aid of collocation points and the use of physical information, PINN has the ability to extrapolate data outside the time domain of the training set, and especially is robust to noisy data, thus with enhanced generalization capabilities. Training is accelerated when the gains obtained along with the reduction in the amount of data outweigh the delay caused by the increase in the loss function terms. The soft-constrained PINN can easily impose a physical law (e.g., conservation of energy) constraint by adding a regularization term to the total loss function, thus improving the solution performance to ODEs that obey this physical law. Furthermore, PINN can also be used for stiff ODEs, PDEs, and other types of DEs, and is becoming a favorable catalyst for the era of Digital Twins.	翻訳日:2024-08-27 20:20:40 公開日:2024-08-24
# TVG:拡散モデルを用いたトレーニング不要遷移ビデオ生成法 TVG: A Training-free Transition Video Generation Method with Diffusion Models ( http://arxiv.org/abs/2408.13413v1 ) ライセンス: Link先を確認	Rui Zhang, Yaosen Chen, Yuegen Liu, Wei Wang, Xuming Wen, Hongxia Wang,	(参考訳) 遷移ビデオはメディア制作において重要な役割を担い、視覚的物語の流れとコヒーレンスを高める。フォーミングのような伝統的な手法は芸術的な魅力を欠くことが多く、特殊スキルを必要とし、その効果を制限している。拡散モデルに基づくビデオ生成の最近の進歩は、トランジションを作成する新しい可能性を提供するが、フレーム間の関係モデリングの貧弱や突然のコンテンツ変更といった課題に直面している。本稿では,これらの制約に対処するビデオレベルの拡散モデルを用いて,新たなトレーニング不要な遷移ビデオ生成(TVG)手法を提案する。提案手法はガウス過程回帰($\mathcal{GPR}$)を利用して遅延表現をモデル化し,フレーム間のスムーズかつダイナミックな遷移を保証する。さらに、時間的制御と遷移信頼性を高めるために、補間に基づく条件制御と周波数対応双方向融合(FBiF)アーキテクチャを導入する。ベンチマークデータセットとカスタムイメージペアの評価は,高品質なスムーズなトランジションビデオの生成において,我々のアプローチの有効性を示す。コードはhttps://sobeymil.github.io/tvg.comで提供されている。 Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes. We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training. Our method leverages Gaussian Process Regression ($\mathcal{GPR}$) to model latent representations, ensuring smooth and dynamic transitions between frames. Additionally, we introduce interpolation-based conditional controls and a Frequency-aware Bidirectional Fusion (FBiF) architecture to enhance temporal control and transition reliability. Evaluations of benchmark datasets and custom image pairs demonstrate the effectiveness of our approach in generating high-quality smooth transition videos. The code are provided in https://sobeymil.github.io/tvg.com.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 開量子系に対する操作的作業ゆらぎ定理 Operational work fluctuation theorem for open quantum systems ( http://arxiv.org/abs/2408.13417v1 ) ライセンス: Link先を確認	Konstantin Beyer, Walter T. Strunz,	(参考訳) 古典的ジャジンスキーの等式は、熱平衡から駆動される系上で実行される確率的仕事と対応する準定常過程における自由エネルギー差との正確な関係を確立する。この揺らぎ定理は、非平衡過程における外部に応用された仕事の測定を通じて自由エネルギー差を決定できるため、実験的な関係を持つ。量子の場合、ジャジンスキーの等式は、確率的作業の測定手順が劇的に変化した場合のみ成り立つ:それは、初期および最終ハミルトニアンの知識を必要とするいわゆる2点測定(TPM)スキームに置き換えられ、したがって古典的ジャジンスキー方程式が知られている自由エネルギー差の予測力に欠ける。ここでは、駆動プロトコルで決定される外部測定可能な量子ワークに有効である量子{{ゆらぎ定理}}を提案する。 TPMの場合とは対照的に、定理は開量子系にも適用され、ハミルトニアン系を知ることなくシナリオを実現できる。我々の揺らぎ定理は不等式の形で成り立つので、真の自由エネルギー差にのみ束縛される。不等式は、プロトコルの開始時と終了時にエネルギーコヒーレンスを消滅する準古典的な場合において飽和する。したがって、明らかに量子的不利がある。 The classical Jarzynski equality establishes an exact relation between the stochastic work performed on a system driven out of thermal equilibrium and the free energy difference in a corresponding quasi-static process. This fluctuation theorem bears experimental relevance, as it enables the determination of the free energy difference through the measurement of externally applied work in a nonequilibrium process. In the quantum case, the Jarzynski equality only holds if the measurement procedure of the stochastic work is drastically changed: it is replaced by a so-called two-point measurement (TPM) scheme that requires the knowledge of the initial and final Hamiltonian and therefore lacks the predictive power for the free energy difference that the classical Jarzynski equation is known for. Here, we propose a quantum {{fluctuation theorem}} that is valid for externally measurable quantum work determined during the driving protocol. In contrast to the TPM case, the theorem also applies to open quantum systems and the scenario can be realized without knowing the system Hamiltonian. Our fluctuation theorem comes in the form of an inequality and therefore only yields bounds to the true free energy difference. The inequality is saturated in the quasiclassical case of vanishing energy coherences at the beginning and at the end of the protocol. Thus, there is a clear quantum disadvantage.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 拡散モデルエキスパートの連鎖による無訓練長ビデオ生成 Training-free Long Video Generation with Chain of Diffusion Model Experts ( http://arxiv.org/abs/2408.13423v1 ) ライセンス: Link先を確認	Wenhao Li, Yichao Cao, Xie Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu,	(参考訳) ビデオ生成モデルは、映画製作などの分野で大きな可能性を秘めている。しかし、現在のビデオ拡散モデルでは、高い計算コストが必要であり、ビデオ生成タスクの複雑さのため、最適以下の結果が得られる。本稿では,ビデオ生成をより簡単なサブタスクに分解する,効率的な高品質なビデオ生成フレームワークである \textbf{ConFiner} を提案する。オフザシェルフ拡散モデルの専門家の鎖で高品質なビデオを生成することができ、それぞれが切り離されたサブタスクを担当している。改良期間中に,複数の拡散専門家の能力を単一のサンプリングにマージできるコーディネート・デノナイジングを導入する。さらに,ConFiner-Long フレームワークを設計し,ConFiner 上で3つの制約戦略で長いコヒーレントなビデオを生成する。実験の結果、推測コストのわずか10%のコストで、私たちのConFinerは、すべての客観的および主観的メトリクスでLavieやModelscopeのような代表モデルを超えています。そしてConFiner-Longは、600フレームまでの高品質でコヒーレントなビデオを生成することができる。 Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{con}trol and spatial-temporal re\textbf{fine}ment. It can generate high-quality videos with chain of off-the-shelf diffusion model experts, each expert responsible for a decoupled subtask. During the refinement, we introduce coordinated denoising, which can merge multiple diffusion experts' capabilities into a single sampling. Furthermore, we design ConFiner-Long framework, which can generate long coherent video with three constraint strategies on ConFiner. Experimental results indicate that with only 10\% of the inference cost, our ConFiner surpasses representative models like Lavie and Modelscope across all objective and subjective metrics. And ConFiner-Long can generate high-quality and coherent videos with up to 600 frames.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 差別化プライバシをターゲットとした人道的アプリケーションの実現 Enabling Humanitarian Applications with Targeted Differential Privacy ( http://arxiv.org/abs/2408.13424v1 ) ライセンス: Link先を確認	Nitin Kohli, Joshua Blumenstock,	(参考訳) 低所得国や中所得国における携帯電話の普及は、世界の貧困層や最も脆弱な人口が政府や企業によって観察され、追跡される範囲を劇的に増加させてきた。歴史的に「グリッド外」の個人がデジタルデータを受動的に生成している。これらのデータは、政府の給付を受けているかどうか、消費者ローンの資格があるかどうかなど、それらの個人について、人生を変える決定を下すために使用されている。本稿では,個人データに基づくアルゴリズム決定の実装手法を開発し,データ対象に対して公式なプライバシ保証を提供する。このアプローチは、個人に関する決定を必要とするアプリケーションに差分プライバシを適用し、データ主体に保証されるプライバシのレベルを、意思決定者がきめ細かいコントロールを提供する。より強力なプライバシー保証は、一般的にある程度のコストがかかることを示し、実際の2つのアプリケーションからのデータ(Togoの反ポルノプログラムとナイジェリアの消費者貸付プラットフォーム)を使って、それらのコストを例示している。私たちの経験的な結果は、プライバシと予測精度のトレードオフを定量化し、プライバシの保証が異なることがプログラム全体の効果に与える影響を特徴づけます。より広範に、私たちの結果は、人道的プログラムが責任を持って個人データを使用する方法を示し、データプライバシに関する情報決定を行うためのプログラムデザイナーの装備を向上する。 The proliferation of mobile phones in low- and middle-income countries has suddenly and dramatically increased the extent to which the world's poorest and most vulnerable populations can be observed and tracked by governments and corporations. Millions of historically "off the grid" individuals are now passively generating digital data; these data, in turn, are being used to make life-altering decisions about those individuals -- including whether or not they receive government benefits, and whether they qualify for a consumer loan. This paper develops an approach to implementing algorithmic decisions based on personal data, while also providing formal privacy guarantees to data subjects. The approach adapts differential privacy to applications that require decisions about individuals, and gives decision makers granular control over the level of privacy guaranteed to data subjects. We show that stronger privacy guarantees typically come at some cost, and use data from two real-world applications -- an anti-poverty program in Togo and a consumer lending platform in Nigeria -- to illustrate those costs. Our empirical results quantify the tradeoff between privacy and predictive accuracy, and characterize how different privacy guarantees impact overall program effectiveness. More broadly, our results demonstrate a way for humanitarian programs to responsibly use personal data, and better equip program designers to make informed decisions about data privacy.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 遅延データ拡張のための最適層選択 Optimal Layer Selection for Latent Data Augmentation ( http://arxiv.org/abs/2408.13426v1 ) ライセンス: Link先を確認	Tomoumi Takase, Ryo Karakida,	(参考訳) データ拡張(DA)は一般的に入力データに適用されるが、いくつかの研究では、ニューラルネットワークの隠れ層にDAを適用することによりパフォーマンスが向上する、と報告されている。しかし、従来の研究では、DAが適用される層は慎重に検討されておらず、しばしばランダムに均一に、あるいは特定の層にのみ適用され、仲裁の余地は残されている。そこで本研究では,様々な実験構成,例えばスクラッチからのトレーニング,移動学習,各種データセット設定,異なるモデルにおいて,DAの適用に適したレイヤの傾向について検討した。さらに,DAに適したレイヤを自動的に調整するために,トレーニング中の勾配降下法に基づいて各レイヤに対してDAを実行するように更新する適応層選択法(AdaLASE)を提案する。いくつかの画像分類データセットで得られた実験結果から,提案手法が期待どおりに変化し,総合的な試験精度が向上したことが示唆された。 While data augmentation (DA) is generally applied to input data, several studies have reported that applying DA to hidden layers in neural networks, i.e., feature augmentation, can improve performance. However, in previous studies, the layers to which DA is applied have not been carefully considered, often being applied randomly and uniformly or only to a specific layer, leaving room for arbitrariness. Thus, in this study, we investigated the trends of suitable layers for applying DA in various experimental configurations, e.g., training from scratch, transfer learning, various dataset settings, and different models. In addition, to adjust the suitable layers for DA automatically, we propose the adaptive layer selection (AdaLASE) method, which updates the ratio to perform DA for each layer based on the gradient descent method during training. The experimental results obtained on several image classification datasets indicate that the proposed AdaLASE method altered the ratio as expected and achieved high overall test accuracy.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# ICML 2023ランキングデータの分析: 著者の自身の論文に対する意見は機械学習におけるピアレビューに役立つか? Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? ( http://arxiv.org/abs/2408.13430v1 ) ライセンス: Link先を確認	Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su,	(参考訳) 我々は2023年のICML(International Conference on Machine Learning)のレビュープロセスにおいて、著者に複数の論文を提出し、評価された品質に基づいて論文のランク付けを依頼する実験を行った。我々はそれぞれ2,592件の応募書を含む1,342件のランク付けを受けた。本稿では、著者が提供するランキングをどのように活用して、機械学習会議におけるピアレビュープロセスを改善できるかを実証分析する。著者によるランキングを用いて生のレビュースコアを校正するイソトニックメカニズムに注目した。分析の結果,2乗と絶対誤差の両測定値において,評価値が生のスコアを上回っていることが判明した。また,高齢者のエリアチェアの推薦を監督する,論文の選定を支援する,緊急審査員の募集を指導するなど,アイソトニック・メカニズムと著者による査定プロセスにおけるランク付けに用いた慎重でリスクの低いアプローチをいくつか提案する。論文は,研究の限界に対処し,今後の研究方向性を提案することで締めくくっている。 We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings. Our analysis demonstrates that the ranking-calibrated scores outperform raw scores in estimating the ground truth ``expected review scores'' in both squared and absolute error metrics. Moreover, we propose several cautious, low-risk approaches to using the Isotonic Mechanism and author-provided rankings in peer review processes, including assisting senior area chairs' oversight of area chairs' recommendations, supporting the selection of paper awards, and guiding the recruitment of emergency reviewers. We conclude the paper by addressing the study's limitations and proposing future research directions.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 早期停止とエッジリコールによる顔クラスタリング Face Clustering via Early Stopping and Edge Recall ( http://arxiv.org/abs/2408.13431v1 ) ライセンス: Link先を確認	Junjie Liu,	(参考訳) 大規模顔クラスタリングは大きな進歩を遂げており、教師あり学習を伴う大規模顔クラスタリングの学習に多くの努力が注がれている。しかし、複雑なモデル設計と退屈なクラスタリングプロセスは、既存の手法で典型的である。このような制限は、現実世界のアプリケーションでは実現不可能なクラスタリングをもたらす。合理的で効率的なモデル設計とトレーニングを考慮する必要がある。さらに、教師なしの顔クラスタリングアルゴリズムの開発も重要であり、現実世界のアプリケーションではより現実的である。本稿では,これらの問題に対処するために,非教師付き顔クラスタリングアルゴリズムFC-ESと教師付き顔クラスタリングアルゴリズムFC-ESERを提案する。 FC-ESでは, 大規模顔クラスタリングの精度とリコールを同時に保証し, 効率的で効果的な隣り合うエッジ確率と新しい早期停止戦略を提案する。さらに,教師あり学習を活かすため,FC-ESERでは新たなエッジリコール戦略を提案し,FC-ESに接続されていないエッジ接続をさらにリコールする。顔,人物,車両のクラスタリングに関する複数のベンチマーク実験により,提案したFC-ESとFC-ESERは,従来の最先端手法よりも大幅に優れていたことがわかった。私たちのコードはhttps://github.com/jumptoliuj/FC-ESER.comで公開されます。 Large-scale face clustering has achieved significant progress, with many efforts dedicated to learning to cluster large-scale faces with supervised-learning. However, complex model design and tedious clustering processes are typical in existing methods. Such limitations result in infeasible clustering in real-world applications. Reasonable and efficient model design and training need to be taken into account. Besides, developing unsupervised face clustering algorithms is crucial, which are more realistic in real-world applications. In this paper, we propose a novel unsupervised face clustering algorithm FC-ES and a novel supervised face clustering algorithm FC-ESER to address these issues. An efficient and effective neighbor-based edge probability and a novel early stopping strategy are proposed in FC-ES, guaranteeing the accuracy and recall of large-scale face clustering simultaneously. Furthermore, to take advantage of supervised learning, a novel edge recall strategy is proposed in FC-ESER to further recall the edge connections that are not connected in FC-ES. Extensive experiments on multiple benchmarks for face, person, and vehicle clustering show that our proposed FC-ES and FC-ESER significantly outperform previous state-of-the-art methods. Our code will be available at https://github.com/jumptoliujj/FC-ESER.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# マルチヘッド畳み込みエンコーダとクロスアテンションの統合によるSPARQLクエリ変換の改善 Integrating Multi-Head Convolutional Encoders with Cross-Attention for Improved SPARQL Query Translation ( http://arxiv.org/abs/2408.13432v1 ) ライセンス: Link先を確認	Yi-Hui Chen, Eric Jui-Lin Lu, Kwan-Ho Cheng,	(参考訳) KGQAシステム(Knowledge Graph Question Answering)の主なタスクは、ユーザ入力の質問をクエリ構文(SPARQLなど)に変換することである。 TransformerやConvS2Sのようなモダンなエンコーダやデコーダの台頭により、多くの学者がSPARQL世代の研究方向を、ニューラルネットワーク変換(NMT)アーキテクチャやText-to-SPARQLの生成AIフィールドに移行した。 NMTベースのQAシステムでは、知識ベースクエリ構文を言語として扱う。 NMTベースの翻訳モデルを使用して、自然言語の質問をクエリ構文に変換する。学者は、Transformer、ConvS2S、BiLSTMといったクロスアテンションを備えた一般的なアーキテクチャを使用して、クエリ構文の翻訳モデルをトレーニングする。そこで本研究では,n-gram 言語モデルに基づくマルチヘッド Conv エンコーダ (MHC エンコーダ) の提案により, ConvS2S エンコーダの改良とトランスフォーマからのマルチヘッドアテンションの追加を行った。原則は、畳み込みレイヤを使用して、異なる受容フィールドを持つ入力シーケンス内のローカルな隠れた特徴をキャプチャし、複数のヘッドアテンションを使用してそれらの間の依存関係を計算することである。その結果,QALD-9データセットとLC-QuAD-1.0データセットでそれぞれ76.52\%,83.37\%のBLEU-1(BiLingual Evaluation Understudy)を得た。さらに,QALD-9データセットとLC-QuAD-1.0データセットのエンドツーエンドシステム実験では,他のKGQAシステムに対して,マクロF1測定値はそれぞれ52\%,66\%に達した。さらに,実験結果から,優れたエンコーダ・デコーダアーキテクチャとクロスアテンションを持つ計算資源が限られた場合,一般埋め込みのみを用いた大規模事前学習モデルに匹敵する優れた性能を達成できることが示唆された。 The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NMT-based QA systems, the system treats knowledge base query syntax as a language. It uses NMT-based translation models to translate natural language questions into query syntax. Scholars use popular architectures equipped with cross-attention, such as Transformer, ConvS2S, and BiLSTM, to train translation models for query syntax. To achieve better query results, this paper improved the ConvS2S encoder and added multi-head attention from the Transformer, proposing a Multi-Head Conv encoder (MHC encoder) based on the n-gram language model. The principle is to use convolutional layers to capture local hidden features in the input sequence with different receptive fields, using multi-head attention to calculate dependencies between them. Ultimately, we found that the translation model based on the Multi-Head Conv encoder achieved better performance than other encoders, obtaining 76.52\% and 83.37\% BLEU-1 (BiLingual Evaluation Understudy) on the QALD-9 and LC-QuAD-1.0 datasets, respectively. Additionally, in the end-to-end system experiments on the QALD-9 and LC-QuAD-1.0 datasets, we achieved leading results over other KGQA systems, with Macro F1-measures reaching 52\% and 66\%, respectively. Moreover, the experimental results show that with limited computational resources, if one possesses an excellent encoder-decoder architecture and cross-attention, experts and scholars can achieve outstanding performance equivalent to large pre-trained models using only general embeddings.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# 視覚言語選好学習による説明可能な概念生成 Explainable Concept Generation through Vision-Language Preference Learning ( http://arxiv.org/abs/2408.13438v1 ) ライセンス: Link先を確認	Aditya Taparia, Som Sagar, Ransalu Senanayake,	(参考訳) 他の説明可能なAI技術とは異なり、機能属性に直接関連しない高レベルの視覚的“概念”をテストするために使用できる。例えば、「ストリップ」の概念は、イメージをシマウマとして分類することが重要である。しかし、概念に基づく説明法では、実践者は複数の候補となる概念イメージを推測し、収集する必要がある。本稿では,この制限に対処するため,画像生成問題として概念セットの作成を行う。しかし, 生成モデルを用いることで意味のある概念が得られないため, 概念のテキスト記述から視覚言語生成モデルを微調整する強化学習に基づく選好最適化アルゴリズムを考案する。一連の実験を通して、手作業で行うのが難しい複雑な抽象概念を記述できる手法の能力を実証した。提案手法の有効性と信頼性に加えて,ニューラルネットワーク解析の診断ツールとしての有用性を示す。 Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and collect multiple candidate concept image sets, which can often be imprecise and labor-intensive. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate the capability of our method to articulate complex, abstract concepts that are otherwise challenging to craft manually. In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.	翻訳日:2024-08-27 19:39:20 公開日:2024-08-24
# グラフ畳み込みネットワークを用いた知識を考慮した会話脱線予測 Knowledge-Aware Conversation Derailment Forecasting Using Graph Convolutional Networks ( http://arxiv.org/abs/2408.13440v1 ) ライセンス: Link先を確認	Enas Altarawneh, Ameeta Agrawal, Michael Jenkin, Manos Papagelis,	(参考訳) オンライン会話は特に脱線の影響を受けやすく、不敬なコメントや虐待を含む有害なコミュニケーションパターンの形で現れうる。予測された会話脱線は、事前に脱線兆候を予測し、会話の積極的なモデレーションを可能にする。会話を逐次エンコードし、グラフニューラルネットワークを使用して対話ユーザのダイナミクスをモデル化する、会話脱線予測のための最先端のアプローチ。しかし、既存のグラフモデルは、文脈の伝播や感情の変化のような複雑な会話の特徴を捉えることができない。常識知識を利用することで、モデルがそのような特徴を捉え、性能を向上させることができる。本稿では,対話文脈情報の知識ベースからコモンセンス文を導出し,グラフニューラルネットワークの分類アーキテクチャを充実させる。我々は,発話のマルチソース情報をカプセルに融合し,会話の脱線を予測するためにトランスフォーマーベースの予測器が使用する。我々のモデルは、CGAおよびCMVベンチマークデータセットにおける最先端モデルよりも優れた、会話のダイナミクスと文脈の伝播をキャプチャする。 Online conversations are particularly susceptible to derailment, which can manifest itself in the form of toxic communication patterns including disrespectful comments and abuse. Forecasting conversation derailment predicts signs of derailment in advance enabling proactive moderation of conversations. State-of-the-art approaches to conversation derailment forecasting sequentially encode conversations and use graph neural networks to model dialogue user dynamics. However, existing graph models are not able to capture complex conversational characteristics such as context propagation and emotional shifts. The use of common sense knowledge enables a model to capture such characteristics, thus improving performance. Following this approach, here we derive commonsense statements from a knowledge base of dialogue contextual information to enrich a graph neural network classification architecture. We fuse the multi-source information on utterance into capsules, which are used by a transformer-based forecaster to predict conversation derailment. Our model captures conversation dynamics and context propagation, outperforming the state-of-the-art models on the CGA and CMV benchmark datasets	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 大規模言語モデルにおける次世代予測法 A Law of Next-Token Prediction in Large Language Models ( http://arxiv.org/abs/2408.13442v1 ) ライセンス: Link先を確認	Hangfeng He, Weijie J. Su,	(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションドメインで広く採用されているが、ブラックボックスの性質は、これらのモデルが入力データを内部でどのように処理して予測を行うかを理解する上で、大きな課題となっている。本稿では,事前学習したLCMの中間層を介し,文脈化トークンの埋め込みを学習し,次から次へと予測する,正確かつ定量的な法則を提案する。この結果から,TransformerやRWKV,Mambaといったアーキテクチャ上に構築された,さまざまなオープンソース LLM にまたがる普遍的な現象である,最下層から最上層まで,各レイヤが予測精度の向上に等しく寄与していることが判明した。この法則は、モデルスケーリング、事前学習タスク、情報フローなど、LLM開発およびアプリケーションにおけるプラクティスを通知し、ガイドするための新しい視点と洞察を提供する。我々の法則は、内部データ処理機構を精査することで、LCMの設計、訓練、解釈に対するよりきめ細かなアプローチを可能にします。 Large language models (LLMs) have been widely employed across various application domains, yet their black-box nature poses significant challenges to understanding how these models process input data internally to make predictions. In this paper, we introduce a precise and quantitative law that governs the learning of contextualized token embeddings through intermediate layers in pre-trained LLMs for next-token prediction. Our findings reveal that each layer contributes equally to enhancing prediction accuracy, from the lowest to the highest layer -- a universal phenomenon observed across a diverse array of open-source LLMs, built on architectures such as Transformer, RWKV, and Mamba. We demonstrate that this law offers new perspectives and insights to inform and guide practices in LLM development and applications, including model scaling, pre-training tasks, and information flow. Overall, our law enables more fine-grained approaches to the design, training, and interpretation of LLMs through scrutinizing their internal data processing mechanisms.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 非循環性制約のない効率的なDAG学習 Efficient Reinforced DAG Learning without Acyclicity Constraints ( http://arxiv.org/abs/2408.13448v1 ) ライセンス: Link先を確認	Bao Duong, Hung Le, Thin Nguyen,	(参考訳) 単なる観測データに埋め込まれた原因-影響構造は、そのような構造から恩恵を受けることができる知識の豊富さを所有する、非常に科学的な関心事である。近年、強化学習(RL)は、有向非巡回グラフ(DAG)の形で最も考えられる因果的説明を探索する古典的手法の強化として現れている。しかし、DAG空間を効果的に探索することは、多数の候補と複雑な非巡回性の制約のために困難である。本研究では,効率的なDAG生成ポリシを備えたRL機械による新しい因果発見手法であるREACT(Reinforced DAG learning without acyclicity Constraints)を提案する。 DAGの新たなパラメトリゼーションにより、実数値ベクトルを1ステップで有効なDAGを表す隣接行列に直接マッピングし、非巡回性制約を課すことなく、より効率的に探索空間を探索できる。さらに,合成データと実データの両方の多種多様な集合に関する包括的数値評価を行い,現状のベースラインと比較して,本手法の有効性を確認した。 Unraveling cause-effect structures embedded in mere observational data is of great scientific interest, owning to the wealth of knowledge that can benefit from such structures. Recently, reinforcement learning (RL) has emerged as the enhancement for classical techniques to search for the most probable causal explanation in the form of a directed acyclic graph (DAG). Yet, effectively exploring the DAG space is challenging due to the vast number of candidates and the intricate constraint of acyclicity. In this study, we present REACT (REinforced DAG learning without acyclicity ConstrainTs)-a novel causal discovery approach fueled by the RL machinery with an efficient DAG generation policy. Through a novel parametrization of DAGs, which allows for directly mapping a real-valued vector to an adjacency matrix representing a valid DAG in a single step without enforcing any acyclicity constraint, we are able to navigate the search space much more effectively with policy gradient methods. In addition, our comprehensive numerical evaluations on a diverse set of both synthetic and real data confirm the effectiveness of our method compared with state-of-the-art baselines.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 逆勾配エピソードメモリによる連続RLデータの増大 Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory ( http://arxiv.org/abs/2408.13452v1 ) ライセンス: Link先を確認	Sihao Wu, Xingyu Zhao, Xiaowei Huang,	(参考訳) Reinforcement Learning(RL)トレーニングプロセスにおいて重要な役割を果たす学習のデータ効率は、連続環境を持つ連続RLにおいてさらに重要になる。連続RLでは、学習者は定常的でないシーケンシャルなタスクと対話し、以前の知識を忘れずに新しいタスクを学習する必要がある。しかし、連続RLのためのデータ拡張の実装についてはほとんど研究されていない。本稿では,連続RLにおけるデータ拡張の有効性について検討する。具体的には,(1)既存のデータ拡張手法を要約し,(2)連続RLの新たな拡張方法を含む連続RLのためのベンチマークデータ拡張(Adv-GEM)を提案する。大規模な実験により、ロボット制御タスクにおいて、ランダム振幅スケーリング、ステートスウィッチ、ミックスアップ、逆方向拡張、Adv-GEMなどのデータ拡張が、その平均性能、破滅的な忘れ、前方移動といった面で、既存の連続RLアルゴリズムを改善できることが示されている。すべてのデータ拡張メソッドはプラグインモジュールとして実装され、連続RLメソッドに簡単に統合できる。 Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for continual RL. In this paper, we investigate the efficacy of data augmentation for continual RL. Specifically, we provide benchmarking data augmentations for continual RL, by (1) summarising existing data augmentation methods and (2) including a new augmentation method for continual RL: Adversarial Augmentation with Gradient Episodic Memory (Adv-GEM). Extensive experiments show that data augmentations, such as random amplitude scaling, state-switch, mixup, adversarial augmentation, and Adv-GEM, can improve existing continual RL algorithms in terms of their average performance, catastrophic forgetting, and forward transfer, on robot control tasks. All data augmentation methods are implemented as plug-in modules for trivial integration into continual RL methods.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# AdaOcc:Adaptive-Resolution Occupancy Prediction AdaOcc: Adaptive-Resolution Occupancy Prediction ( http://arxiv.org/abs/2408.13454v1 ) ライセンス: Link先を確認	Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren,	(参考訳) 複雑な都市シナリオにおける自律運転には、包括的かつ正確な3D知覚が必要である。従来の3D認識手法は物体検出に重点を置いており、環境の詳細を欠く疎らな表現をもたらす。近年のアプローチでは、より包括的なシーン表現のために車両周囲の3次元占有率を推定している。しかし、密度の高い3D占有率予測は計算要求を増大させ、効率と解像度のバランスに挑戦する。高解像度の占有グリッドは精度を提供するが、かなりの計算資源を必要とするが、低解像度のグリッドは効率的だが詳細は欠落している。このジレンマに対処するために,新しい適応分解能・マルチモーダル予測手法であるAdaOccを導入する。提案手法は,対象中心の3次元再構成と全体的占有予測を一つの枠組みに統合し,興味のある領域(ROI)でのみ高度に詳細かつ正確な3次元再構成を行う。これらの高精細な3次元曲面は点雲で表されるので、それらの精度は占有マップの事前定義された格子分解によって制約されない。我々はnuScenesデータセットの総合的な実験を行い、既存の手法よりも大幅に改善されたことを示す。近距離シナリオでは、以前のベースラインを13%以上、ハウスドルフ距離を40%以上上回る。まとめると、AdaOccは多様な運転シナリオで正確な3Dセマンティック占有率予測を提供するための、より汎用的で効果的なフレームワークを提供する。 Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# すべてのペニー数を作る: 費用効率の良い推論のための難易度適応型の自己整合性 Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning ( http://arxiv.org/abs/2408.13457v1 ) ライセンス: Link先を確認	Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li,	(参考訳) 連鎖推論に広く用いられている自己整合性(SC: Self-Consistency)は、様々な多段階推論タスクにおいて顕著な利得を示すが、プリセットサイズで複数のサンプリングを行うため、高いコストがかかる。適応自己整合性 (ASC) とアーリーストッピング自己整合性 (ESC) の変種は、一連のプリサンプルの後方分布に基づいて標本数を動的に調整し、性能への影響を最小限に抑えてSCのコストを下げる。しかし、どちらの手法も質問の難しさに関する事前の情報を利用していない。多くの場合、不必要な繰り返しサンプリングが行われ、簡単な質問が1回の試行で正確に答えられるようになり、リソースを無駄にします。この問題に対処するために,前と後の両方の観点からの難易度情報を活用して推論資源を適応的に割り当てることにより,SCのコストをさらに削減するDifficulty-Adaptive Self-Consistency (DSC)を提案する。 DSCの有効性を示すために、6つのベンチマークで算術、常識、記号的推論という3つの一般的な推論タスクのカテゴリについて広範な実験を行った。実験の結果,DSCは高いベースラインのASCとESCをほぼ上回り,性能は同等であった。 Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples, reducing the cost of SC with minimal impact on performance. Both methods, however, do not exploit the prior information about question difficulty. It often results in unnecessary repeated sampling for easy questions that could be accurately answered with just one attempt, wasting resources. To tackle this problem, we propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information from both prior and posterior perspectives to adaptively allocate inference resources, further reducing the cost of SC. To demonstrate the effectiveness of DSC, we conduct extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning on six benchmarks. The empirical results show that DSC consistently surpasses the strong baseline ASC and ESC in terms of costs by a significant margin, while attaining comparable performances.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# ウェーブレット対応動的変圧器と拡散モデルによる映像劣化の再考 Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model ( http://arxiv.org/abs/2408.13459v1 ) ライセンス: Link先を確認	Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang,	(参考訳) 現在のビデオデブロアリング法は、遅延損失が高周波の詳細で保守的であるため、高周波情報の回復に限界がある。拡散モデル(DM)は高頻度細部生成に強力な機能を持つため,ビデオデブロアリングタスクにDMを導入することを検討する。 1) DMはガウスノイズからビデオを生成するために多くの繰り返しステップを必要とするため、多くの計算資源を消費する。 2) DMはビデオのぼやけたアーティファクトによって容易に誤解され,不合理な内容とデブロワードビデオの歪みが生じる。本稿では,この拡散モデルをWADT(Wavelet-Aware Dynamic Transformer)に統合した新しいビデオデブロアリングフレームワークであるVD-Diffを提案する。具体的には、高コンパクトな潜伏空間において拡散モデルを実行し、基底真理分布に適合する高周波情報を含む先行特徴を生成する。拡散モデルにより生成された高周波情報を利用して、映像中の低周波情報を保存・復元するWADTを設計する。我々の提案するVD-Diffは,GoPro,DVD,BSD,Real-World Videoのデータセット上でSOTA法よりも優れていた。 Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the following problems: (1) DMs require many iteration steps to generate videos from Gaussian noise, which consumes many computational resources. (2) DMs are easily misled by the blurry artifacts in the video, resulting in irrational content and distortion of the deblurred video. To address the above issues, we propose a novel video deblurring framework VD-Diff that integrates the diffusion model into the Wavelet-Aware Dynamic Transformer (WADT). Specifically, we perform the diffusion model in a highly compact latent space to generate prior features containing high-frequency information that conforms to the ground truth distribution. We design the WADT to preserve and recover the low-frequency information in the video while utilizing the high-frequency information generated by the diffusion model. Extensive experiments show that our proposed VD-Diff outperforms SOTA methods on GoPro, DVD, BSD, and Real-World Video datasets.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# DOPPLER:プライバシノイズ低減のための低域フィルタ付き微分プライベートオプティマイザ DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction ( http://arxiv.org/abs/2408.13460v1 ) ライセンス: Link先を確認	Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn,	(参考訳) プライバシーは、現代のディープラーニングシステムやアプリケーションにおける関心の高まりだ。差分プライベート(DP)トレーニングは、トレーニングされた機械学習モデルから収集したトレーニングデータの機密情報の漏洩を防止する。 DP確率勾配降下(DPSGD)とその変種を含むDPオプティマイザは、勾配クリッピングとDPノイズ注入によるトレーニング手順を民営化する。しかし、実際には、DPSGDとその変種を用いて訓練されたDPモデルは、しばしばモデルの性能劣化に悩まされる。このような劣化は、基礎モデル事前学習など、多くの重要なタスクにおけるDP最適化の適用を妨げる。本稿では,DPオプティマイザの設計と解析に新しい信号処理の視点を提供する。 DPノイズの影響を効果的に低減するために,低域フィルタリング(low-pass filtering)と呼ばれる「周波数領域」操作が有効であることを示す。より具体的には、勾配と差分プライバシー(DP)ノイズの「周波数領域」を定義することで、DOPPLERと呼ばれる新しいコンポーネントを開発した。このコンポーネントはDPアルゴリズム用に設計されており、この周波数領域内のDPノイズを抑えながら勾配を効果的に増幅する。その結果、プライバシー保証を維持し、DP保護モデルの品質を高める。実験の結果,低域通過フィルタを用いたDPオプティマイザは,各種モデルやデータセットの試験精度を3%-10%向上させることができた。 DOPPLERはDPトレーニングと非DPトレーニングのギャップを埋めるのに有効である。 Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pretraining. In this paper, we provide a novel signal processing perspective to the design and analysis of DP optimizers. We show that a ``frequency domain'' operation called low-pass filtering can be used to effectively reduce the impact of DP noise. More specifically, by defining the ``frequency domain'' for both the gradient and differential privacy (DP) noise, we have developed a new component, called DOPPLER. This component is designed for DP algorithms and works by effectively amplifying the gradient while suppressing DP noise within this frequency domain. As a result, it maintains privacy guarantees and enhances the quality of the DP-protected model. Our experiments show that the proposed DP optimizers with a low-pass filter outperform their counterparts without the filter by 3%-10% in test accuracy on various models and datasets. Both theoretical and practical evidence suggest that the DOPPLER is effective in closing the gap between DP and non-DP training.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# ビジョンランゲージ事前学習モデルのロバスト性を証明する:マルチモーダル・アタックアプローチ Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach ( http://arxiv.org/abs/2408.13461v1 ) ライセンス: Link先を確認	Jiwei Guan, Tianyu Ding, Longbing Cao, Lei Pan, Chen Wang, Xi Zheng,	(参考訳) トランスフォーマーを用いた視覚言語事前学習(VLP)は、多数のマルチモーダルタスクにおいて例外的な性能を示した。しかし、これらのモデルの対角的堅牢性は十分には研究されていない。既存のマルチモーダルアタック手法は、視覚的・テキスト的モダリティ間の相互モーダル相互作用、特に横断的アテンション機構の文脈において、ほとんど見過ごされている。本稿では,最近のVLPトランスの対角的脆弱性について検討し,ホワイトボックス設定下での視覚的・テキスト的両モードの対角的摂動を同時に導入する新しいJMTFA(Joint Multimodal Transformer Feature Attack)を設計する。 JMTFAは、注意関係スコアを戦略的に対象とし、各モードにおける重要な特徴を妨害し、摂動を融合させて対向サンプルを生成し、誤ったモデル予測をもたらす。実験結果から,提案手法は既存のベースラインと比較して,視覚言語理解や下流タスクの推論において高い攻撃成功率を達成することが示唆された。特に,本研究の結果から,VLP変圧器の複雑な融合過程にテクスチュアル・モダリティが大きな影響を及ぼすことが明らかとなった。また,本研究では,攻撃時のモデルサイズと敵の強靭性との間には明らかな相関が認められなかった。これらの洞察は、マルチモーダルAIシステムの信頼性デプロイメントにおいて、敵の堅牢性と潜在的な潜在的なリスクの新たな次元を強調している。 Vision-language pretraining (VLP) with transformers has demonstrated exceptional performance across numerous multimodal tasks. However, the adversarial robustness of these models has not been thoroughly investigated. Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities, particularly in the context of cross-attention mechanisms. In this paper, we study the adversarial vulnerability of recent VLP transformers and design a novel Joint Multimodal Transformer Feature Attack (JMTFA) that concurrently introduces adversarial perturbations in both visual and textual modalities under white-box settings. JMTFA strategically targets attention relevance scores to disrupt important features within each modality, generating adversarial samples by fusing perturbations and leading to erroneous model predictions. Experimental results indicate that the proposed approach achieves high attack success rates on vision-language understanding and reasoning downstream tasks compared to existing baselines. Notably, our findings reveal that the textual modality significantly influences the complex fusion processes within VLP transformers. Moreover, we observe no apparent relationship between model size and adversarial robustness under our proposed attacks. These insights emphasize a new dimension of adversarial robustness and underscore potential risks in the reliable deployment of multimodal AI systems.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# HabitAction:人間の行動認識のためのビデオデータセット HabitAction: A Video Dataset for Human Habitual Behavior Recognition ( http://arxiv.org/abs/2408.13463v1 ) ライセンス: Link先を確認	Hongwu Li, Zhenliang Zhang, Wei Wang,	(参考訳) HAR(Human Action Recognition)は、コンピュータビジョンにおいて非常に重要なタスクである。人間の行動を理解するなど、一連の下流のタスクを実行するのに役立つ。ヒトの行動の複雑さのため、多くの非常に価値のある行動は、HAR、例えばヒトの習慣行動(HHBs)の利用可能なデータセットにはまだ含まれていない。 HHBは、人の性格、習慣、心理的変化を分析する上で重要な役割を担っている。これらの課題を解決するため,本研究では,様々なHHBを実演するための新しいビデオデータセットを構築した。提案したデータセットのこれらの行動は、内部の精神状態やキャラクターの特定の感情を反映することができる。データセットには、300,000フレーム以上と6,899のアクションインスタンスを含む、30の習慣行動カテゴリが含まれている。これらの動作は通常、人間のアクションビデオの小さな部分に現れるため、既存のアクション認識手法ではこれらの局所的な特徴を扱うことは困難である。そこで本研究では,ヒト骨格とRGB外観の両方を用いた2ストリームモデルを提案する。実験の結果,提案手法は既存手法よりも動作認識性能が優れていることがわかった。 Human Action Recognition (HAR) is a very crucial task in computer vision. It helps to carry out a series of downstream tasks, like understanding human behaviors. Due to the complexity of human behaviors, many highly valuable behaviors are not yet encompassed within the available datasets for HAR, e.g., human habitual behaviors (HHBs). HHBs hold significant importance for analyzing a person's personality, habits, and psychological changes. To solve these problems, in this work, we build a novel video dataset to demonstrate various HHBs. These behaviors in the proposed dataset are able to reflect internal mental states and specific emotions of the characters, e.g., crossing arms suggests to shield oneself from perceived threats. The dataset contains 30 categories of habitual behaviors including more than 300,000 frames and 6,899 action instances. Since these behaviors usually appear at small local parts of human action videos, it is difficult for existing action recognition methods to handle these local features. Therefore, we also propose a two-stream model using both human skeletons and RGB appearances. Experimental results demonstrate that our proposed method has much better performance in action recognition than the existing methods on the proposed dataset.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 反射型大言語モデルによるバイアスの発見 Uncovering Biases with Reflective Large Language Models ( http://arxiv.org/abs/2408.13464v1 ) ライセンス: Link先を確認	Edward Y. Chang,	(参考訳) 人間の努力に固有のバイアスは、機械学習、特にバイアスのある「地下真実」データに依存する教師あり学習に重大な課題をもたらす。この依存は、統計的な最大可能性に基づいて一般化するモデルの傾向と相まって、バイアスを伝播し、増幅し、社会的問題を悪化させる。そこで本研究では,複数の言語モデル(LLM)を動的対話に用い,多様な視点を明らかにするための反射的手法を提案する。条件付き統計、情報理論、発散メトリクスを活用することで、この新しいアプローチは文脈に依存した言語行動を促進し、バイアスのないアウトプットを促進する。さらに、特定バイアスに対処するために、測定可能な進捗追跡と説明可能な修復アクションを可能にする。 Biases inherent in human endeavors pose significant challenges for machine learning, particularly in supervised learning that relies on potentially biased "ground truth" data. This reliance, coupled with models' tendency to generalize based on statistical maximal likelihood, can propagate and amplify biases, exacerbating societal issues. To address this, our study proposes a reflective methodology utilizing multiple Large Language Models (LLMs) engaged in a dynamic dialogue to uncover diverse perspectives. By leveraging conditional statistics, information theory, and divergence metrics, this novel approach fosters context-dependent linguistic behaviors, promoting unbiased outputs. Furthermore, it enables measurable progress tracking and explainable remediation actions to address identified biases.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# LlamaDuo: サービスLLMから小規模ローカルLLMへのシームレス移行のためのLLMOpsパイプライン LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs ( http://arxiv.org/abs/2408.13467v1 ) ライセンス: Link先を確認	Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang, Sunghun Kim,	(参考訳) クラウドベースのプロプライエタリな大規模言語モデル(LLM)の普及は、運用上の依存関係、プライバシの懸念、継続的なインターネット接続の必要性など、大きな課題をもたらしている。本研究では,LLMOpsパイプライン"LlamaDuo"を導入し,サービス指向のLLMから,より小型でローカルに管理可能なモデルへの,知識と能力のシームレスな移行を実現する。このパイプラインは、運用上の障害、厳格なプライバシポリシ、あるいはオフライン要件の存在下でのサービス継続性を保証するために不可欠である。私たちのLlamaDuoは、後者によって生成された合成データセットを使用して、サービスLLMに対して小さな言語モデルを微調整します。細調整されたモデルの性能が期待に届かなかった場合、サービスLLMが作成した類似したデータを追加してさらに細調整を行うことで、性能が向上する。この反復的なプロセスは、小さなモデルが最終的に特定の下流タスクでLLMの能力と一致または超えることを保証するもので、制約のある環境でAIデプロイメントを管理するための実用的でスケーラブルなソリューションを提供する。各種下流タスクにおけるLlamaDuoの有効性,適応性,手頃性を示すために,先進LLMを用いた大規模実験を行った。パイプラインの実装はhttps://github.com/deep-diver/llamaduo.comで公開しています。 The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", for the seamless migration of knowledge and abilities from service-oriented LLMs to smaller, locally manageable models. This pipeline is crucial for ensuring service continuity in the presence of operational failures, strict privacy policies, or offline requirements. Our LlamaDuo involves fine-tuning a small language model against the service LLM using a synthetic dataset generated by the latter. If the performance of the fine-tuned model falls short of expectations, it is enhanced by further fine-tuning with additional similar data created by the service LLM. This iterative process guarantees that the smaller model can eventually match or even surpass the service LLM's capabilities in specific downstream tasks, offering a practical and scalable solution for managing AI deployments in constrained environments. Extensive experiments with leading edge LLMs are conducted to demonstrate the effectiveness, adaptability, and affordability of LlamaDuo across various downstream tasks. Our pipeline implementation is available at https://github.com/deep-diver/llamaduo.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# アンタングル生成グラフ表現学習 Disentangled Generative Graph Representation Learning ( http://arxiv.org/abs/2408.13471v1 ) ライセンス: Link先を確認	Xinyue Hu, Zhibin Duan, Xinyang Liu, Yuxin Li, Bo Chen, Mingyuan Zhou,	(参考訳) 近年,自己教師付き手法によるグラフ表現の学習において,生成グラフモデルが有望な結果を示している。しかし、既存の生成グラフ表現学習(GRL)のアプローチのほとんどは、学習された表現の絡み合いを無視するランダムマスキングに依存している。この監視は、非破壊性と説明可能性の欠如をもたらす。さらに、学習した表現のアンタングル化は依然として重要な課題であり、GRL研究では十分に研究されていない。これらの知見に基づいて,自己教師型学習フレームワークであるDiGGR(Disentangled Generative Graph Representation Learning)を紹介する。 DiGGRは、潜伏不整合因子を学習し、グラフマスクモデリングをガイドし、学習された表現の非整合性を高め、エンドツーエンドのジョイントラーニングを可能にすることを目的としている。 2つの異なるグラフ学習タスクのための11の公開データセットに対する大規模な実験により、DiGGRは、提案手法の有効性を検証し、従来よりも一貫して多くの自己教師付き手法より優れていることが示された。 Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability. Furthermore, disentangling the learned representations remains a significant challenge and has not been sufficiently explored in GRL research. Based on these insights, this paper introduces DiGGR (Disentangled Generative Graph Representation Learning), a self-supervised learning framework. DiGGR aims to learn latent disentangled factors and utilizes them to guide graph mask modeling, thereby enhancing the disentanglement of learned representations and enabling end-to-end joint learning. Extensive experiments on 11 public datasets for two different graph learning tasks demonstrate that DiGGR consistently outperforms many previous self-supervised methods, verifying the effectiveness of the proposed approach.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 対称局所ランダム回路のユニタリ設計 Unitary Designs of Symmetric Local Random Circuits ( http://arxiv.org/abs/2408.13472v1 ) ライセンス: Link先を確認	Yosuke Mitsuhashi, Ryotaro Suzuki, Tomohiro Soejima, Nobuyuki Yoshioka,	(参考訳) 我々は、対称局所乱数回路によって生成されるユニタリ設計を特徴付ける方法を確立した。具体的には、近似t-設計を形成する回路に必要な十分条件が、一般対称性と局所性に対する単純な整数最適化によって与えられることを示した。この結果を用いて、一般局所性に対する$\mathbb{Z}_2$, U(1), SU(2)対称性の下で、ユニタリ設計の極大順序を明示的に与える。この研究は、対称性の基本概念とランダム性の観点からの局所性の関係を明らかにする。 We have established the method of characterizing the unitary design generated by a symmetric local random circuit. Concretely, we have shown that the necessary and sufficient condition for the circuit forming an approximate t-design is given by simple integer optimization for general symmetry and locality. By using the result, we explicitly give the maximal order of unitary design under the $\mathbb{Z}_2$, U(1), and SU(2) symmetries for general locality. This work reveals the relation between the fundamental notions of symmetry and locality in terms of randomness.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# アンチワーク:RoBERTaをベースとした作業関連ストレス同定とリード要因分析システム Why Antiwork: A RoBERTa-Based System for Work-Related Stress Identification and Leading Factor Analysis ( http://arxiv.org/abs/2408.13473v1 ) ライセンス: Link先を確認	Tao Lu, Muzhe Wu, Xinyi Lu, Siyuan Xu, Shuyu Zhan, Anuj Tambwekar, Emily Mower Provost,	(参考訳) ハーシュ労働環境と労働関連ストレスは、不安、抑うつ、自殺の考えといった精神的な健康問題に寄与することが知られている。そのため、従業員の不幸を検知し、問題の根本原因を見つけることができるソリューションを作成することが最重要である。従来の研究は機械学習を用いてメンタルヘルスの原因を調べてきたが、一般的には一般的なメンタルヘルス分析に焦点を合わせており、説明可能なソリューションや職場固有の設定に焦点を絞っているものはほとんどない。 r/antiworkは、反作業運動のサブレディットです。このサブレディットを職場環境の不満のプロキシとして利用し、アンチワーク感情検出のための新しいデータセットを作成し、その後、アンチワーク感情で単語をハイライトするモデルを訓練する。その後、我々は定性的かつ定量的な分析を行い、反作業運動と同一視する個人のマインドセットに対する重要な洞察と、彼らの作業環境がそれらにどのように影響するかを明らかにした。我々は、従業員の権限や責任を与えない作業環境、求職経験のフラストレーション、不公平な報酬が、反作業感情の主因となっていることを発見し、その結果、従業員の自己自信やモチベーションが欠如している。 Harsh working environments and work-related stress have been known to contribute to mental health problems such as anxiety, depression, and suicidal ideation. As such, it is paramount to create solutions that can both detect employee unhappiness and find the root cause of the problem. While prior works have examined causes of mental health using machine learning, they typically focus on general mental health analysis, with few of them focusing on explainable solutions or looking at the workplace-specific setting. r/antiwork is a subreddit for the antiwork movement, which is the desire to stop working altogether. Using this subreddit as a proxy for work environment dissatisfaction, we create a new dataset for antiwork sentiment detection and subsequently train a model that highlights the words with antiwork sentiments. Following this, we performed a qualitative and quantitative analysis to uncover some of the key insights into the mindset of individuals who identify with the antiwork movement and how their working environments influenced them. We find that working environments that do not give employees authority or responsibility, frustrating recruiting experiences, and unfair compensation, are some of the leading causes of the antiwork sentiment, resulting in a lack of self-confidence and motivation among their employees.	翻訳日:2024-08-27 19:29:34 公開日:2024-08-24
# 連続ゲート集合の量子回路におけるランダム性の評価 Characterization of Randomness in Quantum Circuits of Continuous Gate Sets ( http://arxiv.org/abs/2408.13475v1 ) ライセンス: Link先を確認	Yosuke Mitsuhashi, Ryotaro Suzuki, Tomohiro Soejima, Nobuyuki Yoshioka,	(参考訳) arXiv:2408.XXXXX の付録では、対称局所乱数回路によって生成される近似ユニタリな設計の極大順序を特徴付ける方法を確立し、$\mathbb{Z}_2$, U(1), SU(2)対称性の場合にその順序を明示的に指定した。ここでは、一般対称性と具体的な対称性に対する主定理の導出についての詳細を述べる。さらに、対称局所ユニタリゲート集合を含む連結コンパクトユニタリ部分群の有限集合にアクセス可能な一般フレームワークを考える。 In the accompanying paper of arXiv:2408.XXXXX, we have established the method of characterizing the maximal order of approximate unitary designs generated by symmetric local random circuits, and have explicitly specified the order in the cases of $\mathbb{Z}_2$, U(1), and SU(2) symmetries. Here, we provide full details on the derivation of the main theorems for general symmetry and for concrete symmetries. Furthermore, we consider a general framework where we have access to a finite set of connected compact unitary subgroups, which includes symmetric local unitary gate sets.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# 量子機械による創薬支援 : 調査と展望 Quantum-machine-assisted Drug Discovery: Survey and Perspective ( http://arxiv.org/abs/2408.13479v1 ) ライセンス: Link先を確認	Yidong Zhou, Jintai Chen, Weikang Li, Jinglei Cheng, Gopal Karemore, Marinka Zitnik, Frederic Chong, Junyu Liu, Tianfan Fu, Zhiding Liang,	(参考訳) 医薬品の発見と開発は複雑でコストのかかる取り組みであり、新しい薬を市場に出すには10年以上の資金と相当な資金を必要としている。従来のコンピュータ支援ドラッグデザイン(CADD)は、このプロセスの加速に大きな進歩を遂げてきたが、量子コンピューティングの開発は、そのユニークな能力のために潜在的に有益である。本稿では、量子コンピューティングの創薬・開発への統合について論じ、量子技術が医薬品開発サイクルの様々な段階をいかに加速し、促進するかに焦点を当てる。具体的には,分子シミュレーションや薬物-標的相互作用の予測,臨床試験結果の最適化など,薬物発見に関わる課題への量子コンピューティングの適用について検討する。量子コンピューティングの本質的な能力を活用することで、新しい薬を市場に投入する際の時間とコストを削減できるかもしれません。 Drug discovery and development is a highly complex and costly endeavor, typically requiring over a decade and substantial financial investment to bring a new drug to market. Traditional computer-aided drug design (CADD) has made significant progress in accelerating this process, but the development of quantum computing offers potential due to its unique capabilities. This paper discusses the integration of quantum computing into drug discovery and development, focusing on how quantum technologies might accelerate and enhance various stages of the drug development cycle. Specifically, we explore the application of quantum computing in addressing challenges related to drug discovery, such as molecular simulation and the prediction of drug-target interactions, as well as the optimization of clinical trial outcomes. By leveraging the inherent capabilities of quantum computing, we might be able to reduce the time and cost associated with bringing new drugs to market, ultimately benefiting public health.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# MPruner: CKAに基づく相互情報処理によるニューラルネットワークサイズ最適化 MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning ( http://arxiv.org/abs/2408.13482v1 ) ライセンス: Link先を確認	Seungbeom Hu, ChanJun Park, Andrew Ferraiuolo, Sang-Ki Ko, Jinwoo Kim, Haein Song, Jieung Kim,	(参考訳) 実行時のパフォーマンスとメモリ使用量に直接影響するため、ニューラルネットワークの最適なサイズを決定することが重要だ。プルーニング(Pruning)は、ニューラルネットワークのサイズを削減し、精度の保存を数学的に保証する、よく確立されたモデル圧縮技術である。しかし、最近のプルーニングメソッドの多くは、個々のモデルコンポーネントのグローバルなコントリビューションを見落としているため、プルーニングされたモデルが望ましいデータセットとパフォーマンス要件を満たすことを保証するのは難しい。これらの課題に対処するため,ベクトル類似性により相互情報を活用する新しいプルーニングアルゴリズムMPrunerを開発した。 MPrunerはCKA(Centered Kernel Alignment)の類似度測定でレイヤクラスタリングを活用し、ニューラルネットワークのグローバル情報をより正確で効率的なレイヤワイドプルーニングに組み込むことができる。我々はMPrunerを様々なアーキテクチャや構成で評価し、その汎用性を実証し、実践的なガイドラインを提供した。 MPrunerはCNNとトランスフォーマーベースのモデルで最大50%のパラメータとメモリ使用量の削減を実現した。 Determining the optimal size of a neural network is critical, as it directly impacts runtime performance and memory usage. Pruning is a well-established model compression technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation. However, many recent pruning methods overlook the global contributions of individual model components, making it difficult to ensure that a pruned model meets the desired dataset and performance requirements. To address these challenges, we developed a new pruning algorithm, MPruner, that leverages mutual information through vector similarity. MPruner utilizes layer clustering with the Centered Kernel Alignment (CKA) similarity metric, allowing us to incorporate global information from the neural network for more precise and efficient layer-wise pruning. We evaluated MPruner across various architectures and configurations, demonstrating its versatility and providing practical guidelines. MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# IntOPE: 干渉の有無におけるオフ・ポリティ・アセスメント IntOPE: Off-Policy Evaluation in the Presence of Interference ( http://arxiv.org/abs/2408.13484v1 ) ライセンス: Link先を確認	Yuqi Bai, Ziyu Zhao, Minqin Zhu, Kun Kuang,	(参考訳) オフ・ポリシィ・アセスメント(OPE: Off-Policy Evaluation)は、個人化された医療やレコメンデーションシステムなど、オンラインインタラクションが重大なリスクやコストに結びついている分野において重要な、ログ化された文脈的包括的フィードバックを用いて、仮説的ポリシーの潜在的影響を評価するために用いられる。伝統的に、OPEの手法は安定単位処理値推定 (SUTVA) に依存しており、これは任意の個人に対する報酬が他人の行動に影響されないと仮定している。しかし、この仮定は、個人が自分の行動だけでなく、仲間の行動にも影響される、干渉の存在によって現実のシナリオで失敗することが多い。この実現は、現実世界のアプリケーションにおける既存のOPEメソッドの重大な制限を明らかにしている。この制限に対処するため,IPW(Inverse Probability Weighting, 逆確率重み付け)フレームワークを拡張したIPW型推定器であるIntIPWを提案する。 IntIPW法の有効性を実証するために, 合成データと実世界のデータの両方を用いて大規模な実験を行った。 Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# ターゲットの任意ライブラリーの分類における量子イルミネーションの利点 Quantum Illumination Advantage for Classification Among an Arbitrary Library of Targets ( http://arxiv.org/abs/2408.13489v1 ) ライセンス: Link先を確認	Ali Cox, Quntao Zhuang, Jeffrey H. Shapiro, Saikat Guha,	(参考訳) 量子照明(QI)は、理想記憶に保持された基準ビームと量子状態が絡み合っている送信機プローブを用いてシーンを照会するタスクであり、その後、記憶された基準と共に目標反転光を最適に検出し、同じ明るさおよびその他の同じ条件の古典的な送信機で達成可能なものを超える精度で、スタンオフ範囲の目標特性を決定する。摂動理論のツールを用いて, 透過率の低い輝度, 高損失, 高熱背景の限界において, 古典的コヒーレント状態照明(CI)に比べて, ガウス状態の絡み合ったQIプローブを用いた場合, 任意のアプリオリ反射目標の識別において, 誤り確率のチャーノフ指数が4倍に向上することを示した。この利点は標的の有無を検出することで知られていたが、任意の対象ライブラリを識別する一般的なタスクでは証明されなかった。その結果,QI と CI の量子チャーノフ指数の低次漸近展開に対する簡易な一般解析式を,受光器の入室後に空間モードソータによって分離された場合の信号輝度,損失,熱雑音,および目標反射光の放射光出射プロファイルの変調膨張係数を用いて導出した。 Quantum illumination (QI) is the task of querying a scene using a transmitter probe whose quantum state is entangled with a reference beam retained in ideal storage, followed by optimally detecting the target-returned light together with the stored reference, to make decisions on characteristics of targets at stand-off range, at precision that exceeds what is achievable with a classical transmitter of the same brightness and otherwise identical conditions. Using tools from perturbation theory, we show that in the limit of low transmitter brightness, high loss, and high thermal background, there is a factor of four improvement in the Chernoff exponent of the error probability in discriminating any number of apriori-known reflective targets when using a Gaussian-state entangled QI probe, over using classical coherent-state illumination (CI). While this advantage was known for detecting the presence or absence of a target, it had not been proven for the generalized task of discriminating between arbitrary target libraries. In proving our result, we derive simple general analytic expressions for the lowest-order asymptotic expansions of the quantum Chernoff exponents for QI and CI in terms of the signal brightness, loss, thermal noise, and the modal expansion coefficients of the target-reflected light's radiant exitance profiles when separated by a spatial mode sorter after entering the entrance pupil of the receiver's aperture.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# ESA:セマンティックセグメンテーションのためのアノテーション効率の良いアクティブラーニング ESA: Annotation-Efficient Active Learning for Semantic Segmentation ( http://arxiv.org/abs/2408.13491v1 ) ライセンス: Link先を確認	Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao,	(参考訳) アクティブラーニングは、ラベル付けのための最も明快なサンプルを選択することで、アノテーションの効率を高める。セマンティックセグメンテーションのこれまでの方法は、個々のピクセルや小さな領域を中心に、自然画像の豊富なパターンや高度な事前学習モデルのパワーを無視してきた。まず、局所的な構造的手がかりを捉えるために、スーパーピクセルグループ化と組み合わせた、クラス非依存のマスク提案ネットワークを利用する革新的で効率的なアクティブラーニング戦略であるEntity-Superpixel Annotation (ESA)を導入する。さらに,対象領域の各画像内のエンティティのサブセットを選択し,エントロピーの高いスーパーピクセルを優先し,包括的表現を保証する。同時に、限られた数のキーエンティティに焦点を当て、効率を最適化する。本手法は,画像固有の構造を活かしたアノテータフレンドリな設計により,既存の画素ベースの手法よりも優れ,最小限のクエリで優れた結果を得ることができ,特にクリックコストを98%削減し,性能を1.71%向上させることができる。例えば、従来の手法で要求される5000クリックとは対照的に、アノテーションにはたった40クリックしか必要としない。 Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributions: Firstly, we introduce Entity-Superpixel Annotation (ESA), an innovative and efficient active learning strategy which utilizes a class-agnostic mask proposal network coupled with super-pixel grouping to capture local structural cues. Additionally, our method selects a subset of entities within each image of the target domain, prioritizing superpixels with high entropy to ensure comprehensive representation. Simultaneously, it focuses on a limited number of key entities, thereby optimizing for efficiency. By utilizing an annotator-friendly design that capitalizes on the inherent structure of images, our approach significantly outperforms existing pixel-based methods, achieving superior results with minimal queries, specifically reducing click cost by 98% and enhancing performance by 1.71%. For instance, our technique requires a mere 40 clicks for annotation, a stark contrast to the 5000 clicks demanded by conventional methods.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# オンライン連続一般化カテゴリー発見 Online Continuous Generalized Category Discovery ( http://arxiv.org/abs/2408.13492v1 ) ライセンス: Link先を確認	Keon-Hee Park, Hakyung Lee, Kyungwoo Song, Gyeong-Moon Park,	(参考訳) コンピュータビジョンにおけるディープニューラルネットワークの進歩により、人工知能(AI)は現実世界の応用に広く利用されている。しかし、AIは依然として、新しいカテゴリー発見のような高度な人間の能力を模倣する際の限界に直面している。オフライン連続学習を利用した新しいカテゴリー発見手法が提案されているが、実環境におけるデータストリームの連続性は無視されている。本研究では,オンライン連続一般化カテゴリー発見(OCGCD)を紹介し,データストリームの動的性質について考察する。さらに,エネルギー誘導による新たなカテゴリーの発見と,エネルギーに基づくコントラッシブ・ロスによる差別的学習を促進する手法であるDEAN,ディスカバリ・バイ・エナジー・ガイダンス,機能拡張比Nを提案する。さらに、DECANは分散ベースの特徴拡張を通じて、ラベルなしデータを効果的に擬似ラベルする。実験の結果,提案手法はOCGCDシナリオにおいて優れた性能を発揮することが示された。 With the advancement of deep neural networks in computer vision, artificial intelligence (AI) is widely employed in real-world applications. However, AI still faces limitations in mimicking high-level human capabilities, such as novel category discovery, for practical use. While some methods utilizing offline continual learning have been proposed for novel category discovery, they neglect the continuity of data streams in real-world settings. In this work, we introduce Online Continuous Generalized Category Discovery (OCGCD), which considers the dynamic nature of data streams where data can be created and deleted in real time. Additionally, we propose a novel method, DEAN, Discovery via Energy guidance and feature AugmentatioN, which can discover novel categories in an online manner through energy-guided discovery and facilitate discriminative learning via energy-based contrastive loss. Furthermore, DEAN effectively pseudo-labels unlabeled data through variance-based feature augmentation. Experimental results demonstrate that our proposed DEAN achieves outstanding performance in proposed OCGCD scenario.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# 多目的強化学習における閾値レキソグラフィ Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning ( http://arxiv.org/abs/2408.13493v1 ) ライセンス: Link先を確認	Alperen Tercan, Vinayak S. Prabhu,	(参考訳) 語彙的多目的問題は、多くの現実のシナリオにおいて、目的に対して語彙的重要性の順序を課す。既存の強化学習では、語彙的タスクに直接対処する作業が不足している。ベルマン方程式はそれらに適用できないため、いくつかの提案されたアプローチは、理論的な保証なしにヒューリスティックであるとみなされた。さらに、これらの従来のアプローチの実践的適用性も、目標状態に到達できないなど、さまざまな問題に悩まされている。これらの問題のいくつかは以前にも知られていたが、本研究ではさらなる欠点を調査し、多くの場合、実用的なパフォーマンスを改善するための修正を提案する。また,Lexicographic Projection Optimization (LPO)アルゴリズムを用いた政策最適化手法を提案する。最後に,ベンチマーク問題に対する提案アルゴリズムの実証を行った。 Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# Bモード超音波画像からのヒップランドマーク検出のためのトポロジカルGCN Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images ( http://arxiv.org/abs/2408.13495v1 ) ライセンス: Link先を確認	Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi,	(参考訳) Bモード超音波を用いたコンピュータ支援診断 (CAD) は, 乳児の発達障害 (DDH) の診断に有効であることを示した。しかし, 超音波インジェスにおけるスペックルノイズの影響から, ヒップランドマークを正確に検出することは依然として課題である。本研究では,トポロジカルGCN (TGCN) と改良コンバータ (TGCN-ICF) を統合した新しいヒップランドマーク検出モデルを提案する。 TGCN-ICFには、熱マップを生成する改良コンバータ(ICF)サブネットワークと、ランドマーク検出をさらに洗練するTGCNサブネットワークという2つのサブネットワークが含まれている。このTGCNは、クラスラベルのガイダンスにより検出精度を効果的に向上させることができる。 Moreo-ver では,Multual Modulation Fusion (MMF) モジュールが開発され,ICF の U-Net と Transformer のブランチから抽出した特徴を深く改ざんし,融合する。実DDHデータセットにおける実験結果から,提案したTGCN-ICFが比較アルゴリズムのすべてより優れていることが示された。 The B-mode ultrasound based computer-aided diagnosis (CAD) has demonstrated its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants. However, due to effect of speckle noise in ultrasound im-ages, it is still a challenge task to accurately detect hip landmarks. In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance. The TGCN-ICF includes two subnet-works: an Improved Conformer (ICF) subnetwork to generate heatmaps and a TGCN subnetwork to additionally refine landmark detection. This TGCN can effectively improve detection accuracy with the guidance of class labels. Moreo-ver, a Mutual Modulation Fusion (MMF) module is developed for deeply ex-changing and fusing the features extracted from the U-Net and Transformer branches in ICF. The experimental results on the real DDH dataset demonstrate that the proposed TGCN-ICF outperforms all the compared algorithms.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# 虹彩周囲画像作成の可能性について On the Feasibility of Creating Iris Periocular Morphed Images ( http://arxiv.org/abs/2408.13496v1 ) ライセンス: Link先を確認	Juan E. Tapia, Sebastian Gonzalez, Daniel Benalcazar, Christoph Busch,	(参考訳) ここ数年、顔認識システム(FRS)の複雑な課題として、顔の変形が示されている。したがって,指紋,虹彩,その他の生体特性の評価は,生体システムを強化するために検討され,評価されなければならない。本研究は、画像レベルで虹彩形態を生成するためのエンドツーエンドのフレームワークを提案し、眼周囲虹彩画像から虹彩形態を生成する。このフレームワークは、ペア対象の選択、セグメンテーション、形態形成、新しい虹彩認識システムなど、さまざまな段階を考慮している。現実的な形態画像を作成するために、ランダムな選択と類似の半径サイズ選択という2つの対象選択法が検討されている。また,脆弱性解析と単一モーフィング検出アルゴリズムについても検討した。その結果,従来の虹彩認識システムと混同できる非常にリアルな画像が得られた。 In the last few years, face morphing has been shown to be a complex challenge for Face Recognition Systems (FRS). Thus, the evaluation of other biometric modalities such as fingerprint, iris, and others must be explored and evaluated to enhance biometric systems. This work proposes an end-to-end framework to produce iris morphs at the image level, creating morphs from Periocular iris images. This framework considers different stages such as pair subject selection, segmentation, morph creation, and a new iris recognition system. In order to create realistic morphed images, two approaches for subject selection are explored: random selection and similar radius size selection. A vulnerability analysis and a Single Morphing Attack Detection algorithm were also explored. The results show that this approach obtained very realistic images that can confuse conventional iris recognition systems.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# 因果強化学習における状態分散の再考 Rethinking State Disentanglement in Causal Reinforcement Learning ( http://arxiv.org/abs/2408.13498v1 ) ライセンス: Link先を確認	Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi,	(参考訳) 雑音に対処する際の強化学習(RL)における重要な課題の1つは、潜在状態を観測から推定することである。因果性は、根底にある状態が識別可能性によって一意に回復できることを保証するための厳密な理論的支援を提供する。その結果、いくつかの既存の研究は、アルゴリズムの設計を支援するために因果的な視点から識別可能性を確立することに重点を置いている。しかしながら、これらの結果はしばしば、特定のRLコンテキストを無視する純粋に因果的な視点から導かれる。我々はこの研究ラインを再考し、RL固有のコンテキストを取り入れることで、潜在状態に対する以前の識別可能性分析における不要な仮定を低減できることを示した。さらに重要なのは、これらの仮定を削除することで、アルゴリズム設計は、それらによって制約された以前の境界を超えることができることだ。これらの知見を生かして、従来手法の複雑な構造的制約を遷移と報酬保存の2つの単純な制約に置き換えることで、一般に観測可能なマルコフ決定過程(POMDP)の新たなアプローチを提案する。この2つの制約により、提案アルゴリズムは、基礎となる力学に忠実な状態とノイズを乱すことが保証される。広範囲なベンチマーク制御タスクによる実証的な証拠は、我々のアプローチが既存の手法よりも優れていることを示す。 One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# R2Gは3Dシーンで地平線に反応する R2G: Reasoning to Ground in 3D Scenes ( http://arxiv.org/abs/2408.13499v1 ) ライセンス: Link先を確認	Yixuan Li, Zan Wang, Wei Liang,	(参考訳) 本稿では,3次元シーン内の対象物体を理論的にグラウンド化するニューラルネットワークモデルであるReasoning to Ground (R2G)を提案する。従来の作業とは対照的に、R2Gは意味論的概念に基づくシーングラフで3Dシーンを明示的にモデル化し、オブジェクトエンティティ間での注意伝達を反復的にシミュレートすることで、ターゲットオブジェクトを最も高い確率でグラウンド化するプロセスを実現する。具体的には、事前に定義された意味語彙を用いて、グラフノード内に複数のオブジェクト特性を埋め込み、エッジ内にエンティティ間の空間的関係を埋め込む。注意伝達を導くために、私たちは、参照発話を分析して、同じ意味空間内の推論命令に変換する学習やプロンプトベースの手法を採用している。各推論ラウンドにおいて、R2Gは(1)命令と埋め込みエンティティプロパティの類似性と現在の注意分布をマージするか、(2)命令と埋め込み空間関係の類似性に基づいてシーングラフに注目を移す。 Sr3D/Nr3Dベンチマークの実験により、R2Gは3D言語接地のための新しいパスを破り、解釈可能性の改善を維持しながら、以前の作業と同等の結果を得ることが示された。 We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner. In contrast to prior works, R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities; thus makes the process of grounding the target objects with the highest probability interpretable. Specifically, we respectively embed multiple object properties within the graph nodes and spatial relations among entities within the edges, utilizing a predefined semantic vocabulary. To guide attention transferring, we employ learning or prompting-based methods to analyze the referential utterance and convert it into reasoning instructions within the same semantic space. In each reasoning round, R2G either (1) merges current attention distribution with the similarity between the instruction and embedded entity properties or (2) shifts the attention across the scene graph based on the similarity between the instruction and embedded spatial relations. The experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while maintaining improved interpretability, breaking a new path for 3D language grounding.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# 古代中国医学におけるエンティティ認識のための大規模言語モデルを用いた新型コロナウイルス文学の比較研究 Utilizing Large Language Models for Named Entity Recognition in Traditional Chinese Medicine against COVID-19 Literature: Comparative Study ( http://arxiv.org/abs/2408.13501v1 ) ライセンス: Link先を確認	Xu Tong, Nina Smirnova, Sharmila Upadhyaya, Ran Yu, Jack H. Culbert, Chao Sun, Wolfgang Otto, Philipp Mayr,	(参考訳) 目的: 新型コロナウイルスの文献に対するTCM内のさまざまなエンティティタイプやドメインをカバーするドメイン固有のNERタスクにおいて、ChatGPTや他の最先端のLLMのパフォーマンスを探索し、比較する。方法: 新型コロナウイルスに対するTCMに関する389項目のデータセットを作成し, その内48項目に3つのドメインに属する6種類のエンティティを手動で注釈付けし, LLMのNER性能を評価した。次に,ChatGPT (GPT-3.5, GPT-4) と4つの最先端BERTベースのQAモデル (RoBERTa, MiniLM, PubMedBERT, SciBERT) を用いて,特定のタスクを事前にトレーニングすることなく,NERタスクを実行した。ドメインファインチューニングモデル (GSAP-NER) も包括的な比較に応用された。結果: LLMの総合的な性能は, 正確な一致とファジィマッチにおいて有意に異なっていた。ファジィマッチでは、ChatGPTは6タスク中5タスクでBERTベースのQAモデルを上回ったが、正確なマッチでは、BERTベースのQAモデルは6タスク中5タスクでChatGPTを上回ったが、F-1の差は小さい。 GPT-4はファジィマッチにおける他のモデル、特にTCM式と中国の特許医薬品(TFD)および成分(IG)の実体型に対して有意な優位性を示した。 GPT-4は、エンティティタイプであるハーブ、ターゲット、研究方法においてBERTベースのモデルよりも優れていたが、F-1のスコアは0.5を超えなかった。 GSAP-NERはGPT-4よりもF-1よりもRMにわずかに差があった。 ChatGPTは、特にファジィマッチにおいて、精度よりもかなり高いリコールを達成した。結論: LLMのNERパフォーマンスはエンティティタイプに大きく依存しており、そのパフォーマンスはアプリケーションのシナリオによって異なります。高いリコールが好まれるシナリオでは、ChatGPTがよい選択になるかも知れません。しかし、厳密なシナリオでの知識獲得については、ChatGPTやBERTベースのQAモデルはプロの実践者のための既製のツールではない。 Objective: To explore and compare the performance of ChatGPT and other state-of-the-art LLMs on domain-specific NER tasks covering different entity types and domains in TCM against COVID-19 literature. Methods: We established a dataset of 389 articles on TCM against COVID-19, and manually annotated 48 of them with 6 types of entities belonging to 3 domains as the ground truth, against which the NER performance of LLMs can be assessed. We then performed NER tasks for the 6 entity types using ChatGPT (GPT-3.5 and GPT-4) and 4 state-of-the-art BERT-based question-answering (QA) models (RoBERTa, MiniLM, PubMedBERT and SciBERT) without prior training on the specific task. A domain fine-tuned model (GSAP-NER) was also applied for a comprehensive comparison. Results: The overall performance of LLMs varied significantly in exact match and fuzzy match. In the fuzzy match, ChatGPT surpassed BERT-based QA models in 5 out of 6 tasks, while in exact match, BERT-based QA models outperformed ChatGPT in 5 out of 6 tasks but with a smaller F-1 difference. GPT-4 showed a significant advantage over other models in fuzzy match, especially on the entity type of TCM formula and the Chinese patent drug (TFD) and ingredient (IG). Although GPT-4 outperformed BERT-based models on entity type of herb, target, and research method, none of the F-1 scores exceeded 0.5. GSAP-NER, outperformed GPT-4 in terms of F-1 by a slight margin on RM. ChatGPT achieved considerably higher recalls than precisions, particularly in the fuzzy match. Conclusions: The NER performance of LLMs is highly dependent on the entity type, and their performance varies across application scenarios. ChatGPT could be a good choice for scenarios where high recall is favored. However, for knowledge acquisition in rigorous scenarios, neither ChatGPT nor BERT-based QA models are off-the-shelf tools for professional practitioners.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# タタミプリンター:タタミパッズ用物理ZKP Tatami Printer: Physical ZKPs for Tatami Puzzles ( http://arxiv.org/abs/2408.13507v1 ) ライセンス: Link先を確認	Suthee Ruangwises,	(参考訳) 畳パズル(たたみパズル、英: Tatami puzzles)は、矩形格子を四つの領域がコーナーポイントを共有しないような長方形領域に分割する目的を持つ鉛筆と紙の論理パズルである。本稿では,タタミパズルの解法を検証するために,タタミプリンタと呼ばれる物理カードベースのプロトコルを開発する。また、タタミプリンタを用いて、タタミバリとスクエアジャムという2つのパズルの物理的ゼロ知識証明プロトコルを構築する。これらのプロトコルにより、証明者はパズルの解の存在を証明者が明らかにすることなく示すことができる。 Tatami puzzles are pencil-and-paper logic puzzles with an objective to partition a rectangular grid into rectangular regions such that no four regions share a corner point, as well as satisfying other constraints. In this paper, we develop a physical card-based protocol called Tatami printer that can help verify solutions of Tatami puzzles. We also use the Tatami printer to construct physical zero-knowledge proof protocols for two such puzzles: Tatamibari and Square Jam. These protocols enable a prover to show a verifier the existence of the puzzles' solutions without revealing them.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# G3DST: シーンとスタイルにわたるニューラルラジアンス場を用いた3次元移動の一般化 G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles ( http://arxiv.org/abs/2408.13508v1 ) ライセンス: Link先を確認	Adil Meric, Umut Kocasari, Matthias Nießner, Barbara Roessle,	(参考訳) NeRF(Neural Radiance Fields)は、高精細でフォトリアリスティックなシーンを作るための強力なツールとして登場した。既存のNeRFベースの3Dスタイル転送手法では、シングルまたは複数スタイルのシーンごとの最適化が必要であり、3Dスタイル転送の適用性と効率が制限される。本研究では, シーンごとの最適化やスタイルごとの最適化を必要とせずに, NeRF からスタイリングされた新しいビューをレンダリングすることで, 既存の手法の限界を克服する。この目的のために、一般化可能なNeRFモデルを利用して3次元のスタイル伝達を容易にし、様々な場面で1つの学習モデルを使用することを可能にした。ハイパーネットワークを一般化可能なNeRFに組み込むことで,スタイリングされた新規ビューをオンザフライで生成することが可能になる。さらに,複数のビューにまたがる一貫性を維持するために,新しいフローベース多視点整合性損失を導入する。シーン固有の暗黙的モデルを必要としない高品質で多視点整合性のあるスタイリング画像を生成する上で,これらの手法を様々なシーンや芸術的スタイルで評価し,その性能を示す。以上の結果から,本手法はシーンごとの手法に匹敵する良質な視覚的品質を実現するだけでなく,効率性や適用性も著しく向上し,3Dスタイル転送の分野における顕著な進歩を示すことが示唆された。 Neural Radiance Fields (NeRF) have emerged as a powerful tool for creating highly detailed and photorealistic scenes. Existing methods for NeRF-based 3D style transfer need extensive per-scene optimization for single or multiple styles, limiting the applicability and efficiency of 3D style transfer. In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization. To this end, we take advantage of a generalizable NeRF model to facilitate style transfer in 3D, thereby enabling the use of a single learned model across various scenes. By incorporating a hypernetwork into a generalizable NeRF, our approach enables on-the-fly generation of stylized novel views. Moreover, we introduce a novel flow-based multi-view consistency loss to preserve consistency across multiple views. We evaluate our method across various scenes and artistic styles and show its performance in generating high-quality and multi-view consistent stylized images without the need for a scene-specific implicit model. Our findings demonstrate that this approach not only achieves a good visual quality comparable to that of per-scene methods but also significantly enhances efficiency and applicability, marking a notable advancement in the field of 3D style transfer.	翻訳日:2024-08-27 19:19:21 公開日:2024-08-24
# DualAnoDiff:Few-Shot異常画像生成のためのDual-Interrelated Diffusion Model DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation ( http://arxiv.org/abs/2408.13509v1 ) ライセンス: Link先を確認	Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Hao Chen, Jiafu Wu, Wenbing Zhu, Mingmin Chi, Jun Liu, Yabiao Wang, Chengjie Wang,	(参考訳) 製造業における異常検査の性能は異常データの不足によって制約される。この課題を克服するために、研究者は異常データセットを増大させるために異常生成アプローチを採用し始めた。しかし、既存の異常生成法は、生成した異常の多様性が限られており、この異常を元の画像とシームレスに融合させるのに苦労している。本稿では,これらの課題を新たな視点から克服し,全体像と対応する異常部分を同時に生成する。本稿では,新しい拡散型少数ショット画像生成モデルであるDualAnoDiffを提案する。このモデルでは,2つの相互関連拡散モデルを用いて多種多様な現実的な画像を生成することができ,一方が画像全体を生成するのに使われ,他方が異常部分を生成する。さらに,背景情報や形状情報を抽出することで,画像生成時の歪みやぼやけを緩和する。集約的な実験は,本提案モデルが現実主義と多様性の両方の観点から,最先端の手法よりも優れていることを示す。本手法は, 異常検出, 異常局所化, 異常分類タスクなど, 下流異常検出タスクの性能を大幅に向上させる。 The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of both realism and diversity. Overall, our approach significantly improves the performance of downstream anomaly detection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 六方晶窒化ホウ素のスピン対からの量子放出 Quantum Emission from Coupled Spin Pairs in Hexagonal Boron Nitride ( http://arxiv.org/abs/2408.13515v1 ) ライセンス: Link先を確認	Song Li, Anton Pershin, Adam Gali,	(参考訳) 広帯域ギャップ材料における光学的に対応可能な欠陥量子ビットは、室温量子情報処理の候補として好ましい。 2次元(2次元)六方晶窒化ホウ素(hBN)は、量子メモリで明るい量子エミッタをホストする優れた固体プラットフォームであり、2次元材料のポテンシャルを利用して欠陥量子ビットのスケーラブルな調製を実現する。室温の明るい欠陥量子ビットは近年hBNで報告されているが、その微視的起源は、光学遷移の性質と光学的に検出された磁気共鳴(ODMR)の性質が解明されていない。ここでは、量子エミッタの光安定性とスペクトル拡散を、アブイニシアト計算を用いてhBN内のドナー・アクセプター対(DAP)に接続する。 DAPは、ドナーパートナーに依存する非ゼロ磁場において、S = 1/2基底状態の欠陥対のアクセプター対に対してODMR信号を示すことができる。ドナー・アクセプターペアモデルとその遷移機構は、量子アプリケーションのためのhBNにおける欠陥量子ビット識別と性能最適化のためのレシピを提供する。 Optically addressable defect qubits in wide band gap materials are favorable candidates for room temperature quantum information processing. The two dimensional (2D) hexagonal boron nitride (hBN) is an attractive solid state platform with a great potential for hosting bright quantum emitters with quantum memories with leveraging the potential of 2D materials for realizing scalable preparation of defect qubits. Although, room temperature bright defect qubits have been recently reported in hBN but their microscopic origin, the nature of the optical transition as well as the optically detected magnetic resonance (ODMR) have been remained elusive. Here we connect the photostability and spectral diffusion of quantum emitters to donor-acceptor pairs (DAP) in hBN by means of ab initio calculations. We find that DAPs can exhibit ODMR signal for the acceptor counterpart of the defect pair with S = 1/2 ground state at non-zero magnetic fields depending on the donor partner. The donor-acceptor pair model and its transition mechanisms provide a recipe towards defect qubit identification and performance optimization in hBN for quantum applications.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# AnoPLe: 正常サンプルのみを用いた双方向プロンプト学習によるFew-Shot異常検出 AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples ( http://arxiv.org/abs/2408.13516v1 ) ライセンス: Link先を確認	Yujin Lee, Seoyoon Jang, Hyunsoo Yoon,	(参考訳) FAD (Few-shot Anomaly Detection) は、トレーニングサンプルの入手が限られ、異常サンプルの欠如が頻発しているため、重大な課題となっている。従来のアプローチでは、検出を改善するためにアノテーションや真の異常サンプルに頼っていたが、このようなテキストや視覚的な手がかりは必ずしもアクセスできない。そこで本稿では,異常検出のためのマルチモーダル・プロンプト学習手法であるAnoPLeを紹介する。 AnoPLeは異常をシミュレートし、テキストと視覚のプロンプトを双方向に結合して2つのモード間の深い相互作用を促進する。さらに,学習可能なマルチビュー信号と軽量デコーダを統合し,局所的意味理解を高めるために,マルチスケール画像に基づいて訓練する。パフォーマンスをさらに向上するため、我々は、画像レベルの異常の理解を深め、グローバルとローカルのセマンティクスを整合させる。実験の結果、AnoPLe は MVTec-AD と VisA で 94.1% と 86.2% Image AUROC をそれぞれ記録し、真の異常に晒されていないにもかかわらず、SoTA と比較して 1% の差しか示さなかった。コードはhttps://github.com/YoojLee/AnoPLe.comで入手できる。 Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for anomaly detection without prior knowledge of anomalies. AnoPLe simulates anomalies and employs bidirectional coupling of textual and visual prompts to facilitate deep interaction between the two modalities. Additionally, we integrate a lightweight decoder with a learnable multi-view signal, trained on multi-scale images to enhance local semantic comprehension. To further improve performance, we align global and local semantics, enriching the image-level understanding of anomalies. The experimental results demonstrate that AnoPLe achieves strong FAD performance, recording 94.1% and 86.2% Image AUROC on MVTec-AD and VisA respectively, with only around a 1% gap compared to the SoTA, despite not being exposed to true anomalies. Code is available at https://github.com/YoojLee/AnoPLe.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 強化学習によるスケーラブルな類似性を考慮したテストスイートの最小化 Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning ( http://arxiv.org/abs/2408.13517v1 ) ライセンス: Link先を確認	Sijia Gu, Ali Mesbah,	(参考訳) Multi-Criteria Test Suite Minimization (MCTSM)問題は、コードカバレッジや障害検出機能といった適切な基準でガイドされた冗長なテストケースを削除することで、テストスイートを洗練することを目的としている。しかし、現在の手法は、その実用性を制限するNPハードの性質により、高い障害検出能力を失うか、スケーラビリティの課題に直面している。 Integer Linear Program (ILP) に記述カバレッジや障害検出能力などの従来の基準とテストカバレッジの類似性を統合したTripRLを提案する。 TripRLは2部グラフ表現とその埋め込みを利用して、簡潔なILP定式化を行い、ILPと効果的な強化学習(RL)トレーニングを組み合わせる。この組み合わせにより、大規模テストスイートの最小化がよりスケーラブルになり、テストの有効性が向上する。実験により, TripRL のランタイムは MCTSM 問題の大きさと線形にスケールすることを示した。特に、既存のアプローチが妥当な時間枠でソリューションを提供できない大規模なテストスイートでは、我々の技術は47分以内でソリューションを継続的に提供します。 TripRLによって生成されたテストスイートの削減は、未知の障害を検出する可能性が高くながら、元のステートメントカバレッジと障害検出能力も維持している。 The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to refine test suites by removing redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# Token-Level Reward関数推定による選択的選好最適化 Selective Preference Optimization via Token-Level Reward Function Estimation ( http://arxiv.org/abs/2408.13518v1 ) ライセンス: Link先を確認	Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou,	(参考訳) 大規模言語モデルのアライメントの最近の進歩は、トークンレベルの監督を利用して、きめ細かい好みの最適化を行う。しかし、既存のトークンレベルのアライメントメソッドは、ノイズが多く非効率なすべてのトークンを最適化するか、複雑で高価なキー選択戦略で選択的なトレーニングを実行する。本研究では,鍵トークン選択を効率よく行う新しい選択的アライメント戦略であるセレクティブ・パラメータ最適化(SePO)を提案する。 SePOは直接選好最適化(DPO)に基づく最初のトークン選択法を提案し、ターゲットデータ上でトークンレベルの報酬関数を推定するためにオラクルモデルを訓練する。この方法は、応答レベルのアノテーションを持つ既存のアライメントデータセットに適用され、小規模のオラクルモデルとトレーニングデータによるコスト効率の高いトークン選択を可能にする。次に、推定された報酬関数を使用して、ターゲットデータセット内のすべてのトークンをスコアし、キートークンのみを選択して、参照モデルなしのコントラスト目的関数でターゲットポリシーモデルを監督する。 3つの公開評価ベンチマークの大規模な実験により、SEPOはターゲットデータセット上の30%のキートークンを最適化するだけで、競合するベースラインメソッドを著しく上回ります。弱強一般化に対するSePOの応用は、弱いオラクルモデルは最大16.8倍のパラメータを持つ強いポリシーモデルを効果的に監督することを示している。 SePOはまた、配布外データからキートークンを効果的に選択し、強力なポリシーモデルを強化し、過度な最適化問題を緩和する。 Recent advancements in large language model alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be noisy and inefficient, or perform selective training with complex and expensive key token selection strategies. In this work, we propose Selective Preference Optimization (SePO), a novel selective alignment strategy that centers on efficient key token selection. SePO proposes the first token selection method based on Direct Preference Optimization (DPO), which trains an oracle model to estimate a token-level reward function on the target data. This method applies to any existing alignment datasets with response-level annotations and enables cost-efficient token selection with small-scale oracle models and training data. The estimated reward function is then utilized to score all tokens within the target dataset, where only the key tokens are selected to supervise the target policy model with a reference model-free contrastive objective function. Extensive experiments on three public evaluation benchmarks show that SePO significantly outperforms competitive baseline methods by only optimizing 30% key tokens on the target dataset. SePO applications on weak-to-strong generalization show that weak oracle models effectively supervise strong policy models with up to 16.8x more parameters. SePO also effectively selects key tokens from out-of-distribution data to enhance strong policy models and alleviate the over-optimization problem.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# WebXRとAフレームを用いたオープンでクロスプラットフォームなWebベースメタバース An Open, Cross-Platform, Web-Based Metaverse Using WebXR and A-Frame ( http://arxiv.org/abs/2408.13520v1 ) ライセンス: Link先を確認	Giuseppe Macario,	(参考訳) メタバースはここ数年、文学や産業で注目を集めてきたが、オープンでクロスプラットフォームなアーキテクチャが欠如しているため、相互に通信できない多くの異なるメタバースが生まれている。本研究では,A-FrameフレームワークとNetworked-Aframeフレームワークを用いて,Webと拡張現実デバイスの両方からアクセス可能な,オープンかつ相互運用可能なメタバースを考慮した空間Webアプリを開発するための,WebXRベースのクロスプラットフォームアーキテクチャを提案する。プロトタイプが実装され、評価され、様々なプラットフォームやデバイスにまたがる没入的な体験を可能にする技術スタックの機能をサポートする。没入型環境の使いやすさに対する肯定的なフィードバックは、提案したアプローチをさらに裏付け、エンゲージメントとインタラクティブな仮想空間の促進における効果を強調する。相互運用性と傾きの原則に従うことで、Tim Berners-Lee氏のWorld Wide Webというビジョンを、地理的および技術的な境界を越えるオープンなプラットフォームとして実現しています。 The metaverse has received much attention in the literature and industry in the last few years, but the lack of an open and cross-platform architecture has led to many distinct metaverses that cannot communicate with each other. This work proposes a WebXR-based cross-platform architecture for developing spatial web apps using the A-Frame and Networked-Aframe frameworks with a view to an open and interoperable metaverse, accessible from both the web and extended reality devices. A prototype was implemented and evaluated, supporting the capability of the technology stack to enable immersive experiences across different platforms and devices. Positive feedback on ease of use of the immersive environment further corroborates the proposed approach, underscoring its effectiveness in facilitating engaging and interactive virtual spaces. By adhering to principles of interoperability and inclusivity, it lives up to Tim Berners-Lee's vision of the World Wide Web as an open platform that transcends geographical and technical boundaries.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# HRGraph:情報伝達に基づく求人勧告による人事データ知識グラフのLLM活用 HRGraph: Leveraging LLMs for HR Data Knowledge Graphs with Information Propagation-based Job Recommendation ( http://arxiv.org/abs/2408.13521v1 ) ライセンス: Link先を確認	Azmine Toushik Wasi,	(参考訳) セマンティック・ネットワークとして機能する知識グラフ(KG)は、知識の進化に容易に適応できる、統一され、コンテキスト化され、構造化された表現を提供することによって、異なるドメインにおける複雑な相互接続データを管理するのに非常に効果的である。複雑な人事(HR)データを処理するKGは、採用、仕事のマッチング、学習ギャップの識別、従業員の維持強化など、さまざまな人事機能に役立ちます。その可能性にもかかわらず、実用的な人事知識グラフを実装するための限られた努力がなされている。本研究では,大規模言語モデルを用いた文書から人事知識グラフを効果的に開発するためのフレームワークを提案することにより,このギャップを解消する。結果として得られるKGは、ジョブマッチング、従業員スキルギャップの特定など、さまざまなダウンストリームタスクに使用することができる。この研究では、HR KGが正確な仕事のマッチングに役立ち、雇用主と従業員の両方に利点をもたらす事例を紹介します。 KGs と Graph Neural Nets の情報伝達実験による実証的証拠とケーススタディは、仕事や従業員の推薦や仕事領域の分類といったタスクにおける KGs の有効性を裏付けるものである。コードとデータは、https://github.com/azminewasi/HRGraphで入手できる。 Knowledge Graphs (KGs) serving as semantic networks, prove highly effective in managing complex interconnected data in different domains, by offering a unified, contextualized, and structured representation with flexibility that allows for easy adaptation to evolving knowledge. Processing complex Human Resources (HR) data, KGs can help in different HR functions like recruitment, job matching, identifying learning gaps, and enhancing employee retention. Despite their potential, limited efforts have been made to implement practical HR knowledge graphs. This study addresses this gap by presenting a framework for effectively developing HR knowledge graphs from documents using Large Language Models. The resulting KG can be used for a variety of downstream tasks, including job matching, identifying employee skill gaps, and many more. In this work, we showcase instances where HR KGs prove instrumental in precise job matching, yielding advantages for both employers and employees. Empirical evidence from experiments with information propagation in KGs and Graph Neural Nets, along with case studies underscores the effectiveness of KGs in tasks such as job and employee recommendations and job area classification. Code and data are available at : https://github.com/azminewasi/HRGraph	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 異常検出のためのエンコーダのみアーキテクチャを用いた直交遅延空間の因子的学習 : 警報管理の観点から Learning a Factorized Orthogonal Latent Space using Encoder-only Architecture for Fault Detection; An Alarm management perspective ( http://arxiv.org/abs/2408.13526v1 ) ライセンス: Link先を確認	Vahid MohammadZadeh Eivaghi, Mahdi Aliyari Shoorehdeli,	(参考訳) 産業断層検出システムにおける誤報やニュアンスアラームは、しばしば不確実性によって引き起こされ、通常のプロセス変動変動は誤って断層として特定される。本稿では, プロセス変数の確率的, 決定論的成分を, 検出遅延を伴わずに効果的に分離する, エンコーダに基づく残差設計を提案する。提案モデルは2つの異なるエンコーダを用いて、潜在空間を2つの直交空間に分解する: 1つは決定的部分、もう1つは確率的部分である。所望の空間の識別可能性を確保するため、トレーニング中に制約を適用する。決定性空間は、決定性を保証するために滑らかに制約される一方、確率性空間は標準ガウスノイズに類似するように要求される。さらに、デコレーションという用語は、学習された表現の独立を強制する。このアプローチの有効性は、数値的な例を通して示され、テネシー・イーストマン法に応用され、堅牢な断層検出の可能性を強調している。決定論理を決定論的要因のみに焦点をあてることで、提案モデルは、ほぼゼロの誤報と検出の欠如を達成しつつ、予測品質を著しく向上させ、産業環境における運用安全性と整合性を向上させる道を開く。 False and nuisance alarms in industrial fault detection systems are often triggered by uncertainty, causing normal process variable fluctuations to be erroneously identified as faults. This paper introduces a novel encoder-based residual design that effectively decouples the stochastic and deterministic components of process variables without imposing detection delay. The proposed model employs two distinct encoders to factorize the latent space into two orthogonal spaces: one for the deterministic part and the other for the stochastic part. To ensure the identifiability of the desired spaces, constraints are applied during training. The deterministic space is constrained to be smooth to guarantee determinism, while the stochastic space is required to resemble standard Gaussian noise. Additionally, a decorrelation term enforces the independence of the learned representations. The efficacy of this approach is demonstrated through numerical examples and its application to the Tennessee Eastman process, highlighting its potential for robust fault detection. By focusing decision logic solely on deterministic factors, the proposed model significantly enhances prediction quality while achieving nearly zero false alarms and missed detections, paving the way for improved operational safety and integrity in industrial environments.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 一般化空間変調の最大近似検出のためのグロバー適応探索 Grover Adaptive Search for Maximum Likelihood Detection of Generalized Spatial Modulation ( http://arxiv.org/abs/2408.13531v1 ) ライセンス: Link先を確認	Kein Yukiyoshi, Taku Mikuriya, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Naoki Ishikawa,	(参考訳) 本稿では、一般化空間変調(GSM)信号の最大極大検出(MLD)のための量子支援ソリューションを提案する。具体的には、GSMのMDDは、まず新しい多項式最適化問題として定式化され、次いでGrover適応探索という量子アルゴリズムが適用される。提案手法の問合せ複雑性に関する性能を数値解析により評価し, 提案手法はフォールトトレラント量子計算において, データシンボルの数とコンステレーションサイズが相対的に大きい場合, 古典解よりも優れていることを示す。 We propose a quantum-assisted solution for the maximum likelihood detection (MLD) of generalized spatial modulation (GSM) signals. Specifically, the MLD of GSM is first formulated as a novel polynomial optimization problem, followed by the application of a quantum algorithm, namely, the Grover adaptive search. The performance in terms of query complexity of the proposed method is evaluated and compared to the classical alternative via a numerical analysis, which reveals that under fault-tolerant quantum computation, the proposed method outperforms the classical solution if the number of data symbols and the constellation size are relatively large.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 実効弾性特性のリアルタイム予測と高速逆設計によるFFT系メタマテリアルのサロゲートモデリング FFT-based surrogate modeling of auxetic metamaterials with real-time prediction of effective elastic properties and swift inverse design ( http://arxiv.org/abs/2408.13532v1 ) ライセンス: Link先を確認	Hooman Danesh, Daniele Di Lorenzo, Francisco Chinesta, Stefanie Reese, Tim Brepols,	(参考訳) 負ポアソン比で知られる公理構造は、その基盤となる構造幾何学と基底材料特性に強く影響される効果的な弾性特性を示す。軸単位細胞の周期的均質化はこれらの特性の研究に利用できるが、計算コストが高く、設計空間の探索や逆解析に制限がある。本稿では,異なる形状の直交ヴォイドを持つ補助単位細胞の有効弾性特性をリアルタイムに予測するための代理モデルを開発した。単位細胞は、長方形、ダイヤモンド、楕円形、ピーナッツ形のヴォイドを含む4つの異なる形状の直交ヴォイドを特徴とする。生成したサロゲートモデルは、ベース材料の幾何学的パラメータと弾性特性を入力として受け入れ、有効弾性定数をリアルタイムで予測する。この迅速な評価により、所望の有効応答をもたらす最適な設計パラメータを得るための実用的な逆解析フレームワークが実現される。高速フーリエ変換(FFT)に基づくホモジェナイゼーション手法を用いて、周期メッシュの生成や有限要素法(FEM)に典型的に関連する境界条件への懸念を回避し、サロゲートモデルを開発するためのデータを効率的に生成する。生成したサロゲートモデルの性能について,列車/テスト分割手法,パラメトリックスタディ,逆問題を用いて厳密に検討した。最後に,グラフィカルユーザインタフェース(GUI)を開発し,有効接点剛性のリアルタイム予測と,最適パラメータを決定するための逆解析を行う。 Auxetic structures, known for their negative Poisson's ratio, exhibit effective elastic properties heavily influenced by their underlying structural geometry and base material properties. While periodic homogenization of auxetic unit cells can be used to investigate these properties, it is computationally expensive and limits design space exploration and inverse analysis. In this paper, surrogate models are developed for the real-time prediction of the effective elastic properties of auxetic unit cells with orthogonal voids of different shapes. The unit cells feature orthogonal voids in four distinct shapes, including rectangular, diamond, oval, and peanut-shaped voids, each characterized by specific void diameters. The generated surrogate models accept geometric parameters and the elastic properties of the base material as inputs to predict the effective elastic constants in real-time. This rapid evaluation enables a practical inverse analysis framework for obtaining the optimal design parameters that yield the desired effective response. The fast Fourier transform (FFT)-based homogenization approach is adopted to efficiently generate data for developing the surrogate models, bypassing concerns about periodic mesh generation and boundary conditions typically associated with the finite element method (FEM). The performance of the generated surrogate models is rigorously examined through a train/test split methodology, a parametric study, and an inverse problem. Finally, a graphical user interface (GUI) is developed, offering real-time prediction of the effective tangent stiffness and performing inverse analysis to determine optimal geometric parameters.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# Pandora の Box あるいは Aladdin の Lamp: 大規模言語モデルにおける RAG ノイズの役割を包括的に分析する Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models ( http://arxiv.org/abs/2408.13533v1 ) ライセンス: Link先を確認	Jinyang Wu, Feihu Che, Chuyuan Zhang, Jianhua Tao, Shuai Zhang, Pengpeng Shao,	(参考訳) 大規模言語モデル(LLM)における幻覚に対処するための重要な手法として,検索型拡張生成(RAG)が登場している。最近の研究はRAGモデルを複雑な雑音のシナリオにまで拡張しているが、これらの探索はしばしば限定的なノイズタイプに限定し、ノイズはLLMに本質的に有害であり、現実の検索環境から逸脱し、実用的な適用性を制限する可能性があることを前提にしている。本稿では,言語的観点から7つの異なるノイズタイプを定義し,複数のデータセットと推論タスクを含む総合的な評価フレームワークであるNoiserBench(NoiserBench)を確立する。種々の構造と規模を持つ8つのLLMの実証評価により,これらのノイズは,LLMに有益である雑音(高能音)とLLMに有害である雑音(高能音)の2つの実用的なグループにさらに分類できることが判明した。有害なノイズは一般的に性能を損なうが、有益なノイズはモデル機能と全体的なパフォーマンスのいくつかの側面を強化する可能性がある。我々の分析は、より堅牢で適応可能なRAGソリューションを開発し、多様な検索シナリオにまたがる幻覚を緩和するための洞察を提供する。 Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs). While recent research has extended RAG models to complex noisy scenarios, these explorations often confine themselves to limited noise types and presuppose that noise is inherently detrimental to LLMs, potentially deviating from real-world retrieval environments and restricting practical applicability. In this paper, we define seven distinct noise types from a linguistic perspective and establish a Noise RAG Benchmark (NoiserBench), a comprehensive evaluation framework encompassing multiple datasets and reasoning tasks. Through empirical evaluation of eight representative LLMs with diverse architectures and scales, we reveal that these noises can be further categorized into two practical groups: noise that is beneficial to LLMs (aka beneficial noise) and noise that is harmful to LLMs (aka harmful noise). While harmful noise generally impairs performance, beneficial noise may enhance several aspects of model capabilities and overall performance. Our analysis offers insights for developing more robust, adaptable RAG solutions and mitigating hallucinations across diverse retrieval scenarios.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# メヌスの文化的適応 : 微粒化アプローチ Cultural Adaptation of Menus: A Fine-Grained Approach ( http://arxiv.org/abs/2408.13534v1 ) ライセンス: Link先を確認	Zhonghe Zhang, Xiaoyu He, Vivek Iyer, Alexandra Birch,	(参考訳) CSI(Machine Translation of Culture-Specific Items)は、重要な課題である。 CSI翻訳に関する最近の研究は、様々な言語や文化に適応するために、LLM(Large Language Models)を用いたいくつかの成功例を示しているが、それぞれの手法の利点や落とし穴を調べるためには、より深い分析が必要である。本稿では,中国語と英語のメニューコーパスで最大である ChineseMenuCSI データセットについて紹介する。我々は、よりニュアンスな分析のためのCSIの3つのレベルを定義し、多くのカテゴリにおいてGPTベースのプロンプトよりも優れた自動CSI識別手法を開発した。重要なことは、人間翻訳理論をLLM駆動翻訳プロセスに統合し、COMETスコアを最大7ポイント増加させ、翻訳精度を大幅に向上させることである。 Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures; however, a deeper analysis is needed to examine the benefits and pitfalls of each method. In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels and a fine-grained test set. We define three levels of CSI figurativeness for a more nuanced analysis and develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories. Importantly, we are the first to integrate human translation theories into LLM-driven translation processes, significantly improving translation accuracy, with COMET scores increasing by up to 7 points.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# 少数者からの学習:限られたデータセットを用いた小児手首病理診断へのきめ細かいアプローチ Learning from the few: Fine-grained approach to pediatric wrist pathology recognition on a limited dataset ( http://arxiv.org/abs/2408.13542v1 ) ライセンス: Link先を確認	Ammar Ahmed, Ali Shariq Imran, Zenun Kastrati, Sher Muhammad Daudpota, Mohib Ullah, Waheed Noord,	(参考訳) 特に子どもや青年に共通する骨折は、重大な診断上の課題である。 X線画像は依然として一般的な診断ツールであるが、誤解釈率の増加は、特に多くの外科医や医師の間で専門的な訓練が欠如していることを考えると、より正確な分析の必要性を浮き彫りにしている。近年の深部畳み込みニューラルネットワークの進歩は、外傷X線における病理検出の自動化を約束している。しかしながら、X線における小児手首の病態の微妙な変化を区別することは依然として困難である。従来の手作業による注釈は効果的だが、精巧で費用がかかり、専門的な専門知識を必要とする。本稿では,手動による介入を伴わずに,X線における識別領域を自動的に同定することを目的とした,小児手首病理診断の課題を,きめ細かなアプローチで解決する。我々は、アブレーション解析とLIONの統合により、きめ細かいアーキテクチャを洗練する。説明可能なAIテクニックであるGrad-CAMを活用することで、これらの領域を強調します。実世界の医学研究の制約を反映した限られたデータを用いても,本手法は,拡張テストとオリジナルテストの両方において,最先端の画像認識モデルよりも一貫して優れている。提案した改良されたアーキテクチャは,ベースライン法に比べて1.06%,1.25%の精度向上を実現し,それぞれ86%,84%の精度向上を実現している。さらに, 骨折感度は97%と高い値を示し, 手首の病態認識を向上する可能性が示唆された。実装コードはhttps://github.com/ammarlodhi255/fine-fine-approach-to-wrist-pathology-recognitionで見ることができる。 Wrist pathologies, {particularly fractures common among children and adolescents}, present a critical diagnostic challenge. While X-ray imaging remains a prevalent diagnostic tool, the increasing misinterpretation rates highlight the need for more accurate analysis, especially considering the lack of specialized training among many surgeons and physicians. Recent advancements in deep convolutional neural networks offer promise in automating pathology detection in trauma X-rays. However, distinguishing subtle variations between {pediatric} wrist pathologies in X-rays remains challenging. Traditional manual annotation, though effective, is laborious, costly, and requires specialized expertise. {In this paper, we address the challenge of pediatric wrist pathology recognition with a fine-grained approach, aimed at automatically identifying discriminative regions in X-rays without manual intervention. We refine our fine-grained architecture through ablation analysis and the integration of LION.} Leveraging Grad-CAM, an explainable AI technique, we highlight these regions. Despite using limited data, reflective of real-world medical study constraints, our method consistently outperforms state-of-the-art image recognition models on both augmented and original (challenging) test sets. {Our proposed refined architecture achieves an increase in accuracy of 1.06% and 1.25% compared to the baseline method, resulting in accuracies of 86% and 84%, respectively. Moreover, our approach demonstrates the highest fracture sensitivity of 97%, highlighting its potential to enhance wrist pathology recognition. The implementation code can be found at https://github.com/ammarlodhi255/fine-grained-approach-to-wrist-pathology-recognition	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# IQA-EVAL:人間モデル対話型質問応答の自動評価 IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering ( http://arxiv.org/abs/2408.13545v1 ) ライセンス: Link先を確認	Ruosen Li, Barry Wang, Ruochen Li, Xinya Du,	(参考訳) 質問応答(QA)のための大規模言語モデル(LLM)を評価するために、従来の手法は、与えられた質問と文脈に基づいてモデルが生成した即時応答を直接評価することに重点を置いている。 AIアシスタントの助けを求める人間の一般的な場合において、これらの非対話的評価は人間のモデル会話のダイナミックな性質を考慮せず、相互作用認識による評価は、正確なQAモデルが人間に好まれていることを示している(Lee et al , 2023)。 HCI(Human-Computer Interaction)の最近の研究は、人間による評価を用いて、対話や評価を行っているが、それらはしばしば、スケールするのに非常に高価で時間を要する。本研究では,対話型質問応答評価のための自動評価フレームワークIQA-EVALを導入する。具体的には, LLMに基づく評価エージェント(LEA)を導入し, 1) IQAモデルとのインタラクションを生成するための人間の振る舞いをシミュレートし, (2) 生成したインタラクションを自動的に評価する。さらに,実際の人間評価者のグループをより良くシミュレートするために,LEAにペルソナを割り当てることを提案する。 1) GPT-4 (あるいは Claude) をバックボーンモデルとした評価フレームワークは, IQAタスクにおける人的評価と高い相関を達成し, 2) 観衆をより良く表現するためにLAAにペルソナを割り当てることにより, 相関性は大幅に向上する。最後に、我々の自動測定値を用いて、複雑で曖昧な質問応答タスクから1000を超える質問を5つ評価する。 To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on directly assessing the immediate responses generated by the models based on the given question and context. In the common use case of humans seeking AI assistant's help in finding information, these non-interactive evaluations do not account for the dynamic nature of human-model conversations, and interaction-aware evaluations have shown that accurate QA models are preferred by humans (Lee et al., 2023). Recent works in human-computer interaction (HCI) have employed human evaluators to conduct interactions and evaluations, but they are often prohibitively expensive and time-consuming to scale. In this work, we introduce an automatic evaluation framework IQA-EVAL to Interactive Question Answering Evaluation. More specifically, we introduce LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. Moreover, we propose assigning personas to LEAs to better simulate groups of real human evaluators. We show that: (1) our evaluation framework with GPT-4 (or Claude) as the backbone model achieves a high correlation with human evaluations on the IQA task; (2) assigning personas to LEA to better represent the crowd further significantly improves correlations. Finally, we use our automatic metric to evaluate five recent representative LLMs with over 1000 questions from complex and ambiguous question answering tasks, which comes with a substantial cost of $5k if evaluated by humans.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# ダブルダイナミクスを有する車両ネットワークのための機械(SoM)強化ISACプリコーディングの合成 Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics ( http://arxiv.org/abs/2408.13546v1 ) ライセンス: Link先を確認	Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang,	(参考訳) 統合センシング・通信(ISAC)技術は車載ネットワークにおいて重要な役割を担っている。しかし、このコンテキスト内の通信チャネルは時間的特性を示し、潜在的なターゲットは急速に移動し、二重ダイナミクスをもたらす可能性がある。これらは、まだ徹底的に調査されていないリアルタイムISACプリコーディング設計において重要な課題である。最適化に基づくプリコーディング法は広く研究されているが、計算的に複雑であり、ダブルダイナミクスの状況ではめったに利用できない完全な事前情報に大きく依存している。本稿では,基地局が位置決めやチャネル情報などの様々なモダリティを活用して二重ダイナミクスに適応し,環境情報を利用してISAC性能境界を深層強化学習フレームワークで拡張する,SoM(SoM)強化プリコーディングのシンセサイザを提案する。さらに、パラメータ共有アクタークリティカルアーキテクチャは、複雑な状態やアクション空間でのトレーニングを迅速化するように設計されている。提案手法が既存手法よりも多面的優位性を示した。 Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks. However, the communication channel within this context exhibits time-varying characteristics, and potential targets may move rapidly, resulting in double dynamics. These presents significant challenges for real-time ISAC precoding design that have not been thoroughly explored. While optimization-based precoding methods have been extensively studied, they are computationally complex and heavily rely on perfect prior information that is rarely available in situations with double dynamics. In this paper, we propose a synesthesia of machine (SoM)-enhanced precoding paradigm, where the base station leverages various modalities such as positioning and channel information to adapt to double dynamics, and effectively utilizes environmental information to stretch ISAC performance boundaries through a deep reinforcement learning framework. Additionally, a parameter-shared actor-critic architecture is tailored to expedite training in complex state and action spaces. Extensive experimental validation has demonstrated the multifaceted superiority of our method over existing approaches.	翻訳日:2024-08-27 19:09:24 公開日:2024-08-24
# トランペット量子ビットの制御振幅における系統的誤差に対するロバスト最適制御 Robust optimal control for a systematic error in the control amplitude of transmon qubits ( http://arxiv.org/abs/2408.13554v1 ) ライセンス: Link先を確認	Max Cykiert, Eran Ginossar,	(参考訳) ノイズの中間規模量子コンピューティングや誤り訂正回路の時代には、物理量子ビットコヒーレンス時間と高忠実度ゲートが量子コンピュータの機能に欠かせない。本稿では,トランスモン量子ビットの制御振幅誤差に起因して,最適化により設計したパルスを用いてフィリティの損失を防止できることを理論的,実験的に実証する。我々は、ロバストな最適制御により得られる制御環境を分析し、誤差範囲に依存すること、すなわち、解が準最適解のアトラクションの流域に閉じ込められることを発見した。異なる誤差値に対してロバスト制御が見出され、有限緩和率による不整合性機構の損失と比較される。コントロールはIBMQのqubitでテストされ、かなりの$\sim 10\%$エラーに対するレジリエンスを示す。 In the era of Noisy Intermediate-Scale Quantum computing as well as in error correcting circuits, physical qubits coherence time and high fidelity gates are essential to the functioning of quantum computers. In this paper, we demonstrate theoretically and experimentally, that pulses designed by optimization can be used to counteract the loss of fidelity due to a control amplitude error of the transmon qubit. We analyze the control landscape obtained by robust optimal control and find it to depend on the error range, namely the solutions can get trapped in the basin of attraction of sub-optimal solutions. Robust controls are found for different error values and are compared to an incoherent loss of fidelity mechanism due to a finite relaxation rate. The controls are tested on the IBMQ's qubit and found to demonstrate resilience against significant $\sim 10\%$ errors.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# サプライチェーンリスクマネジメントにおける因果機械学習 What if? Causal Machine Learning in Supply Chain Risk Management ( http://arxiv.org/abs/2408.13556v1 ) ライセンス: Link先を確認	Mateusz Wyrembek, George Baryannis, Alexandra Brintrup,	(参考訳) サプライチェーン管理において機械学習モデルを開発するための最終目標は、最適な介入を行うことである。しかし、ほとんどの機械学習モデルは因果関係を推測するのではなく、データの相関関係を識別するので、より優れた結果を体系的に計画することは困難である。本稿では,サプライチェーンのリスク介入モデル開発における因果機械学習の利用を提案するとともに,海洋工学分野におけるサプライチェーンのリスク管理のケーススタディでその利用を実証する。我々の研究は、因果機械学習が、異なるサプライチェーンの介入の下で達成できる変化を識別することで意思決定プロセスを強化することを強調し、シナリオ計画の"What-if"を可能にした。そこで我々は、リスク予測のためのさまざまな機械学習開発経路を提案し、リスク最小化のための介入を計画し、サプライチェーン研究者が因果機械学習を探求するための重要なステップを概説する。 The penultimate goal for developing machine learning models in supply chain management is to make optimal interventions. However, most machine learning models identify correlations in data rather than inferring causation, making it difficult to systematically plan for better outcomes. In this article, we propose and evaluate the use of causal machine learning for developing supply chain risk intervention models, and demonstrate its use with a case study in supply chain risk management in the maritime engineering sector. Our findings highlight that causal machine learning enhances decision-making processes by identifying changes that can be achieved under different supply chain interventions, allowing "what-if" scenario planning. We therefore propose different machine learning developmental pathways for for predicting risk, and planning for interventions to minimise risk and outline key steps for supply chain researchers to explore causal machine learning.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# 異常検出のための変分オートエンコーダの比較検討 Variational Autoencoder for Anomaly Detection: A Comparative Study ( http://arxiv.org/abs/2408.13561v1 ) ライセンス: Link先を確認	Huy Hoang Nguyen, Cuong Nhat Nguyen, Xuan Tung Dao, Quoc Trung Duong, Dzung Pham Thi Kim, Minh-Tan Pham,	(参考訳) 本論文は,同時代の変分オートエンコーダ(VAE)アーキテクチャを異常検出に適用し,その性能と動作特性について比較解析することを目的とする。検討中のアーキテクチャ構成には、元々のVAEベースライン、ガウスランダムフィールド(VAE-GRF)を備えたVAE、ビジョントランスフォーマー(ViT-VAE)を搭載したVAEが含まれる。その結果,VT-VAEは様々なシナリオで模範的性能を示すが,VAE-GRFはより複雑なハイパーパラメータチューニングが必要であり,最適な性能を実現する。さらに、広く使われているMVTecデータセットから得られる結果に対する過度信頼度を緩和するために、最近公開されたMiADデータセットをベンチマークに活用する。この意図的な包摂性は、MVTec専用のドメイン固有モデルの影響を軽減することで結果の競争力を高めることを目的としており、その結果、より堅牢な評価フレームワークに寄与する。 Codesはhttps://github.com/endtheme123/VAE-compare.gitで入手できる。 This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a vision transformer (ViT-VAE). The findings reveal that ViT-VAE exhibits exemplary performance across various scenarios, whereas VAE-GRF may necessitate more intricate hyperparameter tuning to attain its optimal performance state. Additionally, to mitigate the propensity for over-reliance on results derived from the widely used MVTec dataset, this paper leverages the recently-public MiAD dataset for benchmarking. This deliberate inclusion seeks to enhance result competitiveness by alleviating the impact of domain-specific models tailored exclusively for MVTec, thereby contributing to a more robust evaluation framework. Codes is available at https://github.com/endtheme123/VAE-compare.git.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# マルチエージェント強化学習におけるマルチタスク一般化のためのハイブリッドトレーニング Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2408.13567v1 ) ライセンス: Link先を確認	Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti,	(参考訳) マルチエージェント強化学習(MARL)では,多様なエージェントや目的に対するマルチタスクの一般化が大きな課題となっている。既存のオンラインMARLアルゴリズムは、主にシングルタスクのパフォーマンスに重点を置いているが、マルチタスクの一般化能力の欠如は、計算の無駄と現実の応用性に限界をもたらす。一方、既存のオフラインマルチタスクのMARLアプローチはデータ品質に大きく依存しており、しばしば目に見えないタスクのパフォーマンスが低下する。本稿では,マルチタスクの一般化と学習効率の両立を図るために,オンラインとオフラインの学習を統合したハイブリッドMARLフレームワークであるHyGenを紹介する。具体的には、オフラインマルチタスクデータセットから、潜在的な汎用スキルを抽出する。次に、政策を訓練し、中央集権的な訓練・分散実行パラダイム(CTDE)の下で最適なスキルを選択する。この段階では、オフラインデータとオンラインインタラクションの両方を統合するリプレイバッファを使用します。我々は、我々のフレームワークが一般的なスキルを効果的に抽出し、洗練し、目に見えないタスクに印象的な一般化をもたらすことを実証的に実証した。 StarCraftのマルチエージェントチャレンジの比較分析によると、HyGenはオンラインおよびオフラインのみのメソッドで、幅広いパフォーマンスを誇っている。 In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily focus on single-task performance, but their lack of multi-task generalization capabilities typically results in substantial computational waste and limited real-life applicability. Meanwhile, existing offline multi-task MARL approaches are heavily dependent on data quality, often resulting in poor performance on unseen tasks. In this paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for Enhanced Multi-Task Generalization, which integrates online and offline learning to ensure both multi-task generalization and training efficiency. Specifically, our framework extracts potential general skills from offline multi-task datasets. We then train policies to select the optimal skills under the centralized training and decentralized execution paradigm (CTDE). During this stage, we utilize a replay buffer that integrates both offline data and online interactions. We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show that HyGen outperforms a wide range of existing solely online and offline methods.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# 集団強結合のための量子埋め込みアプローチ --極性理論における単純モデルへのab initioとマクロQEDを接続する Quantized Embedding Approaches for Collective Strong Coupling -- Connecting ab initio and macroscopic QED to Simple Models in Polaritonics ( http://arxiv.org/abs/2408.13570v1 ) ライセンス: Link先を確認	Frieder Lindel, Dominik Lentrodt, Stefan Yoshi Buhmann, Christian Schäfer,	(参考訳) 化学とエネルギーの移動を制御するために、集合的な光-物質相互作用が使われてきたが、明示的なシミュレーションの計算コストの急激な増加により、アブイニシアト法と大きな多体量子光学系を組み合わせたアプローチは欠落している。本稿では,多体量子光学系に対するアブイニシアト量子埋め込みの概念を導入し,分子構造に対するアブイニシアト量子化学の厳密さを維持しつつ,分子多体系の集合結合をマクロなQEDの精神で効果的に扱うことを可能にする。我々のアプローチは分極場の量子ゆらぎを完全に含むが、力学平均場理論のような複雑な埋め込みアプローチよりもずっと単純で直感的である。本稿では、Tavis-Cummingsモデルとの比較により、基礎となる仮定を説明する。量子化埋め込み法とその透過的制限の直感的な応用は、現実的な分子アンサンブルにおける集合的効果を記述するために、アブ初期分極化学の分野の実践的な枠組みを提供する。 Collective light-matter interactions have been used to control chemistry and energy transfer, yet accessible approaches that combine ab initio methodology with large many-body quantum optical systems are missing due to the fast increase in computational cost for explicit simulations. We introduce an accessible ab initio quantum embedding concept for many-body quantum optical systems that allows to treat the collective coupling of molecular many-body systems effectively in the spirit of macroscopic QED while keeping the rigor of ab initio quantum chemistry for the molecular structure. Our approach fully includes the quantum fluctuations of the polaritonic field and yet remains much simpler and more intuitive than complex embedding approaches such as dynamical mean-field theory. We illustrate the underlying assumptions by comparison to the Tavis--Cummings model. The intuitive application of the quantized embedding approach and its transparent limitations offer a practical framework for the field of ab initio polaritonic chemistry to describe collective effects in realistic molecular ensembles.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# PointDGMamba: 一般化状態空間モデルによるポイントクラウド分類のドメイン一般化 PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model ( http://arxiv.org/abs/2408.13574v1 ) ライセンス: Link先を確認	Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan,	(参考訳) ドメイン一般化(DG)は、最近、ポイントクラウド分類(PCC)モデルの、目に見えない領域への一般化性を改善するために研究されている。しかし、畳み込みニューラルネットワークや視覚変換器を使用するため、受容野や二次的な複雑さに悩まされることが多い。本稿では、DG PCCにおける状態空間モデル(SSM)の一般化可能性について研究し、DG PCCに直接SSMを適用することは、いくつかの課題に直面することを発見した。さらに、ドメインに依存しない特徴学習とデータスキャンにおける設計の欠如は、3Dシーケンスデータに予期せぬドメイン固有情報をもたらすだろう。そこで本研究では,未知の領域に対する強い一般化性に優れ,大域的受容場と効率的な線形複雑性の利点を有する新しいフレームワークであるPointDGMambaを提案する。 PointDGMambaは、3つの革新的なコンポーネントで構成されている。Masked Sequence Denoising (MSD)、Sequence-wise Cross- Domain Feature Aggregation (SCFA)、Dual-level Domain Scanning (DDS)である。特にMSDは、ポイントクラウドシーケンスのノイズポイントトークンを選択的にマスクアウトし、SCFAはクロスドメインだが同クラスのポイントクラウド機能を導入し、モデルにより一般化された特徴の抽出方法を学ぶように促している。 DDSには、機能間の情報交換を容易にするドメイン内スキャンとクロスドメインスキャンが含まれる。さらに,マルチドメイン一般化のための新しい,より挑戦的なベンチマークPointDG-3to1を提案する。大規模実験により提案したPointDGMambaの有効性と性能を実証した。 Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to the use of convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models (SSMs) in DG PCC and find that directly applying SSMs into DG PCC will encounter several challenges: the inherent topology of the point cloud tends to be disrupted and leads to noise accumulation during the serialization stage. Besides, the lack of designs in domain-agnostic feature learning and data scanning will introduce unanticipated domain-specific information into the 3D sequence data. To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. PointDGMamba consists of three innovative components: Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). In particular, MSD selectively masks out the noised point tokens of the point cloud sequences, SCFA introduces cross-domain but same-class point cloud features to encourage the model to learn how to extract more generalized features. DDS includes intra-domain scanning and cross-domain scanning to facilitate information exchange between features. In addition, we propose a new and more challenging benchmark PointDG-3to1 for multi-domain generalization. Extensive experiments demonstrate the effectiveness and state-of-the-art performance of our presented PointDGMamba.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# Visual Foundation Modelsは長期的ポイントトラッキングを実現することができるか? Can Visual Foundation Models Achieve Long-term Point Tracking? ( http://arxiv.org/abs/2408.13575v1 ) ライセンス: Link先を確認	Görkay Aydemir, Weidi Xie, Fatma Güney,	(参考訳) 大規模ビジョンファウンデーションモデルは、様々なタスクで顕著な成功を示し、その堅牢な一般化能力を強調している。両面対応能力は検討されているが, 複合環境における長期対応の有効性は未解明のままである。これを解決するために,視覚基盤モデルの幾何学的認識を点追跡の文脈で評価する。 (i) 訓練を受けずに、ゼロショット設定で (二)低容量層で探すこと (iii)低位順応(LoRA)による微調整。以上より, 安定拡散とDINOv2の特徴は, ゼロショット設定において優れた幾何対応能力を示すことが示唆された。さらに、DINOv2は適応設定における教師付きモデルに匹敵する性能を実現し、対応学習の強力な初期化の可能性を実証している。 Large-scale vision foundation models have demonstrated remarkable success across various tasks, underscoring their robust generalization capabilities. While their proficiency in two-view correspondence has been explored, their effectiveness in long-term correspondence within complex environments remains unexplored. To address this, we evaluate the geometric awareness of visual foundation models in the context of point tracking: (i) in zero-shot settings, without any training; (ii) by probing with low-capacity layers; (iii) by fine-tuning with Low Rank Adaptation (LoRA). Our findings indicate that features from Stable Diffusion and DINOv2 exhibit superior geometric correspondence abilities in zero-shot settings. Furthermore, DINOv2 achieves performance comparable to supervised models in adaptation settings, demonstrating its potential as a strong initialization for correspondence learning.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# フロケット系のシュテダ式:セサロ・スミメーションによる位相不変量と量子化異常 The Středa Formula for Floquet Systems: Topological Invariants and Quantized Anomalies from Cesàro Summation ( http://arxiv.org/abs/2408.13576v1 ) ライセンス: Link先を確認	Lucila Peralta Gavensky, Gonzalo Usaj, Nathan Goldman,	(参考訳) 本研究は,2次元フロケット系のトポロジカル不変量を表す一般理論フレームワークを導入する。周期駆動系のサムベ表現に基づいて,静的系のSt\v{r}eda式に着想を得て,磁気摂動に応答して状態の非有界フロケット密度の流れを評価する。この Floquet-St\v{r}eda 応答は、数学的に不確定な事前定義であり、Ces\aro summation 法を用いて正規化される。このアプローチの鍵となる結果として、フロケ・ブロッホ・ハミルトニアンの単純なバンド特性と、関連するフロケの巻数をすべて関連付ける。これらの一般的な関係は、フロケ系のトポロジカルな特性が、駆動系の分光的時間進化から完全に導出できることを示している。重要なことに、Floquet-St\v{r}eda反応に対する物理的に区別可能な2つの寄与は、システムのエッジとバルクの間の電荷の量子化フローと、システムと運転場の間のエネルギーの「異常な」量子化フローであり、異常なエッジ状態の物理的起源に関する新たな知見を提供することである。副生成物として, フロケット巻数とフロケット・ブロッホ帯の軌道磁化の関係, 不均一試料中のフロケット位相にアクセス可能なフロケット巻数に対する局所マーカー, 工学的な浴場の存在下で密度測定からこれらのフロケット巻数を取り出す実験プロトコル, およびこれらのトポロジカル不変量に対する状態のフロケット密度の磁気応答の一般表現, 相互作用するフロケット系のトポロジカルな特徴付けの経路を開く。 This work introduces a general theoretical framework, which expresses the topological invariants of two-dimensional Floquet systems in terms of tractable response functions: Building on the Sambe representation of periodically-driven systems, and inspired by the St\v{r}eda formula for static systems, we evaluate the flow of the unbounded Floquet density of states in response to a magnetic perturbation. This Floquet-St\v{r}eda response, which is a priori mathematically ill-defined, is regularized by means of a Ces\`aro summation method. As a key outcome of this approach, we relate all relevant Floquet winding numbers to simple band properties of the Floquet-Bloch Hamiltonian. These general relations indicate how the topological characterization of Floquet systems can be entirely deduced from the stroboscopic time-evolution of the driven system. Importantly, we identify two physically distinguishable contributions to the Floquet-St\v{r}eda response: a quantized flow of charge between the edge and the bulk of the system, and an 'anomalous' quantized flow of energy between the system and the driving field, which provides new insight on the physical origin of the anomalous edge states. As byproducts, our theory provides: a general relation between Floquet winding numbers and the orbital magnetization of Floquet-Bloch bands; a local marker for Floquet winding numbers, which allows to access Floquet topology in inhomogeneous samples; an experimental protocol to extract these Floquet winding numbers from density-measurements in the presence of an engineered bath; as well as general expressions for these topological invariants in terms of the magnetic response of the Floquet density of states, opening a route for the topological characterization of interacting Floquet systems.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# 外部磁場中における量子荷電粒子の複雑度 Complexity of Quantum Charged Particle in External Magnetic Field ( http://arxiv.org/abs/2408.13577v1 ) ライセンス: Link先を確認	M. Radomirov,	(参考訳) 本稿では,外磁場中における量子荷電粒子の回路複雑性について検討する。 Nielsenアプローチを用いて、時間、温度、サイクロトロン周波数の関数として熱場二重状態の複雑さを決定する。様々なパラメーター値における振動の複雑さと振幅を解析し、これらの結果が高調波発振器の場合の極限として導出できないことを明らかにする。最後に、複雑性の速度を計算し、それがロイド境界に従うことを示す。 In this paper, we investigate the circuit complexity of a quantum charged particle in an external magnetic field. Utilizing the Nielsen approach, we determine the complexity of thermofield double states as functions of time, temperature, and cyclotron frequency. We analyze both the complexity and the amplitude of its oscillations across various parameter values, and reveal that these results cannot be derived as a limit of the harmonic oscillator case. Finally, we calculate the rate of complexity and show that it obeys the Lloyd bound.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# CSS-Segment: LSVOS Challenge VOS Trackの第2位 CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track ( http://arxiv.org/abs/2408.13582v1 ) ライセンス: Link先を確認	Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu,	(参考訳) ビデオオブジェクトのセグメンテーションは、ビデオ編集や自動運転など、多くの下流アプリケーションの基礎となる難しいタスクである。本稿では,第6回LSVOS Challenge VOS Track at ECCV 2024において,ビデオオブジェクトセグメンテーションのためのチーム「ユアンジー」のソリューションについて紹介する。提案したCSS-Segmentは、複雑なオブジェクトの動きや長期的なプレゼンテーションのビデオにおいて、より優れたパフォーマンスが期待できる。本稿では,映像オブジェクトセグメンテーションにおけるCSS-Segmentの有効性を検証した。最終的に,本手法は80.84点,試験段階を達成し,ECCV 2024において第6回LSVOSチャレンジVOSトラックの2位にランクインした。 Video object segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this technical report, we briefly introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation. In this report, we successfully validated the effectiveness of the CSS-Segment in video object segmentation. Finally, our method achieved a J\&F score of 80.84 in and test phases, and ultimately ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# FLEURS-ASL:多言語マルチタスク評価におけるアメリカの手話を含む FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation ( http://arxiv.org/abs/2408.13585v1 ) ライセンス: Link先を確認	Garrett Tanzer,	(参考訳) 手話翻訳は歴史的に主流の機械翻訳研究の周辺であった。フィールドの収束を支援するため,FLORES(テキスト用)とFLEURS(音声用)のマルチウェイ並列ベンチマークの拡張であるFLEURS-ASLを導入し,最初の手話(ビデオ用)であるAmerican Sign Languageを5Certified Deaf Interpretersで翻訳した。 FLEURS-ASLは、ASLと200言語間の様々なタスク(主に文と談話レベルの翻訳)をテキストとして、あるいは102言語を音声として評価するために使用することができる。タイムスタンプトークンと過去のテキストトークンを34秒のコンテキストウィンドウに組み込んで,YouTube-ASLのランダムなビデオクリップに基づいてトレーニングした統合モデリング手法を用いて,ASLから英語テキストへのタスクのベースラインを提供する。このモデルは、多数の新しいタスクをサポートしながら、フレーズレベルのベースラインのパフォーマンスを満たしたり、超えたりします。また、FLEURS-ASLを用いて、マルチモーダルフロンティアモデルがASLを事実上理解していないことを示し、標準評価スイートに手話を含めることの重要性を強調した。 Sign language translation has historically been peripheral to mainstream machine translation research. In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters. FLEURS-ASL can be used to evaluate a variety of tasks -- primarily sentence- and discourse-level translation -- between ASL and 200 other languages as text, or 102 languages as speech. We provide baselines for tasks from ASL to English text using a unified modeling approach that incorporates timestamp tokens and previous text tokens in a 34-second context window, trained on random video clips from YouTube-ASL. This model meets or exceeds the performance of phrase-level baselines while supporting a multitude of new tasks. We also use FLEURS-ASL to show that multimodal frontier models have virtually no understanding of ASL, underscoring the importance of including sign languages in standard evaluation suites.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# LLMサンプリングにおける多様性とリスクのバランス:オープンエンディングテキスト生成のための方法とパラメータの選択方法 Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation ( http://arxiv.org/abs/2408.13586v1 ) ライセンス: Link先を確認	Yuxuan Zhou, Margret Keuper, Mario Fritz,	(参考訳) サンプルベースの復号化戦略は大規模言語モデル(LLM)に広く採用されており、温度調整とテールトランケーション(トップ-k、トップ-pサンプリングなど)による多様性と品質のバランスを目標としている。近年の研究では,LLMの予測分布の尾を適応的に切り離す手法が提案されている。オープンエンドテキスト生成タスクにおいて,これらの手法により改善された結果が報告されているが,その結果はキュレートされたトランケーションパラメータや例テキストに大きく依存している。本稿では,全文の文脈を保存した収集したプレフィックスツリーに基づいて,各デコードステップにおける多様性とリスクのトレードオフを考慮し,トランケーションサンプリング手法の本質的な能力を推定する体系的手法を提案する。本研究は,既存のトラクションサンプリング手法の総合的な比較と,ユーザのガイドラインとして推奨されるパラメータについて紹介する。 Sampling-based decoding strategies have been widely adopted for Large Language Models (LLMs) in numerous applications, which target a balance between diversity and quality via temperature tuning and tail truncation (e.g., top-k and top-p sampling). Considering the high dynamic range of the candidate next-token given different prefixes, recent studies propose to adaptively truncate the tail of LLM's predicted distribution. Although improved results haven been reported with these methods on open-ended text generation tasks, the results are highly dependent on the curated truncation parameters and exemplar text. In this paper, we propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step, based on our collected prefix tree which preserves the context of a full sentence. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# クレーター検出と月着陸ナビゲーションのための説明可能な畳み込みネットワーク Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation ( http://arxiv.org/abs/2408.13587v1 ) ライセンス: Link先を確認	Jianing Song, Nabil Aouf, Duarte Rondao, Christophe Honvault, Luis Mansilla,	(参考訳) 月面着陸は近年、月探査に大きな関心を惹きつけており、自律的な月面着陸航法がこの課題に欠かせない。 AIは自律的でインテリジェントな宇宙ミッションにおいて重要な役割を果たすことが期待されているが、人間の専門家はAIソリューションの信頼性に疑問を呈している。そこで,この論文では,月面着陸の透明で理解可能な予測を目的とした,視覚に基づく月面着陸のための \gls{xai} について検討した。特徴抽出構造として注意に基づくDarknet53を提案する。クレーター検出とナビゲーションのタスクには、それぞれ注目ベースのYOLOv3とアテンションベースのDarknet53-LSTMが提示される。実験の結果,提案したネットワークは相対的なクレーター検出と月面着陸時のポーズ推定に競争力を発揮することが示された。モデル構築中にネットワークにアテンション機構を導入することにより、提供されたネットワークの説明可能性を実現する。さらに,PCCを用いて提案したネットワークの説明可能性について定量的に評価し,ネットワーク内の様々な畳み込み層の機能を示す。 The Lunar landing has drawn great interest in lunar exploration in recent years, and autonomous lunar landing navigation is fundamental to this task. AI is expected to play a critical role in autonomous and intelligent space missions, yet human experts question the reliability of AI solutions. Thus, the \gls{xai} for vision-based lunar landing is studied in this paper, aiming at providing transparent and understandable predictions for intelligent lunar landing. Attention-based Darknet53 is proposed as the feature extraction structure. For crater detection and navigation tasks, attention-based YOLOv3 and attention-Darknet53-LSTM are presented respectively. The experimental results show that the offered networks provide competitive performance on relative crater detection and pose estimation during the lunar landing. The explainability of the provided networks is achieved by introducing an attention mechanism into the network during model building. Moreover, the PCC is utilised to quantitively evaluate the explainability of the proposed networks, with the findings showing the functions of various convolutional layers in the network.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# 長期・非線形実効ボラティリティモデルによるリスク値の損失に基づくベイズ系列予測 Loss-based Bayesian Sequential Prediction of Value at Risk with a Long-Memory and Non-linear Realized Volatility Model ( http://arxiv.org/abs/2408.13588v1 ) ライセンス: Link先を確認	Rangika Peiris, Minh-Ngoc Tran, Chao Wang, Richard Gerlach,	(参考訳) リスク・アット・リスク(VaR)の予測には,長期記憶と非線形実現ボラティリティモデルクラスが提案されている。 RNN-HARと呼ばれるこのモデルは、リカレントニューラルネットワーク(Recurrent Neural Network, RNN)を統合して非線形力学を扱うことで、実現された測定において、長いメモリを効率的にキャプチャするフレームワークであるヘテロジニアス自己回帰(HAR)モデルを拡張している。 RNN HARのモデル推定と逐次予測には,損失に基づくモンテカルロを用いた一般化ベイズ推定を用いる。実証分析は、日替わり価格を用いて実施され、2000年から2022年にかけて31の市場指標で実施された。提案したモデルでは,VaR予測性能を基本HARモデルとその拡張と比較する。その結果、提案したRNN-HARモデルは、この研究で考慮された他のモデルよりも一貫して優れていることが示された。 A long memory and non-linear realized volatility model class is proposed for direct Value at Risk (VaR) forecasting. This model, referred to as RNN-HAR, extends the heterogeneous autoregressive (HAR) model, a framework known for efficiently capturing long memory in realized measures, by integrating a Recurrent Neural Network (RNN) to handle non-linear dynamics. Loss-based generalized Bayesian inference with Sequential Monte Carlo is employed for model estimation and sequential prediction in RNN HAR. The empirical analysis is conducted using daily closing prices and realized measures from 2000 to 2022 across 31 market indices. The proposed models one step ahead VaR forecasting performance is compared against a basic HAR model and its extensions. The results demonstrate that the proposed RNN-HAR model consistently outperforms all other models considered in the study.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# 分裂共鳴を持つシリコンマイクロリング共振器における工学的双光子スペクトル波動関数 Engineering biphoton spectral wavefunction in a silicon micro-ring resonator with split resonances ( http://arxiv.org/abs/2408.13590v1 ) ライセンス: Link先を確認	Liao Ye, Haoran Ma, Xiaoqing Guo, Fanjie Ruan, Yuehai Wang, Jianyi Yang,	(参考訳) 周波数時間(英: Frequency-time)は、フォトニックな高次元の絡み合いに適した自由度であり、単一モードデバイスとの互換性や分散に対する感受性などの利点がある。光子電界の周波数時間振幅の工学的制御は、2次光学非線形性を持つプラットフォーム上で実証されている。 3階の光学非線形性しか持たない集積フォトニックプラットフォームでは、工学的に構築された状態の生成は未解明のままである。ここでは,周波数領域における分離可能な状態と制御可能な絡み合った状態の両方を,後処理なしで生成できるシリコンオン絶縁体(SOI)プラットフォーム上のキャビティ強化光子対源を実証する。共振の組み合わせの異なる選択とオンチップ光場微分を用いることで、状態の結合スペクトル強度(JSI)に影響を与える2つの関数を独立に制御できる。半解析モデルを用いて、共振分割とポンプ微分の存在下での双光子スペクトルの波動関数をシミュレートし、そのパラメータは共振器の測定された線形応答から適合性に基づくパラメータ抽出によって完全に決定することができる。分離可能な状態に対する測定されたスペクトル純度は、95.5\pm 1.2\%$であり、一方、絡み合った状態に対する測定されたJSIは、2次元の周波数空間における2-または4-ピーク関数を示す。実験とシミュレーションは、パルス時間モード符号化や長距離量子鍵分布を用いた量子情報処理などの応用を約束するシリコンデバイスにおける周波数領域波動関数を操作する能力を示す。 Frequency-time is a degree of freedom suitable for photonic high-dimensional entanglement, with advantages such as compatibility with single-mode devices and insensitivity to dispersion. The engineering control of the frequency-time amplitude of a photon's electric field has been demonstrated on platforms with second-order optical nonlinearity. For integrated photonic platforms with only third-order optical nonlinearity, the engineered generation of the state remains unexplored. Here, we demonstrate a cavity-enhanced photon-pair source on the silicon-on-insulator (SOI) platform that can generate both separable states and controllable entangled states in the frequency domain without post-manipulation. By choosing different resonance combinations and employing on-chip optical field differentiation, we achieve independent control over two functions that affect the joint spectral intensity (JSI) of the state. A semi-analytical model is derived to simulate the biphoton spectral wavefunction in the presence of resonance splitting and pump differentiation, and its parameters can be fully determined through fitting-based parameter extraction from the resonator's measured linear response. The measured spectral purity for the separable state is $95.5\pm 1.2\%$, while the measured JSIs for the entangled states show two- or four-peaked functions in two-dimensional frequency space. The experiments and simulations demonstrate the capacity to manipulate the frequency-domain wavefunction in a silicon-based device, which is promising for applications like quantum information processing using pulsed temporal-mode encoding or long-distance quantum key distribution.	翻訳日:2024-08-27 18:59:33 公開日:2024-08-24
# ランダム特徴量を用いた最適カーネル量子学習 Optimal Kernel Quantile Learning with Random Features ( http://arxiv.org/abs/2408.13591v1 ) ライセンス: Link先を確認	Caixing Wang, Xingdong Feng,	(参考訳) ランダム機能(RF)アプローチは、スケーラブルなカーネルメソッドのための確立された効率的なツールであるが、既存の文献では、主にランダム機能付きカーネルリッジ回帰(KRR-RF)に焦点を当てている。本稿では、KQR-RFにおけるチェック損失の非平滑性を考慮したカーネル量子化レグレッション(KQR-RF)の一般化研究について、改良されたエラー分解を導入し、KQR-RFとKRR-RFの新たな接続を確立する。本研究は,KQR-RFの容量依存学習率を,いくつかの対数因子に最適化された極小RF数に対して軽度条件下で確立する。重要なことは、データ依存サンプリング戦略を利用した理論的結果は、ターゲット量子関数が仮定されたカーネル空間と正確に一致しないような非依存的な設定をカバーするために拡張することができる。私たちの仮定を少し修正することで、Lipschitzが連続的な損失を被るケースにもキャパシティ依存のエラー分析を適用することができ、機械学習コミュニティにおける幅広い応用を可能にします。理論的な結果を検証するため,シミュレーション実験と実データ応用を行った。 The random feature (RF) approach is a well-established and efficient tool for scalable kernel methods, but existing literature has primarily focused on kernel ridge regression with random features (KRR-RF), which has limitations in handling heterogeneous data with heavy-tailed noises. This paper presents a generalization study of kernel quantile regression with random features (KQR-RF), which accounts for the non-smoothness of the check loss in KQR-RF by introducing a refined error decomposition and establishing a novel connection between KQR-RF and KRR-RF. Our study establishes the capacity-dependent learning rates for KQR-RF under mild conditions on the number of RFs, which are minimax optimal up to some logarithmic factors. Importantly, our theoretical results, utilizing a data-dependent sampling strategy, can be extended to cover the agnostic setting where the target quantile function may not precisely align with the assumed kernel space. By slightly modifying our assumptions, the capacity-dependent error analysis can also be applied to cases with Lipschitz continuous losses, enabling broader applications in the machine learning community. To validate our theoretical findings, simulated experiments and a real data application are conducted.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 大規模言語モデルを用いたソフトウェア脆弱性の自動パッチ Automated Software Vulnerability Patching using Large Language Models ( http://arxiv.org/abs/2408.13597v1 ) ライセンス: Link先を確認	Yu Nong, Haoran Yang, Long Cheng, Hongxin Hu, Haipeng Cai,	(参考訳) サイバーセキュリティ防衛において、タイムリーで効果的な脆弱性パッチは必須であり、様々なアプローチが提案されているが、現実の脆弱性に対する有効かつ正確なパッチの生成には依然として苦労している。本稿では,事前訓練済みの大規模言語モデル(LLM)のパワーとメリットを活用し,テスト入力/エクスロイトエビデンスやモデルトレーニング/ファインタニングを使わずに,自動脆弱性パッチを可能にする。高品質なパッチ生成に不可欠な脆弱性のあるコード動作を効果的に推論するために,LLMに適応的なプロンプトを導入し,その方法論をLLMPATCH(LLMPATCH)としてインスタンス化する。ゼロデイ脆弱性を含む現実世界の脆弱性コードに対するLLMPATCHの評価は、既存のプロンプト法と最先端の非LLMベースの技術の両方(F1の98.9%と65.4%)よりも優れた性能を示している。 LLMPATCHはまた、11のゼロデイ脆弱性のうち7つをパッチした。 Timely and effective vulnerability patching is essential for cybersecurity defense, for which various approaches have been proposed yet still struggle to generate valid and correct patches for real-world vulnerabilities. In this paper, we leverage the power and merits of pre-trained large language models (LLMs) to enable automated vulnerability patching using no test input/exploit evidence and without model training/fine-tuning. To elicit LLMs to effectively reason about vulnerable code behaviors, which is essential for quality patch generation, we introduce adaptive prompting on LLMs and instantiate the methodology as LLMPATCH, an automated LLM-based patching system. Our evaluation of LLMPATCH on real-world vulnerable code including zeroday vulnerabilities demonstrates its superior performance to both existing prompting methods and state-of-the-art non-LLM-based techniques (by 98.9% and 65.4% in F1 over the best baseline performance). LLMPATCH has also successfully patched 7 out of 11 zero-day vulnerabilities, including 2 that none of the four baselines compared were able to.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# リンドブラッド方程式に対するフルおよびローランク指数オイラー積分器 Full- and low-rank exponential Euler integrators for the Lindblad equation ( http://arxiv.org/abs/2408.13601v1 ) ライセンス: Link先を確認	Hao Chen, Alfio Borzì, Denis Janković, Jean-Gabriel Hartmann, Paul-Antoine Hervieux,	(参考訳) リンドブラッド方程式(リンドブラッドりょうり、Lindblad equation)は、密度行列で表される開量子系の動的進化をモデル化するために広く用いられる量子マスター方程式である。これらの解行列は、物理的に有意な数値シミュレーションで保証されなければならない半正性および微量保存特性によって特徴づけられる。本稿では, 定性を維持し, 無条件でトレースするリンドブラッド方程式を近似するために, 完全かつ低ランクな指数的オイラー積分器を開発した。指数積分法の2つのクラスに対して鋭い誤差推定を与える理論的結果が提示される。提案手法の有効性を示す数値実験の結果について述べる。 The Lindblad equation is a widely used quantum master equation to model the dynamical evolution of open quantum systems whose states are described by density matrices. These solution matrices are characterized by semi-positiveness and trace preserving properties, which must be guaranteed in any physically meaningful numerical simulation. In this paper, novel full- and low-rank exponential Euler integrators are developed for approximating the Lindblad equation that preserve positivity and trace unconditionally. Theoretical results are presented that provide sharp error estimates for the two classes of exponential integration methods. Results of numerical experiments are discussed that illustrate the effectiveness of the proposed schemes, beyond present state-of-the-art capabilities.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 量子チャネルのない無条件鍵分布 Unconditionally secure key distribution without quantum channel ( http://arxiv.org/abs/2408.13602v1 ) ライセンス: Link先を確認	Hua-Lei Yin,	(参考訳) 鍵分布は暗号において基本的な役割を果たす。現在、量子スキームは、無条件でセキュアな鍵分布を達成するための唯一の既知の方法である。この手法は, デバイス非依存およびツインフィールド構成において, それぞれ508km, 1002kmの距離で実証されている。しかしながら、量子鍵分布は、ユーザ間の量子チャネルの使用を必要とするため、送信距離の問題や多くのサイドチャネル攻撃に直面している。量子リピータと量子コンステレーションを使用したとしても、グローバル量子ネットワークの確立と移動量子通信の促進にまつわる多大な費用と技術的ハードルのために、大規模な量子暗号の商業化は達成できないままである。ここでは、証明可能な量子一方向関数を発見し、確率鍵分布と呼ばれる無条件のセキュリティを持つ別の鍵分布スキームを提案する。 2人の正当なユーザー間で量子信号を交換するための量子チャネルは存在しない。非局所的絡み合った状態は、同等の仮想プロトコルで生成、識別、測定することができ、秘密鍵の抽出に使用できる。この発見は、無条件でセキュアな暗号を実現するためのパラダイムシフトであり、グローバルなスケールでの応用を促進することを期待する。 Key distribution plays a fundamental role in cryptography. Currently, the quantum scheme stands as the only known method for achieving unconditionally secure key distribution. This method has been demonstrated over distances of 508 and 1002 kilometers in the measurement-device-independent and twin-field configurations, respectively. However, quantum key distribution faces transmission distance issues and numerous side channel attacks since the basic physical picture requires the use of quantum channels between users. Even when quantum repeater and quantum constellation are used, commercializing quantum cryptography on a large scale remains unattainable due to the considerable expense and significant technical hurdles associated with establishing a global quantum network and facilitating mobile quantum communication. Here, by discovering the provable quantum one-way function, we propose another key distribution scheme with unconditional security, named probability key distribution, that promises users between any two distances to generate a fixed and high secret key rate. There are no quantum channels for exchanging quantum signals between two legitimate users. Non-local entangled states can be generated, identified and measured in the equivalent virtual protocol and can be used to extract secret keys. We anticipate that this discovery presents a paradigm shift in achieving unconditionally secure cryptography, thereby facilitating its widespread application on a global scale.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 前方熱処理による逆量子熱処理 Reverse quantum annealing assisted by forward annealing ( http://arxiv.org/abs/2408.13603v1 ) ライセンス: Link先を確認	Manpreet Singh Jattana,	(参考訳) 量子アニールは伝統的に前方アニールを用いてヒューリスティックな解を生成する。逆アニールはより良い解を生成する可能性があるが、適切な初期状態を必要とする。このような状態を見つける方法は一般に未知あるいは非常に問題に依存しており、成功は限定的であり、逆アニーリングの範囲を厳しく制限する。本研究では, フォワードアニールから得られる低品質溶液に逆アニールを供給することにより, 全体の溶液品質と量を改善する一般的な方法を用いる。 D-Wave量子アニールを用いたグラフ着色問題の解法の実験実験により, この手法は, 前方アニール法から得られた無効解を, 逆アニール法により得られた少なくとも1つの有効解に変換することができることを示す。提案手法は, ランダム初期状態を著しく上回り, 平均値に対してより一意な解を得るとともに, 逆アニールの適用性を拡大する。得られた有効解の平均値は問題サイズに比例して指数関数的に減少するが,グラフ彩色問題に対するスケーリング解析により,逆アニールによる従来の前方アニールの計算範囲を効果的に拡張することを示した。 Quantum annealers conventionally use forward annealing to generate heuristic solutions. Reverse annealing can potentially generate better solutions but necessitates an appropriate initial state. Ways to find such states are generally unknown or highly problem dependent, offer limited success, and severely restrict the scope of reverse annealing. We use a general method that improves the overall solution quality and quantity by feeding reverse annealing with low-quality solutions obtained from forward annealing. An experimental demonstration of solving the graph coloring problem using the D-Wave quantum annealers shows that our method is able to convert invalid solutions obtained from forward annealing to at least one valid solution obtained after assisted reverse annealing for $57\%$ of $459$ random Erd\H{o}s-R\'enyi graphs. Our method significantly outperforms random initial states, obtains more unique solutions on average, and widens the applicability of reverse annealing. Although the average number of valid solutions obtained drops exponentially with the problem size, a scaling analysis for the graph coloring problem shows that our method effectively extends the computational reach of conventional forward annealing using reverse annealing.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# GNN:データ発見のためのグラフニューラルネットワークと大規模言語モデル GNN: Graph Neural Network and Large Language Model Based for Data Discovery ( http://arxiv.org/abs/2408.13609v1 ) ライセンス: Link先を確認	Thomas Hoang,	(参考訳) 我々のアルゴリズム GNN: Graph Neural Network and Large Language Model Based for Data Discovery (PLOD: Predictive Learning Optimal Data Discovery), \cite{Hoang2024BODBO} (BOD: Blindly Optimal Data Discovery) の利点を継承する。これらの研究に加えて、GNNはグラフニューラルネットワークと大規模言語モデルの利点を活用し、PLODやMODでは理解できないテキストタイプ値を理解することにより、結果を予測するタスクをより信頼性の高いものにする。 GNNは、数値値だけでなく、テキスト値にもとづくテキストタイプ値とユーザの好みを理解するという点でPLODの拡張と見なすことができ、データサイエンスと分析の目的を約束できる。 Our algorithm GNN: Graph Neural Network and Large Language Model Based for Data Discovery inherits the benefits of \cite{hoang2024plod} (PLOD: Predictive Learning Optimal Data Discovery), \cite{Hoang2024BODBO} (BOD: Blindly Optimal Data Discovery) in terms of overcoming the challenges of having to predefine utility function and the human input for attribute ranking, which helps prevent the time-consuming loop process. In addition to these previous works, our algorithm GNN leverages the advantages of graph neural networks and large language models to understand text type values that cannot be understood by PLOD and MOD, thus making the task of predicting outcomes more reliable. GNN could be seen as an extension of PLOD in terms of understanding the text type value and the user's preferences based on not only numerical values but also text values, making the promise of data science and analytics purposes.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 一般化ワンウェイ関数とその応用 Generalized one-way function and its application ( http://arxiv.org/abs/2408.13613v1 ) ライセンス: Link先を確認	Hua-Lei Yin,	(参考訳) 一方向関数は古典暗号の基本であり、その存在は計算複雑性理論における長年の問題のままである。近年、証明可能な量子一方向関数が同定され、無制限の計算資源でもその一方向性を維持している。ここでは、証明可能な量子ワンウェイ関数の量子ビットを仮想的に測定し、対応する測定結果を同一確率でランダムに割り当てることにより、関数の数学的定義を拡張して一般化されたワンウェイ関数を構築する。注目すべきは、この一般化されたワンウェイ関数を用いて、古典的なデータ処理のみをベースとした、無条件でセキュアな鍵配布プロトコルを開発し、セキュアな暗号化と署名に利用できることである。我々の研究は、量子システムの特徴付けにおける情報の重要性と密度行列の物理的重要性を強調している。確率論とランダム性は、無制限の計算能力を持つ敵に対抗する効果的なツールであることを示す。 One-way functions are fundamental to classical cryptography and their existence remains a longstanding problem in computational complexity theory. Recently, a provable quantum one-way function has been identified, which maintains its one-wayness even with unlimited computational resources. Here, we extend the mathematical definition of functions to construct a generalized one-way function by virtually measuring the qubit of provable quantum one-way function and randomly assigning the corresponding measurement outcomes with identical probability. Remarkably, using this generalized one-way function, we have developed an unconditionally secure key distribution protocol based solely on classical data processing, which can then utilized for secure encryption and signature. Our work highlights the importance of information in characterizing quantum systems and the physical significance of the density matrix. We demonstrate that probability theory and randomness are effective tools for countering adversaries with unlimited computational capabilities.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# あなたが測定するバイアス:話者検証研究におけるバイアス評価の方法論的落とし穴 As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research ( http://arxiv.org/abs/2408.13614v1 ) ライセンス: Link先を確認	Wiebke Hutiri, Tanvina Patel, Aaron Yi Ding, Odette Scharenborg,	(参考訳) 話者認証システムにおけるバイアスの検出と緩和は重要であり、データセット、処理の選択、アルゴリズムはパフォーマンスの違いをもたらす可能性がある。これまでの研究では、偏見を評価するために、グループ間でのパフォーマンスの違いを測定してきた。しかし、研究全体での結果を比較すると、矛盾する結論を導き、この分野の進歩を妨げることが判明する。本稿では,測定がバイアス評価の結果に与える影響について検討する。偏差評価は, 性能測定基準, 比あるいは差分に基づく偏差尺度の選択, メタ尺度への偏差測定の集約によって, 強い影響を受けていることを実証的に示す。特に,基準値の値が小さい場合や,大きさの異なる基準値を比較する必要がある場合などについて検討した。 Detecting and mitigating bias in speaker verification systems is important, as datasets, processing choices and algorithms can lead to performance differences that systematically favour some groups of people while disadvantaging others. Prior studies have thus measured performance differences across groups to evaluate bias. However, when comparing results across studies, it becomes apparent that they draw contradictory conclusions, hindering progress in this area. In this paper we investigate how measurement impacts the outcomes of bias evaluations. We show empirically that bias evaluations are strongly influenced by base metrics that measure performance, by the choice of ratio or difference-based bias measure, and by the aggregation of bias measures into meta-measures. Based on our findings, we recommend the use of ratio-based bias measures, in particular when the values of base metrics are small, or when base metrics with different orders of magnitude need to be compared.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 行列積状態の異常エネルギースペクトル Unusual energy spectra of matrix product states ( http://arxiv.org/abs/2408.13616v1 ) ライセンス: Link先を確認	J. Maxwell Silvester, Giuseppe Carleo, Steven R. White,	(参考訳) 強相関量子系の基底状態のシミュレーションにおいて、ハミルトニアンの正確な固有状態(状態のエネルギースペクトル)への近似解の分解は、シミュレーションの性能の重要な側面を決定する。例えば、想像時間進化に基づくアプローチでは、スペクトルはエネルギーと指数関数的に崩壊し、急速に収束する。ここでは、密度行列再正規化群で得られたような近似行列積状態基底状態のエネルギースペクトルを考察する。これらの状態の精度が高いにもかかわらず、スペクトルへの寄与は驚くほど高いエネルギーにほぼ一定であり、結合次元の増大により振幅が減少するが、高エネルギーの尾の程度は減少する。圧縮波動関数の一般的な特徴であると思われる異常スペクトルはサンプリング法に強い影響を与え、大きな揺らぎをもたらす。例えば、サンプリングを用いてエネルギー分散を推定すると、予想よりもはるかに低くなる。最も極端なサンプルの境界は、分散推定を非常にノイズが少なくするが、強いバイアスをもたらす。しかし, この偏り分散推定器は, 基底状態エネルギーを外挿する場合の分散に対する優れたサロゲートであり, 精度と計算コストの両面で競合する外挿法よりも優れていることがわかった。 In the simulation of ground states of strongly-correlated quantum systems, the decomposition of an approximate solution into the exact eigenstates of the Hamiltonian -- the energy spectrum of the state -- determines crucial aspects of the simulation's performance. For example, in approaches based on imaginary-time evolution, the spectrum falls off exponentially with the energy, ensuring rapid convergence. Here we consider the energy spectra of approximate matrix product state ground states, such as those obtained with the density matrix renormalization group. Despite the high accuracy of these states, contributions to the spectra are roughly constant out to surprisingly high energy, with an increase in bond dimension reducing the amplitude but not the extent of these high-energy tails. The unusual spectra, which appear to be a general feature of compressed wavefunctions, have a strong effect on sampling-based methods, yielding large fluctuations. For example, estimating the energy variance using sampling performs much more poorly than one might expect. Bounding the most extreme samples makes the variance estimate much less noisy but introduces a strong bias. However, we find that this biased variance estimator is an excellent surrogate for the variance when extrapolating the ground-state energy, and this approach outperforms competing extrapolation methods in both accuracy and computational cost.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# STAResNet: MaxwellのPDEを解決する時空代数のネットワーク STAResNet: a Network in Spacetime Algebra to solve Maxwell's PDEs ( http://arxiv.org/abs/2408.13619v1 ) ライセンス: Link先を確認	Alberto Pepe, Sven Buchholz, Joan Lasenby,	(参考訳) 時空代数(STA)におけるResNetアーキテクチャであるSTAResNetを導入し、マクスウェルの偏微分方程式(PDE)を解く。近年,Geometric Algebra (GA) のネットワークは,真の幾何学的機械学習の基盤として実証されている。 \cite{brandstetter2022clifford} において、GAネットワークは偏微分方程式 (PDE) を解くために初めて使われ、実数値ネットワークよりも精度が向上した。本研究では, GA と STA で同一の ResNet アーキテクチャとデータセットを用いて, Maxwell の PDE を解き, GA ネットワークの精度に適切な代数の選択が与える影響について議論する。 STAResNetの研究は、クリフォードネットワークにおける正確な幾何学的埋め込みが、地上の真理と推定された場の平均二乗誤差(MSE)を、訓練可能なパラメータの6倍少ない標準のクリフォード・レスネットよりも最大2.6倍低い値で与えることを示す。 STAREsNetは、シナリオに関わらず、MSEが一貫して低く、高い相関を示す。テストされたシナリオは、データセットのサンプリング期間、目に見えるか見えない構成の障害の存在、ResNetアーキテクチャのチャネル数、ロールアウトステップの数、フィールドが2Dまたは3D空間にあるかどうかである。これはクリフォードネットワークにおける正しい代数の選択が、よりコンパクトで正確で記述的でより良い一般化パイプラインにとって重要な要素であることを示す。 We introduce STAResNet, a ResNet architecture in Spacetime Algebra (STA) to solve Maxwell's partial differential equations (PDEs). Recently, networks in Geometric Algebra (GA) have been demonstrated to be an asset for truly geometric machine learning. In \cite{brandstetter2022clifford}, GA networks have been employed for the first time to solve partial differential equations (PDEs), demonstrating an increased accuracy over real-valued networks. In this work we solve Maxwell's PDEs both in GA and STA employing the same ResNet architecture and dataset, to discuss the impact that the choice of the right algebra has on the accuracy of GA networks. Our study on STAResNet shows how the correct geometric embedding in Clifford Networks gives a mean square error (MSE), between ground truth and estimated fields, up to 2.6 times lower than than obtained with a standard Clifford ResNet with 6 times fewer trainable parameters. STAREsNet demonstrates consistently lower MSE and higher correlation regardless of scenario. The scenarios tested are: sampling period of the dataset; presence of obstacles with either seen or unseen configurations; the number of channels in the ResNet architecture; the number of rollout steps; whether the field is in 2D or 3D space. This demonstrates how choosing the right algebra in Clifford networks is a crucial factor for more compact, accurate, descriptive and better generalising pipelines.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 半導体電子顕微鏡解析における多面ロバストと相乗的アプローチの予備的検討:大言語および多モードモデルによる視覚変換器の統合 Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models ( http://arxiv.org/abs/2408.13621v1 ) ライセンス: Link先を確認	Sakhinana Sagar Srinivas, Geethan Sannidhi, Sreeja Gangasani, Chidaksh Ravuru, Venkataramana Runkana,	(参考訳) 半導体や量子材料などの領域では、電子マイクログラフを用いた材料の特徴付けが不可欠である。従来の分類法は、これらのマイクログラフの複雑な構造が原因である。本稿では, GPT-4(言語のみ)のような大規模言語モデル(LLM)におけるゼロショットプロンプトの生成能力, GPT-4(V)イジョンのような大規模マルチモーダルモデル(LMM)における少数ショット学習の予測能力を活用し, 画像ベースでの知識と正確なナノマテリアルカテゴリー予測のための言語知識を融合する革新的なアーキテクチャを提案する。この包括的なアプローチは、半導体製造におけるナノマテリアルの自動識別タスク、ブレンディング性能、効率、解釈可能性のための堅牢なソリューションを提供することを目的としている。本手法は従来の手法を超越し, 精密なナノ材料識別と高スループットスクリーニングを実現する。 Characterizing materials using electron micrographs is crucial in areas such as semiconductors and quantum materials. Traditional classification methods falter due to the intricatestructures of these micrographs. This study introduces an innovative architecture that leverages the generative capabilities of zero-shot prompting in Large Language Models (LLMs) such as GPT-4(language only), the predictive ability of few-shot (in-context) learning in Large Multimodal Models (LMMs) such as GPT-4(V)ision, and fuses knowledge across image based and linguistic insights for accurate nanomaterial category prediction. This comprehensive approach aims to provide a robust solution for the automated nanomaterial identification task in semiconductor manufacturing, blending performance, efficiency, and interpretability. Our method surpasses conventional approaches, offering precise nanomaterial identification and facilitating high-throughput screening.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# エンタープライズ時空間予測アプリケーションの改善:低リソース環境でのマルチモーダル時系列分析のための言語モデルのインストラクションチューニングとデータマイニング Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings ( http://arxiv.org/abs/2408.13622v1 ) ライセンス: Link先を確認	Sagar Srinivas Sakhinana, Geethan Sannidhi, Chidaksh Ravuru, Venkataramana Runkana,	(参考訳) 時空間予測は輸送、物流、サプライチェーン管理において重要である。しかし、現在の手法は大規模で複雑なデータセットに苦しむ。本稿では,従来の予測手法の強みと,時系列トレンド解析のための小言語モデルの命令チューニングを融合した動的マルチモーダル手法を提案する。このアプローチでは、パフォーマンスとレイテンシのトレードオフのバランスを保ちながら、低リソース設定でAIソリューションをスケールアップするために、専門家(MoE)アーキテクチャとパラメータ効率のよい微調整(PEFT)手法を併用する。さらに,本手法では,従来の予測手法の制約を緩和しつつ,時間領域のモデリング手法を用いて,時系列データと時系列データ間の依存関係を効率的に扱えるように,類似の入力時系列に関する過去の経験を活用している。我々のアプローチは意思決定を改善するために予測の不確実性をモデル化する。我々のフレームワークは、推論速度とデータプライバシ/セキュリティを維持しながら、計算とメモリの要求を低減したオンプレミスのカスタマイズを可能にする。様々な実世界のデータセットに対する大規模な実験により、我々のフレームワークは堅牢で正確な予測を提供し、既存の手法よりもはるかに優れています。 Spatio-temporal forecasting is crucial in transportation, logistics, and supply chain management. However, current methods struggle with large, complex datasets. We propose a dynamic, multi-modal approach that integrates the strengths of traditional forecasting methods and instruction tuning of small language models for time series trend analysis. This approach utilizes a mixture of experts (MoE) architecture with parameter-efficient fine-tuning (PEFT) methods, tailored for consumer hardware to scale up AI solutions in low resource settings while balancing performance and latency tradeoffs. Additionally, our approach leverages related past experiences for similar input time series to efficiently handle both intra-series and inter-series dependencies of non-stationary data with a time-then-space modeling approach, using grouped-query attention, while mitigating the limitations of traditional forecasting techniques in handling distributional shifts. Our approach models predictive uncertainty to improve decision-making. Our framework enables on-premises customization with reduced computational and memory demands, while maintaining inference speed and data privacy/security. Extensive experiments on various real-world datasets demonstrate that our framework provides robust and accurate forecasts, significantly outperforming existing methods.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# Prompt-Softbox-Prompt:画像編集のための自由テキスト埋め込み制御 Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing ( http://arxiv.org/abs/2408.13623v1 ) ライセンス: Link先を確認	Yitong Yang, Yinglin Wang, Jing Wang, Tian Zhang,	(参考訳) テキスト駆動拡散モデルは画像編集において顕著な成功を収めてきたが、これらのモデルにおいて重要な要素であるテキスト埋め込みは十分に研究されていない。テキスト埋め込みの絡み合いと不透明さは、正確な画像編集を実現する上で重要な課題である。本稿では,安定拡散XLにおけるテキスト埋め込みの包括的かつ詳細な解析を行い,三つの重要な知見を提供する。まず、‘aug_embedding’はテキストの完全なセマンティックコンテンツをキャプチャするが、最終的な画像生成へのコントリビューションは比較的小さい。第二に 'BOS' と 'Padding_embedding' には意味情報がない。最後に、"EOS"はすべての単語の意味情報を保持し、最もスタイルのよい特徴を含んでいる。それぞれの単語の埋め込みは、互いに干渉することなく、ユニークな役割を果たす。そこで本研究では,PSP(Prompt-Softbox-Prompt)と呼ばれる自由テキスト埋め込み制御手法を用いて,制御可能な画像編集手法を提案する。 PSPは、クロスアテンション層にテキスト埋め込みを挿入または追加し、Softboxを使用してセマンティックインジェクションの特定の領域を定義し制御することで、正確な画像編集を可能にする。この技術は、画像の他の領域を保存しながら、斜めの追加と置換を可能にする。さらに、PSPは単にテキスト埋め込みを置き換えることでスタイル転送を実現することができる。広範囲な実験結果から,PSPはオブジェクト置換,オブジェクト付加,スタイル移動といったタスクにおいて重要な結果をもたらすことが示された。 Text-driven diffusion models have achieved remarkable success in image editing, but a crucial component in these models-text embeddings-has not been fully explored. The entanglement and opacity of text embeddings present significant challenges to achieving precise image editing. In this paper, we provide a comprehensive and in-depth analysis of text embeddings in Stable Diffusion XL, offering three key insights. First, while the 'aug_embedding' captures the full semantic content of the text, its contribution to the final image generation is relatively minor. Second, 'BOS' and 'Padding_embedding' do not contain any semantic information. Lastly, the 'EOS' holds the semantic information of all words and contains the most style features. Each word embedding plays a unique role without interfering with one another. Based on these insights, we propose a novel approach for controllable image editing using a free-text embedding control method called PSP (Prompt-Softbox-Prompt). PSP enables precise image editing by inserting or adding text embeddings within the cross-attention layers and using Softbox to define and control the specific area for semantic injection. This technique allows for obejct additions and replacements while preserving other areas of the image. Additionally, PSP can achieve style transfer by simply replacing text embeddings. Extensive experimental results show that PSP achieves significant results in tasks such as object replacement, object addition, and style transfer.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 下流知識ベンチマークに不要なデータセット:応答分散はドメイン固有のQAの精度と逆相関する No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA ( http://arxiv.org/abs/2408.13624v1 ) ライセンス: Link先を確認	Robert L Simione II,	(参考訳) 本研究は、特定のトピック領域におけるLLMの知識を比較する際に、QAデータセットの作成やLLM応答のグレーディング(チャットボット)の必要性を回避することを目的としている。これは、LLMの内部動作へのアクセスを必要とせずに、完全にエンドユーザー中心の方法で行われ、同じプロンプトに対して異なる世代を生成するために、無作為なシードが与えられる。論文は、あるトピックドメインに対して、そのトピックドメインについて同じ意見質問を繰り返しLLMに尋ねることにより、LLMの「応答分散」を定義する。すなわち、応答分散は LLM の応答の埋め込み行列における分散の95%を説明するのに必要な特異値のカウントである。その結果、応答分散は関連するQA評価(平均スピアマンランク相関が-.59よりも強い)の精度と逆相関していることがわかった。ユースケース分析により、同一トピック領域上の2つの異なるLLMを比較する場合、その応答分散を比較することは、そのQAの精度を74%から89%に比較するのに適切な代替であり、QAの精度ではなく、レスポンス分散を用いて保存された労力と引き換えに、エンドユーザーに許容されるある程度の精度差耐性に依存する範囲であることが示された。 1つはOpenAIのAPIからのもので、もう1つは新しい埋め込みであり、名前付き参照文類似性埋め込みはローカルに計算でき、応答分散を計算するのにほぼ同様に機能する。また、本研究では、もともとトリビアゲーム用に開発されたIRC-Wiki Triviaデータセットと呼ばれる既存のデータセットが再利用され、キュレーションされ、IRC-WikiTriviaQAと呼ばれるキュレーションが実施されている。 This research seeks to obviate the need for creating QA datasets and grading (chatbot) LLM responses when comparing LLMs' knowledge in specific topic domains. This is done in an entirely end-user centric way without need for access to any inner workings of the LLM, so long as it can be prompted and given a random seed to create different generations to the same prompt. The paper does this by, for a given topic domain, defining the "response dispersion" of an LLM by repeatedly asking an LLM the same opinion question about that topic domain. Namely, the response dispersion is the count of singular values needed to explain 95% of the variance in the embedding matrix of the LLM's responses. It is found that the response dispersion is inversely correlated with accuracy on relevant QA evaluations (average spearman rank correlation stronger than -.59). A use-case analysis shows that when comparing two different LLMs on the same topic domain, comparing their response dispersion is a suitable replacement for comparing their QA accuracy between 74% and 89% of the time, the range depending on certain reasonable accuracy-difference tolerances that may be acceptable to an end-user in exchange for the labor being saved using response dispersion instead of QA accuracy for comparison. Two response embeddings are studied for creating the embedding matrix in this study, one is from OpenAI's APIs and one is a novel embedding, here named reference sentence similarity embeddings, that can be computed locally and performs very nearly as well in calculating response dispersion. Also in this research, a pre-existing dataset called the IRC-Wiki Trivia dataset, originally developed for trivia games, has been re-purposed, curated, and the curation, called IRC-WikiTriviaQA, is made available for the purpose of this research.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 医学フェデレーション学習におけるケースベース解釈の可能性 Towards Case-based Interpretability for Medical Federated Learning ( http://arxiv.org/abs/2408.13626v1 ) ライセンス: Link先を確認	Laura Latorre, Liliana Petrychenko, Regina Beets-Tan, Taisiya Kopytova, Wilson Silva,	(参考訳) 医療連携学習環境におけるケースベースの説明を生成するための深層生成モデルについて検討する。ケースベースの解釈可能性を通じてAIモデル決定を説明することは、信頼を高め、臨床実践におけるAIの広範な採用を可能にするために最重要である。しかし、医療AIトレーニングパラダイムは、データ保護規則に従うために、フェデレートされた学習設定に移行している。フェデレートされたシナリオでは、過去のデータは現在のユーザにはアクセスできない。したがって、我々は、プライバシーを保護し、意思決定を説明するための合成例を生成するために、深い生成モデルを使用する。概念実証は胸水診断に焦点をあて,Chest X-rayデータを用いた。 We explore deep generative models to generate case-based explanations in a medical federated learning setting. Explaining AI model decisions through case-based interpretability is paramount to increasing trust and allowing widespread adoption of AI in clinical practice. However, medical AI training paradigms are shifting towards federated learning settings in order to comply with data protection regulations. In a federated scenario, past data is inaccessible to the current user. Thus, we use a deep generative model to generate synthetic examples that protect privacy and explain decisions. Our proof-of-concept focuses on pleural effusion diagnosis and uses publicly available Chest X-ray data.	翻訳日:2024-08-27 18:49:22 公開日:2024-08-24
# 最近のイベントカメラのイノベーション: サーベイ Recent Event Camera Innovations: A Survey ( http://arxiv.org/abs/2408.13627v1 ) ライセンス: Link先を確認	Bharatesh Chakravarthi, Aayush Atul Verma, Kostas Daniilidis, Cornelia Fermuller,	(参考訳) 人間の視覚システムにインスパイアされたイベントベースのビジョンは、低レイテンシ、高ダイナミックレンジ、消費電力の削減といったトランスフォーメーション機能を提供する。本稿では、イベントカメラに関する総合的な調査を行い、その進化を経時的に追跡する。イベントカメラの基本原則を導入し、それらを従来のフレームカメラと比較し、その特徴と運用上の違いを強調します。この調査は、主要な製造業者による様々なイベントカメラモデル、重要な技術マイルストーン、そして影響力のある研究貢献をカバーしている。さまざまな領域にわたる多様なアプリケーション領域を探索し、研究の進展に不可欠な実世界と合成データセットについて論じている。また,テストおよび開発におけるイベントカメラシミュレータの役割についても論じる。この調査は、イベントカメラの現在の状況を強化し、この急速に発展する分野におけるさらなるイノベーションを促すことを目的としている。研究コミュニティをサポートするために、過去と将来の研究論文を分類し、貴重なリソースを統合する。 Event-based vision, inspired by the human visual system, offers transformative capabilities such as low latency, high dynamic range, and reduced power consumption. This paper presents a comprehensive survey of event cameras, tracing their evolution over time. It introduces the fundamental principles of event cameras, compares them with traditional frame cameras, and highlights their unique characteristics and operational differences. The survey covers various event camera models from leading manufacturers, key technological milestones, and influential research contributions. It explores diverse application areas across different domains and discusses essential real-world and synthetic datasets for research advancement. Additionally, the role of event camera simulators in testing and development is discussed. This survey aims to consolidate the current state of event cameras and inspire further innovation in this rapidly evolving field. To support the research community, a \href{https://github.com/chakravarthi589/Event-based-Vision_Resources}{GitHub} page categorizes past and future research articles and consolidates valuable resources.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# マルチトレーメントマーケティングキャンペーンにおける昇降モデリングの強化:スコアランキングと校正手法の活用 Enhancing Uplift Modeling in Multi-Treatment Marketing Campaigns: Leveraging Score Ranking and Calibration Techniques ( http://arxiv.org/abs/2408.13628v1 ) ライセンス: Link先を確認	Yoon Tae Park, Ting Xu, Mohamed Anany,	(参考訳) 昇降モデリングは、特定のマーケティングキャンペーンに対してポジティブに反応する可能性のある個人を選択することで、マーケティング戦略の最適化に不可欠である。この重要性は、多様な治療が利用可能であり、最も影響を与える可能性のある治療に顧客を割り当てたいという、マルチ処理マーケティングキャンペーンにおいてエスカレートします。 Causalmlのような便利なフレームワークを使ったアプローチは存在するが、マルチユースケースにおけるアップリフトモデリングの効果を高める余地はある。本稿では, マーケティングキャンペーン全体のパフォーマンス向上のために, スコアランキングとキャリブレーション技術を活用して, マルチトリートキャンペーンにおける新たなモデリング手法を提案する。本稿では,Meta Learnerフレームワーク(S,T,X)を含む既存のアップリフトモデルとその実環境シナリオにおけるアプリケーションについてレビューする。さらに、多処理研究からの洞察を掘り下げて、この分野の複雑さと潜在的な進歩を強調します。提案手法はメタラーナー校正と評価ランクに基づくオファー選択戦略を取り入れたものである。実世界のデータセットによる大規模な実験の結果は、我々のアプローチの実用的メリットと優れた性能を示している。本研究は, マーケティング分析における予測モデリングを推進し, キャンペーン戦略の最適化を目指す実践者に対して, スコアランキングとキャリブレーション技術を統合する上で重要な役割を担っている。 Uplift modeling is essential for optimizing marketing strategies by selecting individuals likely to respond positively to specific marketing campaigns. This importance escalates in multi-treatment marketing campaigns, where diverse treatment is available and we may want to assign the customers to treatment that can make the most impact. While there are existing approaches with convenient frameworks like Causalml, there are potential spaces to enhance the effect of uplift modeling in multi treatment cases. This paper introduces a novel approach to uplift modeling in multi-treatment campaigns, leveraging score ranking and calibration techniques to improve overall performance of the marketing campaign. We review existing uplift models, including Meta Learner frameworks (S, T, X), and their application in real-world scenarios. Additionally, we delve into insights from multi-treatment studies to highlight the complexities and potential advancements in the field. Our methodology incorporates Meta-Learner calibration and a scoring rank-based offer selection strategy. Extensive experiment results with real-world datasets demonstrate the practical benefits and superior performance of our approach. The findings underscore the critical role of integrating score ranking and calibration techniques in refining the performance and reliability of uplift predictions, thereby advancing predictive modeling in marketing analytics and providing actionable insights for practitioners seeking to optimize their campaign strategies.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 鳥の時間的一貫した3次元再構成 Temporally-consistent 3D Reconstruction of Birds ( http://arxiv.org/abs/2408.13629v1 ) ライセンス: Link先を確認	Johannes Hägerlind, Jonas Hentati-Sundberg, Bastian Wandt,	(参考訳) 本稿では,最近環境科学者が環境変化に有用なバイオインジケーターとして注目するようになった海鳥の3次元再構築について述べる。このような3D情報は、例えば、動き、形状、外観の変化を追跡することによって、鳥の行動と生理的形状を分析するのに有用である。コンピュータの視界から見ると、鳥は高速でしばしば非剛体的な動きのために特に困難である。本研究では,特定の種類の海鳥の単眼映像から3Dのポーズと形状を復元する手法を提案する。提案手法は, 検出, 追跡, セグメンテーション, 時間的に一貫した3次元再構成からなる。さらに,現像型3次元鳥ポーズ推定器を時間領域に拡張する時間的損失を提案する。さらに, 鳥に固有のキーポイントラベルを付けた小さなテストセットを含む, さまざまな動きや相互作用を含む, 9羽の鳥を同時に捕獲した映像を, 10000 フレームのリアルタイムデータセットとして提供した。時間的最適化を用いて、データセットの挑戦的なシーケンスに対して最先端のパフォーマンスを実現する。 This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# DeepVoting: テーラー埋め込みによる投票ルールの学習 DeepVoting: Learning Voting Rules with Tailored Embeddings ( http://arxiv.org/abs/2408.13630v1 ) ライセンス: Link先を確認	Leonardo Matone, Ben Abramowitz, Nicholas Mattei, Avinash Balakrishnan,	(参考訳) 複数のエージェントの選好を集団決定に集約することは、情報検索、強化学習、レコメンデーターシステムを含むコンピュータ科学の領域における多くの重要な問題において共通のステップである。社会選択論(Social Choice Theory)が示すように、特定の性質(公理)を持つルールを集約するアルゴリズムを設計する問題は困難であり、場合によっては証明不可能である。手動でアルゴリズムを設計する代わりに、データから集約ルール、特に投票ルールを学ぶことができる。しかし、この分野における以前の研究は、非常に大きなモデルが必要であったり、好みの表現、すなわち埋め込みの選択によって制限されたりしてきた。我々は,投票規則を設計する際の問題を,一組の候補に対して分布を出力する投票規則の確率的バージョンを学習する問題に再考した。具体的には、ニューラルネットワークを用いて、文献から確率論的社会的選択関数を学習する。社会的選択文学から派生した選好プロファイルの埋め込みにより、既存の投票規則をより効率的に学習し、学習目的に合わせて組み込んだ場合、他の作業よりも多くの有権者にスケールできることを示す。さらに, 埋め込みを用いて学習したルールを微調整して, 公理特性を改良した新しい投票ルールを作成できることを示す。すなわち,従来の投票ルールでは,No Show Paradoxの確率的バージョンに対処するために,小さな修正しか必要としないことを示す。 Aggregating the preferences of multiple agents into a collective decision is a common step in many important problems across areas of computer science including information retrieval, reinforcement learning, and recommender systems. As Social Choice Theory has shown, the problem of designing algorithms for aggregation rules with specific properties (axioms) can be difficult, or provably impossible in some cases. Instead of designing algorithms by hand, one can learn aggregation rules, particularly voting rules, from data. However, the prior work in this area has required extremely large models, or been limited by the choice of preference representation, i.e., embedding. We recast the problem of designing a good voting rule into one of learning probabilistic versions of voting rules that output distributions over a set of candidates. Specifically, we use neural networks to learn probabilistic social choice functions from the literature. We show that embeddings of preference profiles derived from the social choice literature allows us to learn existing voting rules more efficiently and scale to larger populations of voters more easily than other work if the embedding is tailored to the learning objective. Moreover, we show that rules learned using embeddings can be tweaked to create novel voting rules with improved axiomatic properties. Namely, we show that existing voting rules require only minor modification to combat a probabilistic version of the No Show Paradox.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 古代のデジタル化:KHAMISデータセット作成による東シリア文字の文字認識 Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset ( http://arxiv.org/abs/2408.13631v1 ) ライセンス: Link先を確認	Ameer Majeed, Hossein Hassani,	(参考訳) 多くの言語には、民俗物語や歴史物語、現代の文書や手紙など、膨大な量の手書きのテキストがある。これらのテキストのデジタル化は、日々のタスク、文化研究、歴史研究など、様々な応用がある。シリア語は古代の、絶滅危惧種で、低資源の言語であり、それに必要な注意を引いていない。本稿では,この絶滅危惧言語のためのより多くのデジタルサービスを構築するための出発点として,手書きシリア語テキストに基づく光学文字認識(OCR)モデルの開発を目的とした研究プロジェクトについて報告する。データセットKHAMIS(東シリアの詩人であるKhamis bar Qardaheに触発された)は、東シリアの文字で書かれた文章からなる。我々は手書きデータに基づいてテッセラクト-OCRエンジンの事前訓練されたシリアクモデルを微調整した。データは、KHAMISを作成するために言語で読み書きできるボランティアから収集された。 KHAMISは現在、31人の大学生と1人の教授から集められた624件のシリア人による手書きの文章で構成されている。その結果、手書きのOCRモデルは、トレーニングセットと評価セットの両方で、それぞれ1.097-1.610%と8.963-10.490%の文字エラー率を達成でき、テストセットでの評価では18.89-19.71%と単語エラー率62.83-65.42%の文字エラー率を達成できた。 Many languages have vast amounts of handwritten texts, such as ancient scripts about folktale stories and historical narratives or contemporary documents and letters. Digitization of those texts has various applications, such as daily tasks, cultural studies, and historical research. Syriac is an ancient, endangered, and low-resourced language that has not received the attention it requires and deserves. This paper reports on a research project aimed at developing a optical character recognition (OCR) model based on the handwritten Syriac texts as a starting point to build more digital services for this endangered language. A dataset was created, KHAMIS (inspired by the East Syriac poet, Khamis bar Qardahe), which consists of handwritten sentences in the East Syriac script. We used it to fine-tune the Tesseract-OCR engine's pretrained Syriac model on handwritten data. The data was collected from volunteers capable of reading and writing in the language to create KHAMIS. KHAMIS currently consists of 624 handwritten Syriac sentences collected from 31 university students and one professor, and it will be partially available online and the whole dataset available in the near future for development and research purposes. As a result, the handwritten OCR model was able to achieve a character error rate of 1.097-1.610% and 8.963-10.490% on both training and evaluation sets, respectively, and both a character error rate of 18.89-19.71% and a word error rate of 62.83-65.42% when evaluated on the test set, which is twice as better than the default Syriac model of Tesseract.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# FungiTastic: 画像分類のためのマルチモーダルデータセットとベンチマーク FungiTastic: A multi-modal dataset and benchmark for image categorization ( http://arxiv.org/abs/2408.13632v1 ) ライセンス: Link先を確認	Lukas Picek, Klara Janouskova, Milan Sulc, Jiri Matas,	(参考訳) 我々は、20年間にわたって継続的に収集されたデータに基づいて、新しい非常に挑戦的なベンチマークとデータセット、FungiTasticを導入しました。データセットは専門家によってラベル付けされ、キュレートされた菌類記録に由来する。約350kのマルチモーダル観測で、5kの細粒度のカテゴリから650万枚以上の写真と様々な関連情報、例えば、取得メタデータ、衛星画像、身体部分のセグメンテーションを含む。 FungiTasticは、前例のないラベルの信頼性という、部分的にDNA配列の真実を持つテストセットを含む唯一のベンチマークである。ベンチマークはサポートするように設計されています (i)標準クローズセット分類 (ii)オープンセット分類 (三)マルチモーダル分類 (4)少人数の学習。 (v)ドメインシフトなど。ほとんどすべてのユースケースに適したベースラインメソッドを提供します。我々はHuggingFace上で多数の事前トレーニング済みモデルを提供し、モデルトレーニングのためのフレームワークを提供します。データセットの機能とベースラインを記述する包括的なドキュメントは、https://bohemianvra.github.io/FungiTastic/とhttps://www.kaggle.com/datasets/picekl/fungitasticで公開されている。 We introduce a new, highly challenging benchmark and a dataset -- FungiTastic -- based on data continuously collected over a twenty-year span. The dataset originates in fungal records labeled and curated by experts. It consists of about 350k multi-modal observations that include more than 650k photographs from 5k fine-grained categories and diverse accompanying information, e.g., acquisition metadata, satellite images, and body part segmentation. FungiTastic is the only benchmark that includes a test set with partially DNA-sequenced ground truth of unprecedented label reliability. The benchmark is designed to support (i) standard close-set classification, (ii) open-set classification, (iii) multi-modal classification, (iv) few-shot learning, (v) domain shift, and many more. We provide baseline methods tailored for almost all the use-cases. We provide a multitude of ready-to-use pre-trained models on HuggingFace and a framework for model training. A comprehensive documentation describing the dataset features and the baselines are available at https://bohemianvra.github.io/FungiTastic/ and https://www.kaggle.com/datasets/picekl/fungitastic.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 安全な統合センシング・通信のための送信者行動 Transmitter Actions for Secure Integrated Sensing and Communication ( http://arxiv.org/abs/2408.13635v1 ) ライセンス: Link先を確認	Truman Welling, Onur Günlü, Aylin Yener,	(参考訳) この研究は、セキュアな統合センシング通信(ISAC)システムを、行動依存のチャネル状態と、反射によって得られるチャネル出力フィードバックを備えたワイヤタップチャネルとしてモデル化する。送信されたメッセージは、共通のメッセージとセキュアなメッセージに分割され、どちらも正当な受信側で確実に回収されなければならないが、セキュアなメッセージは盗聴者から秘密にしておく必要がある。ビームフォーミングベクトル設計のような送信機動作は、各チャネル使用時の対応する状態に影響を与える。アクションシーケンスは送信メッセージとチャネル出力の両方のフィードバックに依存するようにモデル化される。完全チャネル出力フィードバックのために、シークレット歪み領域は、送信機動作を伴う物理的に劣化し、逆物理的に劣化したセキュアISACチャネルに対して提供される。メッセージ全体が秘密にされなければならない対応するレート領域も提供される。結果は二項例の秘密歪曲領域を特徴付けることによって説明される。 This work models a secure integrated sensing and communication (ISAC) system as a wiretap channel with action-dependent channel states and channel output feedback, e.g., obtained through reflections. The transmitted message is split into a common and a secure message, both of which must be reliably recovered at the legitimate receiver, while the secure message needs to be kept secret from the eavesdropper. The transmitter actions, such as beamforming vector design, affect the corresponding state at each channel use. The action sequence is modeled to depend on both the transmitted message and channel output feedback. For perfect channel output feedback, the secrecy-distortion regions are provided for physically-degraded and reversely-physically-degraded secure ISAC channels with transmitter actions. The corresponding rate regions when the entire message should be kept secret are also provided. The results are illustrated through characterizing the secrecy-distortion region of a binary example.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 臨時選挙:福祉・防衛・比例 Temporal Elections: Welfare, Strategyproofness, and Proportionality ( http://arxiv.org/abs/2408.13637v1 ) ライセンス: Link先を確認	Edith Elkind, Tzeh Yuan Neoh, Nicholas Teh,	(参考訳) 各ラウンドで1つの選択肢が選択されるシーケンシャルな意思決定モデルについて検討する。本稿では, 実用的福祉(Util)と平等的福祉(Egal)の2つの目的に着目し, 関連する最大化問題の計算複雑性と, 戦略の正当性と比例性との整合性を検討する。我々は、Util の最大化は容易であるが、Egal の対応する決定問題は制限された場合においてもNP完全であると考えている。パラメータ化複雑性解析と近似アルゴリズムでEgalのこの硬さを補完する。さらに、Utilの結果を出力するメカニズムは戦略的だが、Egalの結果を計算するためのすべての決定論的メカニズムは、Non-obvious Manipulability (NOM)と呼ばれる、非常に弱い戦略的変種を失敗することを示した。しかし, エージェントが各時点に空でない承認セットを持つ場合, 関係を破ってEgal-maximizing結果を選択するとNOMを満足することがわかった。比例性については、比例(PROP)の結果を効率的に計算できることが証明されているが、PROPを保証しながらUtilを最大化する結果はNPハードである。我々はまた、Util と Egal に関する比例価格の上下限を導出する。 We investigate a model of sequential decision-making where a single alternative is chosen at each round. We focus on two objectives-utilitarian welfare (Util) and egalitarian welfare (Egal)-and consider the computational complexity of the associated maximization problems, as well as their compatibility with strategyproofness and proportionality. We observe that maximizing Util is easy, but the corresponding decision problem for Egal is NP-complete even in restricted cases. We complement this hardness result for Egal with parameterized complexity analysis and an approximation algorithm. Additionally, we show that, while a mechanism that outputs a Util outcome is strategyproof, all deterministic mechanisms for computing Egal outcomes fail a very weak variant of strategyproofness, called non-obvious manipulability (NOM). However, we show that when agents have non-empty approval sets at each timestep, choosing an Egal-maximizing outcome while breaking ties lexicographically satisfies NOM. Regarding proportionality, we prove that a proportional (PROP) outcome can be computed efficiently, but finding an outcome that maximizes Util while guaranteeing PROP is NP-hard. We also derive upper and lower bounds on the price of proportionality with respect to Util and Egal.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 医用画像セグメンテーションのためのサイズ認識型クロスシェイプスクリブルシミュレーション Size Aware Cross-shape Scribble Supervision for Medical Image Segmentation ( http://arxiv.org/abs/2408.13639v1 ) ライセンス: Link先を確認	Jing Yuan, Tania Stathaki,	(参考訳) 弱い教師付き学習の一般的な形式であるスクリブル監督は、手書きの曲線線を使って画素に注釈を付けることで、手動ラベリングのコストを低減させる。この技術は医用画像分割作業においてネットワークトレーニングの高速化に広く用いられている。しかし、スクリブル・インスペクションは、サンプル間のアノテーションの整合性や総合的な基盤情報の入手に制限がある。さらに、特に医療画像の文脈において、様々なスケールターゲットを収容することの難しさに悩まされることが多い。本稿では,これらの課題,すなわち3つの新しい手法を提案する。 1) クロスシェイプスクリブルアノテーション法 2 十字形に基づく仮面法及び 3) サイズ対応マルチブランチ方式。パラメータと構造設計を詳細に検討した。実験の結果,提案手法は複数のpolypデータセットにまたがるmDiceスコアの大幅な改善を実現していることがわかった。特に,これらの手法を組み合わせることで,医用画像のセグメンテーションのために設計された最先端のスクリブル監視手法の性能が向上する。 Scribble supervision, a common form of weakly supervised learning, involves annotating pixels using hand-drawn curve lines, which helps reduce the cost of manual labelling. This technique has been widely used in medical image segmentation tasks to fasten network training. However, scribble supervision has limitations in terms of annotation consistency across samples and the availability of comprehensive groundtruth information. Additionally, it often grapples with the challenge of accommodating varying scale targets, particularly in the context of medical images. In this paper, we propose three novel methods to overcome these challenges, namely, 1) the cross-shape scribble annotation method; 2) the pseudo mask method based on cross shapes; and 3) the size-aware multi-branch method. The parameter and structure design are investigated in depth. Experimental results show that the proposed methods have achieved significant improvement in mDice scores across multiple polyp datasets. Notably, the combination of these methods outperforms the performance of state-of-the-art scribble supervision methods designed for medical image segmentation.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 完全受動的状態と受動的状態を自由状態とする資源理論 Resource theories with completely passive states and passive states as free states ( http://arxiv.org/abs/2408.13641v1 ) ライセンス: Link先を確認	Gianluca Francica,	(参考訳) 量子システムから抽出可能な作業は、いくつかの資源理論に関連付けられるリソースである。完全受動的状態と受動的状態を自由状態として考えることにより、最大作業が単調な資源理論を定式化し、温度の定義が資源理論においてどのように重要な役割を果たすかを示す。 Work extractable from quantum system is a resource that can be related to some resource theory. By considering completely passive states and passive states as free states, we formulate resource theories where the maximum work extractable is a monotone, showing how the definition of a temperature plays a pivotal role in the resource theories.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 階層変換器を用いた半監督映像における時間分割・コンカレント異常動作の局在化 Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer ( http://arxiv.org/abs/2408.13643v1 ) ライセンス: Link先を確認	Nada Osman, Marwan Torki,	(参考訳) 異常な行動の検出と位置決めは、セキュリティと高度な監視システムにおいて重要な役割を果たす。しかし、膨大な数の監視ビデオにより、タスクの利用可能なデータのほとんどは、既知のビデオクラスとラベル付けまたは半ラベル付けされているが、異常な出来事の所在は不明である。本研究では,半教師付きビデオにおける異常なローカライゼーションを対象とする。この課題に対処する主な方向は,セグメントレベルのマルチインスタンス学習と擬似ラベルの生成に焦点をあてる一方で,ビデオ内の時間的関係を学習して,異常な事象を特定することで解決する,有望かつ未完成な方向性を探究することを目的とする。そこで本稿では,時間軸に沿った分割・縮小戦略を用いて,異常映像における観察行動の意義を評価するための階層型トランスフォーマーモデルを提案する。本手法は, 親映像を階層的に複数の時間的児童事例に区分し, 親映像の異常の分類における子ノードの影響を計測する。 UCF-crimeとShanghaiTechという2つのよく知られた異常検出データセット上でモデルを評価した結果、ビデオ内で観察された動作を解釈し、異常検出をローカライズする能力が証明された。提案手法は,より最近の擬似ラベル方式のアプローチと比較して,有望な性能を保ちつつ,セグメントレベルのマルチインスタンス学習アプローチに依存した先行研究よりも優れていた。 Anomaly action detection and localization play an essential role in security and advanced surveillance systems. However, due to the tremendous amount of surveillance videos, most of the available data for the task is unlabeled or semi-labeled with the video class known, but the location of the anomaly event is unknown. In this work, we target anomaly localization in semi-supervised videos. While the mainstream direction in addressing this task is focused on segment-level multi-instance learning and the generation of pseudo labels, we aim to explore a promising yet unfulfilled direction to solve the problem by learning the temporal relations within videos in order to locate anomaly events. To this end, we propose a hierarchical transformer model designed to evaluate the significance of observed actions in anomalous videos with a divide-and-conquer strategy along the temporal axis. Our approach segments a parent video hierarchically into multiple temporal children instances and measures the influence of the children nodes in classifying the abnormality of the parent video. Evaluating our model on two well-known anomaly detection datasets, UCF-crime and ShanghaiTech, proves its ability to interpret the observed actions within videos and localize the anomalous ones. Our proposed approach outperforms previous works relying on segment-level multiple-instance learning approaches while reaching a promising performance compared to the more recent pseudo-labeling-based approaches.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 環境音分類のための事前学習モデルにおけるオーディオフィルタの効果の検討 Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification ( http://arxiv.org/abs/2408.13644v1 ) ライセンス: Link先を確認	Aditya Dawn, Wazib Ansar,	(参考訳) 環境音の分類は音声認識の重要な問題であり、時間や周波数に関して環境音が十分に構造化されていないため、音声認識よりも複雑である。研究者たちは、様々なCNNモデルを使用して、ログメルスペクトル、ガンマトンスペクトル係数、メル周波数スペクトル係数などの様々なオーディオ特徴から、過去数年間にわたってオーディオファイルから生成された音声特徴を学習してきた。本稿では,2レベル分類手法を提案する。レベル1分類器は音声信号をより広いクラスに分類し,レベル2分類器はレベル1分類器の出力に基づいて,音声が属する実際のクラスを見つける責任を負う。また,本論文では,Audio Cropの新たな手法を導入し,ほとんどの症例で最高のアキュラシーを呈するオーディオフィルタの効果を示した。実験にはESC-50データセットを使用し、レベル1分類の場合は78.75%、レベル2分類では98.04%の最大精度を得た。 Environmental Sound Classification is an important problem of sound recognition and is more complicated than speech recognition problems as environmental sounds are not well structured with respect to time and frequency. Researchers have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel-frequency spectral coefficients, generated from the audio files, over the past years. In this paper, we propose a new methodology : Two-Level Classification; the Level 1 Classifier will be responsible to classify the audio signal into a broader class and the Level 2 Classifiers will be responsible to find the actual class to which the audio belongs, based on the output of the Level 1 Classifier. We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accuracies in most of the cases. We have used the ESC-50 dataset for our experiment and obtained a maximum accuracy of 78.75% in case of Level 1 Classification and 98.04% in case of Level 2 Classifications.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 歩行者検出におけるポストプロシージャを用いた平均高さ支援 Mean Height Aided Post-Processing for Pedestrian Detection ( http://arxiv.org/abs/2408.13646v1 ) ライセンス: Link先を確認	Jing Yuan, Tania Stathaki, Guangyu Ren,	(参考訳) 歩行者検出器の設計は、このタスクのユニークな特徴をほとんど考慮せず、通常、一般的な物体検出のための共通の戦略に従う。これらの特徴の可能性を探求するため、歩行者データセットの視点効果を例として捉え、後処理における平均高さ支援抑制法を提案する。本手法は、歩行者を含む可能性の低いレベルに落下する予測や、平均よりも異常な高さの予測を拒絶する。これを実現するために, 平均高さ発生器と平均高さ発生器を提案する。様々なデータセットや検出器に関する総合的な実験を行い、ハイパーパラメータの選択について深く議論する。提案手法は実装が容易で,プラグアンドプレイである。その結果,既存の歩行者検出装置やデータセットに適用した場合の検出精度は有意に向上した。平均身長による抑制と特定の検出器との併用は、カリフォルニア工科大学とシティパーソンズのデータセットで最先端の歩行者検出器を上回っている。 The design of pedestrian detectors seldom considers the unique characteristics of this task and usually follows the common strategies for general object detection. To explore the potential of these characteristics, we take the perspective effect in pedestrian datasets as an example and propose the mean height aided suppression for post-processing. This method rejects predictions that fall at levels with a low possibility of containing any pedestrians or that have an abnormal height compared to the average. To achieve this, the existence score and mean height generators are proposed. Comprehensive experiments on various datasets and detectors are performed; the choice of hyper-parameters is discussed in depth. The proposed method is easy to implement and is plug-and-play. Results show that the proposed methods significantly improve detection accuracy when applied to different existing pedestrian detectors and datasets. The combination of mean height aided suppression with particular detectors outperforms state-of-the-art pedestrian detectors on Caltech and Citypersons datasets.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# 特徴シフトが性能に与える影響を理解するための説明モデルモニタリング Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance ( http://arxiv.org/abs/2408.13648v1 ) ライセンス: Link先を確認	Thomas Decker, Alexander Koebler, Michael Lebacher, Ingo Thon, Volker Tresp, Florian Buettner,	(参考訳) 機械学習モデルのモニタリングとメンテナンスは、この分野における最近の進歩を現実世界のアプリケーションに翻訳する上で、最も重要な課題である。しかし、現在のモニタリング手法には、特定のモデルの性能が実際に低下した理由に関する質問に答える実用的な洞察を提供する能力がない。本研究では,入力特性の解釈に推定された性能変化を寄与させることにより,特徴変化下でのブラックボックスモデルの振る舞いを説明する新しい手法を提案する。提案手法は,最適輸送と共有値の概念を,説明的性能推定(Explanatory Performance Estimation, XPE)として組み合わせたものである。基礎となる仮定を解析し、画像、オーディオ、表データなどの様々なデータモダリティにまたがる異なるデータセットに対するいくつかのベースラインに対するアプローチの優位性を実証する。また, モデル劣化の潜在的な根本原因を明らかにし, 行動可能な対策を導くことによって, 予測モデルモニタリングを可能にする。 Monitoring and maintaining machine learning models are among the most critical challenges in translating recent advances in the field into real-world applications. However, current monitoring methods lack the capability of provide actionable insights answering the question of why the performance of a particular model really degraded. In this work, we propose a novel approach to explain the behavior of a black-box model under feature shifts by attributing an estimated performance change to interpretable input characteristics. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation (XPE). We analyze the underlying assumptions and demonstrate the superiority of our approach over several baselines on different data sets across various data modalities such as images, audio, and tabular data. We also indicate how the generated results can lead to valuable insights, enabling explanatory model monitoring by revealing potential root causes for model deterioration and guiding toward actionable countermeasures.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# ポアソン境界分布を持つ木構造マルコフランダム場 Tree-structured Markov random fields with Poisson marginal distributions ( http://arxiv.org/abs/2408.13649v1 ) ライセンス: Link先を確認	Benjamin Côté, Hélène Cossette, Etienne Marceau,	(参考訳) 離散カウントランダム変数のベクトルに対する木構造マルコフ確率場の新たなファミリーを導入する。家族の特性によると、マルコフ確率場の限界分布は、すべて同じ平均のポアソンであり、それらの組込み依存の強さや構造からは影響を受けない。この重要な機能はMarkovランダムフィールドでは珍しく、アプリケーション用途では最も便利である。この新ファミリーの具体的特性は、確率変数をカウントするベクトルの結合確率質量関数と結合確率生成関数の簡単なサンプリング手順と解析式を参照し、高次元のベクトルによくスケールする計算方法を与える。本研究では,マルコフ確率場を構成する確率変数の和の分布について検討し,その値に対する確率変数の個人的寄与を推定割当により解析し,確率的順序付けを行い,その挙動を広範囲に把握する。 A new family of tree-structured Markov random fields for a vector of discrete counting random variables is introduced. According to the characteristics of the family, the marginal distributions of the Markov random fields are all Poisson with the same mean, and are untied from the strength or structure of their built-in dependence. This key feature is uncommon for Markov random fields and most convenient for applications purposes. The specific properties of this new family confer a straightforward sampling procedure and analytic expressions for the joint probability mass function and the joint probability generating function of the vector of counting random variables, thus granting computational methods that scale well to vectors of high dimension. We study the distribution of the sum of random variables constituting a Markov random field from the proposed family, analyze a random variable's individual contribution to that sum through expected allocations, and establish stochastic orderings to assess a wide understanding of their behavior.	翻訳日:2024-08-27 18:39:37 公開日:2024-08-24
# コンフリクトにおけるナラティブ:多言語偽情報キャンペーンにおけるニュースフレイムの計算分析 Narratives at Conflict: Computational Analysis of News Framing in Multilingual Disinformation Campaigns ( http://arxiv.org/abs/2408.13651v1 ) ライセンス: Link先を確認	Antonina Sinelnik, Dirk Hovy,	(参考訳) あらゆるレポートは、ストーリーの特定の側面を強調したり排除したりすることで、特定の解釈を好んだりするための問題である。偽情報にフレーミングが広く使われているにもかかわらず、フレーミングの特性と検出方法は英語圏以外では未発見のままである。我々は、同じ問題の多言語フレーミングが体系的にどう異なるかを考察する。ロシアが支援する偽情報キャンペーンの8年間を、15カ国をターゲットとする4つの言語で8万件のニュース記事に費やしています。参加者のターゲット言語によっては, 意図的かつ一貫して, 特定のフレーミングを好んでいることが分かる。さらに、メディアの報道の地域によって、ロシア語記事が選択されたフレームを一貫して強調する様子が明らかになった。自動フレーム解析の最も顕著な2つのモデルは性能が低く、高い不一致を示し、さらなる研究の必要性を浮き彫りにしている。 Any report frames issues to favor a particular interpretation by highlighting or excluding certain aspects of a story. Despite the widespread use of framing in disinformation, framing properties and detection methods remain underexplored outside the English-speaking world. We explore how multilingual framing of the same issue differs systematically. We use eight years of Russia-backed disinformation campaigns, spanning 8k news articles in 4 languages targeting 15 countries. We find that disinformation campaigns consistently and intentionally favor specific framing, depending on the target language of the audience. We further discover how Russian-language articles consistently highlight selected frames depending on the region of the media coverage. We find that the two most prominent models for automatic frame analysis underperform and show high disagreement, highlighting the need for further research.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# LiDARを用いた3次元障害物検出のロバスト性評価と自律運転システムへの影響 Evaluating the Robustness of LiDAR-based 3D Obstacles Detection and Its Impacts on Autonomous Driving Systems ( http://arxiv.org/abs/2408.13653v1 ) ライセンス: Link先を確認	Tri Minh Triet Pham, Bo Yang, Jinqiu Yang,	(参考訳) 自律運転システム(ADS)は、ディープニューラルネットワークを使用して時間に敏感な決定を行うために、複数のセンサーからのリアルタイム入力を必要とする。これにより、これらの決定の正しさは、エラーが重大な損失を引き起こす可能性があるため、ADSの採用に不可欠である。 LiDARのようなセンサーは環境変化や不正確な内蔵に敏感であり、フレーム間で変動する可能性がある。 ADSのテストには広範な作業があったが、現在のADSがLiDARポイントクラウドデータの非常に微妙な変更に対して堅牢であるかどうかは不明だ。本研究では,LiDARセンサの組込み不正確さがLiDAR-3D障害物検出モデルに与える影響について検討し,障害物検出(ロバスト性)や拡張軌道予測(障害物検出の頑健さがADSに与える影響)について考察する。我々は、LiDARデータに微妙な摂動を適用し、LiDAR-3D障害物検出の堅牢性を評価し、軌道予測モジュールとADSへの影響を評価するフレームワークSORBETを提案する。業界グレードのLevel 4 ADS(BaiduのApollo)を含む5種類のLiDAR-3D障害物検出モデルのロバスト性を評価するためにSORBETを適用した。さらに,障害物検出結果の変化が軌道予測に悪影響を及ぼすかを検討した。本評価では,LiDAR-3D障害物検出モデルの微妙な摂動に対する堅牢性テストの重要性を強調した。点雲データの微妙な変化(すなわち2点の除去)でさえ、検出性能の非自明な低下をもたらす可能性がある。さらに、このような負の影響は、他のモジュールにさらに伝播し、ADSの安全性を脅かす。 Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensive work to test ADSs, it remains unclear whether current ADSs are robust against very subtle changes in LiDAR point cloud data. In this work, we study the impact of the built-in inaccuracies in LiDAR sensors on LiDAR-3D obstacle detection models to provide insight into how they can impact obstacle detection (i.e., robustness) and by extension trajectory prediction (i.e., how the robustness of obstacle detection would impact ADSs). We propose a framework SORBET, that applies subtle perturbations to LiDAR data, evaluates the robustness of LiDAR-3D obstacle detection, and assesses the impacts on the trajectory prediction module and ADSs. We applied SORBET to evaluate the robustness of five classic LiDAR-3D obstacle detection models, including one from an industry-grade Level 4 ADS (Baidu's Apollo). Furthermore, we studied how changes in the obstacle detection results would negatively impact trajectory prediction in a cascading fashion. Our evaluation highlights the importance of testing the robustness of LiDAR-3D obstacle detection models against subtle perturbations. We find that even very subtle changes in point cloud data (i.e., removing two points) may introduce a non-trivial decrease in the detection performance. Furthermore, such a negative impact will further propagate to other modules, and endanger the safety of ADSs.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# 記号型ワーキングメモリは複雑なルール適用のための言語モデルを強化する Symbolic Working Memory Enhances Language Models for Complex Rule Application ( http://arxiv.org/abs/2408.13654v1 ) ライセンス: Link先を確認	Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren,	(参考訳) 大規模言語モデル(LLM)は、顕著な推論性能を示しているが、特に規則を連続的に提示しない場合に、一連のルール適用ステップを含む多段階の推論に苦慮している。予備分析の結果,LLMは単一ステップルールアプリケーションでは優れているが,ルールグラウンディングの課題により,多ステップシナリオでは性能が著しく低下することがわかった。複数の入力ルール、事実、推測された事実の中で、適用可能なルールを固定し、各ステップで事実をサポートする必要がある。そこで本研究では,外部動作メモリを用いたLLMの拡張と,ルール適用のためのニューロシンボリックフレームワークを提案する。メモリは、ファクトとルールを自然言語とシンボルの両方に格納し、正確な追跡を可能にする。このメモリを利用することで、私達のフレームワークはシンボリック・ルール・グラウンディングとLLMベースのルール実装を反復的に実行します。前者は象徴的な規則と事実の述語と変数を一致させ、各ステップで適用可能な規則を基礎とする。実験では、ルール適用における我々のフレームワークの有効性と、さまざまなステップと設定にわたる堅牢性を示します。と。 Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning involving a series of rule application steps, especially when rules are presented non-sequentially. Our preliminary analysis shows that while LLMs excel in single-step rule application, their performance drops significantly in multi-step scenarios due to the challenge in rule grounding. It requires anchoring the applicable rule and supporting facts at each step, amidst multiple input rules, facts, and inferred facts. To address this, we propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application. The memory stores facts and rules in both natural language and symbolic forms, enabling precise tracking. Utilizing this memory, our framework iteratively performs symbolic rule grounding and LLM-based rule implementation. The former matches predicates and variables of symbolic rules and facts to ground applicable rules at each step. Experiments indicate our framework's effectiveness in rule application and its robustness across various steps and settings~\footnote{Code and data are available at \url{https://github.com/SiyuanWangw/RuleApplication}.}.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# ローカライズ・アンド・スティッチ:スパースタスク算術による効率的なモデルマージ Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic ( http://arxiv.org/abs/2408.13656v1 ) ライセンス: Link先を確認	Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, Han Zhao,	(参考訳) モデルマージは、複数の微調整されたモデルの強みを、それぞれの特殊能力を保持する統一モデルに結合する効果的な戦略を提供する。既存のメソッドはモデルをグローバルな方法でマージし、すべてのモデルパラメータにわたって算術演算を実行する。しかし、このようなグローバルなマージは、しばしばタスクの干渉を引き起こし、マージされたモデルの性能を低下させる。本稿では,局所的なモデルを統合する新しいアプローチであるLocalize-and-Stitchを紹介する。私たちのアルゴリズムは2つのステップで機能します。一下流業務に欠かせない技能を有する微調整モデルにおいて、極小(総パラメータの百分の1 %)の地域を特定すること。 ii)スティッチング:これらの必須領域のみをタスクシナジーの事前訓練モデルに再統合すること。提案手法は, 微調整性能に寄与するスパース領域を効果的に検出し, 局所化領域を微調整モデル(タスク)のコンパクトかつ解釈可能な表現として扱えることを示す。実験により,本手法を様々なビジョンと言語ベンチマークで評価し,既存のモデルマージ手法を異なるデータ・アベイラビリティー・シナリオで比較した。実験性能の向上に加えて,本アルゴリズムはモデル圧縮を促進し,事前学習した知識を保存し,記憶量と計算オーバーヘッドを最小限に抑えた複数の微調整モデルからフレキシブルかつ連続的なスキル構成を可能にする。私たちのコードはhttps://github.com/yifei-he/Localize-and-Stitch.comで公開されています。 Model merging offers an effective strategy to combine the strengths of multiple finetuned models into a unified model that preserves the specialized capabilities of each. Existing methods merge models in a global manner, performing arithmetic operations across all model parameters. However, such global merging often leads to task interference, degrading the performance of the merged model. In this work, we introduce Localize-and-Stitch, a novel approach that merges models in a localized way. Our algorithm works in two steps: i) Localization: identify tiny ($1\%$ of the total parameters) localized regions in the finetuned models containing essential skills for the downstream tasks, and ii) Stitching: reintegrate only these essential regions back into the pretrained model for task synergy. We demonstrate that our approach effectively locates sparse regions responsible for finetuned performance, and the localized regions could be treated as compact and interpretable representations of the finetuned models (tasks). Empirically, we evaluate our method on various vision and language benchmarks, showing that it outperforms existing model merging methods under different data availability scenarios. Beyond strong empirical performance, our algorithm also facilitates model compression and preserves pretrained knowledge, enabling flexible and continual skill composition from multiple finetuned models with minimal storage and computational overhead. Our code is available at https://github.com/yifei-he/Localize-and-Stitch.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# ディープラーニングモデルを用いたビームラインステアリング Beamline Steering Using Deep Learning Models ( http://arxiv.org/abs/2408.13657v1 ) ライセンス: Link先を確認	Dexter Allen, Isaac Kante, Dorian Bohler,	(参考訳) ビームステアリングは、コリメータの回転軸に関して、粒子加速器の電子ビームがX線ターゲットに入射する角度と位置の校正を伴う。ビームステアリングは光源にとって重要な課題である。リナック・トゥ・アンデュレータは、加速器の各使用の変化のために、磁石の再校正が必要であるため、操縦し、目標を定めることが非常に困難である。しかしビームラインの使用ごとに、現在のステアリング法は角度や位置の調整に直面すると問題が発生する。ヒューマンオペレータはそのタスクにかなりの時間とリソースを費やす。我々は,様々なハイパーパラメータ,入力,出力を持つ複数のフィードフォワードニューラルネットワークを開発し,その性能を比較した。具体的には、33の入力と13の出力を持つ小さなモデルでは、73の入力と50の出力を持つ大きなモデルよりも優れていた。より大規模なモデルでは、この性能の欠如について、以下の説明を行う。まず、トレーニング時間と計算能力の欠如により、モデルの成熟能力は制限された。より多くの時間があれば、私たちのモデルはSVDを上回っます。第二に、モデルの入力サイズが大きくなるとノイズも増加する。この場合、より多くの入力がLINAC加速器のより大きな長さに対応している。より具体的なモデルやより大規模なモデルでは、SVDよりも本質的に悪いパフォーマンスが期待できる。 Beam steering involves the calibration of the angle and position at which a particle accelerator's electron beam is incident upon the x-ray target with respect to the rotation axis of the collimator. Beam Steering is an essential task for light sources. The Linac To Undulator is very difficult to steer and aim due to the changes of each use of the accelerator there must be re-calibration of magnets. However with each use of the Beamline its current method of steering runs into issues when faced with calibrating angles and positions. Human operators spend a substantial amount of time and resources on the task. We developed multiple different feed-forward-neural networks with varying hyper-parameters, inputs, and outputs, seeking to compare their performance. Specifically, our smaller models with 33 inputs and 13 outputs outperformed the larger models with 73 inputs and 50 outputs. We propose the following explanations for this lack of performance in larger models. First, a lack of training time and computational power limited the ability of our models to mature. Given more time, our models would outperform SVD. Second, when the input size of the model increases the noise increases as well. In this case more inputs corresponded to a greater length upon the LINAC accelerator. Less specific and larger models that seek to make more predictions will inherently perform worse than SVD.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# Reactzyme: 酵素反応予測のためのベンチマーク Reactzyme: A Benchmark for Enzyme-Reaction Prediction ( http://arxiv.org/abs/2408.13659v1 ) ライセンス: Link先を確認	Chenqing Hua, Bozitao Zhong, Sitao Luan, Liang Hong, Guy Wolf, Doina Precup, Shuangjia Zheng,	(参考訳) 酵素は、その特異的な触媒反応によって、生命のあらゆる面において必要であり、多様な生物学的プロセスと適応を可能にしている。酵素機能の予測は、生物学的経路を理解し、薬物開発を誘導し、生産物を生産し、進化研究を促進するために不可欠である。そこで本研究では,酵素の触媒的反応に基づくアノテート手法を提案する。この方法は、特定の反応に関する詳細な洞察を与え、新しく発見された反応に適応し、タンパク質ファミリーや専門家由来の反応クラスによる伝統的な分類から分岐する。私たちは、酵素反応データセットの分析に機械学習アルゴリズムを使用し、酵素の機能に関するより洗練されたビューを提供します。評価では,2024年1月8日までにSwissProtデータベースとRheaデータベースから得られた,これまでで最大の酵素反応データセットを活用している。本研究は,酵素反応予測を検索問題として捉え,酵素の触媒活性を比例してランク付けすることを目的とする。我々のモデルでは、新規反応のためのタンパク質をリクルートし、新規タンパク質の反応を予測することで、酵素の発見と機能アノテーションを促進することができる。 Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptations. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# 基礎的大言語モデルを用いた多モード電子マイクログラフ表現学習のための階層型ネットワーク融合 Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models ( http://arxiv.org/abs/2408.13661v1 ) ライセンス: Link先を確認	Sakhinana Sagar Srinivas, Geethan Sannidhi, Venkataramana Runkana,	(参考訳) 電子マイクログラフによる材料評価は、半導体や量子材料といった分野において重要な課題である。マイクログラフの複雑な階層構造は、しばしば伝統的な分類法に挑戦する。本研究では,電子マイクログラフ解析のための革新的なバックボーンアーキテクチャを提案する。マイクログラフをパッチシーケンスにトークン化し,さらに視覚グラフとして表現することで,マイクログラフのマルチモーダル表現を作成する。 HNF(Hierarchical Network Fusion)は,マルチモーダル表現間の情報交換と,異なるパッチ解決における知識統合を容易にする多層ネットワーク構造アーキテクチャである。さらに,大規模言語モデル(LLM)を利用して,ナノマテリアルの詳細な技術記述を補助情報として生成し,下流作業を支援する。我々は,ナノマテリアルのカテゴリを予測するために,クロスドメイン表現(画像ベースと言語情報の両方)間の知識融合のためのクロスモーダルアテンション機構を利用する。この多面的アプローチは、ナノマテリアル識別のためのより包括的で正確なマイクログラフの表現と分類を約束する。我々のフレームワークは従来の手法よりも優れており、分散シフトによる課題を克服し、高スループットのスクリーニングを容易にする。 Characterizing materials with electron micrographs is a crucial task in fields such as semiconductors and quantum materials. The complex hierarchical structure of micrographs often poses challenges for traditional classification methods. In this study, we propose an innovative backbone architecture for analyzing electron micrographs. We create multi-modal representations of the micrographs by tokenizing them into patch sequences and, additionally, representing them as vision graphs, commonly referred to as patch attributed graphs. We introduce the Hierarchical Network Fusion (HNF), a multi-layered network structure architecture that facilitates information exchange between the multi-modal representations and knowledge integration across different patch resolutions. Furthermore, we leverage large language models (LLMs) to generate detailed technical descriptions of nanomaterials as auxiliary information to assist in the downstream task. We utilize a cross-modal attention mechanism for knowledge fusion across cross-domain representations(both image-based and linguistic insights) to predict the nanomaterial category. This multi-faceted approach promises a more comprehensive and accurate representation and classification of micrographs for nanomaterial identification. Our framework outperforms traditional methods, overcoming challenges posed by distributional shifts, and facilitating high-throughput screening.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# 分数化状態におけるスピノンと磁場の関係 Relationship between spinons and magnetic fields in a fractionalized state ( http://arxiv.org/abs/2408.13665v1 ) ライセンス: Link先を確認	Yu Zhang, Hengdi Zhao, Tristan R. Cao, Rahul Nandkishore, Gang Cao,	(参考訳) 4d電子三量体格子Ba4Nb1-xRu3+xO12は、量子スピン液体(QSL)と隣接する重フェルミオン奇金属(HFSM)の両方を基盤とする普遍的な重スピンフェルミ面を特徴としており、Nb含有量に依存している。本稿では,最大14Tまでの磁場印加により,150mK以下での両相の熱容量の符号温度直線性が著しく低下し,最大5000%の急激な熱容量増加が生じるのに対し,交流磁化率と電気抵抗は,同じミリケルビン温度範囲で14Tまでの応答がほとんどないことを示す。さらに、磁場は熱伝導率を容易に抑制し、より強く4K以下の温度を最大40%まで下げる。これらの複雑な熱現象は、強力な単純化原理を示している: 磁場の応用はスピノンの反復性を著しく弱め、最終的に温度を下げて破壊し、驚くべき熱容量の上昇を特徴とする前例のない量子状態となり、ミリケルビン温度と強い磁場の最もあり得ない状況においてエントロピーが引き起こされる。可能な説明を提示し、議論する。 The 4d-electron trimer lattice Ba4Nb1-xRu3+xO12 is believed to feature a universal heavy spinon Fermi surface that underpins both a quantum spin liquid (QSL) and an adjacent heavy-fermion strange metal (HFSM), depending on Nb content; the itinerant spinons as heat carriers render the charge-insulating QSL a much better thermal conductor than the HFSM [1]. Here we report that application of a magnetic field up to 14 T surprisingly breaks the signature temperature-linearity of the heat capacity of both phases below 150 mK, inducing a rapid rise in the heat capacity by as much as 5000%, whereas the AC magnetic susceptibility and the electrical resistivity show little response up to 14 T in the same milli-Kelvin temperature range. Furthermore, the magnetic field readily suppresses the thermal conductivity, and more strongly with decreasing temperature below 4 K by up to 40%. All these complex thermal phenomena indicate a powerful simplifying principle: Application of a magnetic field adversely weakens the itineracy of spinons and eventually destroys it with decreasing temperature, leading to an unprecedented quantum state featuring the astonishing rise in the heat capacity, thus entropy in the most unlikely circumstances of milli-Kelvin temperatures and strong magnetic fields. We present and discuss possible explanations.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# データ対応ビジネスプロセスの発見とシミュレーション Discovery and Simulation of Data-Aware Business Processes ( http://arxiv.org/abs/2408.13666v1 ) ライセンス: Link先を確認	Orlenys López-Pintado, Serhii Murashko, Marlon Dumas,	(参考訳) シミュレーションは、ビジネス・プロセスの変化が量的パフォーマンスに与える影響を予測する一般的な手法である。ビジネスプロセスシミュレーション(Business Process Simulation, BPS)は、シミュレーションパラメータが豊富なプロセスモデルである。 BPSモデルの典型的なパラメータ空間に対処するため、イベントログからBPSモデルを自動的に検出するいくつかの手法が提案されている。事実上、これらのアプローチはビジネスプロセスのデータ観点を無視します。しかし、ビジネスプロセスによって操作されるデータ属性は、どのアクティビティが実行されるか、何回、いつ実行されるかを決定することが多い。本稿では,データ認識型BPSモデリング手法と,イベントログからデータ認識型BPSモデルを検出する手法を導入することで,このギャップに対処する。 BPSモデリングアプローチは、3種類のデータ属性(グローバル、ケースレベル、イベントレベル)、決定論的および確率的属性更新ルールとデータ認識分岐条件をサポートする。実験により,提案手法は,各データ属性とその関連更新ルールのタイプを正確に検出し,得られたBPSモデルが,データ非認識のBPSモデルに対してプロセス実行制御フローをより密に再現することを示す。 Simulation is a common approach to predict the effect of business process changes on quantitative performance. The starting point of Business Process Simulation (BPS) is a process model enriched with simulation parameters. To cope with the typically large parameter spaces of BPS models, several methods have been proposed to automatically discover BPS models from event logs. Virtually all these approaches neglect the data perspective of business processes. Yet, the data attributes manipulated by a business process often determine which activities are performed, how many times, and when. This paper addresses this gap by introducing a data-aware BPS modeling approach and a method to discover data-aware BPS models from event logs. The BPS modeling approach supports three types of data attributes (global, case-level, and event-level) as well as deterministic and stochastic attribute update rules and data-aware branching conditions. An empirical evaluation shows that the proposed method accurately discovers the type of each data attribute and its associated update rules, and that the resulting BPS models more closely replicate the process execution control flow relative to data-unaware BPS models.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# Outlier Detection Bias Busted: データ中心因子によるアルゴリズムバイアスのソース理解 Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors ( http://arxiv.org/abs/2408.13667v1 ) ライセンス: Link先を確認	Xueying Ding, Rui Xi, Leman Akoglu,	(参考訳) MLの驚くべき成功は、現実の環境でデプロイされる現代的なメソッドの公平性に対する関心を高めている。しかし、フェアネスの研究は、主に教師付きMLに焦点を当てているが、金融、セキュリティなどにおける多くの応用を含む教師なしの外部検出(OD)はほとんど注目されていない。いくつかの研究は、公平性向上型ODアルゴリズムを提案したが、基礎となる駆動機構や不公平性の原因を知らないままである。教師付きML文献の中にも、不公平さがアルゴリズム的バイアス(すなわち設計選択)のみに起因するのか、あるいはトレーニングされたデータに符号化されたバイアスから生じるのかという議論がある。このギャップを埋めるために、この研究は、データ中心の異なる要因の下で検出モデルの監査を行うことにより、ODの不公平な原因について光を当てることを目的としている。サンプルサイズの違い、低表現性、特徴測定ノイズ、グループメンバーシップの難易度など、様々な既知のバイアスを入力データに注入することで、研究対象のODアルゴリズムは、どの種類のデータバイアスがより影響を受けやすいかが異なるが、公平な落とし穴があることがわかった。我々の研究で最も注目すべきは、ODアルゴリズムのバイアスが単なるデータバイアスの問題ではないことを示すことである。自然であれバイアスであれ、そのようなデータ特性は、特定のアルゴリズムの設計選択と相互作用するときに不公平を引き起こす可能性がある。 The astonishing successes of ML have raised growing concern for the fairness of modern methods when deployed in real world settings. However, studies on fairness have mostly focused on supervised ML, while unsupervised outlier detection (OD), with numerous applications in finance, security, etc., have attracted little attention. While a few studies proposed fairness-enhanced OD algorithms, they remain agnostic to the underlying driving mechanisms or sources of unfairness. Even within the supervised ML literature, there exists debate on whether unfairness stems solely from algorithmic biases (i.e. design choices) or from the biases encoded in the data on which they are trained. To close this gap, this work aims to shed light on the possible sources of unfairness in OD by auditing detection models under different data-centric factors. By injecting various known biases into the input data -- as pertain to sample size disparity, under-representation, feature measurement noise, and group membership obfuscation -- we find that the OD algorithms under the study all exhibit fairness pitfalls, although differing in which types of data bias they are more susceptible to. Most notable of our study is to demonstrate that OD algorithm bias is not merely a data bias problem. A key realization is that the data properties that emerge from bias injection could as well be organic -- as pertain to natural group differences w.r.t. sparsity, base rate, variance, and multi-modality. Either natural or biased, such data properties can give rise to unfairness as they interact with certain algorithmic design choices.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# GenCA: 現実的で生産可能なコーデックアバターのためのテキスト条件生成モデル GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars ( http://arxiv.org/abs/2408.13674v1 ) ライセンス: Link先を確認	Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson, Hongsheng Li, Sofien Bouaziz,	(参考訳) フォトリアリスティックでコントロール可能な3Dアバターは、バーチャルリアリティー(VR/MR)、テレプレゼンス、ゲーム、映画制作など、様々な用途に欠かせない。アバター作成の伝統的な方法は、しばしば各アバターのスキャンと再構成に時間を要するため、スケーラビリティが制限される。さらに、これらの手法は、新しいアイデンティティをサンプリングしたり、既存のものを修正したりするための柔軟性を提供していません。一方、データから強力な事前学習を行うことで、生成モデルは従来の再構築手法に代わる有望な代替手段を提供し、データキャプチャと処理の両方の時間制約を緩和する。さらに、生成手法は、編集やスタイリゼーションなど、再構築以上のダウンストリームアプリケーションを可能にする。それでも、生成的な3Dアバターの研究はまだ初期段階であり、現在の手法では、静止アバターの作成、フォトリアリズムの欠如、顔の細部が不完全なこと、乾燥性に限界がある。そこで本研究では, 髪, 眼, 口内などの細部を網羅し, 強力な非パラメトリック潜伏表現空間を駆動できる, 多様なアイデンティティを持つ写真リアリスティック顔アバターを生成可能なテキスト条件生成モデルを提案する。具体的には、遅延拡散モデルの生成および編集機能と、アバター表現駆動のための強力な先行モデルを統合する。我々のモデルは高忠実度アバターを生成・制御できる。また、アバター編集や単発アバター再構成など、下流アプリケーションの可能性も強調する。 Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identities or modify existing ones. On the other hand, by learning a strong prior from data, generative models provide a promising alternative to traditional reconstruction methods, easing the time constraints for both data capture and processing. Additionally, generative methods enable downstream applications beyond reconstruction, such as editing and stylization. Nonetheless, the research on generative 3D avatars is still in its infancy, and therefore current methods still have limitations such as creating static avatars, lacking photo-realism, having incomplete facial details, or having limited drivability. To address this, we propose a text-conditioned generative model that can generate photo-realistic facial avatars of diverse identities, with more complete details like hair, eyes and mouth interior, and which can be driven through a powerful non-parametric latent expression space. Specifically, we integrate the generative and editing capabilities of latent diffusion models with a strong prior model for avatar expression driving. Our model can generate and control high-fidelity avatars, even those out-of-distribution. We also highlight its potential for downstream applications, including avatar editing and single-shot avatar reconstruction.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# SSL音声モデルにおけるマンダリンと英語の層構造解析 A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models ( http://arxiv.org/abs/2408.13678v1 ) ライセンス: Link先を確認	Antón de la Fuente, Dan Jurafsky,	(参考訳) 本研究は, マンダリン語彙音, 英語語彙ストレス, 英語句のアクセントなど, 自己指導型音声モデルが, 上層カテゴリーをどう表現するかを問うものである。一連の探索タスクを通じて、英語とマンダリン12層モノリンガルモデルの層間比較を行う。私たちの発見は 1) 英語とmandarin wav2vec 2.0モデルは,ネットワークの中央3分の1で最強となる抽象上層圏の文脈表現を学習する。 2) モデルは訓練データの言語に存在する特徴を表現するのに優れており, この違いは局所的な音響表現ではなく, 変圧器ブロックの豊富なコンテキストによって引き起こされる。 3) 微調整wav2vec 2.0は, トーンやストレスといった語彙的に対照的な特徴を主とする事前訓練モデルと比較して, 後層の性能を向上する。 4) HuBERT と WavLM はwav2vec 2.0 と同様の表現を学習し、主に後層の性能が異なる。以上の結果から,モデルが超越表現をどのように表現するかの理解を深め,これらの表現の言語特異性と文脈的性質に対する新たな洞察を提供する。 This study asks how self-supervised speech models represent suprasegmental categories like Mandarin lexical tone, English lexical stress, and English phrasal accents. Through a series of probing tasks, we make layer-wise comparisons of English and Mandarin 12 layer monolingual models. Our findings suggest that 1) English and Mandarin wav2vec 2.0 models learn contextual representations of abstract suprasegmental categories which are strongest in the middle third of the network. 2) Models are better at representing features that exist in the language of their training data, and this difference is driven by enriched context in transformer blocks, not local acoustic representation. 3) Fine-tuned wav2vec 2.0 improves performance in later layers compared to pre-trained models mainly for lexically contrastive features like tone and stress, 4) HuBERT and WavLM learn similar representations to wav2vec 2.0, differing mainly in later layer performance. Our results extend previous understanding of how models represent suprasegmentals and offer new insights into the language-specificity and contextual nature of these representations.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# Segment Any Mesh: 2Dから3Dへのリフティングによるゼロショットメッシュ部分セグメンテーション Segment Any Mesh: Zero-shot Mesh Part Segmentation via Lifting Segment Anything 2 to 3D ( http://arxiv.org/abs/2408.13679v1 ) ライセンス: Link先を確認	George Tang, William Zhao, Logan Ford, David Benhaim, Paul Zhang,	(参考訳) 形状解析に基づく,学習に基づく,現在のゼロショットアプローチの限界を克服する,メッシュ部分セグメンテーションの新しいゼロショット手法であるSegment Any Mesh(SAMesh)を提案する。 SAMeshはマルチモーダルレンダリングと2D-to-3Dリフトという2つのフェーズで動作する。第1フェーズでは、メッシュのマルチビューレンダリングをSegment Anything 2(SAM2)を介して個別に処理し、2Dマスクを生成する。これらのマスクは、マルチビューレンダリング全体で同じメッシュ部分を参照するマスクを関連付けることでメッシュ部分セグメンテーションに持ち上げられる。 SAM2を正規分布と形状のスカラーのマルチモーダルな特徴レンダリングに適用すると、メッシュの非テクスチャレンダリングのみを使用するよりも、より良い結果が得られることが判明した。 SAM2上にメソッドを構築することで、2Dセグメンテーションに対する将来の改善をシームレスに継承する。提案手法を,頑健でよく評価された形状解析手法である形状寸法関数(ShapeDiam)と比較し,本手法が性能に匹敵するか否かを示す。現在のベンチマークではオブジェクトの多様性が制限されているため、生成されたメッシュのデータセットをキュレートしてリリースし、それを人間の評価を通じてShapeDiamに対する一般化の改善を実証するために使用します。コードとデータセットはhttps://github.com/gtangg12/sameshで公開しています。 We propose Segment Any Mesh (SAMesh), a novel zero-shot method for mesh part segmentation that overcomes the limitations of shape analysis-based, learning-based, and current zero-shot approaches. SAMesh operates in two phases: multimodal rendering and 2D-to-3D lifting. In the first phase, multiview renders of the mesh are individually processed through Segment Anything 2 (SAM2) to generate 2D masks. These masks are then lifted into a mesh part segmentation by associating masks that refer to the same mesh part across the multiview renders. We find that applying SAM2 to multimodal feature renders of normals and shape diameter scalars achieves better results than using only untextured renders of meshes. By building our method on top of SAM2, we seamlessly inherit any future improvements made to 2D segmentation. We compare our method with a robust, well-evaluated shape analysis method, Shape Diameter Function (ShapeDiam), and show our method is comparable to or exceeds its performance. Since current benchmarks contain limited object diversity, we also curate and release a dataset of generated meshes and use it to demonstrate our method's improved generalization over ShapeDiam via human evaluation. We release the code and dataset at https://github.com/gtangg12/samesh	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# フェデレート学習における等価クライアント選択のための部分モジュラ最大化手法 Submodular Maximization Approaches for Equitable Client Selection in Federated Learning ( http://arxiv.org/abs/2408.13683v1 ) ライセンス: Link先を確認	Andrés Catalino Castillo Jiménez, Ege C. Kaya, Lintao Ye, Abolfazl Hashemi,	(参考訳) 従来のフェデレートラーニングフレームワークでは、トレーニングのためのクライアント選択は、通常、イテレーション毎にクライアントのサブセットをランダムにサンプリングする。しかし、このランダムな選択は、しばしばクライアント間で異なるパフォーマンスをもたらし、公正性、特に医療や金融の機械学習タスクなど、公平な結果が不可欠であるアプリケーションにおいて、関心を喚起する。この格差は通常、パフォーマンス中心のクライアントサンプリング技術の出現によってより顕著になる。本稿では,ランダムクライアント選択の限界に対処するために,SUBTRUNCとUNIONFLという2つの新しい手法を提案する。どちらのアプローチも、よりバランスの取れたモデルを達成するために、部分モジュラ函数の最大化を利用する。施設の位置問題を修正することにより、ランダムな選択に伴う公平さの懸念を軽減することを目指している。 SUBTRUNCは、クライアント損失情報を利用してソリューションを多様化し、UNIONFLは、最終モデルのより公平なパフォーマンスを保証するために、過去のクライアント選択データに依存する。さらに、これらのアルゴリズムは、合理的な仮定の下で収束に関する堅牢な理論的保証を伴っている。これらの手法の有効性は、不均一なシナリオにわたる広範囲な評価を通じて実証され、クライアントの異性度測定値によって測定された公正性の顕著な改善が示された。 In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks. This disparity typically becomes more pronounced with the advent of performance-centric client sampling techniques. This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection. Both approaches utilize submodular function maximization to achieve more balanced models. By modifying the facility location problem, they aim to mitigate the fairness concerns associated with random selection. SUBTRUNC leverages client loss information to diversify solutions, while UNIONFL relies on historical client selection data to ensure a more equitable performance of the final model. Moreover, these algorithms are accompanied by robust theoretical guarantees regarding convergence under reasonable assumptions. The efficacy of these methods is demonstrated through extensive evaluations across heterogeneous scenarios, revealing significant improvements in fairness as measured by a client dissimilarity metric.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# パーソナライズされた学習モデルを用いた代替学習介入の評価 Evaluating Alternative Training Interventions Using Personalized Computational Models of Learning ( http://arxiv.org/abs/2408.13684v1 ) ライセンス: Link先を確認	Christopher James MacLellan, Kimberly Stowers, Lisa Brady,	(参考訳) 最高の学習成果を生み出すための異なるトレーニング介入を評価することは、インストラクショナルデザイナが直面する主な課題の1つである。通常、これらの設計者はそれぞれの介入を評価するためにA/B実験を使用するが、そのような研究を行うには費用と時間を要する。この問題に対処するために、我々は、分数チューター内の代替的介入を慎重に推論する上で、学習の計算モデルがどのようにデザイナを支援するかを検討する。モデルを自動的に特定の個人に調整するアプローチを提案し、パーソナライズされたモデルが、一般的なモデルよりも生徒の行動をより良く予測することを示す。次に,分数チュータの異なるバージョンにおいて,2人の生徒(ハイ・ロー・ハイ・ロー・パフォーマンス)のパフォーマンスと学習の反実的予測を生成するシミュレーションを行う。我々のアプローチでは、過去の人間の発見と一致した予測と、将来の人間の実験で評価される可能性のある検証可能な予測を行う。 Evaluating different training interventions to determine which produce the best learning outcomes is one of the main challenges faced by instructional designers. Typically, these designers use A/B experiments to evaluate each intervention; however, it is costly and time consuming to run such studies. To address this issue, we explore how computational models of learning might support designers in reasoning causally about alternative interventions within a fractions tutor. We present an approach for automatically tuning models to specific individuals and show that personalized models make better predictions of students' behavior than generic ones. Next, we conduct simulations to generate counterfactual predictions of performance and learning for two students (high and low performing) in different versions of the fractions tutor. Our approach makes predictions that align with previous human findings, as well as testable predictions that might be evaluated with future human experiments.	翻訳日:2024-08-27 18:29:37 公開日:2024-08-24
# シナリオに基づく自律走行システムの模擬試験のための知覚誘導ファズリング Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems ( http://arxiv.org/abs/2408.13686v1 ) ライセンス: Link先を確認	Tri Minh Triet Pham, Bo Yang, Jinqiu Yang,	(参考訳) 自律運転システム(ADS)は大きな進歩を遂げ、道路上でのテストや商業化試験を開始している。複数のセンサーから入力データを受信し、複数のディープニューラルネットワークモデルとコードロジックを組み合わせて決定する。 ADSの安全性は、その不行が人命の喪失を含むコストのかかる災害を引き起こす可能性があるため、最も重要である。本研究では,マルチモジュールADS上でシステムレベルのテストを行うSimsVを提案する。 SimsVは、ADSの知覚障害をターゲットとし、システム全体に対する知覚障害の影響をさらに評価する。 SimsVは、事前定義された突然変異演算子を継続的に適用することにより、テスト入力とオラクル生成のための高忠実度シミュレータを利用する。さらに、SimsVはさまざまなメトリクスを活用してテストプロセスをガイドする。我々は,オープンソースの運転プラットフォームシミュレータを用いて,商用グレードのレベル4ADS(Apollo)をテストするプロトタイプSimsVを実装した。評価の結果,SimsVはアポロの知覚の弱点を見つけることができることがわかった。さらに,このような弱点を活用することで,シムズVはアポロ計画の衝突を含む深刻な問題に遭遇することを示した。 Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, including the loss of human life. In this work, we propose SimsV, which performs system-level testing on multi-module ADS. SimsV targets perception failures of ADS and further assesses the impact of perception failure on the system as a whole. SimsV leverages a high-fidelity simulator for test input and oracle generation by continuously applying predefined mutation operators. In addition, SimsV leverages various metrics to guide the testing process. We implemented a prototype SimsV for testing a commercial-grade Level 4 ADS (i.e., Apollo) using a popular open-source driving platform simulator. Our evaluation shows that SimsV is capable of finding weaknesses in the perception of Apollo. Furthermore, we show that by exploiting such weakness, SimsV finds severe problems in Apollo, including collisions.	翻訳日:2024-08-27 18:19:53 公開日:2024-08-24
# 表面符号閾値以下での量子誤差補正 Quantum error correction below the surface code threshold ( http://arxiv.org/abs/2408.13687v1 ) ライセンス: Link先を確認	Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, David A. Browne, Brett Buchea, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett, Nicholas Bushnell, Anthony Cabrera, Juan Campero, Hung-Shen Chang, Yu Chen, Zijun Chen, Ben Chiaro, Desmond Chik, Charina Chou, Jahan Claes, Agnetta Y. Cleland, Josh Cogan, Roberto Collins, Paul Conner, William Courtney, Alexander L. Crook, Ben Curtin, Sayan Das, Alex Davies, Laura De Lorenzo, Dripto M. Debroy, Sean Demura, Michel Devoret, Agustin Di Paolo, Paul Donohoe, Ilya Drozdov, Andrew Dunsworth, Clint Earle, Thomas Edlich, Alec Eickbusch, Aviv Moshe Elbag, Mahmoud Elzouka, Catherine Erickson, Lara Faoro, Edward Farhi, Vinicius S. Ferreira, Leslie Flores Burgos, Ebrahim Forati, Austin G. Fowler, Brooks Foxen, Suhas Ganjam, Gonzalo Garcia, Robert Gasca, Élie Genois, William Giang, Craig Gidney, Dar Gilboa, Raja Gosula, Alejandro Grajales Dau, Dietrich Graumann, Alex Greene, Jonathan A. Gross, Steve Habegger, John Hall, Michael C. Hamilton, Monica Hansen, Matthew P. Harrigan, Sean D. Harrington, Francisco J. H. Heras, Stephen Heslin, Paula Heu, Oscar Higgott, Gordon Hill, Jeremy Hilton, George Holland, Sabrina Hong, Hsin-Yuan Huang, Ashley Huff, William J. Huggins, Lev B. Ioffe, Sergei V. Isakov, Justin Iveland, Evan Jeffrey, Zhang Jiang, Cody Jones, Stephen Jordan, Chaitali Joshi, Pavol Juhas, Dvir Kafri, Hui Kang, Amir H. Karamlou, Kostyantyn Kechedzhi, Julian Kelly, Trupti Khaire, Tanuj Khattar, Mostafa Khezri, Seon Kim, Paul V. Klimov, Andrey R. Klots, Bryce Kobrin, Pushmeet Kohli, Alexander N. Korotkov, Fedor Kostritsa, Robin Kothari, Borislav Kozlovskii, John Mark Kreikebaum, Vladislav D. Kurilovich, Nathan Lacroix, David Landhuis, Tiano Lange-Dei, Brandon W. Langley, Pavel Laptev, Kim-Ming Lau, Loïck Le Guevel, Justin Ledford, Kenny Lee, Yuri D. Lensky, Shannon Leon, Brian J. Lester, Wing Yan Li, Yin Li, Alexander T. Lill, Wayne Liu, William P. Livingston, Aditya Locharla, Erik Lucero, Daniel Lundahl, Aaron Lunt, Sid Madhuk, Fionn D. Malone, Ashley Maloney, Salvatore Mandrá, Leigh S. Martin, Steven Martin, Orion Martin, Cameron Maxfield, Jarrod R. McClean, Matt McEwen, Seneca Meeks, Anthony Megrant, Xiao Mi, Kevin C. Miao, Amanda Mieszala, Reza Molavi, Sebastian Molina, Shirin Montazeri, Alexis Morvan, Ramis Movassagh, Wojciech Mruczkiewicz, Ofer Naaman, Matthew Neeley, Charles Neill, Ani Nersisyan, Hartmut Neven, Michael Newman, Jiun How Ng, Anthony Nguyen, Murray Nguyen, Chia-Hung Ni, Thomas E. O'Brien, William D. Oliver, Alex Opremcak, Kristoffer Ottosson, Andre Petukhov, Alex Pizzuto, John Platt, Rebecca Potter, Orion Pritchard, Leonid P. Pryadko, Chris Quintana, Ganesh Ramachandran, Matthew J. Reagor, David M. Rhodes, Gabrielle Roberts, Eliott Rosenberg, Emma Rosenfeld, Pedram Roushan, Nicholas C. Rubin, Negar Saei, Daniel Sank, Kannan Sankaragomathi, Kevin J. Satzinger, Henry F. Schurkus, Christopher Schuster, Andrew W. Senior, Michael J. Shearn, Aaron Shorter, Noah Shutty, Vladimir Shvarts, Shraddha Singh, Volodymyr Sivak, Jindra Skruzny, Spencer Small, Vadim Smelyanskiy, W. Clarke Smith, Rolando D. Somma, Sofia Springer, George Sterling, Doug Strain, Jordan Suchard, Aaron Szasz, Alex Sztein, Douglas Thor, Alfredo Torres, M. Mert Torunbalci, Abeer Vaishnav, Justin Vargas, Sergey Vdovichev, Guifre Vidal, Benjamin Villalonga, Catherine Vollgraff Heidweiller, Steven Waltman, Shannon X. Wang, Brayden Ware, Kate Weber, Theodore White, Kristi Wong, Bryan W. K. Woo, Cheng Xing, Z. Jamie Yao, Ping Yeh, Bicheng Ying, Juhwan Yoo, Noureldin Yosri, Grayson Young, Adam Zalcman, Yaxing Zhang, Ningfeng Zhu, Nicholas Zobrist,	(参考訳) 量子誤り訂正は、複数の物理量子ビットを論理量子ビットに組み合わせ、論理的誤り率を指数関数的に抑制し、より多くの量子ビットを追加することで、実用的な量子コンピューティングに到達する道を提供する。しかし、この指数的な抑制は、物理誤差率が臨界しきい値以下である場合にのみ発生する。本研究では,このしきい値以下で動作する2つのサーフェスコードメモリ,すなわち,リアルタイムデコーダと統合された距離7符号と距離5符号を示す。我々のより大きな量子メモリの論理誤差率は、符号距離を2倍に増やすとき、$\Lambda$ = 2.14$\pm$0.02の係数で抑制され、エラー修正のサイクルあたり0.143%$\pm$ 0.003%の101キュービット距離7符号で終わる。この論理記憶はブレークエクイティを超えていて、最高の物理量子ビットの寿命を2.4$\pm$ 0.3に超えている。我々は,デコーダの平均遅延が63ドル/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m /m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m/m /m/m/m 誤り訂正性能の限界を探索するために、我々は距離29までの繰り返しコードを実行し、論理的性能は、約1時間に1回、または3$\times$10$^9$サイクルで発生する稀な相関エラーイベントによって制限されていることを発見した。以上の結果から,大規模なフォールトトレラント量子アルゴリズムの動作要件を実現する装置の性能が示唆された。 Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $\Lambda$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $\mu$s at distance-5 up to a million cycles, with a cycle time of 1.1 $\mu$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms.	翻訳日:2024-08-27 18:19:53 公開日:2024-08-24
# クラッタにおけるマルチセンサフュージョンと追跡のための分散勾配に基づく変分推論 Decentralised Gradient-based Variational Inference for Multi-sensor Fusion and Tracking in Clutter ( http://arxiv.org/abs/2408.13689v1 ) ライセンス: Link先を確認	Qing Li, Runze Gan, Simon Godsill,	(参考訳) 本稿では,時間変化のある分散マルチセンサネットワーク下でのクラッタ内の複数物体の追跡作業について検討する。本手法は, 局所処理と近接センサのみとの通信において, 最適な分散化融合を実現する。鍵となる革新は、局所的に最大化された証拠の低い境界を分散的に構築することであり、通信に必要な情報を大幅に削減する。従来の勾配の方向を最も急勾配に調整する勾配追従戦略と自然勾配で強化した分散型自然勾配降下変動型多対象トラッカーは,急速な収束を示す。提案手法は, 追従精度の集中核融合と実証的に等価であり, 比較コストで準最適核融合技術を超え, コンセンサスに基づく多対象トラッカーよりもはるかに低い通信オーバヘッドを実現する。 This paper investigates the task of tracking multiple objects in clutter under a distributed multi-sensor network with time-varying connectivity. Designed with the same objective as the centralised variational multi-object tracker, the proposed method achieves optimal decentralised fusion in performance with local processing and communication with only neighboring sensors. A key innovation is the decentralised construction of a locally maximised evidence lower bound, which greatly reduces the information required for communication. Our decentralised natural gradient descent variational multi-object tracker, enhanced with the gradient tracking strategy and natural gradients that adjusts the direction of traditional gradients to the steepest, shows rapid convergence. Our results verify that the proposed method is empirically equivalent to the centralised fusion in tracking accuracy, surpasses suboptimal fusion techniques with comparable costs, and achieves much lower communication overhead than the consensus-based variational multi-object tracker.	翻訳日:2024-08-27 18:19:53 公開日:2024-08-24
# モデルミスマッチによる不確実性に基づくアクティブラーニングの理解 Understanding Uncertainty-based Active Learning Under Model Mismatch ( http://arxiv.org/abs/2408.13690v1 ) ライセンス: Link先を確認	Amir Hossein Rahmati, Mingzhou Fan, Ruida Zhou, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian,	(参考訳) 不確実性に基づくアクティブラーニング(UAL)は、トレーニングデータポイントをランダムに取得する代わりに、予測の不確実性に基づいて選択されたラベル付きプールから重要サンプルのラベル(s)をクエリすることで、モデルトレーニングのラベル付けコストを最小化する。 UALの有効性は、モデル容量だけでなく、採用されている不確実性ベースの獲得機能にも大きく依存する。本研究は,機械学習モデルの能力がUALの有効性にどのように影響するかを理解することを目的としている。理論的解析,総合シミュレーション,実証研究を通じて,機械学習モデルクラスが低容量で基礎となる真実をカバーできない場合に,UALがランダムサンプリングと比較して性能が悪くなることを示した。このような状況下では,予測性能を直接的に推定する獲得関数の採用は,UALの性能向上に有効である。 Instead of randomly acquiring training data points, Uncertainty-based Active Learning (UAL) operates by querying the label(s) of pivotal samples from an unlabeled pool selected based on the prediction uncertainty, thereby aiming at minimizing the labeling cost for model training. The efficacy of UAL critically depends on the model capacity as well as the adopted uncertainty-based acquisition function. Within the context of this study, our analytical focus is directed toward comprehending how the capacity of the machine learning model may affect UAL efficacy. Through theoretical analysis, comprehensive simulations, and empirical studies, we conclusively demonstrate that UAL can lead to worse performance in comparison with random sampling when the machine learning model class has low capacity and is unable to cover the underlying ground truth. In such situations, adopting acquisition functions that directly target estimating the prediction performance may be beneficial for improving the performance of UAL.	翻訳日:2024-08-27 18:19:53 公開日:2024-08-24
# 遅延微分方程式におけるフォワードと逆問題の解法のためのディープニューラルネットワークフレームワーク A Deep Neural Network Framework for Solving Forward and Inverse Problems in Delay Differential Equations ( http://arxiv.org/abs/2408.09202v2 ) ライセンス: Link先を確認	Housen Wang, Yuxing Chen, Sirong Cao, Xiaoli Wang, Qiang Liu,	(参考訳) 本稿では,ディープニューラルネットワーク(DNN)に基づく遅延微分方程式(DDE)の統合フレームワークを提案する。このフレームワークは、遅延微分方程式をニューラルネットワークに埋め込んで、初期条件、制御方程式、既知のデータの観点からDDEの多様な要件を満たすことができる。 NDDEは、損失関数を最小化する自動微分および最適化アルゴリズムによりネットワークパラメータを調整し、従来の数値法に典型的な格子依存や多項式補間を伴わない遅延微分方程式の数値解を得る。逆問題に対処する際、NDDEフレームワークは観測データを利用して単一の遅延パラメータや複数の遅延パラメータを正確に推定することができる。複数の数値実験の結果、NDDEは前方および逆問題の両方において高い精度を示し、その有効性と、遅れた微分方程式問題に対処する有望な可能性を証明している。 We propose a unified framework for delay differential equations (DDEs) based on deep neural networks (DNNs) - the neural delay differential equations (NDDEs), aimed at solving the forward and inverse problems of delay differential equations. This framework could embed delay differential equations into neural networks to accommodate the diverse requirements of DDEs in terms of initial conditions, control equations, and known data. NDDEs adjust the network parameters through automatic differentiation and optimization algorithms to minimize the loss function, thereby obtaining numerical solutions to the delay differential equations without the grid dependence and polynomial interpolation typical of traditional numerical methods. In addressing inverse problems, the NDDE framework can utilize observational data to perform precise estimation of single or multiple delay parameters, which is very important in practical mathematical modeling. The results of multiple numerical experiments have shown that NDDEs demonstrate high precision in both forward and inverse problems, proving their effectiveness and promising potential in dealing with delayed differential equation issues.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# 軍事用ニューロシンボリックAI Neuro-Symbolic AI for Military Applications ( http://arxiv.org/abs/2408.09224v2 ) ライセンス: Link先を確認	Desta Haileselassie Hagos, Danda B. Rawat,	(参考訳) 人工知能(AI)は防衛システムの能力向上、戦略的意思決定の革新、将来の軍事作戦の展望形成に重要な役割を果たしている。 Neuro-Symbolic AIは、ニューラルネットワークとシンボリック推論の強みを活用して強化する、新たなアプローチである。これらのシステムは、従来のAIシステムよりも影響があり、柔軟である可能性があり、軍事用途に適している。本稿では、軍事的文脈におけるその潜在的応用に光を当てることを目的として、ニューロ・シンボリックAIの多様な次元と能力について包括的に検討する。意思決定の改善、複雑なインテリジェンス分析の自動化、自律システム強化の能力について検討する。さらに、軍事的文脈での応用に加えて、様々な領域における複雑なタスクを解く可能性についても検討する。この調査を通じて、軍事および民間の応用において、ニューロ・シンボリックAIの開発と展開に不可欠な倫理的、戦略的、技術的考慮事項に対処する。この研究は、ニューロ・シンボリックAIがもたらす幅広い可能性の包括的調査である。 Artificial Intelligence (AI) plays a significant role in enhancing the capabilities of defense systems, revolutionizing strategic decision-making, and shaping the future landscape of military operations. Neuro-Symbolic AI is an emerging approach that leverages and augments the strengths of neural networks and symbolic reasoning. These systems have the potential to be more impactful and flexible than traditional AI systems, making them well-suited for military applications. This paper comprehensively explores the diverse dimensions and capabilities of Neuro-Symbolic AI, aiming to shed light on its potential applications in military contexts. We investigate its capacity to improve decision-making, automate complex intelligence analysis, and strengthen autonomous systems. We further explore its potential to solve complex tasks in various domains, in addition to its applications in military contexts. Through this exploration, we address ethical, strategic, and technical considerations crucial to the development and deployment of Neuro-Symbolic AI in military and civilian applications. Contributing to the growing body of research, this study represents a comprehensive exploration of the extensive possibilities offered by Neuro-Symbolic AI.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# サブサンプリング機構におけるグループプライバシの校正ノイズ Calibrating Noise for Group Privacy in Subsampled Mechanisms ( http://arxiv.org/abs/2408.09943v2 ) ライセンス: Link先を確認	Yangfan Jiang, Xinjian Luo, Yin Yang, Xiaokui Xiao,	(参考訳) グループサイズmと機密データセットDとが与えられた場合、グループプライバシ(GP)は、基本データがDであるか、またはmレコードによってDと異なる隣接データセットD'であるかを高い信頼で推測できないことを保証して、Dに関する情報を公開する。 GPは個人のプライバシーを保護するために確立された差分プライバシー(DP)の概念を一般化する。 DPと比較して、GPは最大m人までの集団のセンシティブアグリゲーション情報(例えばヨットクラブの会員の平均年収)を保護することができる。研究論文や将来性のある応用において長年存在するにもかかわらず、GPは後から考えるものとして扱われることが多く、ほとんどのアプローチはまずDP機構を開発し、次に汎用変換を用いてGPに適応し、DP溶液をブラックボックスとして扱う。本稿で指摘されているように、この手法は、ディープラーニングモデルの訓練のための古典的なDP-SGD法において、基礎となるDPソリューションがサブサンプリングを含む場合、準最適である。この場合、DP-to-GP変換はその解析において過度に悲観的であり、GPの下での公開結果の実用性は低くなる。そこで本研究では,サブサンプリングGP機構に対する厳密なプライバシ会計を提供する新しい分析フレームワークを提案する。提案手法は,ブラックボックスDP機構をGPに変換する代わりに,サブサンプリング機構の固有ランダム性を慎重に分析し,利用することにより,GPに対するプライバシ損失を大幅に改善する。提案手法は, サブサンプリングを用いた多種多様な基礎機構に適用できる。実データを用いた大規模な実験により,ニューラルネットワークの深層学習を含むいくつかの実践的な設定において,ベースライン変換-ブラックボックス-DPアプローチと比較して,GP機構が1桁以上のノイズ低減を実現することが示された。 Given a group size m and a sensitive dataset D, group privacy (GP) releases information about D with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D' that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m=1, GP reduces to DP. Compared to DP, GP is capable of protecting the sensitive aggregate information of a group of up to m individuals, e.g., the average annual income among members of a yacht club. Despite its longstanding presence in the research literature and its promising applications, GP is often treated as an afterthought, with most approaches first developing a DP mechanism and then using a generic conversion to adapt it for GP, treating the DP solution as a black box. As we point out in the paper, this methodology is suboptimal when the underlying DP solution involves subsampling, e.g., in the classic DP-SGD method for training deep learning models. In this case, the DP-to-GP conversion is overly pessimistic in its analysis, leading to low utility in the published results under GP. Motivated by this, we propose a novel analysis framework that provides tight privacy accounting for subsampled GP mechanisms. Instead of converting a black-box DP mechanism to GP, our solution carefully analyzes and utilizes the inherent randomness in subsampled mechanisms, leading to a substantially improved bound on the privacy loss with respect to GP. The proposed solution applies to a wide variety of foundational mechanisms with subsampling. Extensive experiments with real datasets demonstrate that compared to the baseline convert-from-blackbox-DP approach, our GP mechanisms achieve noise reductions of over an order of magnitude in several practical settings, including deep neural network training.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# SAM 2によるビデオオブジェクトのセグメンテーション: LSVOS Challenge VOS Trackの4番目のソリューション Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track ( http://arxiv.org/abs/2408.10125v2 ) ライセンス: Link先を確認	Feiyu Pan, Hao Fang, Runmin Cong, Wei Zhang, Xiankai Lu,	(参考訳) Video Object Segmentation (VOS)タスクは、第1フレームのオブジェクトマスクのみを与えられたビデオシーケンス全体を通して、特定のオブジェクトインスタンスをセグメンテーションすることを目的としている。近年,画像やビデオにおける迅速な視覚的セグメンテーションの解決に向けた基礎モデルとしてセグメンテーション・アロイング・モデル2(SAM2)が提案されている。 SAM 2は、ユーザインタラクションを通じてモデルとデータを改善するデータエンジンを構築し、これまでで最大のビデオセグメンテーションデータセットを収集している。 SAM 2はストリーミングメモリを備えたシンプルなトランスフォーマーアーキテクチャで、リアルタイムなビデオ処理を実現する。本研究では,より難易度の高いVOSデータセットMOSEとLVOSを用いてSAM2のゼロショット性能を評価する。訓練セットを微調整することなく、SAM 2はテストセットで75.79 J&Fを獲得し、第6回LSVOSチャレンジVOSトラックでは4位となった。 Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. SAM 2 is a simple transformer architecture with streaming memory for real-time video processing, which trained on the date provides strong performance across a wide range of tasks. In this work, we evaluate the zero-shot performance of SAM 2 on the more challenging VOS datasets MOSE and LVOS. Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# UNINEXT-Cutie: LSVOS Challenge RVOS Trackの最初のソリューション UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track ( http://arxiv.org/abs/2408.10129v2 ) ライセンス: Link先を確認	Hao Fang, Feiyu Pan, Xiankai Lu, Wei Zhang, Runmin Cong,	(参考訳) ビデオオブジェクトセグメンテーション(RVOS)の参照は、ビデオ内の対象オブジェクトをセグメントする自然言語表現に依存する。この年、LSVOS Challenge RVOS TrackはオリジナルのYouTube-RVOSベンチマークをMeViSに置き換えた。 MeViSは、静的属性の代わりに動画内のターゲットオブジェクトを参照することに重点を置いており、RVOSタスクにより大きな課題がある。この作業では、主要なRVOSとVOSモデルの強みを統合して、RVOSのためのシンプルで効果的なパイプラインを構築します。まず、最先端のRVOSモデルを微調整し、言語記述と相関するマスクシーケンスを得る。第二に、信頼性が高く高品質なキーフレームに基づいて、VOSモデルを活用し、マスク結果の品質と時間的一貫性を向上させる。最後に、半教師付き学習を用いてRVOSモデルの性能をさらに向上する。我々のソリューションは MeViS テストセットで62.57 J&F を達成し,第6回 LSVOS Challenge RVOS Track で1位となった。 Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video. In this year, LSVOS Challenge RVOS Track replaced the origin YouTube-RVOS benchmark with MeViS. MeViS focuses on referring the target object in a video through its motion descriptions instead of static attributes, posing a greater challenge to RVOS task. In this work, we integrate strengths of that leading RVOS and VOS models to build up a simple and effective pipeline for RVOS. Firstly, We finetune the state-of-the-art RVOS model to obtain mask sequences that are correlated with language descriptions. Secondly, based on a reliable and high-quality key frames, we leverage VOS model to enhance the quality and temporal consistency of the mask results. Finally, we further improve the performance of the RVOS model using semi-supervised learning. Our solution achieved 62.57 J&F on the MeViS test set and ranked 1st place for 6th LSVOS Challenge RVOS Track.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# IDEA:インダクション, 推論, アブダクションによる言語エージェントのルール学習能力の向上 IDEA:Enhancing the Rule Learning Ability of Language Agents through Induction, Deduction, and Abduction ( http://arxiv.org/abs/2408.10455v2 ) ライセンス: Link先を確認	Kaiyu He, Zhiyu Chen,	(参考訳) 大規模言語モデル (LLM) は帰納的推論や帰納的推論において徹底的に評価されているが、帰納的推論の習熟度や対話型環境における全体論的ルール学習はいまだに研究されていない。 RULEARNは、インタラクティブな設定でLLMのルール学習能力を評価するために特別に設計された新しいベンチマークである。 RULEARNでは、エージェントが環境と対話して観察やパターンの識別を行い、これらの洞察を使って問題を解決する。本ベンチマークでは, LLMエージェントの規則学習能力をさらに向上するため, 誘導, Deduction, Abductionプロセスを統合したIDEAエージェントを提案する。 IDEAエージェントは、構造的推論シーケンスを活用することでこのアプローチを洗練し、推論を通じて仮説を生成し、推論を介してそれらをテストし、誘導からのフィードバックに基づいてそれらを精製する。このシーケンスにより、エージェントは人間のような推論プロセスを模倣して規則を動的に確立し、適用することができる。 5つの代表的なLCMを評価した結果,これらのモデルが妥当な初期仮説を生成できる一方で,環境内における戦略的相互作用,効果的なフィードバックの取り込み,仮説の適応的洗練に苦慮していることが示唆された。 IDEAエージェントはRULEARNベンチマークで大幅なパフォーマンス向上を示し、現実世界のシナリオで人間のようなルール学習が可能なエージェントを開発する上で貴重な洞察を提供する。コードとデータを公開します。 While large language models (LLMs) have been thoroughly evaluated for deductive and inductive reasoning, their proficiency in abductive reasoning and holistic rule learning in interactive environments remains less explored. This work introduces RULEARN, a novel benchmark specifically designed to assess the rule-learning ability of LLMs in interactive settings. In RULEARN, agents interact with the environment to gather observations and discern patterns, using these insights to solve problems. To further enhance the rule-learning capabilities of LLM agents within this benchmark, we propose IDEA agent, which integrates Induction, Deduction, and Abduction processes. IDEA agent refines this approach by leveraging a structured reasoning sequence: generating hypotheses through abduction, testing them via deduction, and refining them based on feedback from induction. This sequence enables agents to dynamically establish and apply rules, mimicking human-like reasoning processes. Our evaluation of five representative LLMs indicates that while these models can generate plausible initial hypotheses, they often struggle with strategic interaction within the environment, effective incorporation of feedback, and adaptive refinement of their hypotheses. IDEA agent demonstrates significantly improved performance on the RULEARN benchmark, offering valuable insights for the development of agents capable of human-like rule-learning in real-world scenarios. We will release our code and data.	翻訳日:2024-08-27 12:52:18 公開日:2024-08-24
# LBC:アウトオフ変数一般化のための言語ベース分類器 LBC: Language-Based-Classifier for Out-Of-Variable Generalization ( http://arxiv.org/abs/2408.10923v3 ) ライセンス: Link先を確認	Kangjun Noh, Baekryun Seong, Hoyoon Byun, Youngjun Choi, Sungjin Song, Kyungwoo Song,	(参考訳) 大規模言語モデル(LLM)は、応答生成のような自然言語処理タスクにおいて大きな成功を収めている。しかし、XGBoostのような従来の機械学習モデル(TML)と比べてパフォーマンスが劣っているため、表形式のデータでの使用は制限されている。 LLMの事前学習された知識は、追加のトレーニングなしにテストに現れる新しい変数を解釈することを可能にする。そこで本研究では,LBC(Language-Based-Classifier)を提案する。 LBCは3つの主要な方法論戦略を採用している。 1) モデルの理解に合うようにデータを調整するためのカテゴリの変更。 2)データ表現をモデルに拡張する高度な順序と指標 3)ロジットスコアを推論中にクラスにマッピングし,モデル予測を生成する。これらの戦略は、LBCの事前訓練された知識と組み合わせて、OOVタスクを効果的に処理するモデルの能力を強調している。我々は,LBCの優位性を実証的,理論的に検証した。 LBC は OOV タスクに LLM ベースのモデルを適用する最初の研究である。ソースコードはhttps://github.com/sksmssh/LBCforOOVGenにある。 Large Language Models (LLMs) have great success in natural language processing tasks such as response generation. However, their use in tabular data has been limited due to their inferior performance compared to traditional machine learning models (TMLs) such as XGBoost. We find that the pre-trained knowledge of LLMs enables them to interpret new variables that appear in a test without additional training, a capability central to the concept of Out-of-Variable (OOV). From the findings, we propose a Language-Based-Classifier (LBC), a classifier that maximizes the benefits of LLMs to outperform TMLs on OOV tasks. LBC employs three key methodological strategies: 1) Categorical changes to adjust data to better fit the model's understanding, 2) Advanced order and indicator to enhance data representation to the model, and 3) Using verbalizer to map logit scores to classes during inference to generate model predictions. These strategies, combined with the pre-trained knowledge of LBC, emphasize the model's ability to effectively handle OOV tasks. We empirically and theoretically validate the superiority of LBC. LBC is the first study to apply an LLM-based model to OOV tasks. The source code is at https://github.com/sksmssh/LBCforOOVGen	翻訳日:2024-08-27 12:42:21 公開日:2024-08-24
# SarcasmBench: Sarcasm理解における大規模言語モデルの評価に向けて SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding ( http://arxiv.org/abs/2408.11319v2 ) ライセンス: Link先を確認	Yazhou Zhang, Chunwang Zou, Zheng Lian, Prayag Tiwari, Jing Qin,	(参考訳) 大規模言語モデル (LLMs) の時代には,「システムI''~〜〜〜高速,無意識,直感的なタスク,例えば感情分析,テキスト分類など」という課題が解決されたと主張されている。しかし、サルカズムは微妙な言語現象として、しばしば感情分析よりも高いレベルの抽象性を含む真の感情と意図を伝えるために、ハイパーボールやフィギュレーションのような修辞的な装置を用いる。 LLMの成功に関する議論が、皮肉な理解を考えると、完全には持続できないのではないかという懸念が高まっている。この問題に対処するために、我々は11のSOTA LLMと8のSOTA事前訓練言語モデル(PLM)を選択し、異なるプロンプトアプローチ、すなわちゼロショットインプット/アウトプット(IO)プロンプト、少数ショットIOプロンプト、思考連鎖(CoT)プロンプトを通じて6つの広く使用されているベンチマークデータセットに対して包括的な評価を行う。 1)現在のLSMは6つのサルカサムベンチマークにおいて、教師付きPLMに基づくサルカズム検出ベースラインを過小評価している。このことは、LLMのヒトの肉腫に対する理解を改善するために依然として重要な努力が必要であることを示唆している。 2) GPT-4 は様々なプロンプト法で他の LLM を一貫して大幅に上回り、平均 14.0\%$\uparrow$ である。クロード3とChatGPTはGPT-4に続く次の最高の性能を示した。 (3)0ショット IO と few-shot CoT の 2 つの方法より優れている。その理由は、全体論的、直感的で非合理的な認知過程であるサルカズムの検出が、段階的に論理的推論に固執しないことを主張しており、CoTは数学的推論タスクにおけるその有効性に比べて、サルカズムを理解するのに効果が低いからである。 In the era of large language models (LLMs), the task of ``System I''~-~the fast, unconscious, and intuitive tasks, e.g., sentiment analysis, text classification, etc., have been argued to be successfully solved. However, sarcasm, as a subtle linguistic phenomenon, often employs rhetorical devices like hyperbole and figuration to convey true sentiments and intentions, involving a higher level of abstraction than sentiment analysis. There is growing concern that the argument about LLMs' success may not be fully tenable when considering sarcasm understanding. To address this question, we select eleven SOTA LLMs and eight SOTA pre-trained language models (PLMs) and present comprehensive evaluations on six widely used benchmark datasets through different prompting approaches, i.e., zero-shot input/output (IO) prompting, few-shot IO prompting, chain of thought (CoT) prompting. Our results highlight three key findings: (1) current LLMs underperform supervised PLMs based sarcasm detection baselines across six sarcasm benchmarks. This suggests that significant efforts are still required to improve LLMs' understanding of human sarcasm. (2) GPT-4 consistently and significantly outperforms other LLMs across various prompting methods, with an average improvement of 14.0\%$\uparrow$. Claude 3 and ChatGPT demonstrate the next best performance after GPT-4. (3) Few-shot IO prompting method outperforms the other two methods: zero-shot IO and few-shot CoT. The reason is that sarcasm detection, being a holistic, intuitive, and non-rational cognitive process, is argued not to adhere to step-by-step logical reasoning, making CoT less effective in understanding sarcasm compared to its effectiveness in mathematical reasoning tasks.	翻訳日:2024-08-27 12:42:21 公開日:2024-08-24
# 昇降モデルによる予算制約下での費用対効果インセンティブレコメンデーションのエンド・ツー・エンド化 End-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling ( http://arxiv.org/abs/2408.11623v2 ) ライセンス: Link先を確認	Zexu Sun, Hao Yang, Dugang Liu, Yunpeng Weng, Xing Tang, Xiuqiang He,	(参考訳) 現代のオンラインプラットフォームでは、インセンティブはユーザーエンゲージメントを高め、プラットフォーム収益を増加させる重要な要素である。近年では、個々の顧客にインセンティブを割り当てる戦略的アプローチとして、アップリフトモデリングが導入されている。特に現実世界のアプリケーションでは、オンラインプラットフォームは特定の予算制約で顧客にインセンティブを与えるだけである。この問題は、マルチチョイス・クナプサック問題として再定義できる。この最適化は、投資のリターンを最大化するために、各顧客に対して最適なインセンティブを選択することを目的としている。この分野での最近の研究は、しばしば2段階のアプローチを用いて予算配分問題に取り組む。因果推論手法は,顧客の期待する応答曲線がインセンティブが増大するにつれて単調でスムーズであるべきという,オンラインマーケティングにおけるドメイン知識を無視することが多い。 2) 2段階間の最適性差は, 限られた予算制約下での昇降予測のためのインセンティブ推奨情報の喪失により, 下位最適割当性能が低下する。これらの課題に対処するため,予算制約下での新たなコスト・エフェクティブ・インセンティブ・レコメンデーション(E3IR)モデルを提案する。具体的には、アップリフト予測モジュールと微分可能なアロケーションモジュールの2つのモジュールから構成される。昇降予測モジュールでは、隣接処理とマーケティング領域の制約(モノトニックとスムーズ)との漸進的な改善を捉えるために予測ヘッドを構築する。整数線形プログラミング(ILP)をアロケーションモジュール内の微分可能な層入力として組み込む。さらに、我々は、公開および実際の製品データセットに関する広範な実験を行い、既存の2段階のアプローチと比較して、E3IRがアロケーション性能を改善することを実証した。 In modern online platforms, incentives are essential factors that enhance user engagement and increase platform revenue. Over recent years, uplift modeling has been introduced as a strategic approach to assign incentives to individual customers. Especially in many real-world applications, online platforms can only incentivize customers with specific budget constraints. This problem can be reformulated as the multi-choice knapsack problem. This optimization aims to select the optimal incentive for each customer to maximize the return on investment. Recent works in this field frequently tackle the budget allocation problem using a two-stage approach. However, this solution is confronted with the following challenges: (1) The causal inference methods often ignore the domain knowledge in online marketing, where the expected response curve of a customer should be monotonic and smooth as the incentive increases. (2) An optimality gap between the two stages results in inferior sub-optimal allocation performance due to the loss of the incentive recommendation information for the uplift prediction under the limited budget constraint. To address these challenges, we propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under budget constraints. Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module. In the uplift prediction module, we construct prediction heads to capture the incremental improvement between adjacent treatments with the marketing domain constraints (i.e., monotonic and smooth). We incorporate integer linear programming (ILP) as a differentiable layer input in the allocation module. Furthermore, we conduct extensive experiments on public and real product datasets, demonstrating that our E3IR improves allocation performance compared to existing two-stage approaches.	翻訳日:2024-08-27 12:42:21 公開日:2024-08-24
# 金眼 : ハバナ症候群の理論 Golden Eye: The Theory of Havana Syndrome ( http://arxiv.org/abs/2408.12041v2 ) ライセンス: Link先を確認	Adam Dorian Wong,	(参考訳) 2016年頃から、米国外交官は海外勤務中に異常な負傷を報告した。人体は吐き気、めまい、方向転換などの症状に悩まされた。ハバナ症候群(Havana Syndrome)は、ハバナ症候群(Havana syndrome)の略。このホワイトペーパーは、これらの症状の潜在的な起源に関して競合する仮説を分析する。ホワイトペーパーは2024年6月18日に公開された。この白書で示される見解は著者の見解であり、ダコタ州立大学、陸軍州兵、陸軍、国防省、あるいはアメリカ合衆国政府の公式方針や立場を反映していない。 Beginning around 2016, US Diplomats reported unusual injuries while serving abroad. Personnel suffered from symptoms such as nausea, vertigo, and disorientation. The collective set of ailments was subbed "Havana Syndrome". This whitepaper delves into an analysis of competing hypotheses with respect to potential origins of these symptoms. Whitepaper cleared for release on 18 JUN 2024. The views expressed by this whitepaper are those of the author and do not reflect the official policy or position of Dakota State University, the N.H. Army National Guard, the U.S. Army, the Department of Defense, or the U.S. Government.	翻訳日:2024-08-27 12:42:21 公開日:2024-08-24
# 散逸と相互作用-非エルミート皮膚効果 Dissipation and Interaction-Controlled Non-Hermitian Skin Effects ( http://arxiv.org/abs/2408.12451v2 ) ライセンス: Link先を確認	Yang Li, Zhao-Fan Cai, Tao Liu, Franco Nori,	(参考訳) 非エルミート皮膚効果 (NHSE) は近年, 単一粒子レベルで広く研究されている。多体相互作用が支配的になると、新しい非エルミート的な物理現象が出現する。本研究では,散逸と相互作用によって制御されるNHSEについて理論的に検討する。 1DジグザグBose-Hubbard格子は磁気フラックス,スタガードオンサイト単一粒子損失,および均一なオンサイト2粒子損失を考慮に入れた。 2粒子の損失が小さい場合、磁気フラックスとスタガード単一粒子の損失の相互作用により、2体有界固有状態(すなわち2体有界固有状態)は、すべて同じ境界で局在する。一方, 強い二粒子損失では, ドバイロンの局在方向が予想外に逆転する。これは、粒子対の仮想二階ホッピング過程と3階ホッピング過程、磁束、強い二粒子損失、多体相互作用に寄与する効果的な強い非相互ホッピングによるものである。さらに、2粒子ゲインは、NHSEとその相互作用によって制御されるドーバロンの反転を動的に観察するために利用することができる、同じ皮膚局在のドーバロンを誘導することができる。本研究は,多体系における新しい非エルミート現象を探求するための新たな道を開くものである。 Non-Hermitian skin effects (NHSEs) have recently been investigated extensively at the single-particle level. When many-body interactions become dominant, novel non-Hermitian physical phenomena can emerge. In this work, we theoretically study NHSEs controlled by dissipation and interaction. We consider a 1D zigzag Bose-Hubbard lattice, subject to magnetic flux, staggered onsite single-particle loss, and uniform onsite two-particle loss. When the two-particle loss is small, two-body bound eigenstates (i.e., doublons) are all localized at the same boundary due to the interplay of the magnetic flux and staggered single-particle loss. While, for strong two-particle loss, the localization direction of doublons is unexpectedly reversed. This is attributed to the effective strong nonreciprocal hopping of doublons contributing from the virtual second-order and third-order hopping processes of particle pairs in combination with the magnetic flux, the strong two-particle loss, and the many-body interaction. Moreover, a two-particle gain can induce the same skin-localization of doublons, which can be utilized to dynamically observe the NHSE and its reversal of doublons controlled by interactions. Our results open up a new avenue for exploring novel non-Hermitian phenomena in many-body systems.	翻訳日:2024-08-27 12:42:21 公開日:2024-08-24

Title

Authors

Abstract

論文公表日・翻訳日

# シンタクス誘導による分子の手続き的合成

Syntax-Guided Procedural Synthesis of Molecules ( http://arxiv.org/abs/2409.05873v1 )

ライセンス: Link先を確認

Michael Sun, Alston Lo, Wenhao Gao, Minghao Guo, Veronika Thost, Jie Chen, Connor Coley, Wojciech Matusik,

(参考訳) 合成可能な分子を設計し、合成不可能な分子に類似することを推奨することは、分子発見を加速させる重要な問題である。プログラム合成のアイデアを用いて,両問題を再認識する。シンタクティックスケルトンを合成木の意味論から切り離し、合成経路の組合せ空間を推論するための二段階の枠組みを構築する。アナログを創り出す分子が与えられたら、マルコフ・チェイン・モンテカルロシミュレーションを通じて、シンタクティック骨格の空間上の骨格特性を反復的に洗練する。ブラックボックスのオラクルが最適化されると、私たちは統語的テンプレートと分子記述子の上に共同設計空間を定式化し、統語的次元と意味論的次元の両方を相乗的に最適化する進化的アルゴリズムを導入します。我々の重要な洞察は、構文的スケルトンが設定されると、構文的テンプレートによって課される固定地平面マルコフ決定プロセスを完全に活用するトレーニングポリシーにより、プログラムの意味論を導出する検索複雑さを記憶することができるということである。合成可能なアナログ生成および合成可能な分子設計のための両レベルフレームワークの性能上の利点を示す。特に,本手法は, ユーザに対して, 合成に必要なリソースを明示的に制御し, 設計空間をよりシンプルなソリューションに偏り, 自律的な合成プラットフォームに特に有望である。

Designing synthetically accessible molecules and recommending analogs to unsynthesizable molecules are important problems for accelerating molecular discovery. We reconceptualize both problems using ideas from program synthesis. Drawing inspiration from syntax-guided synthesis approaches, we decouple the syntactic skeleton from the semantics of a synthetic tree to create a bilevel framework for reasoning about the combinatorial space of synthesis pathways. Given a molecule we aim to generate analogs for, we iteratively refine its skeletal characteristics via Markov Chain Monte Carlo simulations over the space of syntactic skeletons. Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. Our key insight is that once the syntactic skeleton is set, we can amortize over the search complexity of deriving the program's semantics by training policies to fully utilize the fixed horizon Markov Decision Process imposed by the syntactic template. We demonstrate performance advantages of our bilevel framework for synthesizable analog generation and synthesizable molecule design. Notably, our approach offers the user explicit control over the resources required to perform synthesis and biases the design space towards simpler solutions, making it particularly promising for autonomous synthesis platforms.

翻訳日:2024-09-15 05:31:27 公開日:2024-08-24

# HSR-KAN:Kolmogorov-Arnold Networksによる高効率ハイパースペクトル画像超解像

HSR-KAN: Efficient Hyperspectral Image Super-Resolution via Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2409.06705v1 )

ライセンス: Link先を確認

Baisong Li, Xingwang Wang, Haixiao Xu,

(参考訳) ハイパースペクトル画像(HSI)は、豊富なスペクトル情報のために様々な視覚的タスクにおいて大きな可能性を秘めている。しかし、物理画像の限界のため、高分解能ハイパースペクトル画像の取得は依然として困難である。 Kolmogorov-Arnold Networks (KANs) に触発され,低分解能HSI (LR-HSI) と高分解能マルチスペクトル像 (HR-MSI) を融合し高分解能HSI (HR-HSI) を得る効率的なHSI超解像 (HSI-SR) モデルを提案する。 HR-MSIからの空間情報の効果的な統合を実現するため,KAN-Fusionと呼ばれるKANをベースとした融合モジュールを設計する。チャネルアテンション機構にさらにインスパイアされた我々は、後核機能抽出のためのKANチャネルアテンションブロック(KAN-CAB)と呼ばれるスペクトルチャネルアテンションモジュールを設計する。 kansと統合されたチャネルアテンションモジュールとして、kan-CABはディープネットワークのきめ細かい調整能力を高め、スペクトルシーケンスや空間テクスチャの詳細を正確にシミュレートするだけでなく、次元の曲線(COD)を効果的に回避する。 HSR-KANは, 現状技術(SOTA)法と比較して, 定性評価と定量的評価の両面で, 最高の性能を達成している。私たちのコードは、https://github.com/Baisonm-Li/HSR-KAN.comで利用可能です。

Hyperspectral images (HSIs) have great potential in various visual tasks due to their rich spectral information. However, obtaining high-resolution hyperspectral images remains challenging due to limitations of physical imaging. Inspired by Kolmogorov-Arnold Networks (KANs), we propose an efficient HSI super-resolution (HSI-SR) model to fuse a low-resolution HSI (LR-HSI) and a high-resolution multispectral image (HR-MSI), yielding a high-resolution HSI (HR-HSI). To achieve the effective integration of spatial information from HR-MSI, we design a fusion module based on KANs, called KAN-Fusion. Further inspired by the channel attention mechanism, we design a spectral channel attention module called KAN Channel Attention Block (KAN-CAB) for post-fusion feature extraction. As a channel attention module integrated with KANs, KAN-CAB not only enhances the fine-grained adjustment ability of deep networks, enabling networks to accurately simulate details of spectral sequences and spatial textures, but also effectively avoid Curse of Dimensionality (COD). Extensive experiments show that, compared to current state-of-the-art (SOTA) HSI-SR methods, proposed HSR-KAN achieves the best performance in terms of both qualitative and quantitative assessments. Our code is available at: https://github.com/Baisonm-Li/HSR-KAN.

翻訳日:2024-09-15 05:21:30 公開日:2024-08-24

# パラメータ効率の良い微調整における長期的影響の解明

Discovering Long-Term Effects on Parameter Efficient Fine-tuning ( http://arxiv.org/abs/2409.06706v1 )

ライセンス: Link先を確認

Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang,

(参考訳) 事前訓練されたニューラルネットワーク(ANN)は、堅牢なパターン認識能力を示し、人間の脳、特にバイオニューラルネットワーク(BNN)と広範囲に類似している。我々はこれらのモデルが微調整によって新しい知識を得る能力に特に興味をそそられる。この点において,パラメータ効率のよいファインチューニング(PEFT)は,適応時のトレーニング可能なパラメータの数を制限することにより,トレーニングコストの削減と過適合リスクの軽減により,フルファインチューニングの代替として広く採用されている。 ANNの重みはBNNのシナプスを表し、ANNの機能(潜伏変数またはロジットとも呼ばれる)はBNNのニューロンによって放出される神経伝達物質を表す。主流PEFT法は、限られた数のトレーニング可能なパラメータ(通常は全パラメータの1%未満)で特徴値やパラメータ値を調整することを目的としているが、驚くほど良い結果が得られる。この手がかりに基づいて,特徴量調整とパラメータ調整の関連性を探究し,特徴量行列のスケーリングを学習し,後部重量行列に対するそれらの効果を伝播する手法であるSynapses & Neurons (SAN)を提案する。我々のアプローチは、よく知られた神経科学現象であるLTP(Long-term Potentiation)とLTD(Long-term Depression)から強いインスピレーションを受け、シナプス発生と神経伝達物質放出レベルとの関係を明らかにする。我々は、注意に基づくネットワークと畳み込みに基づくネットワークを用いて26のデータセットに対してPEFTを広範囲に比較し、他のチューニング手法(+8.5%、+7%、Visual Prompt Tuning、+3.2%)と比較して大幅に改善した。コードはリリースされます。

Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern recognition capabilities and share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs). We are particularly intrigued by these models' ability to acquire new knowledge through fine-tuning. In this regard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption as a substitute for full fine-tuning due to its cost reduction in training and mitigation of over-fitting risks by limiting the number of trainable parameters during adaptation. Since both ANNs and BNNs propagate information layer-by-layer, a common analogy can be drawn: weights in ANNs represent synapses in BNNs, while features (also known as latent variables or logits) in ANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT methods aim to adjust feature or parameter values using only a limited number of trainable parameters (usually less than 1% of the total parameters), yet achieve surprisingly good results. Building upon this clue, we delve deeper into exploring the connections between feature adjustment and parameter adjustment, resulting in our proposed method Synapses & Neurons (SAN) that learns scaling matrices for features and propagates their effects towards posterior weight matrices. Our approach draws strong inspiration from well-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term Depression (LTD), which also reveal the relationship between synapse development and neurotransmitter release levels. We conducted extensive comparisons of PEFT on 26 datasets using attention-based networks as well as convolution-based networks, leading to significant improvements compared to other tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning, and +3.2% over LoRA). The codes would be released.

翻訳日:2024-09-15 05:21:30 公開日:2024-08-24

# 安全運転における歩行者交叉予測のための総合現実知識の学習

Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving ( http://arxiv.org/abs/2409.06707v1 )

ライセンス: Link先を確認

Jie Bai, Jianwu Fang, Yisheng Lv, Chen Lv, Jianru Xue, Zhengguo Li,

(参考訳) 運転シーンにおける歩行者交叉予測(PCP)は、インテリジェントな車両の安全な運転を保証する上で重要な役割を担っている。典型的な状況下での歩行者の横断行動の観察が限られているため、近年では予測性能を高めるために柔軟な変動を伴う合成データの利用が始められ、ドメイン適応フレームワークが採用されている。しかし、異なるドメイン知識は異なるドメイン間分配ギャップを持ち、PCPタスクに適したドメイン知識適応方法を必要とする。本研究では,PCP(Gated-S2R-PCP)のためのGated Syn-to-Real Knowledge Transfer手法を提案する。 1)異なる種類のクロスドメイン知識に適したドメイン適応方法の設計、及び 2) 特定の状況に適切な知識を強制的な知識融合で伝達すること。具体的には, 視覚, 意味, 深度, 位置などの様々な情報に対する, スタイル伝達, 分布近似, 知識蒸留を含む3つのドメイン適応手法を含むフレームワークを設計する。学習可能なゲートユニット(LGU)は、横断歩道予測を促進するために適切なクロスドメイン知識を融合するために使用される。歩行者の位置,RGBフレーム,セマンティックイメージ,深度画像を含む3181フレーム(489,740フレーム)の合成ベンチマークS2R-PCP-3181を構築した。合成S2R-PCP-3181により、PIEとJAADの2つの真の挑戦的データセットに知識を伝達し、最先端の手法に優れたPCP性能を得る。

Pedestrian Crossing Prediction (PCP) in driving scenes plays a critical role in ensuring the safe operation of intelligent vehicles. Due to the limited observations of pedestrian crossing behaviors in typical situations, recent studies have begun to leverage synthetic data with flexible variation to boost prediction performance, employing domain adaptation frameworks. However, different domain knowledge has distinct cross-domain distribution gaps, which necessitates suitable domain knowledge adaption ways for PCP tasks. In this work, we propose a Gated Syn-to-Real Knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: 1) designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and 2) transferring suitable knowledge for specific situations with gated knowledge fusion. Specifically, we design a framework that contains three domain adaption methods including style transfer, distribution approximation, and knowledge distillation for various information, such as visual, semantic, depth, location, etc. A Learnable Gated Unit (LGU) is employed to fuse suitable cross-domain knowledge to boost pedestrian crossing prediction. We construct a new synthetic benchmark S2R-PCP-3181 with 3181 sequences (489,740 frames) which contains the pedestrian locations, RGB frames, semantic images, and depth images. With the synthetic S2R-PCP-3181, we transfer the knowledge to two real challenging datasets of PIE and JAAD, and superior PCP performance is obtained to the state-of-the-art methods.

翻訳日:2024-09-15 05:21:30 公開日:2024-08-24

# AIシステムにおける定量的バイアスの透過的監査による公正性の確保

Ensuring Fairness with Transparent Auditing of Quantitative Bias in AI Systems ( http://arxiv.org/abs/2409.06708v1 )

ライセンス: Link先を確認

Chih-Cheng Rex Yuan, Bow-Yaw Wang,

(参考訳) AIの急速な進歩により、AIを意思決定プロセスに統合する傾向が高まっている。しかし、AIシステムは意思決定者が不公平な結論を導くバイアスを示すかもしれない。特に、アメリカの司法制度で再犯を評価するために使用されるCompASシステムは、人種的多数派を好んでいることが判明した。 AIの公正性を評価するための様々な手段が提案されている。我々は、サードパーティの監査官やAIシステムプロバイダを含むAIフェアネスを監査するためのフレームワークを提案し、AIシステムの体系的な検査を容易にするツールを作成しました。このツールはオープンソースで公開されています。従来のAIシステムとは異なり、私たちは透明なホワイトボックスと統計ベースのアプローチを提唱します。これは、サードパーティの監査官、AI開発者、あるいは一般大衆がAIシステムの公正性基準を判断する際に利用することができる。

With the rapid advancement of AI, there is a growing trend to integrate AI into decision-making processes. However, AI systems may exhibit biases that lead decision-makers to draw unfair conclusions. Notably, the COMPAS system used in the American justice system to evaluate recidivism was found to favor racial majority groups; specifically, it violates a fairness standard called equalized odds. Various measures have been proposed to assess AI fairness. We present a framework for auditing AI fairness, involving third-party auditors and AI system providers, and we have created a tool to facilitate systematic examination of AI systems. The tool is open-sourced and publicly available. Unlike traditional AI systems, we advocate a transparent white-box and statistics-based approach. It can be utilized by third-party auditors, AI developers, or the general public for reference when judging the fairness criterion of AI systems.

翻訳日:2024-09-15 05:21:30 公開日:2024-08-24

# 機械翻訳における低リソース言語データ拡張のための生成逆ネットワーク

Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation ( http://arxiv.org/abs/2409.00071v1 )

ライセンス: Link先を確認

Linda Zeng,

(参考訳) ニューラルネットワーク翻訳(NMT)システムは、トレーニングに使用するモデルのための大規模データコーパスが欠如している低リソース言語への翻訳に苦労する。手動データキュレーションは高価で時間を要するため,低リソース言語データの拡張にGAN(Generative-Adversarial Network)を活用することを提案する。シミュレーションされた低リソース環境で、非常に少量の言語データ(20,000文以下)をトレーニングする場合、我々のモデルは、データ拡張の可能性を示し、"料理中の健康な昼食を教えてくれ"や"祖父は以前よりも一生懸命働く"といった文でモノリンガル言語データを生成する。我々の新しいデータ拡張アプローチは、低リソースNMTにおけるGANの能力を調べるための第一歩であり、低リソースNMTへのGANの将来の拡張が期待できることを示す。

Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.

翻訳日:2024-09-08 15:21:17 公開日:2024-08-24

# LLMベースの手法は不公平なサービス条件を検出するのに十分か?

Are LLM-based methods good enough for detecting unfair terms of service? ( http://arxiv.org/abs/2409.00077v1 )

ライセンス: Link先を確認

Mirgita Frasheri, Arian Bakhtiarnia, Lukas Esterle, Aleksandros Iosifidis,

(参考訳) 数え切れないほどのサービス規約(ToS)は、世界中のユーザーが毎日、あらゆる種類のアプリやWebサイトと対話しながら署名している。多くの場合、この2桁のページにまたがるオンライン契約は、単に希望のサービスに即座にアクセスしたいというユーザーによって盲目的に署名される。通常、法務チームとの相談を必要とするものは、ユーザーがデータプライバシーの観点から、無数のオンラインエンティティやパートナーに登録する、いくつかのクリックからなる日常的な活動になっている。大きな言語モデル(LLM)は、長いテキストベースのドキュメントのパースに長けており、ToSの疑わしい条項とその基盤となるプライバシーポリシーを扱う際に、ユーザを支援するために採用される可能性がある。このタスクのために既存のモデルの有用性を調べるために、まず、人気のあるウェブサイトからクロールされたプライバシーポリシーの集合に対して、個別に適用された12の質問からなるデータセットを構築した。その後、ChatGPTのような一連のオープンソースおよび商用チャットボットが各質問に対して質問され、回答は与えられた根拠の真実と比較される。これらの結果から,オープンソースモデルによっては,商用モデルと比較して精度が高いことが示唆された。しかし、最高のパフォーマンスは商用チャットボット(ChatGPT4)から記録される。全体として、全てのモデルは、このタスクにおいてランダムよりもわずかにパフォーマンスが良いだけである。そのため、この目的のために広く採用される前に、パフォーマンスを著しく改善する必要がある。

Countless terms of service (ToS) are being signed everyday by users all over the world while interacting with all kinds of apps and websites. More often than not, these online contracts spanning double-digit pages are signed blindly by users who simply want immediate access to the desired service. What would normally require a consultation with a legal team, has now become a mundane activity consisting of a few clicks where users potentially sign away their rights, for instance in terms of their data privacy, to countless online entities/companies. Large language models (LLMs) are good at parsing long text-based documents, and could potentially be adopted to help users when dealing with dubious clauses in ToS and their underlying privacy policies. To investigate the utility of existing models for this task, we first build a dataset consisting of 12 questions applied individually to a set of privacy policies crawled from popular websites. Thereafter, a series of open-source as well as commercial chatbots such as ChatGPT, are queried over each question, with the answers being compared to a given ground truth. Our results show that some open-source models are able to provide a higher accuracy compared to some commercial models. However, the best performance is recorded from a commercial chatbot (ChatGPT4). Overall, all models perform only slightly better than random at this task. Consequently, their performance needs to be significantly improved before they can be adopted at large for this purpose.

翻訳日:2024-09-08 15:21:17 公開日:2024-08-24

# SGP-RI: 還元次元入力によるスパースガウス過程に基づくリアルタイムトレーサブル・分散IoT室内局在モデル

SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs ( http://arxiv.org/abs/2409.00078v1 )

ライセンス: Link先を確認

Zhe Tang, Sihao Li, Zichen Huang, Guandong Yang, Kyeong Soo Kim, Jeremy S. Smith,

(参考訳) IoT(Internet of Things, モノのインターネット)デバイスは、提出されたデバイスにデプロイされるが、それらのIoTデバイス上でのローカルコンピューティングには、膨大な量の未使用のポテンシャルがある。そのため、この可能性を屋内のローカライゼーションに当てはめることは、エキサイティングな研究分野となる。従来、屋内ローカライゼーションモデルのトレーニングと展開は、かなりの計算資源を持つ集中型サーバに基づいている。この集中型アプローチは、屋内電磁環境の動的で予測不可能な性質をデータベースが対応できないこと、モデル再トレーニングコスト、集中型サーバのセキュリティ侵害に対する感受性など、いくつかの課題に直面している。これらの課題を軽減するために,SGP-RI(Sparse Gaussian Process with Reduced-dimensional Inputs)に基づくリアルタイム学習型および分散型IoT屋内ローカライゼーションモデルを用いて,従来の屋内ローカライゼーション手法のオフラインおよびオンラインのフェーズを,基準点と無線アクセスポイントフィルタリングによってそれぞれ削減することを目的とした。マルチビルディングおよびマルチフロアの静的データベースおよびシングルビルディングおよびシングルフロアの動的データベースに基づく実験結果は、入力を誘導するトレーニングサンプルの半分未満のSGP-RIモデルが、トレーニングサンプル全体で標準ガウスプロセスモデルに匹敵するローカライゼーション性能を得ることができることを示す。 SGP-RIモデルは、屋内のローカライゼーションの分散化を可能にし、リソース制限されたIoTデバイスへのデプロイメントを容易にし、セキュリティとプライバシの向上、コスト削減、ネットワーク依存性を提供する。また、実時間トレーニングの能力により、時間変化のある屋内電磁環境に迅速に適応することができる。

Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. This centralized approach faces several challenges, including the database's inability to accommodate the dynamic and unpredictable nature of the indoor electromagnetic environment, the model retraining costs, and the susceptibility of centralized servers to security breaches. To mitigate these challenges we aim to amalgamate the offline and online phases of traditional indoor localization methods using a real-time-trainable and decentralized IoT indoor localization model based on Sparse Gaussian Process with Reduced-dimensional Inputs (SGP-RI), where the number and dimension of the input data are reduced through reference point and wireless access point filtering, respectively. The experimental results based on a multi-building and multi-floor static database as well as a single-building and single-floor dynamic database, demonstrate that the proposed SGP-RI model with less than half the training samples as inducing inputs can produce comparable localization performance to the standard Gaussian Process model with the whole training samples. The SGP-RI model enables the decentralization of indoor localization, facilitating its deployment to resource-constrained IoT devices, and thereby could provide enhanced security and privacy, reduced costs, and network dependency. Also, the model's capability of real-time training makes it possible to quickly adapt to the time-varying indoor electromagnetic environment.

翻訳日:2024-09-08 15:21:17 公開日:2024-08-24

# 異なる研究コミュニティを理解する:オーサリングネットワーク

Examining Different Research Communities: Authorship Network ( http://arxiv.org/abs/2409.00081v1 )

ライセンス: Link先を確認

Shrabani Ghosh,

(参考訳) Google Scholarは、学術文献の分野にまたがる研究論文にアクセスするためのトップ検索エンジンの1つだ。 Googleの学者による事前検索オプションでは、フレーズ、出版社名、著者名、期間などに基づいて記事の抽出を行うことができる。本研究では,コンピュータ科学の2つの異なる研究領域であるデータマイニングとソフトウェア工学について,Google Scholarデータ(2000-2021)を収集した。研究者データベースリソースは、ネットワーク分析、データマイニング、著者ネットワークを介して著者間のリンクを特定するために強力である。各ドメインの共著者シップネットワークを調査し,そのネットワーク構造について検討した。出版物の動向を分析し、各分野の影響力のある著作者や関連団体を特定するための大規模な実験が実施されている。ネットワーク分析により、ネットワークの特徴は互いに異なることが示され、特定のドメインの影響力のある著者の中に小さなコミュニティが存在している。

Google Scholar is one of the top search engines to access research articles across multiple disciplines for scholarly literature. Google scholar advance search option gives the privilege to extract articles based on phrases, publishers name, authors name, time duration etc. In this work, we collected Google Scholar data (2000-2021) for two different research domains in computer science: Data Mining and Software Engineering. The scholar database resources are powerful for network analysis, data mining, and identify links between authors via authorship network. We examined coauthor-ship network for each domain and studied their network structure. Extensive experiments are performed to analyze publications trend and identifying influential authors and affiliated organizations for each domain. The network analysis shows that the networks features are distinct from one another and exhibit small communities within the influential authors of a particular domain.

翻訳日:2024-09-08 15:21:17 公開日:2024-08-24

# 複雑なプロセス・エンジニアリング・スキームのヒューマン・レベル理解に向けて--オープン・ドメイン質問応答のための教育的・イントロスペクティブ・マルチエージェント・フレームワーク

Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering ( http://arxiv.org/abs/2409.00082v1 )

ライセンス: Link先を確認

Sagar Srinivas Sakhinana, Geethan Sannidhi, Venkataramana Runkana,

(参考訳) 化学・プロセス産業では、プロセス・フロー・ダイアグラム(PFD)とパイプ・アンド・インスツルメンテーション・ダイアグラム(P&ID)が設計、建設、保守に不可欠である。 GPT4(Omni)のようなLMM(Large Multimodal Models)のようなジェネレーティブAIの最近の進歩は、ビジュアル質問回答(VQA)のプロセス図の理解と解釈において有望であることを示している。しかし、プロプライエタリなモデルはデータプライバシのリスクを生じさせ、その計算複雑性は、消費者ハードウェアにおけるドメイン固有のカスタマイズのための知識編集を妨げる。これらの課題を克服するために、オープンドメイン質問応答(ODQA)タスクのための階層的・マルチエージェント検索拡張生成(RAG)フレームワークを用いて、セキュアでオンプレミスなエンタープライズソリューションを提案し、データプライバシ、説明可能性、費用対効果を提供する。我々の新しいマルチエージェントフレームワークは、PFDとP&ID分析のためのReAct(Reason+Act)プロンプト技術を用いたオープンソースの小型マルチモーダルモデルを用いて、イントロスペクティブで専門的なサブエージェントを採用し、複数の情報ソースを統合し、正確で文脈的に関係のある回答を提供する。反復的自己補正によって支援された我々のアプローチは,ODQAタスクにおいて優れたパフォーマンスを実現することを目的としている。厳密な実験を行い,提案手法の有効性を実証した。

In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance. Recent advancements in Generative AI, such as Large Multimodal Models (LMMs) like GPT4 (Omni), have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA). However, proprietary models pose data privacy risks, and their computational complexity prevents knowledge editing for domain-specific customization on consumer hardware. To overcome these challenges, we propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework for open-domain question answering (ODQA) tasks, offering enhanced data privacy, explainability, and cost-effectiveness. Our novel multi-agent framework employs introspective and specialized sub-agents using open-source, small-scale multimodal models with the ReAct (Reason+Act) prompting technique for PFD and P&ID analysis, integrating multiple information sources to provide accurate and contextually relevant answers. Our approach, supported by iterative self-correction, aims to deliver superior performance in ODQA tasks. We conducted rigorous experimental studies, and the empirical results validated the proposed approach effectiveness.

翻訳日:2024-09-08 15:21:17 公開日:2024-08-24

# 量子キャンディーを用いた量子テレポーテーション

Quantum Teleportation using Quantum Candies ( http://arxiv.org/abs/2408.16016v1 )

ライセンス: Link先を確認

Nikhitha Nunavath, Sandeep Mishra, Anirban Pathak,

(参考訳) 量子キャンディー(Quantum Candies)またはカンディー(Qandies)は、キャンディーの言語における量子情報と量子科学の概念を理解するための、巧妙な方法を提供する。カンディーの批判的な考えは、量子科学を一般大衆に直感的に描写することであり、この領域の研究の大部分は納税者によって資金提供されているので理にかなっている。カンディーズモデルは既に量子科学と量子暗号の基本概念を説明するために使われている。しかし、テレポーテーションと関連する概念はまだ説明されていない。この事実に触発されて、我々はヤコブズとリン=モー=シャピラのアイデアを調査・拡張し、カンディーを用いたテレポーテーションを説明する。ここでは、テレポーテーションプロトコルを明示的に設計し、カンディーゲートを用いた回路モデルを実行する。このプロトコルは、相関したカンディーが適切に事前共有され、両方の端でいくつかのローカル操作を使用するときに成功する。私たちが開発しているモデルは、一般大衆が量子科学とテクノロジーに関する洞察を得るのを助けたいと願う、科学と工学の教育者にとって貴重なツールとなり得る。

Quantum Candies or Qandies provide us with a lucid way of understanding the concepts of quantum information and quantum science in the language of candies. The critical idea of qandies is intuitively depicting quantum science to the general public, making sense as most of the research in this domain is funded by the taxpayers. The qandies model is already used to explain the essential concepts of quantum science and quantum cryptography. However, teleportation and related concepts are yet to be explained. Motivated by this fact, we investigate and extend the idea of Jacobs and Lin-Mor-Shapira to explain teleportation using qandies. Here, we explicitly design the teleportation protocol and perform a circuit model using qandy gates. The protocol is successful when the correlated qandies are appropriately pre-shared and use of some local operations at both ends. The model we develop can be a valuable tool for science and engineering educators who want to help the general public to gain more insights into quantum science and technology.

翻訳日:2024-08-30 18:04:21 公開日:2024-08-24

# スマートグリッドにおける電力時系列データの差分公開

Differentially Private Publication of Electricity Time Series Data in Smart Grids ( http://arxiv.org/abs/2408.16017v1 )

ライセンス: Link先を確認

Sina Shaham, Gabriel Ghinita, Bhaskar Krishnamachari, Cyrus Shahabi,

(参考訳) スマートグリッドは、消費者の行動を研究し、エネルギー政策決定を導くための貴重なデータソースである。特に、地理的領域における電力消費の時系列は、高価な資源(例えば、トランスフォーマー、ストレージ要素)の最適配置とその活性化スケジュールを決定するのに不可欠である。しかし、そのようなデータの公開は、個人の習慣やライフスタイルに関する繊細な詳細を明らかにする可能性があるため、重要なプライバシー問題を引き起こす。差分プライバシー(DP)は、個々のデータの衛生化に適しているが、現在の時系列のDP技術は、データ読取の間に時間的相関が存在するため、実用性が著しく低下している。本稿では、時空間特性を分析し、RNNを利用してマイクロパターンとマクロパターンをキャプチャするDP準拠の電力消費データを公開するための新しい手法である {\em STPT(Spatio-Temporal Private Timeseries)を紹介する。また、特定パターンに基づいて電力消費時系列を解放する分割方式も採用している。実世界のデータセットと合成データセットの両方で広範な実験を行い、STPTは既存のベンチマークを著しく上回り、データユーティリティとユーザのプライバシのバランスのとれたトレードオフを提供します。

Smart grids are a valuable data source to study consumer behavior and guide energy policy decisions. In particular, time-series of power consumption over geographical areas are essential in deciding the optimal placement of expensive resources (e.g., transformers, storage elements) and their activation schedules. However, publication of such data raises significant privacy issues, as it may reveal sensitive details about personal habits and lifestyles. Differential privacy (DP) is well-suited for sanitization of individual data, but current DP techniques for time series lead to significant loss in utility, due to the existence of temporal correlation between data readings. We introduce {\em STPT (Spatio-Temporal Private Timeseries)}, a novel method for DP-compliant publication of electricity consumption data that analyzes spatio-temporal attributes and captures both micro and macro patterns by leveraging RNNs. Additionally, it employs a partitioning method for releasing electricity consumption time series based on identified patterns. We demonstrate through extensive experiments, on both real-world and synthetic datasets, that STPT significantly outperforms existing benchmarks, providing a well-balanced trade-off between data utility and user privacy.

翻訳日:2024-08-30 18:04:21 公開日:2024-08-24

# SpeechCraft: 自然言語記述によるきめ細かい表現型音声データセット

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description ( http://arxiv.org/abs/2408.13608v1 )

ライセンス: Link先を確認

Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu,

(参考訳) 発話スタイルに固有の微妙なニュアンス情報により,多モーダル学習は重要な課題となる。そのため,音声と自然言語の理解を深めるためには,音声スタイルの精巧な理解を提供する大規模データセットが緊急に必要である。しかし、そのようなデータセットの構築は、大規模なデータ収集と高品質なアノテーションの間に大きなトレードオフをもたらす。この課題に対処するため、我々は、表現力と鮮明な人間の言語記述で、単語中の音声クリップに注釈を付ける、表現力の解釈のための自動音声アノテーションシステムを提案する。音声音声は、最初は一連の専門家分類器とキャプションモデルによって処理され、多様な音声特性をキャプチャし、その後、カスタマイズされたアノテーション生成のための微調整されたLLaMAが続く。情報量や多様性が制限された従来のタグ/テンプレットベースのアノテーションフレームワークとは違って,提案システムは,自然言語記述の調整による音声スタイルの深い理解を提供し,大規模なモデルトレーニングのための正確で高機能なデータ生成を可能にする。このシステムにより、細粒度のバイリンガル表現型音声データセットであるSpeechCraftを作成する。約2000時間の音声データを含み、200万以上の音声クリップを含む、高度に記述的な自然言語スタイルのプロンプトによって区別されている。大規模な実験により,提案したデータセットは,スタイリスト音声合成と音声スタイル理解において,言語タスクのパフォーマンスを著しく向上させることが示された。

Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate insightful interplay between speech audio and natural language. However, constructing such datasets presents a major trade-off between large-scale data collection and high-quality annotation. To tackle this challenge, we propose an automatic speech annotation system for expressiveness interpretation that annotates in-the-wild speech clips with expressive and vivid human language descriptions. Initially, speech audios are processed by a series of expert classifiers and captioning models to capture diverse speech characteristics, followed by a fine-tuned LLaMA for customized annotation generation. Unlike previous tag/templet-based annotation frameworks with limited information and diversity, our system provides in-depth understandings of speech style through tailored natural language descriptions, thereby enabling accurate and voluminous data generation for large model training. With this system, we create SpeechCraft, a fine-grained bilingual expressive speech dataset. It is distinguished by highly descriptive natural language style prompts, containing approximately 2,000 hours of audio data and encompassing over two million speech clips. Extensive experiments demonstrate that the proposed dataset significantly boosts speech-language task performance in stylist speech synthesis and speech style understanding.

翻訳日:2024-08-29 18:22:33 公開日:2024-08-24

# 脳モデルとしての概念価値ネットワーク

A Concept-Value Network as a Brain Model ( http://arxiv.org/abs/1904.04579v5 )

ライセンス: Link先を確認

Kieran Greer,

(参考訳) 本稿では,脳様モデルの物理的実体と概念的実体の関係を記述するための統計的枠組みを提案する。特徴と概念のインスタンスはコンテキストに置かれ、化学接続も可能であるが、この論文は特徴が電気配線である可能性を示唆している。この考え方では、実際の接続長は、発射速度とニューロン同期と関係があるため重要であるが、信号タイプはそれほど重要ではない。この論文は、概念が特徴集合と概念インスタンスをリンクするニューロン群であり、それらのグループからの化学信号によって決定されることを示唆している。したがって、特徴はニューラルネットワークの静的水平フレームワークとなり、概念はこれらを垂直に相互に結合する。機能に関して、ニューロンは機能的と考えられ、より水平な記憶構造はグリアとなる。これはまた、機能が分散エンティティであり、単一の領域に集中していないことを示唆する。

This paper suggests a statistical framework for describing the relations between the physical and conceptual entities of a brain-like model. Features and concept instances are put into context, where the paper suggests that features may be the electrical wiring, although chemical connections are also possible. With this idea, the actual length of the connection is important, because it is related to firing rates and neuron synchronization, but the signal type is less important. The paper then suggests that concepts are neuron groups that link feature sets and concept instances are determined by chemical signals from those groups. Therefore, features become the static horizontal framework of the neural system and concepts are vertically interconnected combinations of these. With regards to functionality, the neuron is then considered to be functional and the more horizontal memory structures can even be glial. This would also suggest that features can be distributed entities and not concentrated to a single area.

翻訳日:2024-08-28 20:36:52 公開日:2024-08-24

# 有限スペクトル三重項に対するフェルミオン積分

Fermion integrals for finite spectral triples ( http://arxiv.org/abs/2403.18428v2 )

ライセンス: Link先を確認

John W. Barrett,

(参考訳) フェルミオン函数積分は、有限実スペクトル三重項のディラック作用素に対して計算される。複素、実およびキラルな汎函数積分は、それらが非自明な各KO次元に対して考慮され、定義における位相あいまいさが注目される。

Fermion functional integrals are calculated for the Dirac operator of a finite real spectral triple. Complex, real and chiral functional integrals are considered for each KO-dimension where they are non-trivial, and phase ambiguities in the definition are noted.

翻訳日:2024-08-28 19:29:21 公開日:2024-08-24

# コンクリート製造プロセス最適化のための物理インフォームニューラルネットワーク

Physics-Informed Neural Network for Concrete Manufacturing Process Optimization ( http://arxiv.org/abs/2408.14502v1 )

ライセンス: Link先を確認

Sam Varghese, Mr. Rahul Anand, Gaurav Paliwal,

(参考訳) コンクリート製造プロジェクトは、コンサルティング機関にとって最も一般的なプロジェクトの一つである。灰, 水, セメント, 超塑性などの入力材料の非線形依存性が高く, コンクリートの強度が高いことから, 機械学習モデルでは, この関係をうまく把握し, コスト最適化を行うのが困難になる。本稿では、PINN(Physics Informed Neural Networks)が与えられた状況でどのように役立つかを明らかにする。この最先端モデルは、線形回帰、ランダムフォレスト、グラディエントブースティング、ディープニューラルネットワークといった従来のモデルと比較される。調査の結果は、データセットが減ったとしてもPINNがいかにうまく機能したかを強調し、MLモデルの限られたデータ可用性に関する最大の課題の1つを解決した。 PINNは平均して、Deep Neural Networkに比べて40%少ないデータであっても、損失値を26.3%削減した。また, 材料量の予測に加えて, 粒子群最適化(PSO)などのヒューリスティック最適化手法を用いて, 与えられた強度のコンクリートを最小コストで製造するために必要な材料量の予測を行った。

Concrete manufacturing projects are one of the most common ones for consulting agencies. Because of the highly non-linear dependency of input materials like ash, water, cement, superplastic, etc; with the resultant strength of concrete, it gets difficult for machine learning models to successfully capture this relation and perform cost optimizations. This paper highlights how PINNs (Physics Informed Neural Networks) can be useful in the given situation. This state-of-the-art model shall also get compared with traditional models like Linear Regression, Random Forest, Gradient Boosting, and Deep Neural Network. Results of the research highlights how well PINNs performed even with reduced dataset, thus resolving one of the biggest issues of limited data availability for ML models. On an average, PINN got the loss value reduced by 26.3% even with 40% lesser data compared to the Deep Neural Network. In addition to predicting strength of the concrete given the quantity of raw materials, the paper also highlights the use of heuristic optimization method like Particle Swarm Optimization (PSO) in predicting quantity of raw materials required to manufacture concrete of given strength with least cost.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# 関数的正確性はコード言語モデルを評価するのに十分か? 生成コードの多様性を探る

Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes ( http://arxiv.org/abs/2408.14504v1 )

ライセンス: Link先を確認

Heejae Chon, Seonghyeon Lee, Jinyoung Yeo, Dongha Lee,

(参考訳) 言語モデル(LM)は、自然言語の要求からコードを生成する素晴らしい能力を示した。本研究では,LMが生成するコードの多様性を,機能的正確性に加えて,コード生成能力を評価する重要な基準として強調する。その実践的な意味にもかかわらず、生成されたコードの多様性を評価することに焦点を当てた研究が不足しており、コードLMの開発においてその重要性を見落としている。本稿では,コード間の類似性や機能的正しさを指標として,生成コードの多様性を評価するための体系的なアプローチを提案する。具体的には、コード理解と推論において、大規模なLMの能力を活用し、人間の判断と最も高い相関性を示すペアワイズコード類似度尺度を導入する。モデルのサイズ,温度,トレーニングアプローチ,戦略の推進,入力問題の難しさなど,生成コードの品質に対するさまざまな要因の影響を幅広く検討する。テストパススコアとコード間類似度スコアとの正の相関関係について一貫した観察を行ったところ、現在のLMは機能的に正しいコードを生成する傾向にあることがわかった。

Language models (LMs) have exhibited impressive abilities in generating codes from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities, in addition to functional correctness. Despite its practical implications, there is a lack of studies focused on assessing the diversity of generated code, which overlooks its importance in the development of code LMs. We propose a systematic approach to evaluate the diversity of generated code, utilizing various metrics for inter-code similarity as well as functional correctness. Specifically, we introduce a pairwise code similarity measure that leverages large LMs' capabilities in code understanding and reasoning, demonstrating the highest correlation with human judgment. We extensively investigate the impact of various factors on the quality of generated code, including model sizes, temperatures, training approaches, prompting strategies, and the difficulty of input problems. Our consistent observation of a positive correlation between the test pass score and the inter-code similarity score indicates that current LMs tend to produce functionally correct code with limited diversity.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# 離散再プログラミングのデカップリングによる時空間予測のための事前学習型言語モデル

Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming ( http://arxiv.org/abs/2408.14505v1 )

ライセンス: Link先を確認

Hao Wang, Jindong Han, Wei Fan, Hao Liu,

(参考訳) 時空間時系列予測は、輸送最適化、エネルギー管理、気候分析など、様々な実世界の応用において重要な役割を果たす。最近のPLM(Pre-trained Language Models)の進歩は、それらの優れた推論と一般化能力を活用することで、時系列予測タスクのためにこれらのモデルを再プログラミングする努力にインスピレーションを与えている。しかし、既存のアプローチは、複雑な空間的相互関係や本質的な系列内周波数成分の扱いに乏しく、時空間予測性能を制限している。さらに、連続時系列の圧縮部分語彙への線形写像は、PLMの時空間的表現性に制約を与え、潜在的な情報のボトルネックを引き起こす可能性がある。上記の制約を克服するため,時空間予測のための PLM プログラムフレームワークである \textsc{RePST} を提案する。 textsc{RePST} の重要な洞察は、周波数領域における時空間の時空間ダイナミクスを分離し、PLMテキスト空間との整合性を高めることである。具体的には、まず、フーリエ空間で時空間データを分離し、時間的内在的および空間的拡散信号を得る構造拡散演算子を考案し、このダイナミクスをより理解し、予測可能とした。さらに,限られた語彙からの情報ボトルネックを回避するために,拡張された語彙空間から関連する離散テキスト情報を選択する離散的再プログラミング戦略を提案する。 4つの実世界のデータセットに対する大規模な実験により、提案手法は、特にデータスカースシナリオにおいて、最先端の時空間予測モデルよりも大幅に優れていることが示された。

Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However, existing approaches fall short in handling complex spatial inter-series dependencies and intrinsic intra-series frequency components, limiting their spatio-temporal forecasting performance. Moreover, the linear mapping of continuous time series to a compressed subset vocabulary in reprogramming constrains the spatio-temporal semantic expressivity of PLMs and may lead to potential information bottleneck. To overcome the above limitations, we propose \textsc{RePST}, a tailored PLM reprogramming framework for spatio-temporal forecasting. The key insight of \textsc{RePST} is to decouple the spatio-temporal dynamics in the frequency domain, allowing better alignment with the PLM text space. Specifically, we first decouple spatio-temporal data in Fourier space and devise a structural diffusion operator to obtain temporal intrinsic and spatial diffusion signals, making the dynamics more comprehensible and predictable for PLMs. To avoid information bottleneck from a limited vocabulary, we further propose a discrete reprogramming strategy that selects relevant discrete textual information from an expanded vocabulary space in a differentiable manner. Extensive experiments on four real-world datasets show that our proposed approach significantly outperforms state-of-the-art spatio-temporal forecasting models, particularly in data-scarce scenarios.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# 蒸留ロングテールデータセット

Distilling Long-tailed Datasets ( http://arxiv.org/abs/2408.14506v1 )

ライセンス: Link先を確認

Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan,

(参考訳) データセット蒸留(DD)は、より大規模なデータセットから小さな情報に富んだデータセットを蒸留して、効率的なニューラルネットワークトレーニングを実現することを目的としている。しかし、既存のDDメソッドは、現実世界のシナリオで広く使われている長い尾のデータセットに苦しむ。この予期せぬ結果の背景にある理由を調査した結果、2つの主な原因が判明した。 1) 不均衡なデータに基づいて訓練されたエキスパートネットワークはバイアス勾配を発達させ、同様に不均衡な蒸留データセットを合成する。 DDの一般的な手法であるパラメータマッチングでは、蒸留データセットの学習パラメータと元のデータセットの学習パラメータを整合させる。しかし、長い尾のデータセットの文脈では、バイアスのある専門家が元のデータに存在する不均衡を継承し、蒸留されたデータセットは尾のクラスを不十分に表現する。 2) これらのデータセットを訓練した専門家は, 蒸留監督の誤認や, 品質の悪いソフトラベルの初期化を招いた。これらの課題に対処するため,我々は,Long-tailed Aware Dataset distillation (LAD) という,新しい長鎖データセット蒸留法を提案する。具体的には,偏りのある専門家の軌道と直接一致することを避けるために,ウェイトミスマッチ回避法を提案する。これは、学生と偏りのある専門家の軌跡の間の距離を減らし、尾のクラスバイアスが合成データセットに蒸留されるのを防ぐ。さらに,アダプティブ・デカップリング・マッチング(Adaptive Decoupled Matching)を提案する。この研究は長い尾のデータセット蒸留(LTDD)の分野を開拓し、長い尾のデータセットを蒸留する最初の効果的な取り組みとなった。

Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# GPT-4とスキーママッチングにおけるコスト意識の不確実性低減: Prompt-Matcher フレームワーク

Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework ( http://arxiv.org/abs/2408.14507v1 )

ライセンス: Link先を確認

Longyu Feng, Huahang Li, Chen Jason Zhang,

(参考訳) スキーママッチングは、与えられた2つのスキーマの要素間の対応を識別するプロセスであり、データベース管理システム、データ統合、データウェアハウスに必須である。現在のスキーママッチングアルゴリズムの固有の不確実性は、一連の候補マッチングの生成につながる。これらの結果を維持するには、確率的クエリを処理できるデータベースやシステムを使う必要がある。これにより、クエリプロセスが複雑になり、関連するストレージコストが増加する。 GPT-4の優れた性能により、不確実性を低減できる可能性を探る。本提案では,GPT-4を用いて,候補の集合を問合せするクラウドワーカーの役割を代替することを目的とする。 GPT-4からより正確な対応確認応答を得るため、我々は、GPT-4のセマンティック・マッチとAbbreviation-matchプロンプトを作成し、2つのベンチマークデータセットであるDeepMDatasets 100% (+0.0) と Fabricated-Datasets 91.8% (+2.2) のリコールレートに対して、最先端の結果を達成する。予算の活用を最適化するため、我々はコスト対応ソリューションを考案した。予算の制約の中で、我々のソリューションは、最小限の時間支出で好ましい結果をもたらす。本稿では,複数の自動スキーママッチングアルゴリズムの統合プロセスにおける不確実性を低減し,複雑なパラメータ化を選択するための新しいフレームワークであるPrompt-Matcherを紹介する。これは、候補スキーマの結果に関連する不確実性を減らし、最も有望なマッチを最適にランク付けするのに役立つ。我々は、GPT-4予算の範囲内での収益を最適化することを目的として、対応選択問題を正式に定義する。 CSPがNP-Hardであることを示し、最小時間支出の近似アルゴリズムを提案する。最終的に、厳密な実験を通してPrompt-Matcherの有効性を実証する。

Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling probabilistic queries. This complicates the querying process and increases the associated storage costs. Motivated by GPT-4 outstanding performance, we explore its potential to reduce uncertainty. Our proposal is to supplant the role of crowdworkers with GPT-4 for querying the set of candidate matches. To get more precise correspondence verification responses from GPT-4, We have crafted Semantic-match and Abbreviation-match prompt for GPT-4, achieving state-of-the-art results on two benchmark datasets DeepMDatasets 100% (+0.0) and Fabricated-Datasets 91.8% (+2.2) recall rate. To optimise budget utilisation, we have devised a cost-aware solution. Within the constraints of the budget, our solution delivers favourable outcomes with minimal time expenditure. We introduce a novel framework, Prompt-Matcher, to reduce the uncertainty in the process of integration of multiple automatic schema matching algorithms and the selection of complex parameterization. It assists users in diminishing the uncertainty associated with candidate schema match results and in optimally ranking the most promising matches. We formally define the Correspondence Selection Problem, aiming to optimise the revenue within the confines of the GPT-4 budget. We demonstrate that CSP is NP-Hard and propose an approximation algorithm with minimal time expenditure. Ultimately, we demonstrate the efficacy of Prompt-Matcher through rigorous experiments.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# 科学のための人工知能:簡単で難しい問題

Artificial intelligence for science: The easy and hard problems ( http://arxiv.org/abs/2408.14508v1 )

ライセンス: Link先を確認

Ruairidh M. Battleday, Samuel J. Gershman,

(参考訳) 人工知能の最近の進歩によって、科学的な発見が目覚ましいものとなりました。これらはほとんどすべて、大量のデータにアクセス可能なドメイン科学者とエンジニアのチームによって事前に特定された難しい最適化問題を解決するために、柔軟なアルゴリズムをトレーニングした結果である。非常に有用ではあるが、この種の問題解決は科学の1つの部分、すなわち「簡単な問題」にしか対応しない。科学研究のもう1つの部分は、その問題そのもの、すなわち「ハード問題」を思い浮かび上がっている。難しい問題の解決は、未定義の制約に基づいて連続的な概念修正を必要とするため、科学的な発見のための現在のアルゴリズムの能力を超える。我々は、科学者の認知科学を研究することによって、人間がどのように難しい問題を解くかを理解し、その結果を使って、科学パラダイムを自動推論し更新する新しい計算エージェントを設計することができる。

A suite of impressive scientific discoveries have been driven by recent advances in artificial intelligence. These almost all result from training flexible algorithms to solve difficult optimization problems specified in advance by teams of domain scientists and engineers with access to large amounts of data. Although extremely useful, this kind of problem solving only corresponds to one part of science - the "easy problem." The other part of scientific research is coming up with the problem itself - the "hard problem." Solving the hard problem is beyond the capacities of current algorithms for scientific discovery because it requires continual conceptual revision based on poorly defined constraints. We can make progress on understanding how humans solve the hard problem by studying the cognitive science of scientists, and then use the results to design new computational agents that automatically infer and update their scientific paradigms.

翻訳日:2024-08-28 18:01:37 公開日:2024-08-24

# シンセティック・インターベンション

Synthetic Interventions ( http://arxiv.org/abs/2006.07691v7 )

ライセンス: Link先を確認

Anish Agarwal, Devavrat Shah, Dennis Shen,

(参考訳) SC(Synthetic Control)方法論は,パネルデータアプリケーションにおけるポリシー評価のための重要なツールである。研究者は一般的にSCフレームワークを低次元の行列係数モデルで正当化し、潜在的な結果が低次元単位および時間固有の潜在因子によって記述されると仮定する。近年の[Abadie '20]では,SC手法の先駆者の一人が,SCフレームワークを複数の治療法に拡張する方法について疑問を投げかけている。本稿では、このオープンな疑問に対して、私たちが合成介入(SI)と呼ぶ一つの解決法を提供する。 SIフレームワークの基本は低ランクテンソル因子モデルであり、これは治療に対する潜在因子化を含めることで行列因子モデルを拡張する。本モデルでは,標準SCに基づく推定器の一般化を提案する。このアプローチの1つのインスタンス化に対する一貫性を証明し、漸近的に正常な条件を提供する。さらに,本研究では,その予測性能について検討し,これまでに検討されていない関連質問を探索し,抗タバコ法の影響について, [Abadie-Diamond-Hainmueller '10] の標準SCケーススタディを再検討する代表シミュレーションを行った。

The synthetic controls (SC) methodology is a prominent tool for policy evaluation in panel data applications. Researchers commonly justify the SC framework with a low-rank matrix factor model that assumes the potential outcomes are described by low-dimensional unit and time specific latent factors. In the recent work of [Abadie '20], one of the pioneering authors of the SC method posed the question of how the SC framework can be extended to multiple treatments. This article offers one resolution to this open question that we call synthetic interventions (SI). Fundamental to the SI framework is a low-rank tensor factor model, which extends the matrix factor model by including a latent factorization over treatments. Under this model, we propose a generalization of the standard SC-based estimators. We prove the consistency for one instantiation of our approach and provide conditions under which it is asymptotically normal. Moreover, we conduct a representative simulation to study its prediction performance and revisit the canonical SC case study of [Abadie-Diamond-Hainmueller '10] on the impact of anti-tobacco legislations by exploring related questions not previously investigated.

翻訳日:2024-08-28 01:41:09 公開日:2024-08-24

# VPIT:Voxel Pseudo画像を用いたリアルタイム埋め込み単体3D追跡

VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images ( http://arxiv.org/abs/2206.02619v2 )

ライセンス: Link先を確認

Illia Oleksiienko, Paraskevi Nousi, Nikolaos Passalis, Anastasios Tefas, Alexandros Iosifidis,

(参考訳) 本稿では,Voxel Pseudo Image Tracking (VPIT) と呼ばれる,Voxel-based 3D Single Object Tracking (3D SOT) 手法を提案する。 VPITは3D SOTにボクセル擬似画像を使用する最初の方法である。入力点雲は、柱ベースのボキセル化により構成され、結果として得られる擬似画像は、2DライクなSiamese SOT法の入力として使用される。擬似画像はBird's-eye View (BEV)座標で生成されるため、その中のオブジェクトのサイズは一定である。したがって、新しい座標系ではオブジェクトの回転のみが変化し、オブジェクトのスケールは変化しない。そこで我々は,対象物の位置と回転の両方を予測するために,異なる回転する探索領域を単一のターゲット表現と比較するマルチローテーション探索に置き換える。 KITTI追跡データセットの実験は、VPITが最速の3D SOT法であり、競合的な成功と精度の値を維持することを示している。実世界のシナリオにおけるSOT手法の適用は、組み込み機器の計算能力の低下や、推論速度が十分高くなければ特定のデータフレームをスキップせざるを得ない遅延非推奨環境といった制限に満たされる。我々は、リアルタイム評価プロトコルを実装し、他のメソッドが組み込みデバイスでの性能の大部分を失うことを示す一方、VPITはオブジェクトの追跡能力を維持している。

In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird's-eye View (BEV) coordinates, and therefore the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices, while VPIT maintains its ability to track the object.

翻訳日:2024-08-28 01:37:08 公開日:2024-08-24

# 時間と空間におけるロボットのハグ動作の学習とブレンディング

Learning and Blending Robot Hugging Behaviors in Time and Space ( http://arxiv.org/abs/2212.01507v2 )

ライセンス: Link先を確認

Michael Drolet, Joseph Campbell, Heni Ben Amor,

(参考訳) 複数の相互作用の重畳を含む複雑な相互作用において、適切なロボット応答を予測できる模倣学習に基づく物理ロボットインタラクションアルゴリズムを提案する。提案アルゴリズムであるBlending Bayesian Interaction Primitives (B-BIP) により, 複雑なハグシナリオにおいて応答性のある相互作用を実現できる。本稿では,本アルゴリズムが先行研究の一般化であり,元の定式化が単一インタラクションの特定のケースに還元されることを示す。提案アルゴリズムは,既存の最先端手法と比較して,精度,応答性,タイミングに関して,定量的な予測誤差と,より良好な参加者応答が得られる。

We introduce an imitation learning-based physical human-robot interaction algorithm capable of predicting appropriate robot responses in complex interactions involving a superposition of multiple interactions. Our proposed algorithm, Blending Bayesian Interaction Primitives (B-BIP) allows us to achieve responsive interactions in complex hugging scenarios, capable of reciprocating and adapting to a hugs motion and timing. We show that this algorithm is a generalization of prior work, for which the original formulation reduces to the particular case of a single interaction, and evaluate our method through both an extensive user study and empirical experiments. Our algorithm yields significantly better quantitative prediction error and more-favorable participant responses with respect to accuracy, responsiveness, and timing, when compared to existing state-of-the-art methods.

翻訳日:2024-08-28 01:37:08 公開日:2024-08-24

# 反射結合によるSGLDの幾何学的エルゴディディティ

Geometric ergodicity of SGLD via reflection coupling ( http://arxiv.org/abs/2301.06769v2 )

ライセンス: Link先を確認

Lei Li, Jian-Guo Liu, Yuliang Wang,

(参考訳) 非凸条件下での確率勾配ランゲヴィンダイナミクス(SGLD)の幾何学的エルゴディディティを考察する。反射結合の技法により、目標分布がコンパクトな集合の外側のみに対数展開されているとき、SGLDのワッサーシュタイン収縮を証明できる。 SGLDにおける時間離散化とミニバッチは、条件付き予測の一連の注意深く見積もられたリフレクション結合の適用においていくつかの困難をもたらす。直系として、一定のステップサイズを持つSGLDは不変分布を持ち、その幾何学的エルゴディディティを$W_1$距離で得ることができる。非勾配ドリフトへの一般化も含んでいる。

We consider the geometric ergodicity of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm under nonconvexity settings. Via the technique of reflection coupling, we prove the Wasserstein contraction of SGLD when the target distribution is log-concave only outside some compact set. The time discretization and the minibatch in SGLD introduce several difficulties when applying the reflection coupling, which are addressed by a series of careful estimates of conditional expectations. As a direct corollary, the SGLD with constant step size has an invariant distribution and we are able to obtain its geometric ergodicity in terms of $W_1$ distance. The generalization to non-gradient drifts is also included.

翻訳日:2024-08-28 01:37:08 公開日:2024-08-24

# シーケンスレコメンデーションのためのインテリジェントモデル更新戦略

Intelligent Model Update Strategy for Sequential Recommendation ( http://arxiv.org/abs/2302.07335v2 )

ライセンス: Link先を確認

Zheqi Lv, Wenqiao Zhang, Zhengyu Chen, Shengyu Zhang, Kun Kuang,

(参考訳) 現代のオンラインプラットフォームでは、情報の過負荷に対処し、ユーザエンゲージメントを改善するためのレコメンデーションシステムがますます多くなっている。この研究分野には、クラウドとエッジの両方でネットワーク学習を推奨するパラダイムが進化している(すなわち、エッジクラウドのコラボレーション)。最近の研究は、エッジ固有のコンテキスト対応適応を可能にすることで、この分野をさらに推し進めている。しかしながら、クラウドとエッジ間の頻繁なデータ交換は、かなりのパラメータ更新が冗長である可能性があるため、非効率性と通信/計算リソースの浪費につながることが多い、と我々は主張する。そこで本研究では,IntellectReqと略されるIntelligent Edge-Cloudパラメータ要求モデルを提案する。 IntellectReqはエッジで動作するように設計されており、最小の計算と通信オーバーヘッドでパラメータ要求のコスト対効果を評価できる。我々はこれを,配布外データの検出を目的とした新しい学習タスクとして定式化し,微調整適応通信戦略を提案する。さらに,実時間ユーザ動作を正規分布に変換するための統計マッピング手法を用いて,モデルの不確かさの定量化と一般化能力の定量化にマルチサンプル出力を用いる。広範に評価された4つのベンチマークに対する厳密な実証的検証は、エッジクラウド協調型および動的レコメンデーションシステムの効率と一般化性において顕著に改善されていると判断し、我々のアプローチを評価する。

Modern online platforms are increasingly employing recommendation systems to address information overload and improve user engagement. There is an evolving paradigm in this research field that recommendation network learning occurs both on the cloud and on edges with knowledge transfer in between (i.e., edge-cloud collaboration). Recent works push this field further by enabling edge-specific context-aware adaptivity, where model parameters are updated in real-time based on incoming on-edge data. However, we argue that frequent data exchanges between the cloud and edges often lead to inefficiency and waste of communication/computation resources, as considerable parameter updates might be redundant. To investigate this problem, we introduce Intelligent Edge-Cloud Parameter Request Model, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal computation and communication overhead. We formulate this as a novel learning task, aimed at the detection of out-of-distribution data, thereby fine-tuning adaptive communication strategies. Further, we employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities. Rigorous empirical validation on four widely-adopted benchmarks evaluates our approach, evidencing a marked improvement in the efficiency and generalizability of edge-cloud collaborative and dynamic recommendation systems.

翻訳日:2024-08-28 01:26:59 公開日:2024-08-24

# Leo: Lagrangeの基本的な最適化

Leo: Lagrange Elementary Optimization ( http://arxiv.org/abs/2304.05346v3 )

ライセンス: Link先を確認

Aso M. Aladdin, Tarik A. Rashid,

(参考訳) グローバル最適化問題は、進化的洗練の実践的で効率的な方法を用いて頻繁に解決される。しかし、元の問題がより複雑になると、その有効性と拡張性も向上する。そこで本研究では,ヒト血液のアルブミン投与量を用いたワクチン接種精度の顕著な向上から着想を得た,ラグランジュ基本最適化(Leo)を進化的手法として導入することを目的とする。彼らは、遺伝子交差後の適合関数値を用いてインテリジェントエージェントを開発する。これらの遺伝子は探索と搾取の両方において探索エージェントを誘導する。この論文ではLeoアルゴリズムの主目的と概念のインスピレーションとモチベーションについて述べる。その精度を示すために、提案アルゴリズムは、19の従来のベンチマーク関数やCECC06 2019テスト関数を含む、様々なテスト関数に対して検証される。 19の古典的ベンチマークテスト関数に対するLeoの結果は、DA、PSO、GAに対して別々に評価され、FDO、LPBなどの最近の2つのアルゴリズムも評価に含まれる。さらに、LeoはCECC06 2019の10の関数でDA、WOA、SSA、FDO、PB、FOXアルゴリズムをはっきりとテストしている。累積的な結果は、レオが人口を増やし、世界的最適な方向に進む能力を示している。異なる標準測定は、探検と搾取の両方の段階でレオの安定性を検証し、証明するために用いられる。さらに, 統計的解析は, 提案研究の結果を支持する。最後に、Leoの実用性を実証するために、現実世界における新しい応用を紹介した。

Global optimization problems are frequently solved using the practical and efficient method of evolutionary sophistication. But as the original problem becomes more complex, so does its efficacy and expandability. Thus, the purpose of this research is to introduce the Lagrange Elementary Optimization (Leo) as an evolutionary method, which is self-adaptive inspired by the remarkable accuracy of vaccinations using the albumin quotient of human blood. They develop intelligent agents using their fitness function value after gene crossing. These genes direct the search agents during both exploration and exploitation. The main objective of the Leo algorithm is presented in this paper along with the inspiration and motivation for the concept. To demonstrate its precision, the proposed algorithm is validated against a variety of test functions, including 19 traditional benchmark functions and the CECC06 2019 test functions. The results of Leo for 19 classic benchmark test functions are evaluated against DA, PSO, and GA separately, and then two other recent algorithms such as FDO and LPB are also included in the evaluation. In addition, the Leo is tested by ten functions on CECC06 2019 with DA, WOA, SSA, FDO, LPB, and FOX algorithms distinctly. The cumulative outcomes demonstrate Leo's capacity to increase the starting population and move toward the global optimum. Different standard measurements are used to verify and prove the stability of Leo in both the exploration and exploitation phases. Moreover, Statistical analysis supports the findings results of the proposed research. Finally, novel applications in the real world are introduced to demonstrate the practicality of Leo.

翻訳日:2024-08-28 01:26:59 公開日:2024-08-24

# 時間共有計算資源に関する学習可能性

Learnability with Time-Sharing Computational Resource Concerns ( http://arxiv.org/abs/2305.02217v5 )

ライセンス: Link先を確認

Zhi-Hua Zhou,

(参考訳) 従来の理論的機械学習研究は、一般に、十分に、あるいは無限に供給された計算資源が存在することを明示的または暗黙的に仮定する。しかし、実際には、計算リソースは通常限られており、機械学習のパフォーマンスは、受信したデータの数だけでなく、利用可能な計算リソースの処理量にも依存する。現在の 'intelligent supercomputing'' 施設は、学習性能要求や学習プロセス状態などの重要な要因を考慮して、適応的なスケジューリング戦略を使わずに、一定の量のリソースを機械学習タスクに割り当てる排他的オペレーティングシステムのように機能する。本稿では,機械学習のスループットの概念を導入し,計算資源効率学習(CoRE-Learning)を定義し,学習理論における計算資源の影響を考慮した理論的枠組みを提案する。このフレームワークは、入ってくるデータストリームが圧倒的なサイズで無限に終止符を打つことができるようなストリーム学習に自然に適用することができ、受信したすべてのデータを時間内に処理できると仮定するのは現実的ではない。これはまた、インテリジェントなスーパーコンピュータオペレーティングシステムの設計に対する理論的視点を提供するかもしれない。

Conventional theoretical machine learning studies generally assume explicitly or implicitly that there are enough or even infinitely supplied computational resources. In real practice, however, computational resources are usually limited, and the performance of machine learning depends not only on how many data have been received, but also on how many data can be handled subject to computational resources available. Note that most current ``intelligent supercomputing'' facilities work like exclusive operating systems, where a fixed amount of resources are allocated to a machine learning task without adaptive scheduling strategies considering important factors such as the learning performance demands and learning process status. In this article, we introduce the notion of machine learning throughput, define Computational Resource Efficient Learning (CoRE-Learning), and present a theoretical framework that takes into account the influence of computational resources in learning theory. This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless with overwhelming size and it is impractical to assume that all received data can be handled in time. It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.

翻訳日:2024-08-28 01:26:59 公開日:2024-08-24

# Virtual Quantum Device (VQD): 量子コンピュータの詳細なエミュレーションのためのツール

The Virtual Quantum Device (VQD): A tool for detailed emulation of quantum computers ( http://arxiv.org/abs/2306.07342v3 )

ライセンス: Link先を確認

Cica Gustiani, Tyson Jones, Simon C. Benjamin,

(参考訳) 我々はQuEST量子エミュレータに基づくシステムであるVirtual Quantum Device (VQD) プラットフォームを提案する。 VQDを使用することで、エキスパートでないユーザは、特定の量子コンピュータに詳細なエラーモデル、分岐ゲートセット、接続性をエミュレートすることができる。プラットフォームには直感的なインターフェース、強力な視覚化、複雑な量子アルゴリズムやさまざまな量子コンピューティングハードウェアにおけるアイデアの効率的なテストと最適化のための高性能な計算との互換性がある。我々は、トラップされたイオン、窒素空孔中心、中性原子配列、シリコン量子ドットスピン、超伝導デバイスに対応するVQDの5つのファミリーを作成し、探索する。それぞれが、調整されたパラメータのセットを通じて、高度に設定可能である。ツールの有用性を実例で示すとともに,各デバイス固有の属性を強調表示する。多様な量子ハードウェアのユーザフレンドリなカプセル化された記述を提供することで、VQDプラットフォームは研究者に、現実的な環境でアルゴリズムやプロトコルを迅速に探索する機能を提供する。

We present the Virtual Quantum Device (VQD) platform, a system based on the QuEST quantum emulator. Through the use of VQDs, non-expert users can emulate specific quantum computers with detailed error models, bespoke gate sets and connectivities. The platform boasts an intuitive interface, powerful visualisation, and compatibility with high-performance computation for effective testing and optimisation of complex quantum algorithms or ideas across a range of quantum computing hardware. We create and explore five families of VQDs corresponding to trapped ions, nitrogen-vacancy-centres, neutral atom arrays, silicon quantum dot spins, and superconducting devices. Each is highly configurable through a set of tailored parameters. We showcase the key characteristics of each virtual device, providing practical examples of the tool's usefulness and highlighting each device's specific attributes. By offering user-friendly encapsulated descriptions of diverse quantum hardware, the VQD platform offers researchers the ability to rapidly explore algorithms and protocols in a realistic setting; meanwhile hardware experts can create their own VQDs to compare with their experiments.

翻訳日:2024-08-28 01:17:09 公開日:2024-08-24

# UrbanIR: ワンビデオによる大規模都市シーンの逆レンダリング

UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video ( http://arxiv.org/abs/2306.09349v3 )

ライセンス: Link先を確認

Zhi-Hao Lin, Bohan Liu, Yi-Ting Chen, Kuan-Sheng Chen, David Forsyth, Jia-Bin Huang, Anand Bhattad, Shenlong Wang,

(参考訳) Urban Scene Inverse Rendering(Urban Scene Inverse Rendering)は,様々な照明条件下でのシーンのリアルかつ自由視点レンダリングを可能にする,新しい逆グラフィックモデルである。形状、アルベド、可視性、および太陽と空の照明は、車載カメラなど、NeRFの密集した視界設定とは異なる広いベースラインの映像から正確に推測される。この文脈では、標準的な手法は、不正確な屋根表現や多数の「フローター」のような、サブパー幾何学と材料推定をしばしば得る。 UrbanIRはこれらの問題に、逆グラフィック推論とレンダリングアーティファクトのエラーを減らす新たな損失で対処する。その技術により、元のシーンで正確に影の体積を推定できる。このモデルの出力は、制御可能な編集をサポートし、夜間シミュレーションのフォトリアリスティックな自由視点レンダリング、シーンへの依存、挿入オブジェクトを可能にし、既存の最先端手法よりも大幅に改善されている。

We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous 'floaters'. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods.

翻訳日:2024-08-28 01:17:09 公開日:2024-08-24

# ハイパーネットを用いた高速非教師付き深部外乱モデル選択

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks ( http://arxiv.org/abs/2307.10529v3 )

ライセンス: Link先を確認

Xueying Ding, Yue Zhao, Leman Akoglu,

(参考訳) 外乱検出(OD)は、多くのテクニックの豊富な文献で多くの応用を見出す。ディープニューラルネットワークに基づくOD(DOD)は、ディープラーニングの多くの進歩のおかげで、近年注目を集めている。本稿では,教師なしDOD,すなわち実効性ハイパーパラメータ(HP)チューニング/モデル選択によるクリティカル・イット・アンサンディドな課題について考察する。いくつかの先行研究は、ODモデルのHPに対する感受性を報告しているが、HPの長いリストを示す現代のDODモデルにとって、非常に重要なものになっている。我々は,DODモデルのチューニングにHYPERを導入し,(1)監督のない検証(ラベル付き異常の欠如による)と(2)HP/モデル空間の効率的な探索(HP数の増加による)という2つの基本的な課題に対処する。鍵となるアイデアは、HPをメインのDODモデルの最適な重みにマッピングする新しいハイパーネットワーク(HN)を設計し、訓練することである。 HYPERは、多くのDODモデルの重みを動的に生成できる単一のHN(HPの異なるモデルに対応する)に乗じて、大幅なスピードアップを実現している。さらに,従来のODタスクのメタラーニングを利用して,提案したHNを効率的にトレーニングしたプロキシバリデーション関数をラベルでトレーニングする。 35のODタスクに対する大規模な実験により、HYPERは高い効率で8つのベースラインに対して高いパフォーマンスを達成している。

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.

翻訳日:2024-08-28 01:07:17 公開日:2024-08-24

# AI生成テキストにおける編集検出のための情報理論アプローチ

An Information-Theoretic Approach for Detecting Edits in AI-Generated Text ( http://arxiv.org/abs/2308.12747v2 )

ライセンス: Link先を確認

Idan Kashtan, Alon Kipnis,

(参考訳) 本稿では,ある記事が生成言語モデルで完全に書かれたのか,あるいは異なる著者による編集を含むのか,あるいは人間なのかを判断する手法を提案する。我々のプロセスは、個々の文章や他のテキストの起点に関する複数のテストと、まれな代替品に敏感な手法を用いてこれらのテストを組み合わせることを含みます。興味深いことに、この方法は編集を含むと思われるテキストの断片も識別する。本手法の有効性を実データを用いた広範囲な評価により示すとともに,その成功に影響を及ぼす要因の情報理論解析を行う。特に、テキスト編集の理論的枠組みの下で、文は言語モデルによって主に生成されるという最適性について論じる。我々の分析は、情報理論とデータ科学の共通点における興味深い研究課題をいくつか提起する。

We propose a method to determine whether a given article was written entirely by a generative language model or perhaps contains edits by a different author, possibly a human. Our process involves multiple tests for the origin of individual sentences or other pieces of text and combining these tests using a method that is sensitive to rare alternatives, i.e., non-null effects are few and scattered across the text in unknown locations. Interestingly, this method also identifies pieces of text suspected to contain edits. We demonstrate the effectiveness of the method in detecting edits through extensive evaluations using real data and provide an information-theoretic analysis of the factors affecting its success. In particular, we discuss optimality properties under a theoretical framework for text editing saying that sentences are generated mainly by the language model, except perhaps for a few sentences that might have originated via a different mechanism. Our analysis raises several interesting research questions at the intersection of information theory and data science.

翻訳日:2024-08-28 01:07:17 公開日:2024-08-24

# 古典的到着時間のモーダル変形

Moyal deformation of the classical arrival time ( http://arxiv.org/abs/2309.00222v3 )

ライセンス: Link先を確認

Dean Alvin L. Pablico, Eric A. Galapon,

(参考訳) 到着の量子時間(TOA)問題は、粒子の初期状態のみを仮定して測定された到着時間の統計を必要とする。量子論の標準的な枠組みに従って、この問題は古典的到着時刻 $\mathcal{T}_C(q,p)$ の適切な量子像を見つけることに変換される。本稿では、量子力学の位相空間定式化における問題を新たに考察する。得られた量子画像は実数値で時間反転対称関数 $\mathcal{T}_M(q,p)$ の形式的級数$\hbar^2$ であり、古典的到着時刻を主項とする。これはハミルトニアン系とのモヤルブラケット関係から直接得られ、したがって古典的TOAのモヤル変形として解釈される。その性質について検討し、$\mathcal{T}_M(q,p)$ と[Eur で構築されたヒルベルト空間 TOA 作用素の間の同型性を示すことによって、既知の障害物を量子化にバイパスする方法について議論する。 Phys J. Plus \textbf{138}, 153 (2023)] は任意の解析ポテンシャルに対して常に時間-エネルギーの正準交換関係(TECCR)を満たす。次に、自由粒子と準振動子ポテンシャルのTOA問題を例として考察する。

The quantum time of arrival (TOA) problem requires the statistics of measured arrival times given only the initial state of a particle. Following the standard framework of quantum theory, the problem translates into finding an appropriate quantum image of the classical arrival time $\mathcal{T}_C(q,p)$, usually in operator form $\hat{\mathrm{T}}$. In this paper, we consider the problem anew within the phase space formulation of quantum mechanics. The resulting quantum image is a real-valued and time-reversal symmetric function $\mathcal{T}_M(q,p)$ in formal series of $\hbar^2$ with the classical arrival time as the leading term. It is obtained directly from the Moyal bracket relation with the system Hamiltonian and is hence interpreted as a Moyal deformation of the classical TOA. We investigate its properties and discuss how it bypasses the known obstructions to quantization by showing the isomorphism between $\mathcal{T}_M(q,p)$ and the rigged Hilbert space TOA operator constructed in [Eur. Phys. J. Plus \textbf{138}, 153 (2023)] which always satisfy the time-energy canonical commutation relation (TECCR) for arbitrary analytic potentials. We then examine TOA problems for a free particle and a quartic oscillator potential as examples.

翻訳日:2024-08-28 01:07:17 公開日:2024-08-24

# Lyra: 自動定理証明における二重補正のオーケストレーション

Lyra: Orchestrating Dual Correction in Automated Theorem Proving ( http://arxiv.org/abs/2309.15806v4 )

ライセンス: Link先を確認

Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li,

(参考訳) 大言語モデル (LLMs) は、公式な定理証明の分野における探索の興味深い道を示す。しかし、その可能性、特に幻覚の緩和と証明エラーメッセージによる改善については、まだ徹底的に研究されていない領域である。 LLMの有効性を高めるために,ツール補正(TC)とコンジェクチュア補正(CC)の2つの異なる補正機構を取り入れた新しいフレームワークであるLyraを導入する。形式的証明の後処理にツール補正を実装するために,事前に定義された証明ツール(例えばSledgehammer)を用いて,不正なツールの置き換えを誘導する。ツール補正は幻覚の緩和に大きく寄与し、それによって証明の全体的な精度が向上する。さらに,証明者と対話し,形式的証明予想を証明者エラーメッセージで洗練するエラーフィードバック機構であるConjecture Correctionを導入する。従来の改良フレームワークと比較して、提案されたConjecture Correctionは命令で生成を洗練させるが、ペア化された(生成、エラー、改善)プロンプトは収集しない。提案手法は, MiniF2F 検証 (48.0% -> 55.3%) とテスト (45.5% -> 51.2%) の両方で最先端 (SOTA) 性能を達成した。また、Lyraによって解決された3つのIMO問題を提示する。ツール補正(幻覚の緩和プロセス)とコンジェクチュア補正(環境との相互作用による副次的な調整)が今後の研究の道筋となると信じている。

Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field.

翻訳日:2024-08-28 01:07:17 公開日:2024-08-24

# Lemur: 自然言語の調和と言語エージェントのコード

Lemur: Harmonizing Natural Language and Code for Language Agents ( http://arxiv.org/abs/2310.06830v2 )

ライセンス: Link先を確認

Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu,

(参考訳) 自然言語とコーディング機能の両方に最適化されたオープンアクセス型言語モデルであるLemurとLemur-Chatを導入し、多言語エージェントのバックボーンとして機能する。言語チャットモデルから関数型言語エージェントへの進化は、モデルが人間のインタラクション、推論、計画をマスターするだけでなく、関連する環境の基盤を確保することを要求する。これにより、モデルにおける言語とコーディング機能の調和が求められます。 Lemur と Lemur-Chat はこの必要性に対処するために提案され、両方の領域でバランスの取れた熟練度を示す。コード集約コーパスを用いた厳密な事前学習とテキストとコードデータの微調整により,我々のモデルは,オープンソースモデル間での多種多様なテキストおよびコーディングベンチマークの最先端平均性能を達成する。総合的な実験は、ルムールが既存のオープンソースモデルよりも優れていること、そして人間のコミュニケーション、ツールの使用、完全に観察可能な環境下での相互作用を含む様々なエージェントタスクの能力を示している。自然言語とプログラミング言語の調和により、Lemur-Chatはエージェント能力に関するプロプライエタリなモデルとのギャップを著しく狭め、推論、計画、環境間のシームレスな操作に適した高度なオープンソースエージェントの開発に関する重要な洞察を提供する。 https://github.com/OpenLemur/Lemur

We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls for a harmonious blend of language and coding capabilities in the models. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks among open-source models. Comprehensive experiments demonstrate Lemur's superiority over existing open-source models and its proficiency across various agent tasks involving human communication, tool usage, and interaction under fully- and partially- observable environments. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities, providing key insights into developing advanced open-source agents adept at reasoning, planning, and operating seamlessly across environments. https://github.com/OpenLemur/Lemur

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# 時系列分類のためのデータ拡張:広範囲にわたる実証研究と包括的調査

Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey ( http://arxiv.org/abs/2310.10060v5 )

ライセンス: Link先を確認

Zijun Gao, Haibao Liu, Lingbo Li,

(参考訳) データ拡張(DA)は、トレーニングデータセットを拡張し、モデルの堅牢性を高め、多様性を導入し、オーバーフィッティングを減らす能力のために、時系列分類(TSC)において重要なアプローチとなっている。しかし、TSCにおけるDAの現在の状況は、断片化された文献レビュー、曖昧な方法論の分類、不適切な評価基準、そしてアクセス可能でユーザ指向のツールの不足に悩まされている。本研究は, TSC領域内におけるDA手法の総合的な検討を通じて, これらの課題に対処するものである。我々の研究は10年間にわたる広範な文献レビューから始まり, 既存の調査における大きなギャップを明らかにし, 60以上のDA手法を識別するために,100以上の学術論文の詳細な分析を必要とする。この厳格なレビューにより、TSCにおけるDAの特定のニーズに合わせた新しい分類法が開発され、テクニックを変換ベース、パターンベース、生成ベース、分解ベース、自動データ拡張の5つの主要なカテゴリに分類した。この分類法は、研究者がより明確で適切な方法を選択するのを導くことを目的としている。基礎DA手法の包括的評価の欠如に対して,UCR時系列リポジトリ内の全型を表す15の多様なデータセットに対して,20近いDA戦略を検証し,徹底的な実証実験を行った。 ResNet と LSTM アーキテクチャを用いて,精度,メソッドランク,残留分析などの指標を含む多面的評価手法を用いて,ResNet では 84.98 +- 16.41%,LSTM では 82.41 +- 18.71% のベンチマーク精度を得た。例えば、RGWやランダム置換といった手法はモデル性能を大幅に改善する一方、EMDのような手法では効果が低かった。

Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC), primarily for its capacity to expand training datasets, enhance model robustness, introduce diversity, and reduce overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible and user-oriented tools. This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain.Our research began with an extensive literature review spanning a decade, revealing significant gaps in existing surveys and necessitating a detailed analysis of over 100 scholarly articles to identify more than 60 distinct DA techniques. This rigorous review led to the development of a novel taxonomy tailored to the specific needs of DA in TSC, categorizing techniques into five primary categories: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. This taxonomy is intended to guide researchers in selecting appropriate methods with greater clarity. In response to the lack of comprehensive evaluations of foundational DA techniques, we conducted a thorough empirical study, testing nearly 20 DA strategies across 15 diverse datasets representing all types within the UCR time-series repository. Using ResNet and LSTM architectures, we employed a multifaceted evaluation approach, including metrics such as Accuracy, Method Ranking, and Residual Analysis, resulting in a benchmark accuracy of 84.98 +- 16.41% in ResNet and 82.41 +- 18.71% in LSTM. Our investigation underscored the inconsistent efficacies of DA techniques, for instance, methods like RGWs and Random Permutation significantly improved model performance, whereas others, like EMD, were less effective.

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# 未知のトークンによる学習は、より強力な視力学習者を駆り立てる

Learning with Unmasked Tokens Drives Stronger Vision Learners ( http://arxiv.org/abs/2310.13593v3 )

ライセンス: Link先を確認

Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han,

(参考訳) マスク付き画像モデリング(MIM)は,自己指導型学習戦略の先駆けとなる。 Masked Autoencoder (MAE) のようなMIMは、入力トークンをランダムにマスキングして処理し、デコーダが入力にマスクされたトークンを再構成することで、強力な表現を学ぶ。しかし、MIM事前訓練エンコーダは、マスク付きトークンのみを回帰することにのみ焦点をあてているため、限られた注意幅を持つことが多いため、エンコーダのより広範な文脈学習を阻害する可能性がある。この制限に対処するため、トレーニングプロセスに無意味なトークンを明示的に組み込むことによりMIMを改善する。具体的には,デコーダがマスク付きトークンを再構成している間に,アンマスク付きトークンが広いコンテキストを体験できるようにする。このように、符号化されたアンマスクトークンは、広範囲なコンテキスト情報を備えており、マスクされたトークンはMIMの強化されたアンマスクトークンを利用することができる。その結果,ImageNet-1K上でのVT-Bによる84.2%のトップ-1の精度と0.6%の利得を達成して,より差別的な表現を訓練した。この成功は、特異値スペクトルと注意分析によって証明されたように、事前学習の強化によるものである。最後に、下流のセマンティックセグメンテーションときめ細かい視覚的分類タスク、そして多様なロバストな評価指標において、我々のモデルは大きなパフォーマンス向上を達成する。コードはhttps://github.com/naver-ai/lutで入手できる。

Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing masked tokens only, which may impede the encoder's broader context learning. To tackle the limitation, we improve MIM by explicitly incorporating unmasked tokens into the training process. Specifically, our method enables the encoder to learn from broader context supervision, allowing unmasked tokens to experience broader contexts while the decoder reconstructs masked tokens. Thus, the encoded unmasked tokens are equipped with extensive contextual information, empowering masked tokens to leverage the enhanced unmasked tokens for MIM. As a result, our simple remedy trains more discriminative representations revealed by achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, our models achieve significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics. Code is available at https://github.com/naver-ai/lut

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# 最小限に修正されたマルコフゲームは、あらゆるナッシュ均衡と価値を得る

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value ( http://arxiv.org/abs/2311.00582v5 )

ライセンス: Link先を確認

Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie,

(参考訳) 本稿では,ゲーム修正問題について検討する。このゲーム修正問題では,ゼロサムマルコフゲームの報酬関数を,目標決定的あるいは確率的ポリシープロファイルが独自のマルコフ完全ナッシュ均衡となり,目標範囲内に値を持つように変更コストを最小限に抑える方法として,ゼロサムマルコフゲームの報酬関数を変更する。ゲームの一意平衡としてインストール可能なポリシープロファイルの集合を特徴付け,インストールを成功させるために十分な,必要な条件を確立する。線形制約で凸最適化問題を解き、次にランダムな摂動を行い、ほぼ最適コストで修正計画を得る効率的なアルゴリズムを提案する。アルゴリズムのコードはhttps://github.com/YoungWu559/game-modification で利用可能です。

We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost. The code for our algorithm is available at https://github.com/YoungWu559/game-modification .

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# 知識集中型視覚質問応答におけるGPT-4Vの総合的評価

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering ( http://arxiv.org/abs/2311.07536v3 )

ライセンス: Link先を確認

Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang,

(参考訳) マルチモーダル大モデル(MLM)の出現は、視覚的理解の分野を著しく進歩させ、視覚的質問応答(VQA)の領域において顕著な能力を提供している。しかし、真の課題は知識集約型VQAタスクの領域にある。これは視覚要素の認識だけでなく、学習した知識の膨大なリポジトリとともに視覚情報の深い理解を必要とする。 MLM、特に新たに導入されたGPT-4VとGeminiの機能を明らかにするために、3つの視点から詳細な評価を行う。 1) 共通知識(Commonsense Knowledge)とは,モデルが視覚的手がかりをいかに理解し,一般知識に結び付けるかを評価すること。 2 細かな世界知識は、画像から特定の知識を推論し、様々な専門分野においてその習熟度を示すためのモデルの技能を検査する。 3) モデルが推論に論理的説明を与える能力を検証し, 解釈可能性の観点からより深い分析を容易にする。さらに、視覚的知識強化トレーニング戦略とマルチモーダル検索強化ジェネレーションアプローチを用いて、MDMの強化を行い、今後の研究方向性の進歩の必要性を浮き彫りにしている。大規模な実験は次のように示している。 a)GPT-4Vは、合成画像を少数ショットとして使用する際の説明生成の強化を示す。 b) GPT-4Vその他のMLMは,世界知識を扱う際に,深刻な幻覚を生じさせる。 c) 視覚的知識により訓練が強化され、技術が性能を向上させる可能性があること。コード:https://github.com/HITsz-TMG/Cognitive-Visual-Language-Mapper

The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V and Gemini, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Additionally, we utilize a visual knowledge-enhanced training strategy and multimodal retrieval-augmented generation approach to enhance MLMs, highlighting the future need for advancements in this research direction. Extensive experiments indicate that: a) GPT-4V demonstrates enhanced explanation generation when using composite images as few-shots; b) GPT-4V and other MLMs produce severe hallucinations when dealing with world knowledge; c) Visual knowledge enhanced training and prompting technicals present potential to improve performance. Codes: https://github.com/HITsz-TMG/Cognitive-Visual-Language-Mapper

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# minimax: JAX における Autocurricula の効率的なベースライン

minimax: Efficient Baselines for Autocurricula in JAX ( http://arxiv.org/abs/2311.12716v3 )

ライセンス: Link先を確認

Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel,

(参考訳) 教師なし環境設計(英語: Unsupervised Environment Design, UED)は、堅牢な意思決定エージェントを訓練し、目に見えない環境にゼロショットで移行するための自動カリキュラム学習の形式である。このようなオートキュリキュラは、RLコミュニティから大きな関心を集めている。しかし、CPUロールアウトとGPUモデルの更新に基づくUED実験は、しばしば数週間のトレーニングを必要とした。この計算要求は、この分野の急速な革新の大きな障害である。この研究は、加速ハードウェア上でのUEDトレーニングのためのminimaxライブラリを導入している。 JAXを使って完全に拡張された環境とオートキュラムアルゴリズムを実装し、minimaxはハードウェアアクセラレーションのためにトレーニングループ全体をコンパイルできる。手続き的に生成された環境でオートキュリキュラを行うための再利用可能な抽象化に加えて、MiniGridに基づくテンソル化グリッドワールドを含む、迅速な実験用のペトリ皿を提供する。これらのコンポーネントによって minimax は強力な UED ベースラインを提供し、これには新たな並列化版が含まれており、同じバッチサイズでトレーニングした場合の以前の実装と比較して、壁時間で 120$\times$ のスピードアップを実現している。 minimaxライブラリはApache 2.0ライセンスでhttps://github.com/facebookresearch/minimax.comから入手できる。

Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# 分類から臨床への展望:大規模言語モデルを用いたモバイルおよび行動保健データの分析と分析に向けて

From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models ( http://arxiv.org/abs/2311.13063v3 )

ライセンス: Link先を確認

Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer,

(参考訳) ユビキタスセンサーからの受動的に収集された行動健康データは、患者の日常生活からメンタルヘルスの専門家に洞察を提供するという大きな約束を持っているが、このデータを臨床実践に使用する分析ツールを開発するには、デバイス全体の一般化と、測定された信号と個人のメンタルヘルスの間の弱いあるいはあいまいな相関に関する課題に対処する必要がある。これらの課題に対処するために,我々は,大規模言語モデル(LLM)を活用して,多センサデータから臨床的に有用な知見を合成する,新しいアプローチを採っている。歩数や睡眠などのデータにおける傾向がうつ病や不安などの状態とどのように関係しているかを,LSMを用いて推論する思考促進手法の連鎖を構築した。まず,LLMによる2次うつ病分類を行い,61.1%のアキュラシーを達成した。分類よりも影響があり、価値の高いアプローチは、新たな人間とAIのコラボレーションアプローチであり、臨床の専門家がこれらのツールを対話的にクエリし、臨床意思決定をサポートするためにAIが生成した推論に関するドメインの専門知識とコンテキストを組み合わせる。 GPT-4のようなモデルでは数値データの75%を正確に参照しており、臨床参加者は、この手法を用いて自己追跡データを解釈することへの強い関心を表明している。

Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental health. To address these challenges, we take a novel approach that leverages large language models (LLMs) to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data such as step count and sleep relate to conditions like depression and anxiety. We first demonstrate binary depression classification with LLMs achieving accuracies of 61.1% which exceed the state of the art. While it is not robust for clinical use, this leads us to our key finding: even more impactful and valued than classification is a new human-AI collaboration approach in which clinician experts interactively query these tools and combine their domain expertise and context about the patient with AI generated reasoning to support clinical decision-making. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# Snapshot Spectral Compressive Imaging における遅延拡散前処理

Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging ( http://arxiv.org/abs/2311.14280v2 )

ライセンス: Link先を確認

Zongliang Wu, Ruiying Lu, Ying Fu, Xin Yuan,

(参考訳) スナップショット圧縮分光画像再構成は, 単発2次元圧縮画像から3次元空間スペクトル像を再構成することを目的としている。既存の最先端の手法は、主に深い展開構造に基づいているが、固有の性能ボトルネックがある:$i$) 過度に劣化した測定を扱う不適切な問題、そして$ii$) 回帰損失に基づく再構成モデルは、ほとんど詳細を持って画像を復元する傾向にある。本稿では,遅延拡散モデル(LDM)と呼ばれる生成モデルを導入し,回帰に基づく深部展開法を強化する前に劣化のないモデルを生成する。さらに, LDMにおける大規模計算コストの課題を克服するために, 深層展開デノイザにおける知識事前生成のための軽量モデルを提案し, それらの先行処理を統合し, 高品質なスペクトル信号の詳細を補償する再構成プロセスを導出する。合成データセットと実世界のデータセットの数値的および視覚的比較は、再構成品質と計算効率の両面で提案手法の優位性を示している。コードはリリースされる。

Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: $i$) the ill-posed problem of dealing with heavily degraded measurement, and $ii$) the regression loss-based reconstruction models being prone to recover images with few details. In this paper, we introduce a generative model, namely the latent diffusion model (LDM), to generate degradation-free prior to enhance the regression-based deep unfolding method. Furthermore, to overcome the large computational cost challenge in LDM, we propose a lightweight model to generate knowledge priors in deep unfolding denoiser, and integrate these priors to guide the reconstruction process for compensating high-quality spectral signal details. Numeric and visual comparisons on synthetic and real-world datasets illustrate the superiority of our proposed method in both reconstruction quality and computational efficiency. Code will be released.

翻訳日:2024-08-28 00:57:20 公開日:2024-08-24

# SPECT画像のマルチモーダル融合によるコントラストグラフクロスビュー学習を用いたパーキンソン病の分類と臨床像

Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features ( http://arxiv.org/abs/2311.14902v4 )

ライセンス: Link先を確認

Jun-En Ding, Chien-Chin Hsu, Feng Liu,

(参考訳) パーキンソン病(PD)は世界中の何百万もの人に影響を与え、運動に影響を与えている。以前の研究では、ディープラーニングをPD予測に利用し、主に医療画像に焦点を当て、データの基盤となる多様体構造を無視した。本研究では,画像特徴と非画像特徴の両方を包含するマルチモーダルアプローチを提案し,PD分類にコントラッシブなクロスビューグラフ融合を利用する。画像と臨床特徴の低次元表現から得られたグラフビューからの埋め込みを統合した,新しいマルチモーダル・コアテンション・モジュールを提案する。これにより、より堅牢で構造化された特徴抽出が実現され、マルチビューデータ分析が改善される。さらに、クロスビュー融合学習を強化するために、簡易なコントラッシブ・ロスベース融合法が考案された。グラフビューによるマルチモーダル手法は, 精度0.91, 受信機動作特性曲線0.93の領域を5倍のクロスバリデーションで達成する。また、単に機械学習ベースの手法と比較して、非画像データに対して優れた予測能力を示す。

Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 0.91 and an area under the receiver operating characteristic curve (AUC) of 0.93 in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# DIPR: 動的イテレーションによる効率的なポイントクラウド登録

DIPR: Efficient Point Cloud Registration via Dynamic Iteration ( http://arxiv.org/abs/2312.02877v2 )

ライセンス: Link先を確認

Yang Ai, Qiang Bai, Jindong Li, Xi Yang,

(参考訳) ポイントクラウド登録(PCR)は、3Dビジョンにおいて必須のタスクである。既存の手法はますます精度を高めている。しかし、ポイントクラウド登録における重複しないポイントの多数は、多くの計算資源を消費し、登録精度に悪影響を及ぼす。この課題を克服するために、我々は、ダイナミックイテレーションフレームワークであるDIPRを通じて、スペーサー入力ポイントに基づいたオーバーラップポイントに対話的にフォーカスする、新しい効率的なポイントクラウド登録を導入する。我々は,効率的なコースツーファイン処理を実現するために,グローバルおよびローカルな登録段階を設計する。基本整合モジュールの他に,Refined Nodesでは,高密度クラスタリングを用いて重なり合う点の範囲を狭め,計算量を大幅に削減する手法を提案する。そして、SC分類器は、一致した精度に応じて登録プロセスを終了する早期終了機構として機能する。複数のデータセットに対する大規模な実験により,提案手法は,最先端の手法に比べて計算時間とGPUメモリ消費を著しく削減しつつ,優れた登録精度を実現することが示された。

Point cloud registration (PCR) is an essential task in 3D vision. Existing methods achieve increasingly higher accuracy. However, a large proportion of non-overlapping points in point cloud registration consume a lot of computational resources while negatively affecting registration accuracy. To overcome this challenge, we introduce a novel Efficient Point Cloud Registration via Dynamic Iteration framework, DIPR, that makes the neural network interactively focus on overlapping points based on sparser input points. We design global and local registration stages to achieve efficient course-tofine processing. Beyond basic matching modules, we propose the Refined Nodes to narrow down the scope of overlapping points by using adopted density-based clustering to significantly reduce the computation amount. And our SC Classifier serves as an early-exit mechanism to terminate the registration process in time according to matching accuracy. Extensive experiments on multiple datasets show that our proposed approach achieves superior registration accuracy while significantly reducing computational time and GPU memory consumption compared to state-of-the-art methods.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# 継続的な敵防衛

Continual Adversarial Defense ( http://arxiv.org/abs/2312.09481v3 )

ライセンス: Link先を確認

Qian Wang, Yaoyao Liu, Hefei Ling, Yingwei Li, Qihao Liu, Ping Li, Jiazhong Chen, Alan Yuille, Ning Yu,

(参考訳) 視覚的分類器に対する敵の攻撃は、月々急速に進化しているため、可能な限り多くの既知の攻撃に対して、多くの防衛策が提案されている。しかし、防衛システムが動作している環境は動的であり、時間とともに現れる様々なユニークな攻撃を含むため、あらゆる種類の攻撃に一般化する防衛手法を設計することは現実的ではない。動的環境に対するよく整合したアプローチは、敵データをオンラインで継続的に収集し、自らを迅速に改善する防衛システムにある。そこで,我々は,挑戦的脅威モデルに対する実践的な防衛展開を提唱し,(1)壊滅的忘れを伴わない新たな攻撃への継続的な適応,(2)少数ショット適応,(3)メモリ効率の適応,(4)クリーンデータと逆データの両方において高い精度で攻撃列に適応する継続的敵防衛(CAD)フレームワークを初めて提案した。最先端の継続的学習、少数ショット学習、およびアンサンブル学習技術を探求し、統合し、原則を立証する。大規模な実験により, 現代の敵攻撃の複数段階に対するアプローチの有効性が検証され, 多数のベースライン法に対して有意な改善が見られた。特にCADは、前回の攻撃に対して優れた性能を維持しつつ、最小限の予算と低コストの防衛失敗に迅速に適応することができる。我々の研究は、動的および進化的攻撃に対する継続的な防御適応のための、新しいパラダイムに光を当てています。

In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible. However, designing a defense method that generalizes to all types of attacks is not realistic because the environment in which defense systems operate is dynamic and comprises various unique attacks that emerge as time goes on. A well-matched approach to the dynamic environment lies in a defense system that continuously collects adversarial data online to quickly improve itself. Therefore, we put forward a practical defense deployment against a challenging threat model and propose, for the first time, the Continual Adversarial Defense (CAD) framework that adapts to attack sequences under four principles: (1) continual adaptation to new attacks without catastrophic forgetting, (2) few-shot adaptation, (3) memory-efficient adaptation, and (4) high accuracy on both clean and adversarial data. We explore and integrate cutting-edge continual learning, few-shot learning, and ensemble learning techniques to qualify the principles. Extensive experiments validate the effectiveness of our approach against multiple stages of modern adversarial attacks and demonstrate significant improvements over numerous baseline methods. In particular, CAD is capable of quickly adapting with minimal budget and a low cost of defense failure while maintaining good performance against previous attacks. Our research sheds light on a brand-new paradigm for continual defense adaptation against dynamic and evolving attacks.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# アルツハイマー病検出のための分散型プライバシ保存モデル

A Distributed Privacy Preserving Model for the Detection of Alzheimer's Disease ( http://arxiv.org/abs/2312.10237v4 )

ライセンス: Link先を確認

Paul K. Mandal,

(参考訳) 急速に進歩する医療技術の時代には、医療データのセグメンテーションは避けられなくなり、分散データでトレーニングできるプライバシー保護機械学習アルゴリズムの開発が必要とされるようになった。特に、健康保険可搬性会計法(HIPAA)が課している厳格なプライバシー規制のために、機密性の高い医療データを統合することは、必ずしも選択肢ではない。本稿では,分散データからトレーニングできるHIPAA準拠のフレームワークについて紹介する。次に、認知症、重度の脳機能障害、特に予防的ケアを伴わない簡単な作業の妨げとなる重度の神経変性疾患であるアルツハイマー病(AD)検出のための多モード垂直連合モデルを提案する。この垂直連合学習(VFL)モデルは、HIPAAが課したプライバシー制約を尊重しながら、さまざまな医療データのソースをまたいだ協調学習を可能にする分散アーキテクチャを提供する。ここで提案されたVFLアーキテクチャは、法的なプライバシー制約を尊重しながら、さまざまな医療データのソースをまたいだ協調学習を可能にする、新しい分散アーキテクチャを提供する。複数のデータモダリティを活用することにより、AD検出の堅牢性と精度を向上させることができる。このモデルは、フェデレーション学習技術の進歩に寄与するだけでなく、医学研究におけるデータセグメンテーションによるハードルを克服する公約も持つ。

In the era of rapidly advancing medical technologies, the segmentation of medical data has become inevitable, necessitating the development of privacy preserving machine learning algorithms that can train on distributed data. Consolidating sensitive medical data is not always an option particularly due to the stringent privacy regulations imposed by the Health Insurance Portability and Accountability Act (HIPAA). In this paper, I introduce a HIPAA compliant framework that can train from distributed data. I then propose a multimodal vertical federated model for Alzheimer's Disease (AD) detection, a serious neurodegenerative condition that can cause dementia, severely impairing brain function and hindering simple tasks, especially without preventative care. This vertical federated learning (VFL) model offers a distributed architecture that enables collaborative learning across diverse sources of medical data while respecting privacy constraints imposed by HIPAA. The VFL architecture proposed herein offers a novel distributed architecture, enabling collaborative learning across diverse sources of medical data while respecting statutory privacy constraints. By leveraging multiple modalities of data, the robustness and accuracy of AD detection can be enhanced. This model not only contributes to the advancement of federated learning techniques but also holds promise for overcoming the hurdles posed by data segmentation in medical research.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# ロバスト目標音声抽出のための自己教師付き遠交表現学習

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction ( http://arxiv.org/abs/2312.10305v3 )

ライセンス: Link先を確認

Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang,

(参考訳) 音声信号は、大域的な音響特性と局所的な意味情報の両方を含むため、本質的に複雑である。しかし、ターゲット音声抽出のタスクでは、話者識別とは無関係な参照音声における大域的・局所的な意味情報の特定の要素は、音声抽出ネットワーク内で話者の混乱を引き起こす可能性がある。この課題を克服するために,自己教師付き不整合表現学習法を提案する。提案手法は、参照音声符号化ネットワークとグローバル情報アンタングルネットワークを用いて、2段階のプロセスでこの問題に対処し、話者識別情報を他の無関係な要因から徐々に切り離す。我々は、音声抽出ネットワークを誘導するために、非絡み合った話者識別情報のみを用いる。さらに、適応変調変換器を導入し、混合信号の音響的表現が話者埋め込みによって乱れないようにする。このコンポーネントは、話者埋め込みを条件情報として含み、音声抽出ネットワークの自然かつ効率的なガイダンスを容易にする。実験により,本手法の有効性を実証し,話者混同の可能性を大幅に低減した。

Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we propose a self-supervised disentangled representation learning method. Our approach tackles this issue through a two-phase process, utilizing a reference speech encoding network and a global information disentanglement network to gradually disentangle the speaker identity information from other irrelevant factors. We exclusively employ the disentangled speaker identity information to guide the speech extraction network. Moreover, we introduce the adaptive modulation Transformer to ensure that the acoustic representation of the mixed signal remains undisturbed by the speaker embeddings. This component incorporates speaker embeddings as conditional information, facilitating natural and efficient guidance for the speech extraction network. Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# 一般化されたパウリ安定化符号の2次元における位相順序の抽出

Extracting topological orders of generalized Pauli stabilizer codes in two dimensions ( http://arxiv.org/abs/2312.11170v3 )

ライセンス: Link先を確認

Zijian Liang, Yijia Xu, Joseph T. Iosue, Yu-An Chen,

(参考訳) 本稿では,2次元システムにおける一般化されたパウリ安定化符号からトポロジカルデータを抽出するアルゴリズムを提案する。このアルゴリズムは$\mathbb{Z}_d$ quditsに適用される。この能力により、$\mathbb{Z}_d$ トーリック符号とは異なる位相順序を識別できる。これは、$\mathbb{Z}_p$ qudits ($p$は素数)のパウリ安定化符号が$\mathbb{Z}_p$ トーリック符号と自明な安定化符号の有限複写に等しいという確立された定理を超えて、我々の理解を拡張している。このアルゴリズムは、全てのエノンとその弦演算子を決定し、融合規則、トポロジカルスピン、ブレイディング統計の計算を可能にするように設計されている。この方法は、位相的順序の同定をガウス的除去、エルミート正規形式、スミス正規形式のトランケートされたローラン多項式を含む計算問題に変換する。さらに、このアルゴリズムは量子誤り訂正符号を研究するための体系的なアプローチを提供する。例えば、2dハニカムカラーコードから修正された自己双対CSS量子コードや、ダブルセミオントポロジオーダーや6セミオントポロジオーダーを含む非CSS量子コードなどです。

In this paper, we introduce an algorithm for extracting topological data from translation invariant generalized Pauli stabilizer codes in two-dimensional systems, focusing on the analysis of anyon excitations and string operators. The algorithm applies to $\mathbb{Z}_d$ qudits, including instances where $d$ is a nonprime number. This capability allows the identification of topological orders that differ from the $\mathbb{Z}_d$ toric codes. It extends our understanding beyond the established theorem that Pauli stabilizer codes for $\mathbb{Z}_p$ qudits (with $p$ being a prime) are equivalent to finite copies of $\mathbb{Z}_p$ toric codes and trivial stabilizers. The algorithm is designed to determine all anyons and their string operators, enabling the computation of their fusion rules, topological spins, and braiding statistics. The method converts the identification of topological orders into computational tasks, including Gaussian elimination, the Hermite normal form, and the Smith normal form of truncated Laurent polynomials. Furthermore, the algorithm provides a systematic approach for studying quantum error-correcting codes. We apply it to various codes, such as self-dual CSS quantum codes modified from the 2d honeycomb color code and non-CSS quantum codes that contain the double semion topological order or the six-semion topological order.

翻訳日:2024-08-28 00:46:25 公開日:2024-08-24

# 虚偽の否定とクラス不均衡に対する時系列コントラスト学習

Time-Series Contrastive Learning against False Negatives and Class Imbalance ( http://arxiv.org/abs/2312.11939v2 )

ライセンス: Link先を確認

Xiyuan Jin, Jing Wang, Lei Liu, Youfang Lin,

(参考訳) 表現学習における模範的な自己指導的アプローチとして、時系列コントラスト学習は現代研究において顕著な進歩を見せている。近年のコントラスト学習戦略は,適切な正と負の作法に重点を置いているが,本研究では理論的分析を行い,偽の負とクラス不均衡という,InfoNCEの損失に基づくフレームワークに固有の根本的な問題を見落としている。そこで本研究では,SimCLRフレームワークに基盤を置き,インスタンス識別タスクに係わるモデルに普遍的に適応する直感的な修正を導入する。インスタンス間の対話的学習を容易にするためにインスタンスグラフを構築することにより、複数インスタンス識別タスクを通じて教師付きコントラスト学習をエミュレートし、偽陰性の有害な影響を緩和する。さらに、グラフ構造と少ないラベル付きデータを活用し、半教師付き整合性分類を行い、マイノリティクラスの代表的能力を高める。提案手法を,4つの実世界の時系列データセット上で最も一般的な時系列比較学習法と比較し,全体的な性能において有意な優位性を示した。

As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.

翻訳日:2024-08-28 00:36:11 公開日:2024-08-24

# AI研究に対するビッグデータの影響再考:アイデアのアフィリエイトへの貢献に関するメメティック分析

Big Tech influence over AI research revisited: memetic analysis of attribution of ideas to affiliation ( http://arxiv.org/abs/2312.12881v2 )

ライセンス: Link先を確認

Stanisław Giziński, Paulina Kaczyńska, Hubert Ruczyński, Emilia Wiśnios, Bartosz Pieliński, Przemysław Biecek, Julian Sienkiewicz,

(参考訳) 人工知能(AI)研究のランドスケープでは、ビッグデータの優位性に関する議論が増えているが、この現象の理解はいまだに順調だ。本稿は、AI研究におけるビッグデータのリーチとパワーの理解を広げ、深化することを目的としている。これは単なる出版量ではなく、新しいアイデアやミームの伝播における支配性を強調している。現在の研究は、一般的にarXivや特定の学術会議のような限られたデータベースから得られる学術論文における関係の共有に対する影響の概念を単純化する。本稿の主な目的は、その影響の特定のニュアンスを解明し、どのAIアイデアがビッグデータのエンティティによって主に駆動されているかを決定することである。 AI指向の論文抽象化とその引用ネットワークにネットワークとメメティック分析を適用することで、この現象に関する深い知見を把握できる。 OpenAlexとS2ORCの2つのデータベースを利用することで、従来の試みよりもはるかに大きなスケールでそのような分析を行うことができる。以上の結果から,Big Tech関連論文は,一部地域では不当に引用されているものの,最も引用されている論文はBig TechとAcademiaの関連論文であることが示唆された。最も伝染的なミームに着目して、それらの特定のアフィリエイトグループ(Big Tech、Academia、Mixed Affiliation)への帰属は、これら3つのグループに等しく分布しているように見える。これは、AI研究に対するビッグデータの優位の概念が、議論の中で過度に単純化されていることを示唆している。

There exists a growing discourse around the domination of Big Tech on the landscape of artificial intelligence (AI) research, yet our comprehension of this phenomenon remains cursory. This paper aims to broaden and deepen our understanding of Big Tech's reach and power within AI research. It highlights the dominance not merely in terms of sheer publication volume but rather in the propagation of new ideas or memes. Current studies often oversimplify the concept of influence to the share of affiliations in academic papers, typically sourced from limited databases such as arXiv or specific academic conferences. The main goal of this paper is to unravel the specific nuances of such influence, determining which AI ideas are predominantly driven by Big Tech entities. By employing network and memetic analysis on AI-oriented paper abstracts and their citation network, we are able to grasp a deeper insight into this phenomenon. By utilizing two databases: OpenAlex and S2ORC, we are able to perform such analysis on a much bigger scale than previous attempts. Our findings suggest that while Big Tech-affiliated papers are disproportionately more cited in some areas, the most cited papers are those affiliated with both Big Tech and Academia. Focusing on the most contagious memes, their attribution to specific affiliation groups (Big Tech, Academia, mixed affiliation) seems equally distributed between those three groups. This suggests that the notion of Big Tech domination over AI research is oversimplified in the discourse.

翻訳日:2024-08-28 00:36:11 公開日:2024-08-24

# コンテキストの復活:マルチモーダル知識グラフのリンク予測としてのカメラトラップ種別分類

Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs ( http://arxiv.org/abs/2401.00608v5 )

ライセンス: Link先を確認

Vardaan Pahuja, Weidi Luo, Yu Gu, Cheng-Hao Tu, Hong-You Chen, Tanya Berger-Wolf, Charles Stewart, Song Gao, Wei-Lun Chao, Yu Su,

(参考訳) カメラトラップは生物多様性の監視と保全のための動物生態学における重要なツールである。しかし、それらの実践的応用は、新しい場所や目に見えない場所への一般化の欠如のような問題によって制限されている。画像は典型的には様々な形態の文脈と結びついており、様々な形態が存在する可能性がある。本研究では,カメラトラップ画像に関連付けられた構造化コンテキストを利用して,カメラトラップ内の種分類タスクの分布外一般化を促進する。例えば、野生動物の写真は、捕獲された時間と場所の詳細と、動物種に関する構造化された生物学的知識に関連付けられる。既存の研究ではしばしば見過ごされるが、そのようなコンテキストを組み込むことは、データの不足への対処や一般化の強化など、画像理解の改善にいくつかの潜在的な利点をもたらす。しかし、このような異種コンテキストを視覚領域に効果的に組み込むことは難しい問題である。そこで本研究では,種分類をリンク予測として,マルチモーダル知識グラフ(KG)に変換する新しいフレームワークを提案する。このフレームワークは、視覚認識のための多様なマルチモーダルコンテキストのシームレスな統合を可能にする。本フレームワークをiWildCam2020-WILDSおよびSnapshot Mountain Zebraデータセットの分布外種分類に適用し,最先端のアプローチによる競合性能を実現する。さらに,本フレームワークは,外来種を認識するためのサンプル効率を向上させる。

Camera traps are important tools in animal ecology for biodiversity monitoring and conservation. However, their practical application is limited by issues such as poor generalization to new and unseen locations. Images are typically associated with diverse forms of context, which may exist in different modalities. In this work, we exploit the structured context linked to camera trap images to boost out-of-distribution generalization for species classification tasks in camera traps. For instance, a picture of a wild animal could be linked to details about the time and place it was captured, as well as structured biological knowledge about the animal species. While often overlooked by existing studies, incorporating such context offers several potential benefits for better image understanding, such as addressing data scarcity and enhancing generalization. However, effectively incorporating such heterogeneous context into the visual domain is a challenging problem. To address this, we propose a novel framework that transforms species classification as link prediction in a multimodal knowledge graph (KG). This framework enables the seamless integration of diverse multimodal contexts for visual recognition. We apply this framework for out-of-distribution species classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets and achieve competitive performance with state-of-the-art approaches. Furthermore, our framework enhances sample efficiency for recognizing under-represented species.

翻訳日:2024-08-28 00:36:11 公開日:2024-08-24

# クロム二量体Cr$_2$の「ノズル」に向けて:ボルン・オッペンハイマーの可視光スペクトルを予測する

Towards the "puzzle" of Chromium dimer Cr$_2$: predicting the Born-Oppenheimer rovibrational spectrum ( http://arxiv.org/abs/2401.03259v3 )

ライセンス: Link先を確認

Horacio Olivares-Pilón, Daniel Aguilar-Díaz, Alexander V. Turbiner,

(参考訳) Cr$_2$二量体の実験的に観測された非自明な電子構造は、そのポテンシャルエネルギー曲線の計算を過去数十年で理論的に挑戦した。小さな核間距離での摂動理論と大きな距離での多極展開の$R$(漸近的な性質の両方が仮定される)をマッチングし、Casey-Leopold (1993) の実験データから抽出された数個のRydberg-Klein-Rees (RKR) の回転点を追加することにより、基底状態に対するポテンシャルエネルギー曲線の解析形式 $X^1\Sigma^+$ of Cr$2$ dimer が最初に発見された。これは2点Pad\'e近似の形で、29の実験振動エネルギーで3-4桁の精度を提供する。結果として得られる基底状態 $X^1\Sigma^+$ ポテンシャル曲線は、最大振動数 $\nu_\text{max}=104$ で最大振動量 $L_\text{max}=312$ で最大振動量 $> 10^{-4}$ { hartree} で、さらに 218 で弱有界な状態 (解離極限に近い) でエネルギー$<10^{-4}$ { hartree} で支える。

The experimentally-observed non-trivial electronic structure of the Cr$_2$ dimer has made the calculation of its potential energy curve a theoretical challenge in the last decades. By matching the perturbation theory at small internuclear distances $R$ and the multipole expansion at large distances $R$ (supposedly both of asymptotic nature), and by adding a few Rydberg-Klein-Rees (RKR) turning points, extracted from experimental data by Casey-Leopold (1993), the analytic form of the potential energy curve for the ground state $X^1\Sigma^+$ of the Cr$_2$ dimer is found for the first time for the whole range of internuclear distances $R$. This has the form of a two-point Pad\'e approximant and provides an accuracy of 3-4 decimal digits in 29 experimental vibrational energies. The resulting ground state $X^1\Sigma^+$ potential curve supports 19694 rovibrational states with a maximal vibrational number $\nu_\text{max}=104$ at zero angular momentum and with a maximal angular momentum $L_\text{max}=312$ with energies $> 10^{-4}$ { hartree}, and additionally 218 weakly-bound states (close to the dissociation limit) with energies $< 10^{-4}$ { hartree}.

翻訳日:2024-08-28 00:36:11 公開日:2024-08-24

# 幾何学的滑らかな運動量を持つランダム化カッツマルツ

Randomized Kaczmarz with geometrically smoothed momentum ( http://arxiv.org/abs/2401.09415v3 )

ライセンス: Link先を確認

Seth J. Alderman, Roan W. Luikart, Nicholas F. Marshall,

(参考訳) 本稿では, 線形最小二乗損失関数上の確率勾配勾配の例であるランダム化Kaczmarzアルゴリズムに幾何的に滑らかな運動量を加える効果について検討する。最小二乗損失を定義する行列の特異ベクトル方向の予測誤差に関する結果を証明する。結果の有用性を示す数値的な例をいくつか提示し,いくつかの疑問を呈する。

This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.

翻訳日:2024-08-28 00:36:11 公開日:2024-08-24

# SpeechDPR--to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering ( http://arxiv.org/abs/2401.13463v3 )

ライセンス: Link先を確認

Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee,

(参考訳) SQA(Spken Question Answering)は、機械がユーザの質問に応答するために必要である。 SQAは、認識エラーや外語彙(OOV)の問題を避けるために、これまでASRなしで達成されてきた。しかし,オープンドメインSQA(open-domain SQA)の現実的な問題として,音声アーカイブから応答を含む可能性のあるパスをマシンが最初に取り出す必要があることが考えられた。本稿では,openSQA問題の検索コンポーネントとして,最初のエンドツーエンドフレームワークであるSpeechDPR(SpeechDPR)を提案する。 SpeechDPRは、教師なしASR (UASR) とテキスト密度検索 (TDR) のカスケーディングモデルから知識を蒸留することにより、文レベルの意味表現を学習する。手書きの音声データの書き起こしは不要。最初の実験では、UASRとTDRのカスケードモデルに匹敵する性能を示し、UASRが貧弱な場合には、この手法が音声認識エラーに対してより堅牢であることを示す。

Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.

翻訳日:2024-08-28 00:26:06 公開日:2024-08-24

# 古典的ハードハミルトニアンの基底状態解く多項式時間散逸に基づく量子アルゴリズム

A polynomial-time dissipation-based quantum algorithm for solving the ground states of a class of classically hard Hamiltonians ( http://arxiv.org/abs/2401.13946v7 )

ライセンス: Link先を確認

Zhong-Xia Shang, Zi-Han Chen, Chao-Yang Lu, Jian-Wei Pan, Ming-Cheng Chen,

(参考訳) 本研究では、ハミルトン群の基底状態を解決するための量子アルゴリズムを提案する。我々のアルゴリズムに現れた指数的スピードアップのメカニズムは、オープン量子系における散逸に由来する。この散逸を利用するために、中心的なアイデアはベクトル化と正規化により$n$-qubit 密度行列 $\rho$ を 2n$-qubit 純状態 $|\rho\rangle$ として扱うことである。そうすることによって、リンドブラッドマスター方程式(LME)は、非エルミート的ハミルトニアン$L$を持つシュリンガー方程式となる。したがって、 LME の定常状態 $\rho_{ss}$ は、基底状態 $|\rho_{ss}\rangle$ と $L^\dag L$ の形で対応する。 LMEのランタイムは、初期状態と基底状態の重複を$\zeta$に依存しない。入力部分に対して、ハミルトニアン$H$が妥当な仮定の下で与えられたとき、多項式時間的古典的手続きを与え、$L$が存在して$H-E_0=L^\dag L$であるかどうかを判断し、解決する。出力部分について、ミッションは基底状態 $|\rho_{ss}\rangle$ に対する任意の作用素の期待値を推定するものと定義する。我々は、実際に$|\rho_{ss}\rangle$を作成することの量子硬さに関するいくつかの証拠を与え、これは、我々のアルゴリズムと量子位相推定のような射影に基づく量子アルゴリズムの間の潜在的な複雑さの分離を示す。さらに、我々のアルゴリズムで効率的に解けるハミルトニアンは、$\text{P}\neq \text{BQP}$を仮定する古典的なハードなインスタンスを含むことを示す。その後、他の種類のハミルトニアンへの一般化や、アルゴリズムの「非線形」力学など、アルゴリズムの重要な側面について論じ、分析する。

In this work, we give a quantum algorithm for solving the ground states of a class of Hamiltonians. The mechanism of the exponential speedup that appeared in our algorithm comes from dissipation in open quantum systems. To utilize the dissipation, the central idea is to treat $n$-qubit density matrices $\rho$ as $2n$-qubit pure states $|\rho\rangle$ by vectorization and normalization. By doing so, the Lindblad master equation (LME) becomes a Schr\"odinger equation with non-Hermitian Hamiltonian $L$. The steady-state $\rho_{ss}$ of the LME, therefore, corresponds to the ground states $|\rho_{ss}\rangle$ of Hamiltonians with the form $L^\dag L$. The runtime of the LME has no dependence on $\zeta$ the overlap between the initial state and the ground state compared with the Heisenberg scaling $\mathcal{O}(\zeta^{-1})$ in other algorithms. For the input part, given a Hamiltonian $H$, under plausible assumptions, we give a polynomial-time classical procedure to judge and solve whether there exists $L$ such that $H-E_0=L^\dag L$. For the output part, we define the mission as estimating expectation values of arbitrary operators with respect to the ground state $|\rho_{ss}\rangle$, which can be done surprisingly by an efficient measurement protocol on $\rho_{ss}$ with no need to prepare $|\rho_{ss}\rangle$. We give several pieces of evidence on the quantum hardness of really preparing $|\rho_{ss}\rangle$, which indicates a potential complexity separation between our algorithm and those projection-based quantum algorithms such as quantum phase estimation. Further, we show that the Hamiltonians that can be efficiently solved by our algorithms contain classically hard instances assuming $\text{P}\neq \text{BQP}$. Later, we discuss and analyze several important aspects of the algorithm including generalizing to other types of Hamiltonians and the "non-linear`` dynamics in the algorithm.

翻訳日:2024-08-28 00:26:06 公開日:2024-08-24

# Synergy-of-Thoughts:ハイブリッド言語モデルにおける効率的な推論

Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models ( http://arxiv.org/abs/2402.02563v4 )

ライセンス: Link先を確認

Yu Shang, Yu Li, Fengli Xu, Yong Li,

(参考訳) 大規模言語モデル(LLM)は、広範囲のタスクにおいて顕著な創発能力を示しているが、関連する高価なAPIコストは、実際のアプリケーションを大幅に制限している。チェーン・オブ・シント(CoT)やツリー・オブ・シント(ToT)といったこれまでの作業は、精度の向上に重点を置いていたが、APIコストの急激な増加を見落としている。人間の認知の二重過程理論に触発されて、効率的な推論のために、異なるスケールのハイブリッドLLMの相乗的ポテンシャルを解き放つために、「思考のシネルギー」(SoT)を提案する。デフォルトでは、SoTはより小規模の言語モデルを使用して、System 1の並列直感に類似した、低コストで直感的な思考を生成する。次に、直感的思考を相互評価する信頼度評価器を設計し、相互の対立を決定するための制御可能なしきい値機構を導入する。これらの直感的な思考が矛盾を示す場合、SoTはシステム2の介入をエミュレートするためにスケールアップされた言語モデルの反射的推論を実行し、直観的な思考をオーバーライドし、推論結果を修正します。このフレームワークはモデルに依存しないトレーニングフリーで、様々な既製のLCMで柔軟に実装できる。 6つの代表的な推論タスクの実験では、SoTはAPIコストを38.3%-75.1%削減し、最先端の推論精度とソリューションの多様性を同時に達成している。特に、オープンエンドタスクの平均トークンコストの削減は69.1%に達する。

Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. Previous works like chain-of-thought (CoT) and tree-of-thoughts (ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing API cost, which could be particularly problematic for open-ended real-world tasks with huge solution spaces. Motivated by the dual process theory of human cognition, we propose "Synergy of Thoughts"(SoT) to unleash the synergistic potential of hybrid LLMs with different scales for efficient reasoning. By default, SoT uses smaller-scale language models to generate multiple low-cost intuitive thoughts, which resembles the parallel intuitions produced by System 1. We then design a confidence evaluator where the intuitive thoughts are cross-evaluated and introduce a controllable threshold mechanism to decide their mutual conflict. If these intuitive thoughts exhibit conflicts, SoT will invoke the reflective reasoning of scaled-up language models to emulate the intervention of System 2, which will override the intuitive thoughts and rectify the reasoning results. This framework is model-agnostic and training-free, which can be flexibly implemented with various off-the-shelf LLMs. Experiments on six representative reasoning tasks show that SoT substantially reduces the API cost by 38.3%-75.1%, and simultaneously achieves state-of-the-art reasoning accuracy and solution diversity. Notably, the average token cost reduction on open-ended tasks reaches up to 69.1%.

翻訳日:2024-08-28 00:26:06 公開日:2024-08-24

# マルチアームバンドに対するL2正規化ポリシー勾配アルゴリズムの収束性

Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit ( http://arxiv.org/abs/2402.06388v2 )

ライセンス: Link先を確認

Stefana Anita, Gabriel Turinici,

(参考訳) 一方のマルチアームバンド(MAB)と他方のポリシー勾配アプローチは強化学習の最もよく使われるフレームワークであるが、MABで使用されるポリシー勾配アルゴリズムの理論的性質は十分に注目されていない。本研究では,L2$正規化項が 'softmax' パラメトリゼーションと共同で存在する状況に対する,そのような手順の収束について検討する。我々は、適切な技術的仮説の下で収束を証明し、理論的な設定を超えた状況を含む手順を数値的に検証する。実験の結果,初期推定値が解から遠い場合,時間依存正規化手順が標準手法よりも改善できることが示唆された。

Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a $L2$ regularization term is present jointly with the 'softmax' parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution.

翻訳日:2024-08-28 00:16:18 公開日:2024-08-24

# 表面符号復号のためのプログレッシブ・プロクシミティ・ビット・フリップ

Progressive-Proximity Bit-Flipping for Decoding Surface Codes ( http://arxiv.org/abs/2402.15924v2 )

ライセンス: Link先を確認

Michele Pacenti, Mark F. Flanagan, Dimitris Chytas, Bane Vasic,

(参考訳) トリックやサーフェス符号のようなトポロジカル量子符号は、エラーに対する堅牢性や量子ビット間の局所的な相互作用により、ハードウェア実装の優れた候補である。既存のデコーダは、計算複雑性の低い(コードのブロック長が理想的に線形である)、デコード遅延の低い、消費電力の低いといった要件を満たしていないことが多い。本稿では,トリックおよび表面符号に適したビットフリップ(BF)デコーダを提案する。近接ベクトルをビットを反転させるヒューリスティックな計量として導入し、隣接する量子ビットの多重誤差を補正する新しいサブルーチンを開発した。我々のアルゴリズムは2次複雑さの増大があり、最小ウェイト完全マッチングやユニオン探索のような最先端の復号アルゴリズムのように動的メモリの操作を必要としないため、効率よく実装できる。提案した復号器は、2次元トーリック符号に対して7.5%の復号しきい値を示し、2次元対称チャネル上で回転した平面符号に対して7%の復号しきい値を示した。

Topological quantum codes, such as toric and surface codes, are excellent candidates for hardware implementation due to their robustness against errors and their local interactions between qubits. However, decoding these codes efficiently remains a challenge: existing decoders often fall short of meeting requirements such as having low computational complexity (ideally linear in the code's blocklength), low decoding latency, and low power consumption. In this paper we propose a novel bit-flipping (BF) decoder tailored for toric and surface codes. We introduce the proximity vector as a heuristic metric for flipping bits, and we develop a new subroutine for correcting degenerate multiple errors on adjacent qubits. Our algorithm has quadratic complexity growth and it can be efficiently implemented as it does not require operations on dynamic memories, as do state-of-art decoding algorithms such as minimum weight perfect matching or union find. The proposed decoder shows a decoding threshold of 7.5% for the 2D toric code and 7% for the rotated planar code over the binary symmetric channel.

翻訳日:2024-08-28 00:16:18 公開日:2024-08-24

# GCAN:fMRI機能接続性に基づく説明可能な認知劣化診断のための生成的非現実的注意誘導ネットワーク

GCAN: Generative Counterfactual Attention-guided Network for Explainable Cognitive Decline Diagnostics based on fMRI Functional Connectivity ( http://arxiv.org/abs/2403.01758v2 )

ライセンス: Link先を確認

Xiongri Shen, Zhenxi Song, Zhiguo Zhang,

(参考訳) 軽度認知障害(MCI)の診断とfMRI機能的接続(FC)からの主観的認知低下(SCD)が普及しているが、ほとんどのFCベースの診断モデルは、カジュアルな推論を欠いたブラックボックスであり、認知低下に関するFCベースの神経バイオマーカーに関する知識にはほとんど寄与しない。さらに,Atlas-Aware Bidirectional Transformer (AABT) 法を考案した。 AABTは双方向戦略を用いて、脳房の各ネットワークからトークンをエンコードしデコードし、高品質なターゲットラベルFCを生成する。病院で収集したデータセットとADNIデータセットの実験では、SCDとMCIに関する文献において、生成されたアテンションマップはFC異常によく似ている。診断性能はベースラインモデルよりも優れている。コードはhttps://github.com/SXR3015/GCANで公開されている。

Diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) from fMRI functional connectivity (FC) has gained popularity, but most FC-based diagnostic models are black boxes lacking casual reasoning so they contribute little to the knowledge about FC-based neural biomarkers of cognitive decline.To enhance the explainability of diagnostic models, we propose a generative counterfactual attention-guided network (GCAN), which introduces counterfactual reasoning to recognize cognitive decline-related brain regions and then uses these regions as attention maps to boost the prediction performance of diagnostic models. Furthermore, to tackle the difficulty in the generation of highly-structured and brain-atlas-constrained FC, which is essential in counterfactual reasoning, an Atlas-Aware Bidirectional Transformer (AABT) method is developed. AABT employs a bidirectional strategy to encode and decode the tokens from each network of brain atlas, thereby enhancing the generation of high-quality target label FC. In the experiments of hospital-collected and ADNI datasets, the generated attention maps closely resemble FC abnormalities in the literature on SCD and MCI. The diagnostic performance is also superior to baseline models. The code is available at https://github.com/SXR3015/GCAN

翻訳日:2024-08-28 00:16:18 公開日:2024-08-24

# 複雑度問題:純粋相関の存在下での特徴学習のダイナミクス

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations ( http://arxiv.org/abs/2403.03375v3 )

ライセンス: Link先を確認

GuanWen Qiu, Da Kuang, Surbhi Goel,

(参考訳) 既存の研究は、ニューラルネットワークの最適化におけるコア機能よりも、素早い特徴を学習しやすくすることが多いが、それらの相対的単純さの影響は、まだ解明されていない。さらに、主に特徴学習の学習力学よりも、エンドパフォーマンスに焦点を当てている。本稿では,ブール関数解析に基づく理論的枠組みと関連する合成データセットを提案する。この設定により、(中核的な特徴と比較して)相対的な複雑性と(ラベルに関して)相関強度をきめ細かな制御が可能となり、刺激的な相関の下で特徴学習のダイナミクスを研究することができる。その結果,(1) コア特徴の学習速度を低下させ,(2) コア特徴とスプリアス特徴を別々に学習するために,(2) コア特徴とコア特徴の学習フェーズは必ずしも分離可能ではなく,(4) コア特徴が完全に学習された後も,スプリアス特徴を忘れない,という2つの異なるサブネットが形成された。以上の結果から,最終層の再トレーニングの成功を正当化して,突発的相関を除去し,突発的特徴の早期学習を生かした一般的なデバイアスアルゴリズムの限界を識別できることが示唆された。単層ReLUネットワークを用いてXOR特徴を学習する場合の理論的解析により経験的発見を支援する。

Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean function analysis. This setup allows for fine-grained control over the relative complexity (compared to core features) and correlation strength (with respect to the label) of spurious features to study the dynamics of feature learning under spurious correlations. Our findings uncover several interesting phenomena: (1) stronger spurious correlations or simpler spurious features slow down the learning rate of the core features, (2) two distinct subnetworks are formed to learn core and spurious features separately, (3) learning phases of spurious and core features are not always separable, (4) spurious features are not forgotten even after core features are fully learned. We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation and also identifies limitations of popular debiasing algorithms that exploit early learning of spurious features. We support our empirical findings with theoretical analyses for the case of learning XOR features with a one-hidden-layer ReLU network.

翻訳日:2024-08-28 00:16:18 公開日:2024-08-24

# SheetAgent: 大規模言語モデルによるスプレッドシート推論と操作のための汎用エージェント

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models ( http://arxiv.org/abs/2403.03636v2 )

ライセンス: Link先を確認

Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao,

(参考訳) スプレッドシートの操作は、ほとんどの日常的な作業に広く存在し、作業効率を大幅に向上させる。大規模言語モデル(LLM)は、最近、自動スプレッドシート操作のために試みられているが、推論の課題が存在する複雑な現実的なタスク(例えば、多段階推論と曖昧な要求を含む長い地平線操作)では、まだ研究されていない。実世界の要件とのギャップを埋めるため, 実生活課題に起因する推論依存操作を伴う長期・多カテゴリタスクを特徴とするベンチマークである$\textbf{SheetRM}$を導入する。上記の課題を緩和するために、LLMの力を利用する新しい自律エージェントである$\textbf{SheetAgent}$を提案する。 SheetAgentは3つの協調モジュールで構成されている。 $\textit{Planner}$, $\textit{Informer}$, $\textit{Retriever}$。 SheetAgentは、ベースライン上の複数のベンチマークで20～30%のパスレート改善を実現し、スプレッドシート操作の精度の向上とテーブル推論能力の向上を実現している。詳細と視覚化はhttps://sheetagent.github.io.comで公開されている。

Spreadsheet manipulation is widely existing in most daily works and significantly improves working efficiency. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce $\textbf{SheetRM}$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose $\textbf{SheetAgent}$, a novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: $\textit{Planner}$, $\textit{Informer}$, and $\textit{Retriever}$, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20-30% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities. More details and visualizations are available at https://sheetagent.github.io.

翻訳日:2024-08-28 00:16:18 公開日:2024-08-24

# MUC:ロバストな3D人体再構築のための非校正カメラの混合

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction ( http://arxiv.org/abs/2403.05055v3 )

ライセンス: Link先を確認

Yitao Zhu, Sheng Wang, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang,

(参考訳) 複数のカメラは、人物の包括的なマルチビュービデオカバレッジを提供することができる。このマルチビューデータを融合することは、行動分析のようなタスクには不可欠だが、伝統的にカメラのキャリブレーションを必要とする。さらに, 複数視点での自己閉塞による課題と, 人体形状推定の連続性を見落としている。本研究では,複数のカメラビューから3次元人体を再構築する手法を提案する。当初、トレーニング済みの人体エンコーダを用いて、各カメラビューを個別に処理し、予測されたカメラ位置とともに、人体モデルと各ビューのパラメータの再構成を可能にする。ビュー全体にわたってモデルを平均化するのではなく、各カメラからの関節距離の推定値に基づいて、人間の関節の個々のビューに重みを割り当てるように訓練されたニューラルネットワークを開発する。さらに,ダイナミックフュージョンのための人体のメッシュ面に焦点を合わせ,顔の表情と体形をシームレスに統合し,統一された人体モデルを構築する。本手法は, SMPLモデルからSMPL-Xモデルまで, 2つの公開データセット上での人体再構築に優れた性能を示した。この拡張には、より複雑な手ポーズと表情が含まれており、再建の詳細と精度が向上している。重要なのは、さまざまなカメラのフレキシブルなアドホック展開をサポートし、さまざまなアプリケーションに大きな可能性を秘めていることだ。私たちのコードはhttps://github.com/AbsterZhu/MUC.comで公開されています。

Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration, a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. In this study, we introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views. Initially, we utilize a pre-trained human body encoder to process each camera view individually, enabling the reconstruction of human body models and parameters for each view along with predicted camera positions. Rather than merely averaging the models across views, we develop a neural network trained to assign weights to individual views for all human body joints, based on the estimated distribution of joint distances from each camera. Additionally, we focus on the mesh surface of the human body for dynamic fusion, allowing for the seamless integration of facial expressions and body shape into a unified human body model. Our method has shown excellent performance in reconstructing the human body on two public datasets, advancing beyond previous work from the SMPL model to the SMPL-X model. This extension incorporates more complex hand poses and facial expressions, enhancing the detail and accuracy of the reconstructions. Crucially, it supports the flexible ad-hoc deployment of any number of cameras, offering significant potential for various applications. Our code is available at https://github.com/AbsterZhu/MUC.

翻訳日:2024-08-28 00:06:22 公開日:2024-08-24

# 永久電流輸送における非エルミアンフェルミ-ディラック分布

Non-Hermitian Fermi-Dirac Distribution in Persistent Current Transport ( http://arxiv.org/abs/2403.09569v2 )

ライセンス: Link先を確認

Pei-Xin Shen, Zhide Lu, Jose L. Lado, Mircea Trif,

(参考訳) 永久電流は外部電源を必要とせずに連続的に循環する。ここでは、これらの理論を非エルミート量子ハミルトニアンの枠組み内での散逸を含むように拡張する。グリーン関数フォーマリズムを用いて、非エルミートフェルミ・ディラック分布を導入し、複素スペクトルのみに依存する永続電流の解析式を導出する。持続電流を支持する2つの散逸モデルに適用する。 i) 位相バイアス型超伝導-常温超電導接合 (ii)磁束で結ばれた正常な環。両系統の持続電流は、現在の感受性でしか識別できない異常点に異常を示さないことを示す。本研究は, 厳密な対角化による検証を行い, 有限温度および相互作用効果を考慮に入れた。我々の定式化は、非エルミート系の量子多体観測可能を平衡で計算するための一般的な枠組みを提供し、非平衡シナリオへの潜在的な拡張を提供する。

Persistent currents circulate continuously without requiring external power sources. Here, we extend their theory to include dissipation within the framework of non-Hermitian quantum Hamiltonians. Using Green's function formalism, we introduce a non-Hermitian Fermi-Dirac distribution and derive an analytical expression for the persistent current that relies solely on the complex spectrum. We apply our formula to two dissipative models supporting persistent currents: (i) a phase-biased superconducting-normal-superconducting junction; (ii) a normal ring threaded by a magnetic flux. We show that the persistent currents in both systems exhibit no anomalies at any emergent exceptional points, whose signatures are only discernible in the current susceptibility. We validate our findings by exact diagonalization and extend them to account for finite temperatures and interaction effects. Our formalism offers a general framework for computing quantum many-body observables of non-Hermitian systems in equilibrium, with potential extensions to non-equilibrium scenarios.

翻訳日:2024-08-28 00:06:22 公開日:2024-08-24

# DSP: 多次元変圧器の動的シーケンス並列性

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers ( http://arxiv.org/abs/2403.10266v3 )

ライセンス: Link先を確認

Xuanlei Zhao, Shenggan Cheng, Chang Chen, Zangwei Zheng, Ziming Liu, Zheming Yang, Yang You,

(参考訳) 長い列への多次元変換器のスケーリングは、様々な領域で必須である。しかし、大きなメモリ要求とそのようなシーケンスの遅い速度の課題は、シーケンス並列性を必要とする。既存のすべてのアプローチは、単一のシーケンス次元に沿ってシャードに制限された組込みシーケンス並列化のカテゴリに該当するため、かなりの通信オーバーヘッドが生じる。しかし、多次元変圧器の性質は、複数の列次元にまたがる独立計算を伴う。そこで本研究では,動的シーケンス並列性(DSP)を並列性の新たな抽象化として提案する。 DSPは効率的な再シャーディング戦略で計算段階に応じて全列の並列次元を動的に切り替える。 DSPは通信コストの大幅な削減、モジュール間の適応性、最小限の制約による実装の容易性を提供する。実験により、DSPは32.2%から10倍のスループット向上により25%未満の通信量で、最先端の組込みシーケンス並列化法よりも優れていることが示された。

Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall under the category of embedded sequence parallelism, which are limited to shard along a single sequence dimension, thereby introducing significant communication overhead. However, the nature of multi-dimensional transformers involves independent calculations across multiple sequence dimensions. To this end, we propose Dynamic Sequence Parallelism (DSP) as a novel abstraction of sequence parallelism. DSP dynamically switches the parallel dimension among all sequences according to the computation stage with efficient resharding strategy. DSP offers significant reductions in communication costs, adaptability across modules, and ease of implementation with minimal constraints. Experimental evaluations demonstrate DSP's superiority over state-of-the-art embedded sequence parallelism methods by remarkable throughput improvements ranging from 32.2% to 10x, with less than 25% communication volume.

翻訳日:2024-08-28 00:06:22 公開日:2024-08-24

# EAS-SNN: 繰り返しスパイクニューラルネットワークを用いた事象検出のためのエンドツーエンド適応サンプリングと表現

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2403.12574v2 )

ライセンス: Link先を確認

Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Huajin Tang,

(参考訳) イベントカメラは、高いダイナミックレンジと時間分解能を持ち、特に動きのぼやけと困難な照明条件のシナリオにおいて、オブジェクト検出に最適である。しかし、ほとんどの既存手法は、高度な検出バックボーンと早期集約関数による時空間表現の最適化を優先しているが、適応的なイベントサンプリングの重要な問題は、ほとんど未適応のままである。スパーススパイク通信を通じてイベント駆動のパラダイムで動作するスパイキングニューラルネットワーク(SNN)は、この課題に対処するための自然なフィットとして現れます。本研究では、スパイキングニューロンの神経力学が理想的な時間事象サンプリング器の動作と密接に一致していることを明らかにする。そこで本研究では,時間記憶を付加した再帰的畳み込みSNNを活用する適応型サンプリングモジュールを提案する。さらに、スパイクベースサンプリングモジュールで発生する電位分布の制御と性能劣化に対処するため、Residual potential Dropout (RPD) と Spike-Aware Training (SAT) を導入する。ニューロモルフィック検出データセットの実証評価により,本手法は既存のスパイク法よりもはるかに少ないパラメータと時間ステップで優れていることが示された。例えば、我々の方法では、Gen1データセットで4.4\% mAPの改善が得られ、パラメータは38\%少なく、3段階しか必要としない。さらに, 適応サンプリング手法の適用性および有効性は, 従来の非スパイキングモデルに対するさらなる検証を通じて示されるように, SNN を超えて拡張される。コードはhttps://github.com/Windere/EAS-SNNで入手できる。

Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling remains largely unaddressed. Spiking Neural Networks (SNNs), which operate on an event-driven paradigm through sparse spike communication, emerge as a natural fit for addressing this challenge. In this study, we discover that the neural dynamics of spiking neurons align closely with the behavior of an ideal temporal event sampler. Motivated by this insight, we propose a novel adaptive sampling module that leverages recurrent convolutional SNNs enhanced with temporal memory, facilitating a fully end-to-end learnable framework for event-based detection. Additionally, we introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution and address performance degradation encountered in spike-based sampling modules. Empirical evaluation on neuromorphic detection datasets demonstrates that our approach outperforms existing state-of-the-art spike-based methods with significantly fewer parameters and time steps. For instance, our method yields a 4.4\% mAP improvement on the Gen1 dataset, while requiring 38\% fewer parameters and only three time steps. Moreover, the applicability and effectiveness of our adaptive sampling methodology extend beyond SNNs, as demonstrated through further validation on conventional non-spiking models. Code is available at https://github.com/Windere/EAS-SNN.

翻訳日:2024-08-27 23:56:35 公開日:2024-08-24

# スパース符号化アーキテクチャによるモデル反転攻撃に対するロバスト性の改善

Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures ( http://arxiv.org/abs/2403.14772v2 )

ライセンス: Link先を確認

Sayanton V. Dibbo, Adam Breuer, Juston Moore, Michael Teti,

(参考訳) 最近のモデル反転攻撃アルゴリズムでは、ニューラルネットワークのプライベートかつ潜在的に敏感なトレーニングデータを繰り返しクエリすることで、敵が再構築することができる。本研究では,この攻撃に対してより優れたロバスト性を得るために,スパース符号化層を利用した新しいネットワークアーキテクチャを開発する。 30年にわたるコンピュータサイエンス研究は、画像の認識、オブジェクト認識、および敵対的誤分類設定という文脈でスパースコーディングを研究してきたが、私たちの知る限りでは、最先端のプライバシー脆弱性への関連性はまだ研究されていない。本研究は,ネットワークによって符号化された無関係なプライベート情報の量を,分類精度にはほとんど影響しない方法で制御できるため,スパース符号化アーキテクチャがモデル反転攻撃を防御する有利な手段であることを仮定する。具体的には、さまざまな最先端防衛で訓練されたネットワークと比較して、スパースコーディングアーキテクチャは、様々な再構築品質指標(PSNR、SSIM、FID)で1.1～18.3の要因で、最先端のトレーニングデータ再構成を劣化させながら、同等またはそれ以上の分類精度を維持している。このパフォーマンス上のアドバンテージは、CelebAの顔から医療画像、CIFAR-10まで、5つのデータセットにまたがる。我々はクラスタ対応のPyTorchコードベースを提供し、研究を促進し、防衛評価を標準化する。

Recent model inversion attack algorithms permit adversaries to reconstruct a neural network's private and potentially sensitive training data by repeatedly querying the network. In this work, we develop a novel network architecture that leverages sparse-coding layers to obtain superior robustness to this class of attacks. Three decades of computer science research has studied sparse coding in the context of image denoising, object recognition, and adversarial misclassification settings, but to the best of our knowledge, its connection to state-of-the-art privacy vulnerabilities remains unstudied. In this work, we hypothesize that sparse coding architectures suggest an advantageous means to defend against model inversion attacks because they allow us to control the amount of irrelevant private information encoded by a network in a manner that is known to have little effect on classification accuracy. Specifically, compared to networks trained with a variety of state-of-the-art defenses, our sparse-coding architectures maintain comparable or higher classification accuracy while degrading state-of-the-art training data reconstructions by factors of 1.1 to 18.3 across a variety of reconstruction quality metrics (PSNR, SSIM, FID). This performance advantage holds across 5 datasets ranging from CelebA faces to medical images and CIFAR-10, and across various state-of-the-art SGD-based and GAN-based inversion attacks, including Plug-&-Play attacks. We provide a cluster-ready PyTorch codebase to promote research and standardize defense evaluations.

翻訳日:2024-08-27 23:56:35 公開日:2024-08-24

# LLM-as-a-Judgeに対する最適化型プロンプトインジェクション攻撃

Optimization-based Prompt Injection Attack to LLM-as-a-Judge ( http://arxiv.org/abs/2403.17710v2 )

ライセンス: Link先を確認

Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong,

(参考訳) LLM-as-a-Judgeは、大きな言語モデル(LLM)を使用して、ある質問に対する候補セットから最適な応答を選択する。 LLM-as-a-Judgeには、LLMを使った検索、AIフィードバックによる強化学習(RLAIF)、ツールの選択など、多くの応用がある。本稿では,LLM-as-a-Judgeに対する最適化に基づくプロンプトインジェクション攻撃であるJiceDeceiverを提案する。ジャッジデシーバーは、LLM-as-a-Judgeが攻撃者長質問に対する候補応答を他の候補応答が何であれ選択するように、攻撃者制御された候補応答に慎重に作成されたシーケンスを注入する。具体的には、最適化問題としてそのようなシーケンスを定式化し、近似解法として勾配法を提案する。我々の広範な評価によると、JiceDeceiveは極めて効果的であり、既存のインジェクションインジェクションアタックよりもはるかに効果的であり、私たちの問題に拡張された時に、手動でインジェクションシーケンスとジェイルブレイクアタックを作成できる。また,LLMを用いた検索,RLAIF,ツール選択の3つのケーススタディにおいて,JiceDeceiverの有効性を示す。さらに, 既知の問合せ検出, パープレキシティ検出, パープレキシティウィンドウ検出などの防御策も検討した。以上の結果から,これらの防衛戦略は不十分であり,新たな防衛戦略開発への緊急の必要性が浮き彫りにされている。

LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question. LLM-as-a-Judge has many applications such as LLM-powered search, reinforcement learning with AI feedback (RLAIF), and tool selection. In this work, we propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge. JudgeDeceiver injects a carefully crafted sequence into an attacker-controlled candidate response such that LLM-as-a-Judge selects the candidate response for an attacker-chosen question no matter what other candidate responses are. Specifically, we formulate finding such sequence as an optimization problem and propose a gradient based method to approximately solve it. Our extensive evaluation shows that JudgeDeceive is highly effective, and is much more effective than existing prompt injection attacks that manually craft the injected sequences and jailbreak attacks when extended to our problem. We also show the effectiveness of JudgeDeceiver in three case studies, i.e., LLM-powered search, RLAIF, and tool selection. Moreover, we consider defenses including known-answer detection, perplexity detection, and perplexity windowed detection. Our results show these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

翻訳日:2024-08-27 23:56:35 公開日:2024-08-24

# 勾配偏光アルゴリズムによる3dB限界を超える最適機械的四次スキューズ

Optimized mechanical quadrature squeezing beyond the 3-dB limit via a gradient-descent algorithm ( http://arxiv.org/abs/2404.13563v3 )

ライセンス: Link先を確認

Yu-Hong Liu, Jie-Qiao Liao,

(参考訳) メカニカル・クアチュア・スクイーズ状態の調製は、キャビティ・オプティメニクスにおいて重要な意味を持つ。なぜなら、圧縮された状態は基本的な量子力学を理解し、現代の量子技術を利用するために広く応用されているからである。そこで本研究では, 勾配偏光アルゴリズムを用いて, 最適キャビティフィールド駆動パルスを求めることにより, 典型的なキャビティ・オプティメカカル・システムにおいて, メカニカル・クィアリングを生成するための信頼性の高い手法を提案する。熱フォノン占有率100の3dB定常限界を超える機械共振器において, 強い4次スキューズを実現する。さらに、機械的スクイーズを1つの機械的発振期間内に超急速生成することができる。また、生成したメカニカルスクイーズに付随する最適パルス駆動値を求め、メカニカルスクイーズ生成のメカニズムを解析した。本稿では,量子光学および量子情報科学における最適量子制御の適用を促進する。

The preparation of mechanical quadrature-squeezed states holds significant importance in cavity optomechanics because the squeezed states have extensive applications in understanding fundamental quantum mechanics and exploiting modern quantum technology. Here, we propose a reliable scheme for generating mechanical quadrature squeezing in a typical cavity optomechanical system via seeking optimal cavity-field driving pulses using the gradient-descent algorithm. We realize strong quadrature squeezing in a mechanical resonator that exceeds the 3-dB steady-state limit, even with a thermal phonon occupancy of 100. Furthermore, the mechanical squeezing can be ultrarapidly created within one mechanical oscillation period. We also obtain the optimal pulsed drivings associated with the created mechanical squeezings and analyze the mechanism for mechanical squeezing generation. This paper will promote the application of optimal quantum control in quantum optics and quantum information science.

翻訳日:2024-08-27 23:46:51 公開日:2024-08-24

# 能動物体検出のためのパラメータ効率向上のための外部プロンプト特性

External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection ( http://arxiv.org/abs/2404.15008v2 )

ライセンス: Link先を確認

Wen Liang, Peipei Ran, Mengchao Bai, Xiao Liu, P. Bilha Githinji, Wei Zhao, Peiwu Qin,

(参考訳) Salient Object Detection (SOD) は、画像中の最も健全なオブジェクトを見つけ、ピクセルレベルのバイナリマスクを出力することを目的としている。トランスフォーマーに基づく手法は,グローバルなセマンティック理解によって有望な性能を達成する。しかし、これらのモデルは大規模であり、多くの訓練パラメータを必要とする傾向にある。そこで本研究では,SOD用変圧器のポテンシャルをよりよく活用するために,学習パラメータの削減を目的としたパラメータ効率の高い微調整手法を提案する。 ExPert(AdaptedR Tuning)と呼ばれる我々のモデルでは、冷凍トランスエンコーダの層間にアダプタとインジェクタが分散したエンコーダ・デコーダ構造が特徴的である。アダプタモジュールは事前訓練されたバックボーンをSODに適合させ、インジェクタモジュールは外部のプロンプト機能を組み込んで、正常なオブジェクトの認識を高める。総合的な実験により,本手法の優位性を実証した。従来の最先端(SOTA)モデルを5つのSODデータセットに渡すことで、ExPertは80.2Mのトレーニングパラメータを持つECSSDデータセットで0.215の平均絶対誤差(MAE)を達成し、SelfReformerより21%、EGNetより47%向上した。

Salient object detection (SOD) aims at finding the most salient objects in images and outputs pixel-level binary masks. Transformer-based methods achieve promising performance due to their global semantic understanding, crucial for identifying salient objects. However, these models tend to be large and require numerous training parameters. To better harness the potential of transformers for SOD, we propose a novel parameter-efficient fine-tuning method aimed at reducing the number of training parameters while enhancing the salient object detection capability. Our model, termed EXternal Prompt features Enhanced adapteR Tuning (ExPert), features an encoder-decoder structure with adapters and injectors interspersed between the layers of a frozen transformer encoder. The adapter modules adapt the pretrained backbone to SOD while the injector modules incorporate external prompt features to enhance the awareness of salient objects. Comprehensive experiments demonstrate the superiority of our method. Surpassing former state-of-the-art (SOTA) models across five SOD datasets, ExPert achieves 0.215 mean absolute error (MAE) in the ECSSD dataset with 80.2M trained parameters, 21% better than SelfReformer and 47% better than EGNet.

翻訳日:2024-08-27 23:46:51 公開日:2024-08-24

# 大規模言語モデルを用いた逆グラフの再合成

Re-Thinking Inverse Graphics With Large Language Models ( http://arxiv.org/abs/2404.15228v2 )

ライセンス: Link先を確認

Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Abrevaya, Michael J. Black,

(参考訳) 逆グラフィックス - イメージを物理変数に変換するタスクで、レンダリングされると観察されたシーンの再生を可能にする - は、コンピュータビジョンとグラフィックスの基本的な課題である。画像が3Dシーンのオブジェクトの形状、色、材料特性などの構成要素に切り離されるのに成功するには、環境を包括的に理解する必要がある。この複雑さは、ドメインをまたいで一般化する既存の慎重に設計されたアプローチの能力を制限します。大規模言語モデル(LLM)が新しい文脈に一般化するゼロショット能力に着想を得て,そのようなモデルに符号化された広い世界知識を活用して,逆グラフ問題の解法を提案する。そこで本研究では,LLMを中心とした逆グラフフレームワークである逆グラフ大言語モデル(IG-LLM)を提案する。我々は、凍結した事前学習されたビジュアルエンコーダと連続的な数値ヘッドを組み込んで、エンドツーエンドのトレーニングを可能にする。本研究は,画像空間の監督を使わずに,次から次へと予測することで,逆グラフィックスを促進するLLMの可能性を実証するものである。本分析により,LLMの視覚的知識を利用した画像の空間的推論が可能となった。コードとデータはhttps://ig-llm.is.tue.mpg.de/で公開しています。

Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.

翻訳日:2024-08-27 23:46:51 公開日:2024-08-24

# 物理インフォームドニューラルネットワークにおける最適時間サンプリング

Optimal time sampling in physics-informed neural networks ( http://arxiv.org/abs/2404.18780v2 )

ライセンス: Link先を確認

Gabriel Turinici,

(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network、PINN)は、科学計算応用における方程式の解法として非常に強力なパラダイムである。手順の重要な部分は、方程式が時間依存であるとき、時間サンプリングを含む方程式残差の最小化である。文献では、サンプリングは均一である必要はないが、初期時間は過重であるべきだと論じられたが、この選択には厳密な説明は提供されなかった。本研究では, ニューラルネットワーク収束に関する標準的な仮説として, 最適時間サンプリングが指数分布に追従することを示す。特に、均一な時間サンプリングを使用するのが最適な時期と、そうすべきでない時期について説明する。この結果は、線形方程式、バーガーズ方程式、ローレンツ系に関する数値的な例で示される。

Physics-informed neural networks (PINN) is a extremely powerful paradigm used to solve equations encountered in scientific computing applications. An important part of the procedure is the minimization of the equation residual which includes, when the equation is time-dependent, a time sampling. It was argued in the literature that the sampling need not be uniform but should overweight initial time instants, but no rigorous explanation was provided for this choice. In the present work we take some prototypical examples and, under standard hypothesis concerning the neural network convergence, we show that the optimal time sampling follows a (truncated) exponential distribution. In particular we explain when is best to use uniform time sampling and when one should not. The findings are illustrated with numerical examples on linear equation, Burgers' equation and the Lorenz system.

翻訳日:2024-08-27 23:46:51 公開日:2024-08-24

# 単元ブロック最適化スキームと古典的後処理を組み合わせた変分量子固有解法の最適化

Better Optimization of Variational Quantum Eigensolvers by Combining the Unitary Block Optimization Scheme with Classical Post-Processing ( http://arxiv.org/abs/2404.19027v4 )

ライセンス: Link先を確認

Xiaochuan Ding, Bryan K. Clark,

(参考訳) 変分量子固有解法(VQE)は、ハミルトンの古典的に難解な基底状態を見つけるための有望なアプローチである。 Unitary Block Optimization Scheme (UBOS) は最先端のVQE方式であり、ゲートを網羅し、他のゲート環境における各ゲートの最適パラメータを求める。 UBOSは、SGD (Stochastic Gradient Descent) に対する等級によって、基底状態への収束時間を改善する。それにもかかわらず、ショットノイズから生じる非常にノイズの多い期待値に直面して、収束率と最終的な収束エネルギーの両方に苦しむ。ここではUBOSを改良する2つの古典的後処理手法について述べる。ガウス過程回帰(GPR)を用いて、量子コンピュータからの原データを用いて人工的な拡張現実データを生成し、改良されたパラメータを解く際の全体的なエラーを低減する。 DROPR(Double Robust Optimization plus Rejection)を用いることで、非典型的にノイズの多いデータの外部への流出を防止し、特に誤った単一最適化ステップを発生させ、ノイズ測定に対するロバスト性を高める。これらの手法を組み合わせることで、UBOSが3倍の誤差で到達する最終的な相対誤差をさらに削減し、追加の量子測定やサンプリングオーバーヘッドを追加することなく実現できる。この研究は、古典的資源を用いて量子計測結果を後処理する技術を開発することにより、VQEアルゴリズムを著しく改善することを示した。

Variational Quantum Eigensolvers (VQE) are a promising approach for finding the classically intractable ground state of a Hamiltonian. The Unitary Block Optimization Scheme (UBOS) is a state-of-the-art VQE method which works by sweeping over gates and finding optimal parameters for each gate in the environment of other gates. UBOS improves the convergence time to the ground state by an order of magnitude over Stochastic Gradient Descent (SGD). It nonetheless suffers in both rate of convergence and final converged energies in the face of highly noisy expectation values coming from shot noise. Here we develop two classical post-processing techniques which improve UBOS especially when measurements have large noise. Using Gaussian Process Regression (GPR), we generate artificial augmented data using original data from the quantum computer to reduce the overall error when solving for the improved parameters. Using Double Robust Optimization plus Rejection (DROPR), we prevent outlying data which are atypically noisy from resulting in a particularly erroneous single optimization step thereby increasing robustness against noisy measurements. Combining these techniques further reduces the final relative error that UBOS reaches by a factor of three without adding additional quantum measurement or sampling overhead. This work further demonstrates that developing techniques which use classical resources to post-process quantum measurement results can significantly improve VQE algorithms.

翻訳日:2024-08-27 23:36:49 公開日:2024-08-24

# 視覚言語概念ボトルネックモデルにおける概念アライメントの改善

Improving Concept Alignment in Vision-Language Concept Bottleneck Models ( http://arxiv.org/abs/2405.01825v2 )

ライセンス: Link先を確認

Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Wai-Kin Kong, Alex Kot,

(参考訳) 概念ボトルネックモデル (Concept Bottleneck Models, CBM) は、クラス予測を行う前に、イメージを人間の解釈可能な概念にマッピングする。近年のアプローチでは、大規模言語モデル(LLM)にテキスト概念の生成を促し、視覚言語モデル(VLM)を用いてこれらの概念をCBM訓練に活用することにより、CBM構築を自動化する。しかし、LCMが生成したものよりも、人間の専門家が定義した概念でCBMを構築し、より信頼できるものにすることが望まれている。本研究では, 鳥の細粒化や動物分類などの領域において, 専門家が定義した概念に対するVLM概念スコアの忠実性について, 詳しく検討する。これらの結果から,CLIPのようなVLMは高い分類性能を達成しつつも,概念と対応する視覚入力を正しく関連付けるのに苦慮していることが明らかとなった。このミスアライメントは、結果のモデルを解釈しにくく、信頼性の低いものにする。この問題に対処するために,数個のラベル付き概念サンプルを活用して,真に視覚的な概念を活性化し,CLIPモデルにおける概念アライメントを改善する,新しいコントラシブ・セミスーパーバイザード(CSS)学習法を提案する。 3つのベンチマークデータセットに対する大規模な実験により,提案手法は概念(+29.95)と分類(+3.84)の両方を著しく向上させるが,人間に注釈付けされた概念ラベルのごく一部しか必要としないことが示された。分類性能をさらに向上するために,クラスレベルの介入手順を導入し,クラス間の相違を識別し,それらの概念空間に介入することで誤りを低減した。

Concept Bottleneck Models (CBM) map images to human-interpretable concepts before making class predictions. Recent approaches automate CBM construction by prompting Large Language Models (LLMs) to generate text concepts and employing Vision Language Models (VLMs) to score these concepts for CBM training. However, it is desired to build CBMs with concepts defined by human experts rather than LLM-generated ones to make them more trustworthy. In this work, we closely examine the faithfulness of VLM concept scores for such expert-defined concepts in domains like fine-grained bird species and animal classification. Our investigations reveal that VLMs like CLIP often struggle to correctly associate a concept with the corresponding visual input, despite achieving a high classification performance. This misalignment renders the resulting models difficult to interpret and less reliable. To address this issue, we propose a novel Contrastive Semi-Supervised (CSS) learning method that leverages a few labeled concept samples to activate truthful visual concepts and improve concept alignment in the CLIP model. Extensive experiments on three benchmark datasets demonstrate that our method significantly enhances both concept (+29.95) and classification (+3.84) accuracies yet requires only a fraction of human-annotated concept labels. To further improve the classification performance, we introduce a class-level intervention procedure for fine-grained classification problems that identifies the confounding classes and intervenes in their concept space to reduce errors.

翻訳日:2024-08-27 23:36:49 公開日:2024-08-24

# Time Evidence Fusion Network: 長期連続予測におけるマルチソースビュー

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting ( http://arxiv.org/abs/2405.06419v2 )

ライセンス: Link先を確認

Tianxiang Zhan, Yuanpeng He, Yong Deng, Zhen Li,

(参考訳) 現実的なシナリオでは、特に大規模なデータセットを扱う場合、時系列予測がタイムラインを必要とする。その結果、モデルアーキテクチャの探索は研究において年々話題となっている。これらの性能要求を満たすため,情報融合の観点から新しいバックボーンを提案する。 The Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN) のエビデンス理論に基づく導入により,優れた性能を実現することができる。一方,マルチソース情報融合の視点は,予測精度を効果的に向上させる。 BPA がファジィ理論によって生成されるという事実から、EFN もかなり解釈可能である。実際のデータ実験では、TEFNはPatchTSTに匹敵する低い誤差で最先端を部分的に達成し、Dlinearのような性能モデルを上回る動作効率を実現した。一方、TEFNは、ランダムなハイパーパラメータ選択において、高いロバスト性および小さなエラー変動を有する。 TEFNは、単一面において究極のものを達成するモデルではなく、性能、正確性、安定性、解釈可能性のバランスをとるモデルである。

In practical scenarios, time series forecasting necessitates timeliness, especially when dealing with large datasets. Consequently, the exploration of model architectures remains a perennially trending topic in research. To meet these performance demands, we propose a novel backbone from the perspective of information fusion. Introducing the Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN), based on evidence theory, allows us to achieve superior performance. On the other hand, the perspective of multi-source information fusion effectively improves the accuracy of forecasting. Due to the fact that BPA is generated by fuzzy theory, TEFN also has considerable interpretability. In real data experiments, the TEFN partially achieved state-of-the-art, with low errors comparable to PatchTST, and operating efficiency surpass performance models such as Dlinear. Meanwhile, TEFN has high robustness and small error fluctuations in the random hyperparameter selection. TEFN is not a model that achieves the ultimate in single aspect, but a model that balances performance, accuracy, stability, and interpretability.

翻訳日:2024-08-27 23:36:49 公開日:2024-08-24

# プラケットモデル, セルオートマタおよび測定による臨界度

Plaquette Models, Cellular Automata, and Measurement-induced Criticality ( http://arxiv.org/abs/2405.08286v2 )

ライセンス: Link先を確認

Hanchen Liu, Xiao Chen,

(参考訳) ここでは,複数スピン相互作用項をプラケット項と呼ぶ2次元ランダム化プラケットモデルのクラスを,1-p$の確率で単一サイトスピン項に置き換える。異なる$p$により、基底状態の位相遷移、あるいは同値な対称性作用素の位相遷移を観察する。 p$ が変化するにつれて、対称性作用素は拡大から空間の局所化へと変化する。これらのモデルは1+1Dランダム化セルオートマトンダイナミクスと等価に理解することができ、2D遷移を1+1D動的吸収相転移と解釈することができる。本稿では,3体あるいは5体の相互作用を持つラケット項に着目し,遷移の普遍性クラスについて検討する。具体的には, 1+1D クリフォード力学で観測される測定誘起エンタングルメント相転移と, ランダムバルクパウリ測定により誘導される2次元クラスター状態の境界エンタングルメント遷移と同じ普遍性クラスに属することを示す。この研究は、古典的なスピンモデル、セルオートマトン、ハイブリッドランダム回路における遷移の間の接続を確立する。

We present a class of two-dimensional randomized plaquette models, where the multi-spin interaction term, referred to as the plaquette term, is replaced by a single-site spin term with a probability of $1-p$. By varying $p$, we observe a ground state phase transition, or equivalently, a phase transition of the symmetry operator. We find that as we vary $p$, the symmetry operator changes from being extensive to being localized in space. These models can be equivalently understood as 1+1D randomized cellular automaton dynamics, allowing the 2D transition to be interpreted as a 1+1D dynamical absorbing phase transition. In this paper, our primary focus is on the plaquette term with three or five-body interactions, where we explore the universality classes of the transitions. Specifically, for the model with five-body interaction, we demonstrate that it belongs to the same universality class as the measurement-induced entanglement phase transition observed in 1+1D Clifford dynamics, as well as the boundary entanglement transition of the 2D cluster state induced by random bulk Pauli measurements. This work establishes a connection between transitions in classical spin models, cellular automata, and hybrid random circuits.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# パッチ付き視覚プロンプトインジェクタに対する視覚言語モデルの保護

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors ( http://arxiv.org/abs/2405.10529v2 )

ライセンス: Link先を確認

Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao,

(参考訳) 大規模言語モデルはますます顕著になり、人工知能の次のフロンティアとしてマルチモーダリティへのシフトを示唆している。視覚言語モデル(VLM)はこの進歩の最前線にあり、視覚とテキストのデータを組み合わせて理解と相互作用を強化する革新的な方法を提供している。しかし、この統合は攻撃面を拡大する。パッチベースの敵攻撃は、既存の多くの文献で示されているように、物理的な視覚応用において最も現実的な脅威モデルと考えられている。本稿では,VLMのターゲットコンテンツを生成するために,相手が相手のパッチを利用するようなパッチ付きビジュアルプロンプトインジェクションを提案する。本研究は, 画素単位のランダム化に対して, パッチを施した対向性刺激が感受性を示すことを明らかにした。この知見を活かして、スムージング技術に根ざした防御機構であるSmoothVLMを導入し、特に、パッチされた視覚的プロンプトインジェクタの脅威からVLMを保護するようにした。我々のフレームワークは、2つの主要なVLMにおいて攻撃成功率を0%から5.0%の範囲に格段に低下させ、67.3%から95.0%のコンテキスト回復を実現し、セキュリティとユーザビリティのバランスを示す。

Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 衝突モデルにおける可変非マルコフ力学-コヒーレント輸送への応用

Tunable non-Markovian dynamics in a collision model: an application to coherent transport ( http://arxiv.org/abs/2405.10685v2 )

ライセンス: Link先を確認

Simone Rijavec, Giuseppe Di Pietra,

(参考訳) 非マルコビアン性の異なる環境に結合したシステムの情報力学を解析するための衝突モデルを提案する。量子ビットの固定および剛性貯留層に偏極チャネルを適用することにより、非マルコビアン性の度合いを制御する。偏極チャネルの効果を特徴付けるとともに、3つの相互作用する量子ビットの連鎖上の励起のコヒーレント輸送を研究するためにモデルを適用する。システム-環境結合強度と非マルコビアン性の程度がプロセスにどのように影響するかを示す。興味深いことに、マルコフ環境は励起のコヒーレント輸送を強化するために好まれる場合もある。

We propose a collision model to investigate the information dynamics of a system coupled to an environment with varying degrees of non-Markovianity. We control the degree of non-Markovianity by applying a depolarising channel to a fixed and rigid reservoir of qubits. We characterise the effect of the depolarising channel and apply the model to study the coherent transport of an excitation on a chain of three interacting qubits. We show how the system-environment coupling strength and the degree of non-Markovianity affect the process. Interestingly, in some cases a Markovian environment is preferable to enhance the coherent transport of the excitation.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 実用的なマッハ・ツェンダー干渉計を用いた差動位相シフトQKD

Differential-phase-shift QKD with practical Mach-Zehnder interferometer ( http://arxiv.org/abs/2405.11760v2 )

ライセンス: Link先を確認

Akihiro Mizutani, Masanori Terashita, Junya Matsubayashi, Shogo Mori, Ibuki Matsukura, Suzuna Tagawa, Kiyoshi Tamaki,

(参考訳) 微分位相シフト(DPS)量子鍵分布は、単純な実装のため有望なプロトコルであり、コヒーレントパルス列と受動測定ユニットで実現可能である。 DPSプロトコルを実装するためには、ユーザのデバイスに実用上の欠陥を取り入れたセキュリティ証明を確立することが重要であるが、既存のセキュリティ証明は、マッハ・ツェンダー干渉計を用いて測定ユニットに非現実的な仮定を行う。本稿では、測定ユニットに主要な欠陥を組み込むことにより、DPSプロトコルの実装セキュリティを強化する。具体的には、既存のセキュリティ証明で想定されているように、正確に50\%$のビームスプリッタよりも、送信範囲の既知の実用的なビームスプリッタを使用することが可能である。数値シミュレーションにより, 理想値からの透過率の変動が$\pm0.5\%である場合でも, 鍵レートは0.57でしか劣化しないことが示された。この結果は,DPSプロトコルの実現可能性を示すものである。

Differential-phase-shift (DPS) quantum key distribution stands as a promising protocol due to its simple implementation, which can be realized with a train of coherent pulses and a passive measurement unit. To implement the DPS protocol, it is crucial to establish security proofs incorporating practical imperfections in users' devices, however, existing security proofs make unrealistic assumptions on the measurement unit using a Mach-Zehnder interferometer. In this paper, we enhance the implementation security of the DPS protocol by incorporating a major imperfection in the measurement unit. Specifically, our proof enables us to use practical beam splitters with a known range of the transmittance rather than the one with exactly $50\%$, as was assumed in the existing security proofs. Our numerical simulations demonstrate that even with fluctuations of $\pm0.5\%$ in the transmittance from the ideal value, the key rate degrades only by a factor of 0.57. This result highlights the feasibility of the DPS protocol with practical measurement setups.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# ディジタル双生児における生産プロセス最適化のためのスパースアテンション駆動品質予測

Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins ( http://arxiv.org/abs/2405.11895v2 )

ライセンス: Link先を確認

Yanlei Yin, Lihua Wang, Dinh Thai Hoang, Wenbo Wang, Dusit Niyato,

(参考訳) プロセス産業では、生産ラインの長期的かつ効率的な最適化には、生産ラインパラメータを微調整するために、運用状態のリアルタイムモニタリングと分析が必要である。しかし、運用論理の複雑さと生産プロセスパラメータの複雑な結合は、プロセス全体の正確な数学的モデルを開発するのを難しくし、効率的な最適化機構の展開を妨げる。これらの困難を鑑みて、我々は、データ駆動方式で運用ロジックを符号化することで、生産ラインのデジタルツインをデプロイすることを提案する。デジタル双生児における機器運用状況と製品品質指標を反映した実世界のデータを反復的にマッピングすることにより、自己注意型時間畳み込みニューラルネットワークに基づく生産プロセスの品質予測モデルを採用する。このモデルは、デジタルツインのデータ駆動状態の進化を可能にする。デジタルツインは、実際の動作条件の情報と品質に敏感な分析結果を集約する役割を担い、仮想現実性進化によるプロセス生産の最適化を容易にする。ディジタルツインを情報フローキャリアとして活用し、キープロセスインジケータから時間的特徴を抽出し、提案したディープニューラルネットワークに基づく生産プロセス品質予測モデルを確立する。本手法は,本手法により,仮想及び実生産ライン間のシームレスな統合を促進できることを示す。この統合により、平均動作状態予測精度が98%以上、製品品質受け入れ率が96%以上となる。

In the process industry, long-term and efficient optimization of production lines requires real-time monitoring and analysis of operational states to fine-tune production line parameters. However, complexity in operational logic and intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by encoding its operational logic in a data-driven approach. By iteratively mapping the real-world data reflecting equipment operation status and product quality indicators in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production with virtual-reality evolution. Leveraging the digital twin as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed deep neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98% and a product quality acceptance rate of over 96%.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 非決定論的因果モデル

Nondeterministic Causal Models ( http://arxiv.org/abs/2405.14001v2 )

ライセンス: Link先を確認

Sander Beckers,

(参考訳) 非巡回決定論的構造方程式モデルを非決定論的ケースに一般化し、反事実に対して改良された意味論を提供すると主張する。ハルパーンによって開発された標準的な決定論的意味論(およびギャレス・アンド・パールの最初の提案に基づく)は、親変数への値の割り当てにはそれぞれの子変数に固有の代入が存在すると仮定し、実際の世界(モデルのすべての変数に対する値の代入)がそれぞれの介入に対してユニークな逆実世界を特定すると仮定する。どちらの仮定も非現実的であり、それゆえ、我々は両方の仮定を我々の提案に落としている。構造方程式における多値関数を許容する。さらに, 実世界で得られた方程式の解が, あらゆる反現実の世界に保存されるようにセマンティクスを調整した。我々は、結果の論理の健全かつ完全な公理化を提供し、ハルパーンによる標準的な論理と、我々のより近いより最近の提案と比較する。最後に、我々のモデルを確率的ケースに拡張し、カウサルベイズネットワークにおいても、カウンターファクトの特定方法を公開することを示す。

We generalize acyclic deterministic structural equation models to the nondeterministic case and argue that it offers an improved semantics for counterfactuals. The standard, deterministic, semantics developed by Halpern (and based on the initial proposal of Galles & Pearl) assumes that for each assignment of values to parent variables there is a unique assignment to their child variable, and it assumes that the actual world (an assignment of values to all variables of a model) specifies a unique counterfactual world for each intervention. Both assumptions are unrealistic, and therefore we drop both of them in our proposal. We do so by allowing multi-valued functions in the structural equations. In addition, we adjust the semantics so that the solutions to the equations that obtained in the actual world are preserved in any counterfactual world. We provide a sound and complete axiomatization of the resulting logic and compare it to the standard one by Halpern and to more recent proposals that are closer to ours. Finally, we extend our models to the probabilistic case and show that they open up the way to identifying counterfactuals even in Causal Bayesian Networks.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 振り返り:教師投影ヘッドを用いた自己教師型学習による軽量モデルへの効率的な埋込み蒸留

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning ( http://arxiv.org/abs/2405.15311v3 )

ライセンス: Link先を確認

Khanh-Binh Nguyen, Chae Jung Park,

(参考訳) 自己教師付き学習(SSL)は、大量のラベルのないデータで効果的な表現を学習する能力に注目が集まっている。軽量モデルは、コントラストと一貫性の制約を用いて、より大規模な自己教師付き事前訓練モデルから蒸留することができる。しかし、プロジェクションヘッドのサイズの違いは、生徒が先生の埋め込みを正確に模倣することを困難にしている。本稿では,教師のプロジェクションヘッドを学生に再利用する「textsc{Retro}」を提案する。例えば、ResNet-50/101/152を教師として使用したEfficientNet-B0のトレーニングでは、ImageNetの線形結果が6.9\%$、69.3\%$、69.8\%$に改善され、パラメータが大幅に少ない。

Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 強化サイテーションバイアスを用いた大規模言語モデルによる人間のクエンテーションパターンの反映

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias ( http://arxiv.org/abs/2405.15739v3 )

ライセンス: Link先を確認

Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, Vincent Ginis,

(参考訳) サイテーションの実践は科学的知識の構造を形成するのに不可欠であるが、それらは現代の規範や偏見の影響を受けていることが多い。 LLM(Large Language Models)の出現は、これらのプラクティスに新たなダイナミクスをもたらす。興味深いことに、LLMが推奨する参照の特徴と潜在的なバイアスは、そのパラメトリックな知識に完全に依存しており、検索や検索強化世代に依存していない。本稿では,これらの特徴を,GPT-4の知識遮断日後に公表されたAAAI,NeurIPS,ICML,ICLRのデータセットを用いて解析する。本実験では,これらの論文の中で,匿名化された文中の引用を学術的に参照することを提案する。以上の結果より, 出版年, タイトル長, 著者数, 会場数に比して, 高い引用バイアスが持続することが明らかとなった。 GPT-4と、より有能なモデルであるGPT-4oとClaude 3.5の両方で、論文はトレーニングデータの一部である。さらに, LLMの既存の参照と存在しない参照の特徴との間に大きな一貫性が見られ, モデルが励起パターンを内部化していることが示唆された。引用グラフを解析することにより、推奨される参照が関連する引用コンテキストに埋め込まれていることを示し、引用ネットワークのより深い概念的内部化を示唆する。 LLMは引用生成に役立つが、マシュー効果のような既存のバイアスを増幅し、新しいバイアスを導入し、科学的知識の拡散を引き起こす可能性がある。

Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date. In our experiment, LLMs are tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias, which persists even after controlling for publication year, title length, number of authors, and venue. The results hold for both GPT-4, and the more capable models GPT-4o and Claude 3.5 where the papers are part of the training data. Additionally, we observe a large consistency between the characteristics of LLM's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases, such as the Matthew effect, and introduce new ones, potentially skewing scientific knowledge dissemination.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# 摂動フォージェリによる逆データ検出

Detecting Adversarial Data via Perturbation Forgery ( http://arxiv.org/abs/2405.16226v2 )

ライセンス: Link先を確認

Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Ping Li, Jiazhong Chen, Shijuan Huang, Ning Yu,

(参考訳) 敵対的攻撃に対する防御戦略として、敵対的検出は、自然・敵対的データ間の分布の相違とノイズパターンに基づいて、データフローから敵対的データを識別・フィルタリングすることを目的としている。従来の検出手法は勾配に基づく対向攻撃の検出では高い性能を示すが,不均衡および異方性雑音パターンを回避した生成モデルに基づく新たな攻撃は回避される。さらに悪いことに、既存のテクニックは、防衛を展開する前に攻撃データへのアクセスを必要とするか、推論にかなりの時間的コストを要し、防御者が目にしない新たな攻撃を防御するためには実用的ではない。本稿では, 対向雑音分布間の近接関係について検討し, 開放被覆の存在を実証する。このオープンカバーと自然データの分布を区別することで、あらゆる種類の敵攻撃に対して強力な一般化能力を持つ検出器を開発することができる。この知見に基づいて,ノイズ分布の摂動,スパースマスク生成,擬似逆数データ生成を含む摂動フォージェリを提案し,特定のモデルに依存せず,未知の勾配ベース,生成モデルベース,物理的逆数攻撃を検出可能な逆数検出器を訓練する。複数の汎用的および顔的データセットに対して行われた総合的な実験は、幅広い攻撃範囲で、我々の手法の強力な一般化を検証した。

As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalanced and anisotropic noise patterns evade detection. Even worse, existing techniques either necessitate access to attack data before deploying a defense or incur a significant time cost for inference, rendering them impractical for defending against newly emerging attacks that are unseen by defenders. In this paper, we explore the proximity relationship between adversarial noise distributions and demonstrate the existence of an open covering for them. By learning to distinguish this open covering from the distribution of natural data, we can develop a detector with strong generalization capabilities against all types of adversarial attacks. Based on this insight, we heuristically propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks, while remaining agnostic to any specific models. Comprehensive experiments conducted on multiple general and facial datasets, with a wide spectrum of attacks, validate the strong generalization of our method.

翻訳日:2024-08-27 23:27:05 公開日:2024-08-24

# ポリアディック超対称性

Polyadic supersymmetry ( http://arxiv.org/abs/2406.02188v2 )

ライセンス: Link先を確認

Steven Duplij,

(参考訳) 一次元超対称性量子力学の玩具モデルに適用した多元化法(著者が提案する)を考慮し、超対称性の多進アナログを導入する。スーパーチャージは、初期の研究で定義された$n$-ary sigma行列を用いてポリアディックに一般化される。このように、スーパーチャージとハミルトニアンのポリアディックアナログは巡回シフトブロック行列形式をとり、N$拡張および多重グレードSQMとは異なる方法で多生成量子状態を記述することができる。対応する超対称性を$n$-ary Lie superalgebra ("n$ is the arity of the initial associative multiplication") として構成する一方で、新たな括弧が2,2\leq m<n$と関連する$m$-ary superalgebrasシリーズ(二元超代数では不可能である)を発見した。さらに、アリティ$m$が小さくなったら、ハミルトン作用素でさえ高次(微分作用素として)の塔を得るが、奇数$m$の場合、高次奇超電荷の塔を得ることができ、対応する代数は奇数セクターのみからなる。

We introduce a polyadic analog of supersymmetry by considering the polyadization procedure (proposed by the author) applied to the toy model of one-dimensional supersymmetric quantum mechanics. The supercharges are generalized to polyadic ones using the $n$-ary sigma matrices defined in earlier work. In this way, polyadic analogs of supercharges and Hamiltonians take the cyclic shift block matrix form, and they can describe multidegenerated quantum states in a way that is different from the $N$-extended and multigraded SQM. While constructing the corresponding supersymmetry as an $n$-ary Lie superalgebra ($n$ is the arity of the initial associative multiplication), we have found new brackets with a reduced arity of $2\leq m<n$ and a related series of $m$-ary superalgebras (which is impossible for binary superalgebras). In the case of even reduced arity $m$ we obtain a tower of higher order (as differential operators) even Hamiltonians, while for $m$ odd we get a tower of higher order odd supercharges, and the corresponding algebra consists of the odd sector only.

翻訳日:2024-08-27 23:17:21 公開日:2024-08-24

# 語彙データ分類のためのファジィ畳み込みニューラルネットワーク

Fuzzy Convolution Neural Networks for Tabular Data Classification ( http://arxiv.org/abs/2406.03506v4 )

ライセンス: Link先を確認

Arun D. Kulkarni,

(参考訳) 近年、畳み込みニューラルネットワーク(CNN)は、特に画像やテキストの分類タスクにおいて、様々な領域における顕著な性能のために、多くの注目を集めている。しかし、表形式のデータ分類への応用はいまだ未定である。バイオインフォマティクス、ファイナンス、非画像データが一般的である医療など、多くの分野がある。非画像データの分類にCNNを適用することは、依然として非常に困難である。本稿では,従来の機械学習手法と深層学習手法のギャップを埋めることを目的として,表層データ分類におけるCNNの有効性について検討する。本稿では,特徴ベクトル内の局所パターンを捉えるための表データに適した,ファジィ畳み込みニューラルネットワーク(FCNN)を提案する。提案手法では,特徴値をファジィメンバシップにマップする。ファジィメンバシップベクトルは、CNNモデルのトレーニングに使用される画像に変換される。訓練されたCNNモデルは未知の機能ベクトルを分類するために使用される。提案手法を検証するために,6つの複雑なノイズデータセットを生成した。各データセットからランダムに70パーセントのサンプルをトレーニングに使用し、30%をテストに使用しました。データセットはまた、決定木(DT)、サポートベクターマシン(SVM)、ファジィニューラルネットワーク(FNN)、ベイズ分類器、ランダムフォレスト(RF)といった最先端の機械学習アルゴリズムを使用して分類された。実験結果から,提案手法は従来の手法と比較して,有意な表現を表象データから効果的に学習し,競争力や優れた性能を達成できることが示唆された。全体として、提案したFCNNモデルは、表型データ分類タスクの代替として有望であり、構造化データ分析におけるディープラーニングを活用する新たな機会を、新たな期待と潜在的に解放する可能性を示唆している。

Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.

翻訳日:2024-08-27 23:17:21 公開日:2024-08-24

# 有限サイズ効果による量子相対エントロピーの測定

Measuring quantum relative entropy with finite-size effect ( http://arxiv.org/abs/2406.17299v2 )

ライセンス: Link先を確認

Masahito Hayashi,

(参考訳) 相対エントロピー$D(\rho\|\sigma)$を$\sigma$が知られているときに推定する。我々は、Cram\'{e}r-Rao型が相対的バレントロピーと等しいことを示す。我々の推定器は次元 $d$ が固定されたときに Cram\'{e}r-Rao 型が有界となる。また、次元$d$が増加すると、サンプルの複雑さ$O(d^2)$も達成する。このサンプルの複雑さは、$\sigma$が完全に混合状態であるときに最適である。また、時間複雑性は$O(d^6 \polylog d)$である。提案する推定器は両設定で統一的に動作する。

We study the estimation of relative entropy $D(\rho\|\sigma)$ when $\sigma$ is known. We show that the Cram\'{e}r-Rao type bound equals the relative varentropy. Our estimator attains the Cram\'{e}r-Rao type bound when the dimension $d$ is fixed. It also achieves the sample complexity $O(d^2)$ when the dimension $d$ increases. This sample complexity is optimal when $\sigma$ is the completely mixed state. Also, it has time complexity $O(d^6 \polylog d)$. Our proposed estimator unifiedly works under both settings.

翻訳日:2024-08-27 22:57:33 公開日:2024-08-24

# データセンターの不確実性を考慮した脱炭

Uncertainty-Aware Decarbonization for Datacenters ( http://arxiv.org/abs/2407.02390v2 )

ライセンス: Link先を確認

Amy Li, Sihang Liu, Yi Ding,

(参考訳) 本論文は, データセンター脱炭のための炭素強度予測の不確かさを定量化するための最初の試みである。我々は、時間的および空間的な2つの不確実性を特定し、分析し、システム含意について議論する。炭素強度予測の不確かさの定量化における時間的ダイナミクスに対処するために,共形予測に基づく枠組みを導入する。評価結果から, 本手法は, 種々の意義レベルにわたる不確実性定量化において, 対象範囲を頑健に達成できることが示唆された。生産電力トレースを用いた2つのケーススタディを行い,時間的および空間的負荷シフトに着目した。その結果, スケジュール決定に不確実性を導入することで, それぞれ5%と14%の二酸化炭素排出量の増加を防止できることがわかった。これらの割合は20MWのデータセンターで2.1トンと10.4トンの炭素排出量を絶対的に減少させる。

This paper represents the first effort to quantify uncertainty in carbon intensity forecasting for datacenter decarbonization. We identify and analyze two types of uncertainty -- temporal and spatial -- and discuss their system implications. To address the temporal dynamics in quantifying uncertainty for carbon intensity forecasting, we introduce a conformal prediction-based framework. Evaluation results show that our technique robustly achieves target coverages in uncertainty quantification across various significance levels. We conduct two case studies using production power traces, focusing on temporal and spatial load shifting respectively. The results show that incorporating uncertainty into scheduling decisions can prevent a 5% and 14% increase in carbon emissions, respectively. These percentages translate to an absolute reduction of 2.1 and 10.4 tons of carbon emissions in a 20 MW datacenter cluster.

翻訳日:2024-08-27 22:57:33 公開日:2024-08-24

# マルチ話者とターゲット話者の同時音声認識システムとしてのウィスパーの活用

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System ( http://arxiv.org/abs/2407.09817v2 )

ライセンス: Link先を確認

Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng,

(参考訳) マルチトーカー音声認識とターゲットストーカー音声認識は、どちらもマルチトーカーコンテキストにおける転写を含むが、依然として大きな課題である。しかし、既存のメソッドは両方のタスクを同時に処理しようとすることは滅多にない。本研究では,言語基盤モデルであるWhisperを,複数話者とターゲット話者の同時音声認識タスクに適応させる先駆的手法を提案する。具体的には (i)Whisperを凍結し、Sidecarセパレータをエンコーダに差し込み、複数の話者に対する混合埋め込みを分離する。 2 目標話者識別器を導入して、目標話者のハエへの埋め込みの流れを識別し、cueとして3秒の音声のみを必要とする。 3) タスク適応性を向上させるため, デコーダのソフトプロンプトチューニングについて検討した。 AishellMix Mandarin データセット上で,2-および3-talker の LibriMix と LibriSpeechMix の2つのタスクに対して従来手法よりも優れており,AishellMix Mandarin データセット上でのマルチストーカー ASR のゼロショット性能が許容できる。

Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks. Specifically, (i) we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers; (ii) a Target Talker Identifier is introduced to identify the embedding flow of the target talker on the fly, requiring only three-second enrollment speech as a cue; (iii) soft prompt tuning for decoder is explored for better task adaptation. Our method outperforms previous methods on two- and three-talker LibriMix and LibriSpeechMix datasets for both tasks, and delivers acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.

翻訳日:2024-08-27 22:47:47 公開日:2024-08-24

# サルカスム検出は大規模言語モデルにおけるステップバイステップ推論プロセスか?

Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? ( http://arxiv.org/abs/2407.12725v2 )

ライセンス: Link先を確認

Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin,

(参考訳) 一連の中間推論ステップを共同作業することで、LLMを逐次的に考えさせるような複雑な問題を解くための大きな言語モデル(LLM)の能力が大幅に向上する。しかしながら、人間の皮肉理解は直感的で全体論的認知過程と見なされ、様々な言語的、文脈的、感情的な手がかりが統合され、必ずしもステップバイステップのやり方に従わないような包括的理解を形成する。本論の妥当性を検証するために,4つのサブメソッド,Viz. chain of contradiction (CoC), Graph of cues (GoC), bagging of cues (BoC), tensor of cues (ToC) を含む新たなプロンプトフレームワーク(SarcasmCue)を導入する。 1) CoC と GoC は GPT-4 や Claude 3.5 といったより高度なモデルで優れた性能を示し,3.5% の改善を実現した。 2)ToCはLLMが小さく評価された場合,F1スコアが最良基準値に対して29.7%向上するなど,他の手法よりも優れていた。 (3)提案したフレームワークは、4つのデータセットでF1スコアの4.2%、2.0%、29.7%、58.2%を継続的に最先端(ToT)にプッシュします。これは提案したフレームワークの有効性と安定性を示している。

Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive understanding, in a way that does not necessarily follow a step-by-step fashion. To verify the validity of this argument, we introduce a new prompting framework (called SarcasmCue) containing four sub-methods, viz. chain of contradiction (CoC), graph of cues (GoC), bagging of cues (BoC) and tensor of cues (ToC), which elicits LLMs to detect human sarcasm by considering sequential and non-sequential prompting methods. Through a comprehensive empirical comparison on four benchmarks, we highlight three key findings: (1) CoC and GoC show superior performance with more advanced models like GPT-4 and Claude 3.5, with an improvement of 3.5%. (2) ToC significantly outperforms other methods when smaller LLMs are evaluated, boosting the F1 score by 29.7% over the best baseline. (3) Our proposed framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets. This demonstrates the effectiveness and stability of the proposed framework.

翻訳日:2024-08-27 22:47:47 公開日:2024-08-24

# PriPL-Tree: 局所微分プライバシー下での任意分布の正確なレンジクエリ

PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy ( http://arxiv.org/abs/2407.13532v2 )

ライセンス: Link先を確認

Leixia Wang, Qingqing Ye, Haibo Hu, Xiaofeng Meng,

(参考訳) 局所微分プライバシー(LDP)の文脈における範囲クエリの回答は、オンライン分析処理(OLAP)において広く研究されている問題である。既存のLCPソリューションはすべて、各ドメインパーティション内の均一なデータ分散を前提としており、データの分散が変化している現実のシナリオと一致しない可能性があるため、不正確な見積もりをもたらす。この問題に対処するために、任意の分布に対する範囲クエリに答えるために、階層木構造とPL関数を組み合わせた新しいデータ構造であるPriPL-Treeを導入する。 PriPL-Treeは、いくつかの行セグメントで基礎となるデータ分散を正確にモデル化し、レンジクエリのより正確な結果をもたらす。さらに、新しいデータ認識適応グリッドを用いた多次元ケースに拡張する。これらのグリッドは、PriPL-Treesを通して得られた限界分布からの洞察を利用してグリッドを適応的に分割し、基礎となる分布の密度に適応する。実データと合成データの両方に対する広範な実験により、任意のデータ分布にまたがる範囲クエリに応答する最先端のソリューションに対するPriPL-Treeの有効性と優位性を示した。

Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree, a novel data structure that combines hierarchical tree structures with piecewise linear (PL) functions to answer range queries for arbitrary distributions. PriPL-Tree precisely models the underlying data distribution with a few line segments, leading to more accurate results for range queries. Furthermore, we extend it to multi-dimensional cases with novel data-aware adaptive grids. These grids leverage the insights from marginal distributions obtained through PriPL-Trees to partition the grids adaptively, adapting the density of underlying distributions. Our extensive experiments on both real and synthetic datasets demonstrate the effectiveness and superiority of PriPL-Tree over state-of-the-art solutions in answering range queries across arbitrary data distributions.

翻訳日:2024-08-27 22:47:47 公開日:2024-08-24

# トレーディング・デビル・ファイナル:株式市場によるバックドア攻撃とベイズ最適化

Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization ( http://arxiv.org/abs/2407.14573v4 )

ライセンス: Link先を確認

Orson Mengara,

(参考訳) 生成人工知能の出現以来、あらゆる企業や研究者が、商業的であろうとなかろうと、独自の生成モデルの開発を急いできた。これらの強力な新ツールのユーザ数を考えると、LLM(大規模言語モデル)が学習した時に何が起こるかを説明するための、本質的に検証可能な方法は今のところありません。例えば,Webから収集した膨大な量のデータに頼って高速かつ効率的な結果を得る自動音声認識システムでは,音響データ中毒に基づくMarketBackFinal 2.0と呼ばれるバックドアアタックが開発され,MarketBackFinal 2.0は主に現代の株式市場モデルに基づいている。 LLMに依存する可能性のある音声ベースのトランスフォーマーの脆弱性を示す。

Since the advent of generative artificial intelligence, every company and researcher has been rushing to develop their own generative models, whether commercial or not. Given the large number of users of these powerful new tools, there is currently no intrinsically verifiable way to explain from the ground up what happens when LLMs (large language models) learn. For example, those based on automatic speech recognition systems, which have to rely on huge and astronomical amounts of data collected from all over the web to produce fast and efficient results, In this article, we develop a backdoor attack called MarketBackFinal 2.0, based on acoustic data poisoning, MarketBackFinal 2.0 is mainly based on modern stock market models. In order to show the possible vulnerabilities of speech-based transformers that may rely on LLMs.

翻訳日:2024-08-27 22:47:47 公開日:2024-08-24

# ディープラーニングによるブラックスクールデルタヘッジの強化

Enhancing Black-Scholes Delta Hedging via Deep Learning ( http://arxiv.org/abs/2407.19367v2 )

ライセンス: Link先を確認

Chunhui Qiao, Xiangwei Wan,

(参考訳) 本稿では,ニューラルネットワークを応用して,ヒージング関数とインプリッドブラックスコールズデルタの間の残差を学習する,オプションのための深いデルタヒージングフレームワークを提案する。このアプローチはこれらの残留物のスムーズな特性を活用し、ディープラーニング性能を向上させる。 10年間の日次S&P 500指数データを用いて,平均2乗1ステップのヘッジ誤差を損失関数として用いた残差の学習が,ヒージング関数を直接学習するよりも,ヒージング性能を100%以上向上させることを示した。残差を学習する際に入力機能を追加することで、呼び出しよりもヘッジパフォーマンスが向上する。さらに,3年間のデータによる残差の学習は,10年間のデータを直接学習する際の過度な性能と一致し,本手法が要求するデータ量が少なくなることを証明した。

This paper proposes a deep delta hedging framework for options, utilizing neural networks to learn the residuals between the hedging function and the implied Black-Scholes delta. This approach leverages the smoother properties of these residuals, enhancing deep learning performance. Utilizing ten years of daily S&P 500 index option data, our empirical analysis demonstrates that learning the residuals, using the mean squared one-step hedging error as the loss function, significantly improves hedging performance over directly learning the hedging function, often by more than 100%. Adding input features when learning the residuals enhances hedging performance more for puts than calls, with market sentiment being less crucial. Furthermore, learning the residuals with three years of data matches the hedging performance of directly learning with ten years of data, proving that our method demands less data.

翻訳日:2024-08-27 20:50:26 公開日:2024-08-24

# TVDiag:マルチモーダルデータを用いたタスク指向・ビュー不変の故障診断フレームワーク

TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data ( http://arxiv.org/abs/2407.19711v2 )

ライセンス: Link先を確認

Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li,

(参考訳) マイクロサービスベースのシステムは、複雑なインタラクションとスケールの拡大によって、信頼性上の問題に悩まされることが多い。観測可能性技術の急速な成長に伴い、ログやメトリクス、トレースといった多様なモニタリングデータを活用することにより、根本原因のローカライゼーションや障害タイプ識別など、さまざまな障害診断を実現する方法が提案されている。しかし、単一モーダルデータを使用する従来の障害診断手法では、制限された情報のため、すべての障害シナリオをほとんどカバーできない。近年,深層学習に基づくマルチモーダルデータ統合のための故障診断手法が提案されている。しかしながら、これらの手法は、特定のモダリティと異なる診断タスクとの関係を無視して、非差別的にモダリティを結合し、障害診断においてそれらを等しく扱う傾向にある。この監視は、各モダリティが提供するユニークな利点の有効利用を妨げる。この制限に対処するため、我々は、マイクロサービスベースのシステムにおいて、犯人のマイクロサービスインスタンスを特定し、それらの障害タイプ(Net-packets Corruptionなど)を特定するためのマルチモーダルな障害診断フレームワークである、‘textit{TVDiag}’を提案する。 \textit{TVDiag} はタスク指向学習を用いて各モダリティの潜在的な優位性を高め、対照的な学習に基づくクロスモーダルなアソシエーションを確立し、ビュー不変の障害情報を抽出する。さらに、トレーニング中の通常のマイクロサービスインスタンスの可観測性をランダムに不活性化するグラフレベルのデータ拡張戦略を開発し、トレーニングデータの不足を軽減する。実験結果によると、‘textit{TVDiag} はマルチモーダル故障診断における最先端の手法よりも優れており、2つのデータセットで F1スコアが4.08 %以上上昇し、少なくとも55.94 %高いHR@1$精度を達成した。

Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. Several failure diagnosis methods have been recently proposed to integrate multimodal data based on deep learning. These methods, however, tend to combine modalities indiscriminately and treat them equally in failure diagnosis, ignoring the relationship between specific modalities and different diagnostic tasks. This oversight hinders the effective utilization of the unique advantages offered by each modality. To address the limitation, we propose \textit{TVDiag}, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types (e.g., Net-packets Corruption) in microservice-based systems. \textit{TVDiag} employs task-oriented learning to enhance the potential advantages of each modality and establishes cross-modal associations based on contrastive learning to extract view-invariant failure information. Furthermore, we develop a graph-level data augmentation strategy that randomly inactivates the observability of some normal microservice instances during training to mitigate the shortage of training data. Experimental results show that \textit{TVDiag} outperforms state-of-the-art methods in multimodal failure diagnosis, achieving at least a 55.94\% higher $HR@1$ accuracy and over a 4.08\% increase in F1-score across two datasets.

翻訳日:2024-08-27 20:50:26 公開日:2024-08-24

# 拡散フィードバックがCLIPの改善に役立つ

Diffusion Feedback Helps CLIP See Better ( http://arxiv.org/abs/2407.20171v4 )

ライセンス: Link先を確認

Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang,

(参考訳) ドメインやモダリティ間のオープンワールド表現を抽象化するコントラスト言語-画像事前学習(CLIP)は、さまざまなビジョンやマルチモーダルタスクの基盤となっている。しかし、最近の研究では、CLIPには、方向、量、色、構造などの区別がほとんどできない、深刻な視覚的欠点があることが示されている。これらの視覚的欠点は、CLIP上に構築されたマルチモーダルな大規模言語モデル(MLLM)の認識能力を制限している。主な理由は、CLIPのトレーニングに使用される画像テキストペアが、テキストの特異性や画像の多様性が欠如しているため、本質的にバイアスがあるためかもしれない。本稿では,CLIPモデルに対して,自己教師付き拡散プロセスを通じて視覚的欠点を克服する,簡単なポストトレーニング手法を提案する。私たちはDIVAを導入し、DIffusionモデルをCLIPのビジュアルアシスタントとして使用します。特に、DIVAはテキストから画像への拡散モデルからの生成的フィードバックを活用して、画像のみ(対応するテキストなしで)CLIP表現を最適化する。本研究では,MMVP-VLMベンチマークにおけるCLIPの性能向上を実証し,マルチモーダル理解とセグメンテーションタスクにおけるMLLMとビジョンモデルの性能向上を図る。 29の画像分類と検索ベンチマークの大規模な評価により、我々のフレームワークはCLIPの強力なゼロショット能力を保っていることを確認した。コードはhttps://github.com/baaivision/DIVA.comで公開されている。

Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the perception capabilities of multimodal large language models (MLLMs) built on CLIP. The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. We introduce DIVA, which uses the DIffusion model as a Visual Assistant for CLIP. Specifically, DIVA leverages generative feedback from text-to-image diffusion models to optimize CLIP representations, with only images (without corresponding text). We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e.g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks. Extensive evaluation on 29 image classification and retrieval benchmarks confirms that our framework preserves CLIP's strong zero-shot capabilities. The code is available at https://github.com/baaivision/DIVA.

翻訳日:2024-08-27 20:50:26 公開日:2024-08-24

# Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment ( http://arxiv.org/abs/2408.06266v2 )

ライセンス: Link先を確認

Karel D'Oosterlinck, Winnie Xu, Chris Develder, Thomas Demeester, Amanpreet Singh, Christopher Potts, Douwe Kiela, Shikib Mehri,

(参考訳) 大規模言語モデル(LLM)は、しばしばコントラスト的なアライメント目標と選好ペアデータセットを使って整列される。モデル、ペアデータ、および目的間の相互作用は複雑な手順を作り、時にサブパー結果を生成する。私たちはこれを研究し、それを見つけます二嗜好データにより、基礎となる応答が対照的な場合に、より良い学習信号が得られること。 (ii)アライメントの目的は、トレーニング中にモデルに対するさらなるコントロールを指定すると、パフォーマンスが向上する。これらの知見に基づき、よりコントラスト的な選好ペアを生み出すデータ生成手法であるContrastive Learning from AI Revisions (CLAIR)と、制御可能でより安定したアライメント目的であるAnchored Preference Optimization (APO)を紹介する。我々はLlama-3-8B-Instructを、様々な類似したデータセットとアライメント目標を用いて調整し、MixEval-Hardスコアを測定する。 CLAIRの選好はすべてのデータセットの中で最強のパフォーマンスをもたらし、APOは一貫してコントロール可能な目標よりも優れています。我々の最良のモデルは、APOで32K CLAIRの選好に基づいて訓練され、Llama-3-8B-Instructを7.65%改善し、GPT4-turboとのギャップを45%短縮しました。私たちのコードはhttps://github.com/ContextualAI/CLAIR_and_APO.orgで公開されています。

Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code is available at https://github.com/ContextualAI/CLAIR_and_APO.

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# 二次元マニピュレーションのための模倣学習アルゴリズムの比較

A Comparison of Imitation Learning Algorithms for Bimanual Manipulation ( http://arxiv.org/abs/2408.06536v2 )

ライセンス: Link先を確認

Michael Drolet, Simon Stepputtis, Siva Kailas, Ajinkya Jain, Jan Peters, Stefan Schaal, Heni Ben Amor,

(参考訳) ロボット工学における模倣学習アルゴリズムの普及の中で、ハイパーパラメータの感度、トレーニングの容易さ、データ効率、パフォーマンスに関するそれらの特性は、高精度産業にインスパイアされた環境ではよく研究されていない。本研究は,顕著な模倣学習アプローチの限界とメリットを実証し,それらの特性を解析する。我々は,操作対象と環境との複数の接触を含む設定において,過剰に制約された動的システムを含む複雑な双方向操作タスクにおいて,各アルゴリズムを評価する。模倣学習は複雑なタスクを解くのに適しているが、全てのアルゴリズムが環境やハイパーパラメータの摂動、訓練要件、性能、使いやすさを扱うという点で等しいわけではない。本研究では,これらの特徴の実証的影響について,慎重に設計した実験手法と学習環境を用いて検討する。 Paper website: https://bimanual-imitation.github.io/

Amidst the wide popularity of imitation learning algorithms in robotics, their properties regarding hyperparameter sensitivity, ease of training, data efficiency, and performance have not been well-studied in high-precision industry-inspired environments. In this work, we demonstrate the limitations and benefits of prominent imitation learning approaches and analyze their capabilities regarding these properties. We evaluate each algorithm on a complex bimanual manipulation task involving an over-constrained dynamics system in a setting involving multiple contacts between the manipulated object and the environment. While we find that imitation learning is well suited to solve such complex tasks, not all algorithms are equal in terms of handling environmental and hyperparameter perturbations, training requirements, performance, and ease of use. We investigate the empirical influence of these key characteristics by employing a carefully designed experimental procedure and learning environment. Paper website: https://bimanual-imitation.github.io/

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# 長時間のアウト・オブ・ディストリビューション検出:タイルへの注意の優先順位付け

Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail ( http://arxiv.org/abs/2408.06742v2 )

ライセンス: Link先を確認

Yina He, Lei Peng, Yongcun Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li,

(参考訳) 現在のアウト・オブ・ディストリビューション(OOD)検出法は、通常はバランスの取れたイン・ディストリビューション(ID)データを仮定する。長い尾のOOD検出に対する以前のアプローチは、しばしばヘッドクラスのセマンティクスを減らしてIDデータのバランスをとる。しかし、この削減はIDデータの分類精度に深刻な影響を及ぼす可能性がある。このタスクの主な課題は、テールクラスの機能の深刻な欠如であり、OODデータとの混同につながります。この問題に対処するために,削減ではなく拡張を用いたPATT法を提案する。我々の主な直感は、von Mises-Fisher(vMF)分布を混合してIDデータと温度スケーリングモジュールをモデル化し、IDデータの信頼性を高めることである。これにより、IDとOODデータの区別を促進しながら、IDクラスのセマンティクスを暗黙的に強化し、無限のコントラスト対を生成することができる。 IDデータの分類性能を損なうことなくOODデータの検出をさらに強化するため,推測フェーズにおける特徴キャリブレーションを提案する。テールクラスを優先し、OODデータの信頼性を低下させる訓練セットから注意重みを抽出することにより、OOD検出能力を向上する。大規模実験により,本手法は様々なベンチマークにおいて最先端の手法よりも優れていることを確認した。

Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of this task lies in the severe lack of features for tail classes, leading to confusion with OOD data. To tackle this issue, we introduce a novel Prioritizing Attention to Tail (PATT) method using augmentation instead of reduction. Our main intuition involves using a mixture of von Mises-Fisher (vMF) distributions to model the ID data and a temperature scaling module to boost the confidence of ID data. This enables us to generate infinite contrastive pairs, implicitly enhancing the semantics of ID classes while promoting differentiation between ID and OOD data. To further strengthen the detection of OOD data without compromising the classification performance of ID data, we propose feature calibration during the inference phase. By extracting an attention weight from the training set that prioritizes the tail classes and reduces the confidence in OOD data, we improve the OOD detection capability. Extensive experiments verified that our method outperforms the current state-of-the-art methods on various benchmarks.

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# パーキンソン病の重症度評価のためのホームターン角推定

Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment ( http://arxiv.org/abs/2408.08182v2 )

ライセンス: Link先を確認

Qiushuo Cheng, Catherine Morgan, Arindam Sikdar, Alessandro Masullo, Alan Whone, Majid Mirmehdi,

(参考訳) パーキンソン病(PD)の患者は、疾患が進行するにつれて向きを変えるなど、歩行が徐々に悪化することがある。既存の臨床評価ツールでは、診療所内での短い評価に制限されるため、時間ごとのPD症状の変動を捉えることができない。歩行の回転角を連続的かつ受動的に測定することは、歩行特性をPDの疾患進行の敏感な指標として活用するための要素である。本稿では, ビデオから3次元骨格を抽出し, 股関節と膝関節の回転を計算し, 回転角を自動的に定量化する深層学習手法を提案する。我々は、現在最先端の人間のポーズ推定モデルであるFastposeとStrided Transformerを、24人の被験者(PDの12人、健康管理のボランティアの12人)の動画クリップを、自宅のような設定でPDデータセットからトリミングする(Turn-REMAP)。また、人間3.6Mの人間ポーズベンチマークからターンビデオデータセットであるTurn-H3.6Mを3D地上真実でキュレートし、我々の手法をさらに検証する。これまでの歩行研究は、主にクリニックや研究室でスクリプト歩行の結果を評価するが、この研究は、手ぶらりした衣服や照明不足などの複雑さがある自由生活の家庭環境に焦点を当てている。自由生活環境において正確な地上真実データを得るのに難しかったため、専門医の手によるラベル付けに基づいて、最寄りのビン45^\circ$に定量化する。提案手法は,旋回計算精度が41.6%,平均絶対誤差が34.7{\deg},重み付き精度WPrecが68.3%である。これは、一眼レフカメラデータを用いて、自宅のPD患者によるターンの定量化を行う最初の研究である。

People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring gait turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints. We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP). We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method. Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on free-living home settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin $45^\circ$ based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7{\deg}, and a weighted precision WPrec of 68.3% for Turn-REMAP. This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting.

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# LLMのフェローシップ:合成選好最適化データセット生成のためのマルチエージェントワークフロー

The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation ( http://arxiv.org/abs/2408.08688v2 )

ライセンス: Link先を確認

Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza,

(参考訳) 本稿では、マルチエージェントワークフローを用いて生成した合成優先度最適化(PO)データセットについて、データセット生成プロセスにおけるこれらのワークフローの有効性とポテンシャルを評価する。 POデータセット生成には,(1)応答評価,(2)応答生成という2つのモジュールが必要である。応答評価モジュールでは,Lumge Language Models (LLMs) からの応答を評価し,評価する。反応評価モジュールを2段階のプロセスで評価する。ステップ1では,LLMを3つの異なるプロンプト戦略を用いて評価する。ステップ2では, LLM-as-a-Judge, LLMs-as-a-Jury, LLM Debateの性能の比較を行う。それぞれのステップで、人間のアノテーションとLDM間のCohen's Kappaを用いたラスタ間合意を用いる。応答生成モジュールについて、LLM評価器の設定を用いて、LLMフィードバックループの異なる構成を比較する。我々は、勝利率(LLM評価器によって生成フレームワークがベストに選択される回数)を用いて、生成のための最適なマルチエージェント構成を決定する。両方のモジュールで最適な設定を特定した後、GPT、Gemma、Llamaファミリーのモデルを使用して、上記のパイプラインを使用してPOデータセットを生成します。我々は2種類のPOデータセットを生成し、1つは個々のLLMの生成能力を向上し、もう1つはマルチエージェントワークフローを改善する。 GPT4o-as-a-Judgeは,GPTファミリーからの応答を含まない場合,データセット間でより一貫性があることが評価された。さらに、Llamaをジェネレータとし、GemmaをレビュアーとするLLMフィードバックループは、LlamaとGemmaをそれぞれ71.8%、73.8%の勝利率を達成した。

This paper presents synthetic Preference Optimization (PO) datasets generated using multi-agent workflows and evaluates the effectiveness and potential of these workflows in the dataset generation process. PO dataset generation requires two modules: (1) response evaluation, and (2) response generation. In the response evaluation module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. In each step, we use inter-rater agreement using Cohen's Kappa between human annotators and LLMs. For the response generation module, we compare different configurations for the LLM Feedback Loop using the identified LLM evaluator configuration. We use the win rate (the fraction of times a generation framework is selected as the best by an LLM evaluator) to determine the best multi-agent configuration for generation. After identifying the best configurations for both modules, we use models from the GPT, Gemma, and Llama families to generate our PO datasets using the above pipeline. We generate two types of PO datasets, one to improve the generation capabilities of individual LLM and the other to improve the multi-agent workflow. Our evaluation shows that GPT-4o-as-a-Judge is more consistent across datasets when the candidate responses do not include responses from the GPT family. Additionally, we find that the LLM Feedback Loop, with Llama as the generator and Gemma as the reviewer, achieves a notable 71.8% and 73.8% win rate over single-agent Llama and Gemma, respectively.

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# Unc-TTP: 文脈内事例選択を改善するLLM不確かさの分類方法

Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection ( http://arxiv.org/abs/2408.09172v3 )

ライセンス: Link先を確認

Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu,

(参考訳) 現在、Large Language Models (LLMs) は様々な下流タスクで例外的なパフォーマンスを示している。しかし、ユーザの期待に応えるために、応答が確実に生成されるか、あるいは作られているかを知ることは困難である。 LLMの不確実性を推定することは、その大規模化とホワイトボックスアクセスの欠如により特に困難である。本研究では,ラベル干渉をサンプリングベースアプローチに組み込む際のLCM出力の整合性を評価することによって,LCMの不確かさを分類する新しいUncertainty Tripartite Testing Paradigm(Unc-TTP)を提案する。 Unc-TTP出力に基づいて、インスタンスを特定のカテゴリと不確実なカテゴリに集約する。さらに,LLMの不確かさの詳細な解析を行い,既存のサンプリング法よりもUnc-TTPの方が優れていることを示す。さらに、得られた不確実性情報を利用して、文脈内サンプル選択を誘導し、Unc-TTPが明らかに検索ベースおよびサンプリングベースアプローチより優れていることを示す。本研究は,オープンソース LLM とクローズドソース LLM の両方の不確かさを分類する新たな手法を提案し,この不確実性を利用して LLM の性能を向上させるための実践的アプローチを提案する。

Nowadays, Large Language Models (LLMs) have demonstrated exceptional performance across various downstream tasks. However, it is challenging for users to discern whether the responses are generated with certainty or are fabricated to meet user expectations. Estimating the uncertainty of LLMs is particularly challenging due to their vast scale and the lack of white-box access. In this work, we propose a novel Uncertainty Tripartite Testing Paradigm (Unc-TTP) to classify LLM uncertainty, via evaluating the consistency of LLM outputs when incorporating label interference into the sampling-based approach. Based on Unc-TTP outputs, we aggregate instances into certain and uncertain categories. Further, we conduct a detailed analysis of the uncertainty properties of LLMs and show Unc-TTP's superiority over the existing sampling-based methods. In addition, we leverage the obtained uncertainty information to guide in-context example selection, demonstrating that Unc-TTP obviously outperforms retrieval-based and sampling-based approaches in selecting more informative examples. Our work paves a new way to classify the uncertainty of both open- and closed-source LLMs, and introduces a practical approach to exploit this uncertainty to improve LLMs performance.

翻訳日:2024-08-27 20:30:25 公開日:2024-08-24

# 小データを用いたソフト拘束型物理インフォームニューラルネットワークによるオシレータODEの解法

Solving Oscillator ODEs via Soft-constrained Physics-informed Neural Network with Small Data ( http://arxiv.org/abs/2408.11077v2 )

ライセンス: Link先を確認

Kai-liang Lu, Yu-meng Su, Zhuo Bi, Cheng Qiu, Wen-jun Zhang,

(参考訳) 本稿では,物理インフォームドニューラルネットワーク(PINN),従来のニューラルネットワーク(NN)および従来の数値離散化法を,文献調査と実験的検証を通じて比較した。我々は,ソフト制約のPINNアプローチに着目し,その数学的枠組みと計算フローを正規DESと部分DDE(ODE/PDE)の解法として定式化した。動作機構とその精度と効率は、典型的な線形および非線形(例えば、プライマー、ファンデルポル、ダッフィング)振動子ODEを解くことによって実験的に検証された。我々は、PINNのDeepXDEベースの実装が、トレーニングにおいて軽量コードであり、効率的なだけでなく、CPU/GPUプラットフォーム間で柔軟なことを実証した。 PINNは、ODEの非線形性が弱い場合、非常に少数の教師なしのトレーニングデータと少数の教師なしのコロケーションポイントが解を予測するのに十分であり、最小限の場合、それぞれ1階または2階のODEに対して1つまたは2つのトレーニングポイント(初期値)しか必要としない。また,コロケーションポイントの活用と物理情報の利用により,PINNはトレーニングセットの時間領域外からデータを外挿する能力を有し,特にノイズの多いデータに対して堅牢であり,一般化能力の強化が期待できる。損失関数項の増加による遅延よりも、データ量の削減とともに得られる利得が、トレーニングを加速する。ソフト制約されたPINNは、全損失関数に正規化項を追加することにより、物理法則(例えばエネルギーの保存)を容易に課すことができ、この物理法則に従うODEに対する解性能を向上させることができる。さらに、PINNは固いODEやPDE、その他のDESにも利用でき、デジタルツインズ時代において好ましい触媒になりつつある。

This paper compared physics-informed neural network (PINN), conventional neural network (NN) and traditional numerical discretization methods on solving differential equations (DEs) through literature investigation and experimental validation. We focused on the soft-constrained PINN approach and formalized its mathematical framework and computational flow for solving Ordinary DEs and Partial DEs (ODEs/PDEs). The working mechanism and its accuracy and efficiency were experimentally verified by solving typical linear and non-linear (e.g., Primer, Van der Pol, Duffing) oscillator ODEs. We demonstrate that the DeepXDE-based implementation of PINN is not only light code and efficient in training, but also flexible across CPU/GPU platforms. PINN greatly reduces the need for labeled data: when the nonlinearity of the ODE is weak, a very small amount of supervised training data plus a few unsupervised collocation points are sufficient to predict the solution; in the minimalist case, only one or two training points (with initial values) are needed for first- or second-order ODEs, respectively. We also find that, with the aid of collocation points and the use of physical information, PINN has the ability to extrapolate data outside the time domain of the training set, and especially is robust to noisy data, thus with enhanced generalization capabilities. Training is accelerated when the gains obtained along with the reduction in the amount of data outweigh the delay caused by the increase in the loss function terms. The soft-constrained PINN can easily impose a physical law (e.g., conservation of energy) constraint by adding a regularization term to the total loss function, thus improving the solution performance to ODEs that obey this physical law. Furthermore, PINN can also be used for stiff ODEs, PDEs, and other types of DEs, and is becoming a favorable catalyst for the era of Digital Twins.

翻訳日:2024-08-27 20:20:40 公開日:2024-08-24

# TVG:拡散モデルを用いたトレーニング不要遷移ビデオ生成法

TVG: A Training-free Transition Video Generation Method with Diffusion Models ( http://arxiv.org/abs/2408.13413v1 )

ライセンス: Link先を確認

Rui Zhang, Yaosen Chen, Yuegen Liu, Wei Wang, Xuming Wen, Hongxia Wang,

(参考訳) 遷移ビデオはメディア制作において重要な役割を担い、視覚的物語の流れとコヒーレンスを高める。フォーミングのような伝統的な手法は芸術的な魅力を欠くことが多く、特殊スキルを必要とし、その効果を制限している。拡散モデルに基づくビデオ生成の最近の進歩は、トランジションを作成する新しい可能性を提供するが、フレーム間の関係モデリングの貧弱や突然のコンテンツ変更といった課題に直面している。本稿では,これらの制約に対処するビデオレベルの拡散モデルを用いて,新たなトレーニング不要な遷移ビデオ生成(TVG)手法を提案する。提案手法はガウス過程回帰($\mathcal{GPR}$)を利用して遅延表現をモデル化し,フレーム間のスムーズかつダイナミックな遷移を保証する。さらに、時間的制御と遷移信頼性を高めるために、補間に基づく条件制御と周波数対応双方向融合(FBiF)アーキテクチャを導入する。ベンチマークデータセットとカスタムイメージペアの評価は,高品質なスムーズなトランジションビデオの生成において,我々のアプローチの有効性を示す。コードはhttps://sobeymil.github.io/tvg.comで提供されている。

Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes. We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training. Our method leverages Gaussian Process Regression ($\mathcal{GPR}$) to model latent representations, ensuring smooth and dynamic transitions between frames. Additionally, we introduce interpolation-based conditional controls and a Frequency-aware Bidirectional Fusion (FBiF) architecture to enhance temporal control and transition reliability. Evaluations of benchmark datasets and custom image pairs demonstrate the effectiveness of our approach in generating high-quality smooth transition videos. The code are provided in https://sobeymil.github.io/tvg.com.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 開量子系に対する操作的作業ゆらぎ定理

Operational work fluctuation theorem for open quantum systems ( http://arxiv.org/abs/2408.13417v1 )

ライセンス: Link先を確認

Konstantin Beyer, Walter T. Strunz,

(参考訳) 古典的ジャジンスキーの等式は、熱平衡から駆動される系上で実行される確率的仕事と対応する準定常過程における自由エネルギー差との正確な関係を確立する。この揺らぎ定理は、非平衡過程における外部に応用された仕事の測定を通じて自由エネルギー差を決定できるため、実験的な関係を持つ。量子の場合、ジャジンスキーの等式は、確率的作業の測定手順が劇的に変化した場合のみ成り立つ:それは、初期および最終ハミルトニアンの知識を必要とするいわゆる2点測定(TPM)スキームに置き換えられ、したがって古典的ジャジンスキー方程式が知られている自由エネルギー差の予測力に欠ける。ここでは、駆動プロトコルで決定される外部測定可能な量子ワークに有効である量子{{ゆらぎ定理}}を提案する。 TPMの場合とは対照的に、定理は開量子系にも適用され、ハミルトニアン系を知ることなくシナリオを実現できる。我々の揺らぎ定理は不等式の形で成り立つので、真の自由エネルギー差にのみ束縛される。不等式は、プロトコルの開始時と終了時にエネルギーコヒーレンスを消滅する準古典的な場合において飽和する。したがって、明らかに量子的不利がある。

The classical Jarzynski equality establishes an exact relation between the stochastic work performed on a system driven out of thermal equilibrium and the free energy difference in a corresponding quasi-static process. This fluctuation theorem bears experimental relevance, as it enables the determination of the free energy difference through the measurement of externally applied work in a nonequilibrium process. In the quantum case, the Jarzynski equality only holds if the measurement procedure of the stochastic work is drastically changed: it is replaced by a so-called two-point measurement (TPM) scheme that requires the knowledge of the initial and final Hamiltonian and therefore lacks the predictive power for the free energy difference that the classical Jarzynski equation is known for. Here, we propose a quantum {{fluctuation theorem}} that is valid for externally measurable quantum work determined during the driving protocol. In contrast to the TPM case, the theorem also applies to open quantum systems and the scenario can be realized without knowing the system Hamiltonian. Our fluctuation theorem comes in the form of an inequality and therefore only yields bounds to the true free energy difference. The inequality is saturated in the quasiclassical case of vanishing energy coherences at the beginning and at the end of the protocol. Thus, there is a clear quantum disadvantage.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 拡散モデルエキスパートの連鎖による無訓練長ビデオ生成

Training-free Long Video Generation with Chain of Diffusion Model Experts ( http://arxiv.org/abs/2408.13423v1 )

ライセンス: Link先を確認

Wenhao Li, Yichao Cao, Xie Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu,

(参考訳) ビデオ生成モデルは、映画製作などの分野で大きな可能性を秘めている。しかし、現在のビデオ拡散モデルでは、高い計算コストが必要であり、ビデオ生成タスクの複雑さのため、最適以下の結果が得られる。本稿では,ビデオ生成をより簡単なサブタスクに分解する,効率的な高品質なビデオ生成フレームワークである \textbf{ConFiner} を提案する。オフザシェルフ拡散モデルの専門家の鎖で高品質なビデオを生成することができ、それぞれが切り離されたサブタスクを担当している。改良期間中に,複数の拡散専門家の能力を単一のサンプリングにマージできるコーディネート・デノナイジングを導入する。さらに,ConFiner-Long フレームワークを設計し,ConFiner 上で3つの制約戦略で長いコヒーレントなビデオを生成する。実験の結果、推測コストのわずか10%のコストで、私たちのConFinerは、すべての客観的および主観的メトリクスでLavieやModelscopeのような代表モデルを超えています。そしてConFiner-Longは、600フレームまでの高品質でコヒーレントなビデオを生成することができる。

Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{con}trol and spatial-temporal re\textbf{fine}ment. It can generate high-quality videos with chain of off-the-shelf diffusion model experts, each expert responsible for a decoupled subtask. During the refinement, we introduce coordinated denoising, which can merge multiple diffusion experts' capabilities into a single sampling. Furthermore, we design ConFiner-Long framework, which can generate long coherent video with three constraint strategies on ConFiner. Experimental results indicate that with only 10\% of the inference cost, our ConFiner surpasses representative models like Lavie and Modelscope across all objective and subjective metrics. And ConFiner-Long can generate high-quality and coherent videos with up to 600 frames.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 差別化プライバシをターゲットとした人道的アプリケーションの実現

Enabling Humanitarian Applications with Targeted Differential Privacy ( http://arxiv.org/abs/2408.13424v1 )

ライセンス: Link先を確認

Nitin Kohli, Joshua Blumenstock,

(参考訳) 低所得国や中所得国における携帯電話の普及は、世界の貧困層や最も脆弱な人口が政府や企業によって観察され、追跡される範囲を劇的に増加させてきた。歴史的に「グリッド外」の個人がデジタルデータを受動的に生成している。これらのデータは、政府の給付を受けているかどうか、消費者ローンの資格があるかどうかなど、それらの個人について、人生を変える決定を下すために使用されている。本稿では,個人データに基づくアルゴリズム決定の実装手法を開発し,データ対象に対して公式なプライバシ保証を提供する。このアプローチは、個人に関する決定を必要とするアプリケーションに差分プライバシを適用し、データ主体に保証されるプライバシのレベルを、意思決定者がきめ細かいコントロールを提供する。より強力なプライバシー保証は、一般的にある程度のコストがかかることを示し、実際の2つのアプリケーションからのデータ(Togoの反ポルノプログラムとナイジェリアの消費者貸付プラットフォーム)を使って、それらのコストを例示している。私たちの経験的な結果は、プライバシと予測精度のトレードオフを定量化し、プライバシの保証が異なることがプログラム全体の効果に与える影響を特徴づけます。より広範に、私たちの結果は、人道的プログラムが責任を持って個人データを使用する方法を示し、データプライバシに関する情報決定を行うためのプログラムデザイナーの装備を向上する。

The proliferation of mobile phones in low- and middle-income countries has suddenly and dramatically increased the extent to which the world's poorest and most vulnerable populations can be observed and tracked by governments and corporations. Millions of historically "off the grid" individuals are now passively generating digital data; these data, in turn, are being used to make life-altering decisions about those individuals -- including whether or not they receive government benefits, and whether they qualify for a consumer loan. This paper develops an approach to implementing algorithmic decisions based on personal data, while also providing formal privacy guarantees to data subjects. The approach adapts differential privacy to applications that require decisions about individuals, and gives decision makers granular control over the level of privacy guaranteed to data subjects. We show that stronger privacy guarantees typically come at some cost, and use data from two real-world applications -- an anti-poverty program in Togo and a consumer lending platform in Nigeria -- to illustrate those costs. Our empirical results quantify the tradeoff between privacy and predictive accuracy, and characterize how different privacy guarantees impact overall program effectiveness. More broadly, our results demonstrate a way for humanitarian programs to responsibly use personal data, and better equip program designers to make informed decisions about data privacy.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 遅延データ拡張のための最適層選択

Optimal Layer Selection for Latent Data Augmentation ( http://arxiv.org/abs/2408.13426v1 )

ライセンス: Link先を確認

Tomoumi Takase, Ryo Karakida,

(参考訳) データ拡張(DA)は一般的に入力データに適用されるが、いくつかの研究では、ニューラルネットワークの隠れ層にDAを適用することによりパフォーマンスが向上する、と報告されている。しかし、従来の研究では、DAが適用される層は慎重に検討されておらず、しばしばランダムに均一に、あるいは特定の層にのみ適用され、仲裁の余地は残されている。そこで本研究では,様々な実験構成,例えばスクラッチからのトレーニング,移動学習,各種データセット設定,異なるモデルにおいて,DAの適用に適したレイヤの傾向について検討した。さらに,DAに適したレイヤを自動的に調整するために,トレーニング中の勾配降下法に基づいて各レイヤに対してDAを実行するように更新する適応層選択法(AdaLASE)を提案する。いくつかの画像分類データセットで得られた実験結果から,提案手法が期待どおりに変化し,総合的な試験精度が向上したことが示唆された。

While data augmentation (DA) is generally applied to input data, several studies have reported that applying DA to hidden layers in neural networks, i.e., feature augmentation, can improve performance. However, in previous studies, the layers to which DA is applied have not been carefully considered, often being applied randomly and uniformly or only to a specific layer, leaving room for arbitrariness. Thus, in this study, we investigated the trends of suitable layers for applying DA in various experimental configurations, e.g., training from scratch, transfer learning, various dataset settings, and different models. In addition, to adjust the suitable layers for DA automatically, we propose the adaptive layer selection (AdaLASE) method, which updates the ratio to perform DA for each layer based on the gradient descent method during training. The experimental results obtained on several image classification datasets indicate that the proposed AdaLASE method altered the ratio as expected and achieved high overall test accuracy.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# ICML 2023ランキングデータの分析: 著者の自身の論文に対する意見は機械学習におけるピアレビューに役立つか?

Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? ( http://arxiv.org/abs/2408.13430v1 )

ライセンス: Link先を確認

Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su,

(参考訳) 我々は2023年のICML(International Conference on Machine Learning)のレビュープロセスにおいて、著者に複数の論文を提出し、評価された品質に基づいて論文のランク付けを依頼する実験を行った。我々はそれぞれ2,592件の応募書を含む1,342件のランク付けを受けた。本稿では、著者が提供するランキングをどのように活用して、機械学習会議におけるピアレビュープロセスを改善できるかを実証分析する。著者によるランキングを用いて生のレビュースコアを校正するイソトニックメカニズムに注目した。分析の結果,2乗と絶対誤差の両測定値において,評価値が生のスコアを上回っていることが判明した。また,高齢者のエリアチェアの推薦を監督する,論文の選定を支援する,緊急審査員の募集を指導するなど,アイソトニック・メカニズムと著者による査定プロセスにおけるランク付けに用いた慎重でリスクの低いアプローチをいくつか提案する。論文は,研究の限界に対処し,今後の研究方向性を提案することで締めくくっている。

We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings. Our analysis demonstrates that the ranking-calibrated scores outperform raw scores in estimating the ground truth ``expected review scores'' in both squared and absolute error metrics. Moreover, we propose several cautious, low-risk approaches to using the Isotonic Mechanism and author-provided rankings in peer review processes, including assisting senior area chairs' oversight of area chairs' recommendations, supporting the selection of paper awards, and guiding the recruitment of emergency reviewers. We conclude the paper by addressing the study's limitations and proposing future research directions.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 早期停止とエッジリコールによる顔クラスタリング

Face Clustering via Early Stopping and Edge Recall ( http://arxiv.org/abs/2408.13431v1 )

ライセンス: Link先を確認

Junjie Liu,

(参考訳) 大規模顔クラスタリングは大きな進歩を遂げており、教師あり学習を伴う大規模顔クラスタリングの学習に多くの努力が注がれている。しかし、複雑なモデル設計と退屈なクラスタリングプロセスは、既存の手法で典型的である。このような制限は、現実世界のアプリケーションでは実現不可能なクラスタリングをもたらす。合理的で効率的なモデル設計とトレーニングを考慮する必要がある。さらに、教師なしの顔クラスタリングアルゴリズムの開発も重要であり、現実世界のアプリケーションではより現実的である。本稿では,これらの問題に対処するために,非教師付き顔クラスタリングアルゴリズムFC-ESと教師付き顔クラスタリングアルゴリズムFC-ESERを提案する。 FC-ESでは, 大規模顔クラスタリングの精度とリコールを同時に保証し, 効率的で効果的な隣り合うエッジ確率と新しい早期停止戦略を提案する。さらに,教師あり学習を活かすため,FC-ESERでは新たなエッジリコール戦略を提案し,FC-ESに接続されていないエッジ接続をさらにリコールする。顔,人物,車両のクラスタリングに関する複数のベンチマーク実験により,提案したFC-ESとFC-ESERは,従来の最先端手法よりも大幅に優れていたことがわかった。私たちのコードはhttps://github.com/jumptoliuj/FC-ESER.comで公開されます。

Large-scale face clustering has achieved significant progress, with many efforts dedicated to learning to cluster large-scale faces with supervised-learning. However, complex model design and tedious clustering processes are typical in existing methods. Such limitations result in infeasible clustering in real-world applications. Reasonable and efficient model design and training need to be taken into account. Besides, developing unsupervised face clustering algorithms is crucial, which are more realistic in real-world applications. In this paper, we propose a novel unsupervised face clustering algorithm FC-ES and a novel supervised face clustering algorithm FC-ESER to address these issues. An efficient and effective neighbor-based edge probability and a novel early stopping strategy are proposed in FC-ES, guaranteeing the accuracy and recall of large-scale face clustering simultaneously. Furthermore, to take advantage of supervised learning, a novel edge recall strategy is proposed in FC-ESER to further recall the edge connections that are not connected in FC-ES. Extensive experiments on multiple benchmarks for face, person, and vehicle clustering show that our proposed FC-ES and FC-ESER significantly outperform previous state-of-the-art methods. Our code will be available at https://github.com/jumptoliujj/FC-ESER.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# マルチヘッド畳み込みエンコーダとクロスアテンションの統合によるSPARQLクエリ変換の改善

Integrating Multi-Head Convolutional Encoders with Cross-Attention for Improved SPARQL Query Translation ( http://arxiv.org/abs/2408.13432v1 )

ライセンス: Link先を確認

Yi-Hui Chen, Eric Jui-Lin Lu, Kwan-Ho Cheng,

(参考訳) KGQAシステム(Knowledge Graph Question Answering)の主なタスクは、ユーザ入力の質問をクエリ構文(SPARQLなど)に変換することである。 TransformerやConvS2Sのようなモダンなエンコーダやデコーダの台頭により、多くの学者がSPARQL世代の研究方向を、ニューラルネットワーク変換(NMT)アーキテクチャやText-to-SPARQLの生成AIフィールドに移行した。 NMTベースのQAシステムでは、知識ベースクエリ構文を言語として扱う。 NMTベースの翻訳モデルを使用して、自然言語の質問をクエリ構文に変換する。学者は、Transformer、ConvS2S、BiLSTMといったクロスアテンションを備えた一般的なアーキテクチャを使用して、クエリ構文の翻訳モデルをトレーニングする。そこで本研究では,n-gram 言語モデルに基づくマルチヘッド Conv エンコーダ (MHC エンコーダ) の提案により, ConvS2S エンコーダの改良とトランスフォーマからのマルチヘッドアテンションの追加を行った。原則は、畳み込みレイヤを使用して、異なる受容フィールドを持つ入力シーケンス内のローカルな隠れた特徴をキャプチャし、複数のヘッドアテンションを使用してそれらの間の依存関係を計算することである。その結果,QALD-9データセットとLC-QuAD-1.0データセットでそれぞれ76.52\%,83.37\%のBLEU-1(BiLingual Evaluation Understudy)を得た。さらに,QALD-9データセットとLC-QuAD-1.0データセットのエンドツーエンドシステム実験では,他のKGQAシステムに対して,マクロF1測定値はそれぞれ52\%,66\%に達した。さらに,実験結果から,優れたエンコーダ・デコーダアーキテクチャとクロスアテンションを持つ計算資源が限られた場合,一般埋め込みのみを用いた大規模事前学習モデルに匹敵する優れた性能を達成できることが示唆された。

The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NMT-based QA systems, the system treats knowledge base query syntax as a language. It uses NMT-based translation models to translate natural language questions into query syntax. Scholars use popular architectures equipped with cross-attention, such as Transformer, ConvS2S, and BiLSTM, to train translation models for query syntax. To achieve better query results, this paper improved the ConvS2S encoder and added multi-head attention from the Transformer, proposing a Multi-Head Conv encoder (MHC encoder) based on the n-gram language model. The principle is to use convolutional layers to capture local hidden features in the input sequence with different receptive fields, using multi-head attention to calculate dependencies between them. Ultimately, we found that the translation model based on the Multi-Head Conv encoder achieved better performance than other encoders, obtaining 76.52\% and 83.37\% BLEU-1 (BiLingual Evaluation Understudy) on the QALD-9 and LC-QuAD-1.0 datasets, respectively. Additionally, in the end-to-end system experiments on the QALD-9 and LC-QuAD-1.0 datasets, we achieved leading results over other KGQA systems, with Macro F1-measures reaching 52\% and 66\%, respectively. Moreover, the experimental results show that with limited computational resources, if one possesses an excellent encoder-decoder architecture and cross-attention, experts and scholars can achieve outstanding performance equivalent to large pre-trained models using only general embeddings.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# 視覚言語選好学習による説明可能な概念生成

Explainable Concept Generation through Vision-Language Preference Learning ( http://arxiv.org/abs/2408.13438v1 )

ライセンス: Link先を確認

Aditya Taparia, Som Sagar, Ransalu Senanayake,

(参考訳) 他の説明可能なAI技術とは異なり、機能属性に直接関連しない高レベルの視覚的“概念”をテストするために使用できる。例えば、「ストリップ」の概念は、イメージをシマウマとして分類することが重要である。しかし、概念に基づく説明法では、実践者は複数の候補となる概念イメージを推測し、収集する必要がある。本稿では,この制限に対処するため,画像生成問題として概念セットの作成を行う。しかし, 生成モデルを用いることで意味のある概念が得られないため, 概念のテキスト記述から視覚言語生成モデルを微調整する強化学習に基づく選好最適化アルゴリズムを考案する。一連の実験を通して、手作業で行うのが難しい複雑な抽象概念を記述できる手法の能力を実証した。提案手法の有効性と信頼性に加えて,ニューラルネットワーク解析の診断ツールとしての有用性を示す。

Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and collect multiple candidate concept image sets, which can often be imprecise and labor-intensive. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate the capability of our method to articulate complex, abstract concepts that are otherwise challenging to craft manually. In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.

翻訳日:2024-08-27 19:39:20 公開日:2024-08-24

# グラフ畳み込みネットワークを用いた知識を考慮した会話脱線予測

Knowledge-Aware Conversation Derailment Forecasting Using Graph Convolutional Networks ( http://arxiv.org/abs/2408.13440v1 )

ライセンス: Link先を確認

Enas Altarawneh, Ameeta Agrawal, Michael Jenkin, Manos Papagelis,

(参考訳) オンライン会話は特に脱線の影響を受けやすく、不敬なコメントや虐待を含む有害なコミュニケーションパターンの形で現れうる。予測された会話脱線は、事前に脱線兆候を予測し、会話の積極的なモデレーションを可能にする。会話を逐次エンコードし、グラフニューラルネットワークを使用して対話ユーザのダイナミクスをモデル化する、会話脱線予測のための最先端のアプローチ。しかし、既存のグラフモデルは、文脈の伝播や感情の変化のような複雑な会話の特徴を捉えることができない。常識知識を利用することで、モデルがそのような特徴を捉え、性能を向上させることができる。本稿では,対話文脈情報の知識ベースからコモンセンス文を導出し,グラフニューラルネットワークの分類アーキテクチャを充実させる。我々は,発話のマルチソース情報をカプセルに融合し,会話の脱線を予測するためにトランスフォーマーベースの予測器が使用する。我々のモデルは、CGAおよびCMVベンチマークデータセットにおける最先端モデルよりも優れた、会話のダイナミクスと文脈の伝播をキャプチャする。

Online conversations are particularly susceptible to derailment, which can manifest itself in the form of toxic communication patterns including disrespectful comments and abuse. Forecasting conversation derailment predicts signs of derailment in advance enabling proactive moderation of conversations. State-of-the-art approaches to conversation derailment forecasting sequentially encode conversations and use graph neural networks to model dialogue user dynamics. However, existing graph models are not able to capture complex conversational characteristics such as context propagation and emotional shifts. The use of common sense knowledge enables a model to capture such characteristics, thus improving performance. Following this approach, here we derive commonsense statements from a knowledge base of dialogue contextual information to enrich a graph neural network classification architecture. We fuse the multi-source information on utterance into capsules, which are used by a transformer-based forecaster to predict conversation derailment. Our model captures conversation dynamics and context propagation, outperforming the state-of-the-art models on the CGA and CMV benchmark datasets

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 大規模言語モデルにおける次世代予測法

A Law of Next-Token Prediction in Large Language Models ( http://arxiv.org/abs/2408.13442v1 )

ライセンス: Link先を確認

Hangfeng He, Weijie J. Su,

(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションドメインで広く採用されているが、ブラックボックスの性質は、これらのモデルが入力データを内部でどのように処理して予測を行うかを理解する上で、大きな課題となっている。本稿では,事前学習したLCMの中間層を介し,文脈化トークンの埋め込みを学習し,次から次へと予測する,正確かつ定量的な法則を提案する。この結果から,TransformerやRWKV,Mambaといったアーキテクチャ上に構築された,さまざまなオープンソース LLM にまたがる普遍的な現象である,最下層から最上層まで,各レイヤが予測精度の向上に等しく寄与していることが判明した。この法則は、モデルスケーリング、事前学習タスク、情報フローなど、LLM開発およびアプリケーションにおけるプラクティスを通知し、ガイドするための新しい視点と洞察を提供する。我々の法則は、内部データ処理機構を精査することで、LCMの設計、訓練、解釈に対するよりきめ細かなアプローチを可能にします。

Large language models (LLMs) have been widely employed across various application domains, yet their black-box nature poses significant challenges to understanding how these models process input data internally to make predictions. In this paper, we introduce a precise and quantitative law that governs the learning of contextualized token embeddings through intermediate layers in pre-trained LLMs for next-token prediction. Our findings reveal that each layer contributes equally to enhancing prediction accuracy, from the lowest to the highest layer -- a universal phenomenon observed across a diverse array of open-source LLMs, built on architectures such as Transformer, RWKV, and Mamba. We demonstrate that this law offers new perspectives and insights to inform and guide practices in LLM development and applications, including model scaling, pre-training tasks, and information flow. Overall, our law enables more fine-grained approaches to the design, training, and interpretation of LLMs through scrutinizing their internal data processing mechanisms.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 非循環性制約のない効率的なDAG学習

Efficient Reinforced DAG Learning without Acyclicity Constraints ( http://arxiv.org/abs/2408.13448v1 )

ライセンス: Link先を確認

Bao Duong, Hung Le, Thin Nguyen,

(参考訳) 単なる観測データに埋め込まれた原因-影響構造は、そのような構造から恩恵を受けることができる知識の豊富さを所有する、非常に科学的な関心事である。近年、強化学習(RL)は、有向非巡回グラフ(DAG)の形で最も考えられる因果的説明を探索する古典的手法の強化として現れている。しかし、DAG空間を効果的に探索することは、多数の候補と複雑な非巡回性の制約のために困難である。本研究では,効率的なDAG生成ポリシを備えたRL機械による新しい因果発見手法であるREACT(Reinforced DAG learning without acyclicity Constraints)を提案する。 DAGの新たなパラメトリゼーションにより、実数値ベクトルを1ステップで有効なDAGを表す隣接行列に直接マッピングし、非巡回性制約を課すことなく、より効率的に探索空間を探索できる。さらに,合成データと実データの両方の多種多様な集合に関する包括的数値評価を行い,現状のベースラインと比較して,本手法の有効性を確認した。

Unraveling cause-effect structures embedded in mere observational data is of great scientific interest, owning to the wealth of knowledge that can benefit from such structures. Recently, reinforcement learning (RL) has emerged as the enhancement for classical techniques to search for the most probable causal explanation in the form of a directed acyclic graph (DAG). Yet, effectively exploring the DAG space is challenging due to the vast number of candidates and the intricate constraint of acyclicity. In this study, we present REACT (REinforced DAG learning without acyclicity ConstrainTs)-a novel causal discovery approach fueled by the RL machinery with an efficient DAG generation policy. Through a novel parametrization of DAGs, which allows for directly mapping a real-valued vector to an adjacency matrix representing a valid DAG in a single step without enforcing any acyclicity constraint, we are able to navigate the search space much more effectively with policy gradient methods. In addition, our comprehensive numerical evaluations on a diverse set of both synthetic and real data confirm the effectiveness of our method compared with state-of-the-art baselines.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 逆勾配エピソードメモリによる連続RLデータの増大

Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory ( http://arxiv.org/abs/2408.13452v1 )

ライセンス: Link先を確認

Sihao Wu, Xingyu Zhao, Xiaowei Huang,

(参考訳) Reinforcement Learning(RL)トレーニングプロセスにおいて重要な役割を果たす学習のデータ効率は、連続環境を持つ連続RLにおいてさらに重要になる。連続RLでは、学習者は定常的でないシーケンシャルなタスクと対話し、以前の知識を忘れずに新しいタスクを学習する必要がある。しかし、連続RLのためのデータ拡張の実装についてはほとんど研究されていない。本稿では,連続RLにおけるデータ拡張の有効性について検討する。具体的には,(1)既存のデータ拡張手法を要約し,(2)連続RLの新たな拡張方法を含む連続RLのためのベンチマークデータ拡張(Adv-GEM)を提案する。大規模な実験により、ロボット制御タスクにおいて、ランダム振幅スケーリング、ステートスウィッチ、ミックスアップ、逆方向拡張、Adv-GEMなどのデータ拡張が、その平均性能、破滅的な忘れ、前方移動といった面で、既存の連続RLアルゴリズムを改善できることが示されている。すべてのデータ拡張メソッドはプラグインモジュールとして実装され、連続RLメソッドに簡単に統合できる。

Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for continual RL. In this paper, we investigate the efficacy of data augmentation for continual RL. Specifically, we provide benchmarking data augmentations for continual RL, by (1) summarising existing data augmentation methods and (2) including a new augmentation method for continual RL: Adversarial Augmentation with Gradient Episodic Memory (Adv-GEM). Extensive experiments show that data augmentations, such as random amplitude scaling, state-switch, mixup, adversarial augmentation, and Adv-GEM, can improve existing continual RL algorithms in terms of their average performance, catastrophic forgetting, and forward transfer, on robot control tasks. All data augmentation methods are implemented as plug-in modules for trivial integration into continual RL methods.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# AdaOcc:Adaptive-Resolution Occupancy Prediction

AdaOcc: Adaptive-Resolution Occupancy Prediction ( http://arxiv.org/abs/2408.13454v1 )

ライセンス: Link先を確認

Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren,

(参考訳) 複雑な都市シナリオにおける自律運転には、包括的かつ正確な3D知覚が必要である。従来の3D認識手法は物体検出に重点を置いており、環境の詳細を欠く疎らな表現をもたらす。近年のアプローチでは、より包括的なシーン表現のために車両周囲の3次元占有率を推定している。しかし、密度の高い3D占有率予測は計算要求を増大させ、効率と解像度のバランスに挑戦する。高解像度の占有グリッドは精度を提供するが、かなりの計算資源を必要とするが、低解像度のグリッドは効率的だが詳細は欠落している。このジレンマに対処するために,新しい適応分解能・マルチモーダル予測手法であるAdaOccを導入する。提案手法は,対象中心の3次元再構成と全体的占有予測を一つの枠組みに統合し,興味のある領域(ROI)でのみ高度に詳細かつ正確な3次元再構成を行う。これらの高精細な3次元曲面は点雲で表されるので、それらの精度は占有マップの事前定義された格子分解によって制約されない。我々はnuScenesデータセットの総合的な実験を行い、既存の手法よりも大幅に改善されたことを示す。近距離シナリオでは、以前のベースラインを13%以上、ハウスドルフ距離を40%以上上回る。まとめると、AdaOccは多様な運転シナリオで正確な3Dセマンティック占有率予測を提供するための、より汎用的で効果的なフレームワークを提供する。

Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computational demands, challenging the balance between efficiency and resolution. High-resolution occupancy grids offer accuracy but demand substantial computational resources, while low-resolution grids are efficient but lack detail. To address this dilemma, we introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach. Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework, performing highly detailed and precise 3D reconstruction only in regions of interest (ROIs). These high-detailed 3D surfaces are represented in point clouds, thus their precision is not constrained by the predefined grid resolution of the occupancy map. We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods. In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance. In summary, AdaOcc offers a more versatile and effective framework for delivering accurate 3D semantic occupancy prediction across diverse driving scenarios.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# すべてのペニー数を作る: 費用効率の良い推論のための難易度適応型の自己整合性

Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning ( http://arxiv.org/abs/2408.13457v1 )

ライセンス: Link先を確認

Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li,

(参考訳) 連鎖推論に広く用いられている自己整合性(SC: Self-Consistency)は、様々な多段階推論タスクにおいて顕著な利得を示すが、プリセットサイズで複数のサンプリングを行うため、高いコストがかかる。適応自己整合性 (ASC) とアーリーストッピング自己整合性 (ESC) の変種は、一連のプリサンプルの後方分布に基づいて標本数を動的に調整し、性能への影響を最小限に抑えてSCのコストを下げる。しかし、どちらの手法も質問の難しさに関する事前の情報を利用していない。多くの場合、不必要な繰り返しサンプリングが行われ、簡単な質問が1回の試行で正確に答えられるようになり、リソースを無駄にします。この問題に対処するために,前と後の両方の観点からの難易度情報を活用して推論資源を適応的に割り当てることにより,SCのコストをさらに削減するDifficulty-Adaptive Self-Consistency (DSC)を提案する。 DSCの有効性を示すために、6つのベンチマークで算術、常識、記号的推論という3つの一般的な推論タスクのカテゴリについて広範な実験を行った。実験の結果,DSCは高いベースラインのASCとESCをほぼ上回り,性能は同等であった。

Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples, reducing the cost of SC with minimal impact on performance. Both methods, however, do not exploit the prior information about question difficulty. It often results in unnecessary repeated sampling for easy questions that could be accurately answered with just one attempt, wasting resources. To tackle this problem, we propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information from both prior and posterior perspectives to adaptively allocate inference resources, further reducing the cost of SC. To demonstrate the effectiveness of DSC, we conduct extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning on six benchmarks. The empirical results show that DSC consistently surpasses the strong baseline ASC and ESC in terms of costs by a significant margin, while attaining comparable performances.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# ウェーブレット対応動的変圧器と拡散モデルによる映像劣化の再考

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model ( http://arxiv.org/abs/2408.13459v1 )

ライセンス: Link先を確認

Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang,

(参考訳) 現在のビデオデブロアリング法は、遅延損失が高周波の詳細で保守的であるため、高周波情報の回復に限界がある。拡散モデル(DM)は高頻度細部生成に強力な機能を持つため,ビデオデブロアリングタスクにDMを導入することを検討する。 1) DMはガウスノイズからビデオを生成するために多くの繰り返しステップを必要とするため、多くの計算資源を消費する。 2) DMはビデオのぼやけたアーティファクトによって容易に誤解され,不合理な内容とデブロワードビデオの歪みが生じる。本稿では,この拡散モデルをWADT(Wavelet-Aware Dynamic Transformer)に統合した新しいビデオデブロアリングフレームワークであるVD-Diffを提案する。具体的には、高コンパクトな潜伏空間において拡散モデルを実行し、基底真理分布に適合する高周波情報を含む先行特徴を生成する。拡散モデルにより生成された高周波情報を利用して、映像中の低周波情報を保存・復元するWADTを設計する。我々の提案するVD-Diffは,GoPro,DVD,BSD,Real-World Videoのデータセット上でSOTA法よりも優れていた。

Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the following problems: (1) DMs require many iteration steps to generate videos from Gaussian noise, which consumes many computational resources. (2) DMs are easily misled by the blurry artifacts in the video, resulting in irrational content and distortion of the deblurred video. To address the above issues, we propose a novel video deblurring framework VD-Diff that integrates the diffusion model into the Wavelet-Aware Dynamic Transformer (WADT). Specifically, we perform the diffusion model in a highly compact latent space to generate prior features containing high-frequency information that conforms to the ground truth distribution. We design the WADT to preserve and recover the low-frequency information in the video while utilizing the high-frequency information generated by the diffusion model. Extensive experiments show that our proposed VD-Diff outperforms SOTA methods on GoPro, DVD, BSD, and Real-World Video datasets.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# DOPPLER:プライバシノイズ低減のための低域フィルタ付き微分プライベートオプティマイザ

DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction ( http://arxiv.org/abs/2408.13460v1 )

ライセンス: Link先を確認

Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn,

(参考訳) プライバシーは、現代のディープラーニングシステムやアプリケーションにおける関心の高まりだ。差分プライベート(DP)トレーニングは、トレーニングされた機械学習モデルから収集したトレーニングデータの機密情報の漏洩を防止する。 DP確率勾配降下(DPSGD)とその変種を含むDPオプティマイザは、勾配クリッピングとDPノイズ注入によるトレーニング手順を民営化する。しかし、実際には、DPSGDとその変種を用いて訓練されたDPモデルは、しばしばモデルの性能劣化に悩まされる。このような劣化は、基礎モデル事前学習など、多くの重要なタスクにおけるDP最適化の適用を妨げる。本稿では,DPオプティマイザの設計と解析に新しい信号処理の視点を提供する。 DPノイズの影響を効果的に低減するために,低域フィルタリング(low-pass filtering)と呼ばれる「周波数領域」操作が有効であることを示す。より具体的には、勾配と差分プライバシー(DP)ノイズの「周波数領域」を定義することで、DOPPLERと呼ばれる新しいコンポーネントを開発した。このコンポーネントはDPアルゴリズム用に設計されており、この周波数領域内のDPノイズを抑えながら勾配を効果的に増幅する。その結果、プライバシー保証を維持し、DP保護モデルの品質を高める。実験の結果,低域通過フィルタを用いたDPオプティマイザは,各種モデルやデータセットの試験精度を3%-10%向上させることができた。 DOPPLERはDPトレーニングと非DPトレーニングのギャップを埋めるのに有効である。

Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pretraining. In this paper, we provide a novel signal processing perspective to the design and analysis of DP optimizers. We show that a ``frequency domain'' operation called low-pass filtering can be used to effectively reduce the impact of DP noise. More specifically, by defining the ``frequency domain'' for both the gradient and differential privacy (DP) noise, we have developed a new component, called DOPPLER. This component is designed for DP algorithms and works by effectively amplifying the gradient while suppressing DP noise within this frequency domain. As a result, it maintains privacy guarantees and enhances the quality of the DP-protected model. Our experiments show that the proposed DP optimizers with a low-pass filter outperform their counterparts without the filter by 3%-10% in test accuracy on various models and datasets. Both theoretical and practical evidence suggest that the DOPPLER is effective in closing the gap between DP and non-DP training.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# ビジョンランゲージ事前学習モデルのロバスト性を証明する:マルチモーダル・アタックアプローチ

Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach ( http://arxiv.org/abs/2408.13461v1 )

ライセンス: Link先を確認

Jiwei Guan, Tianyu Ding, Longbing Cao, Lei Pan, Chen Wang, Xi Zheng,

(参考訳) トランスフォーマーを用いた視覚言語事前学習(VLP)は、多数のマルチモーダルタスクにおいて例外的な性能を示した。しかし、これらのモデルの対角的堅牢性は十分には研究されていない。既存のマルチモーダルアタック手法は、視覚的・テキスト的モダリティ間の相互モーダル相互作用、特に横断的アテンション機構の文脈において、ほとんど見過ごされている。本稿では,最近のVLPトランスの対角的脆弱性について検討し,ホワイトボックス設定下での視覚的・テキスト的両モードの対角的摂動を同時に導入する新しいJMTFA(Joint Multimodal Transformer Feature Attack)を設計する。 JMTFAは、注意関係スコアを戦略的に対象とし、各モードにおける重要な特徴を妨害し、摂動を融合させて対向サンプルを生成し、誤ったモデル予測をもたらす。実験結果から,提案手法は既存のベースラインと比較して,視覚言語理解や下流タスクの推論において高い攻撃成功率を達成することが示唆された。特に,本研究の結果から,VLP変圧器の複雑な融合過程にテクスチュアル・モダリティが大きな影響を及ぼすことが明らかとなった。また,本研究では,攻撃時のモデルサイズと敵の強靭性との間には明らかな相関が認められなかった。これらの洞察は、マルチモーダルAIシステムの信頼性デプロイメントにおいて、敵の堅牢性と潜在的な潜在的なリスクの新たな次元を強調している。

Vision-language pretraining (VLP) with transformers has demonstrated exceptional performance across numerous multimodal tasks. However, the adversarial robustness of these models has not been thoroughly investigated. Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities, particularly in the context of cross-attention mechanisms. In this paper, we study the adversarial vulnerability of recent VLP transformers and design a novel Joint Multimodal Transformer Feature Attack (JMTFA) that concurrently introduces adversarial perturbations in both visual and textual modalities under white-box settings. JMTFA strategically targets attention relevance scores to disrupt important features within each modality, generating adversarial samples by fusing perturbations and leading to erroneous model predictions. Experimental results indicate that the proposed approach achieves high attack success rates on vision-language understanding and reasoning downstream tasks compared to existing baselines. Notably, our findings reveal that the textual modality significantly influences the complex fusion processes within VLP transformers. Moreover, we observe no apparent relationship between model size and adversarial robustness under our proposed attacks. These insights emphasize a new dimension of adversarial robustness and underscore potential risks in the reliable deployment of multimodal AI systems.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# HabitAction:人間の行動認識のためのビデオデータセット

HabitAction: A Video Dataset for Human Habitual Behavior Recognition ( http://arxiv.org/abs/2408.13463v1 )

ライセンス: Link先を確認

Hongwu Li, Zhenliang Zhang, Wei Wang,

(参考訳) HAR(Human Action Recognition)は、コンピュータビジョンにおいて非常に重要なタスクである。人間の行動を理解するなど、一連の下流のタスクを実行するのに役立つ。ヒトの行動の複雑さのため、多くの非常に価値のある行動は、HAR、例えばヒトの習慣行動(HHBs)の利用可能なデータセットにはまだ含まれていない。 HHBは、人の性格、習慣、心理的変化を分析する上で重要な役割を担っている。これらの課題を解決するため,本研究では,様々なHHBを実演するための新しいビデオデータセットを構築した。提案したデータセットのこれらの行動は、内部の精神状態やキャラクターの特定の感情を反映することができる。データセットには、300,000フレーム以上と6,899のアクションインスタンスを含む、30の習慣行動カテゴリが含まれている。これらの動作は通常、人間のアクションビデオの小さな部分に現れるため、既存のアクション認識手法ではこれらの局所的な特徴を扱うことは困難である。そこで本研究では,ヒト骨格とRGB外観の両方を用いた2ストリームモデルを提案する。実験の結果,提案手法は既存手法よりも動作認識性能が優れていることがわかった。

Human Action Recognition (HAR) is a very crucial task in computer vision. It helps to carry out a series of downstream tasks, like understanding human behaviors. Due to the complexity of human behaviors, many highly valuable behaviors are not yet encompassed within the available datasets for HAR, e.g., human habitual behaviors (HHBs). HHBs hold significant importance for analyzing a person's personality, habits, and psychological changes. To solve these problems, in this work, we build a novel video dataset to demonstrate various HHBs. These behaviors in the proposed dataset are able to reflect internal mental states and specific emotions of the characters, e.g., crossing arms suggests to shield oneself from perceived threats. The dataset contains 30 categories of habitual behaviors including more than 300,000 frames and 6,899 action instances. Since these behaviors usually appear at small local parts of human action videos, it is difficult for existing action recognition methods to handle these local features. Therefore, we also propose a two-stream model using both human skeletons and RGB appearances. Experimental results demonstrate that our proposed method has much better performance in action recognition than the existing methods on the proposed dataset.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 反射型大言語モデルによるバイアスの発見

Uncovering Biases with Reflective Large Language Models ( http://arxiv.org/abs/2408.13464v1 )

ライセンス: Link先を確認

Edward Y. Chang,

(参考訳) 人間の努力に固有のバイアスは、機械学習、特にバイアスのある「地下真実」データに依存する教師あり学習に重大な課題をもたらす。この依存は、統計的な最大可能性に基づいて一般化するモデルの傾向と相まって、バイアスを伝播し、増幅し、社会的問題を悪化させる。そこで本研究では,複数の言語モデル(LLM)を動的対話に用い,多様な視点を明らかにするための反射的手法を提案する。条件付き統計、情報理論、発散メトリクスを活用することで、この新しいアプローチは文脈に依存した言語行動を促進し、バイアスのないアウトプットを促進する。さらに、特定バイアスに対処するために、測定可能な進捗追跡と説明可能な修復アクションを可能にする。

Biases inherent in human endeavors pose significant challenges for machine learning, particularly in supervised learning that relies on potentially biased "ground truth" data. This reliance, coupled with models' tendency to generalize based on statistical maximal likelihood, can propagate and amplify biases, exacerbating societal issues. To address this, our study proposes a reflective methodology utilizing multiple Large Language Models (LLMs) engaged in a dynamic dialogue to uncover diverse perspectives. By leveraging conditional statistics, information theory, and divergence metrics, this novel approach fosters context-dependent linguistic behaviors, promoting unbiased outputs. Furthermore, it enables measurable progress tracking and explainable remediation actions to address identified biases.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# LlamaDuo: サービスLLMから小規模ローカルLLMへのシームレス移行のためのLLMOpsパイプライン

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs ( http://arxiv.org/abs/2408.13467v1 )

ライセンス: Link先を確認

Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang, Sunghun Kim,

(参考訳) クラウドベースのプロプライエタリな大規模言語モデル(LLM)の普及は、運用上の依存関係、プライバシの懸念、継続的なインターネット接続の必要性など、大きな課題をもたらしている。本研究では,LLMOpsパイプライン"LlamaDuo"を導入し,サービス指向のLLMから,より小型でローカルに管理可能なモデルへの,知識と能力のシームレスな移行を実現する。このパイプラインは、運用上の障害、厳格なプライバシポリシ、あるいはオフライン要件の存在下でのサービス継続性を保証するために不可欠である。私たちのLlamaDuoは、後者によって生成された合成データセットを使用して、サービスLLMに対して小さな言語モデルを微調整します。細調整されたモデルの性能が期待に届かなかった場合、サービスLLMが作成した類似したデータを追加してさらに細調整を行うことで、性能が向上する。この反復的なプロセスは、小さなモデルが最終的に特定の下流タスクでLLMの能力と一致または超えることを保証するもので、制約のある環境でAIデプロイメントを管理するための実用的でスケーラブルなソリューションを提供する。各種下流タスクにおけるLlamaDuoの有効性,適応性,手頃性を示すために,先進LLMを用いた大規模実験を行った。パイプラインの実装はhttps://github.com/deep-diver/llamaduo.comで公開しています。

The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", for the seamless migration of knowledge and abilities from service-oriented LLMs to smaller, locally manageable models. This pipeline is crucial for ensuring service continuity in the presence of operational failures, strict privacy policies, or offline requirements. Our LlamaDuo involves fine-tuning a small language model against the service LLM using a synthetic dataset generated by the latter. If the performance of the fine-tuned model falls short of expectations, it is enhanced by further fine-tuning with additional similar data created by the service LLM. This iterative process guarantees that the smaller model can eventually match or even surpass the service LLM's capabilities in specific downstream tasks, offering a practical and scalable solution for managing AI deployments in constrained environments. Extensive experiments with leading edge LLMs are conducted to demonstrate the effectiveness, adaptability, and affordability of LlamaDuo across various downstream tasks. Our pipeline implementation is available at https://github.com/deep-diver/llamaduo.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# アンタングル生成グラフ表現学習

Disentangled Generative Graph Representation Learning ( http://arxiv.org/abs/2408.13471v1 )

ライセンス: Link先を確認

Xinyue Hu, Zhibin Duan, Xinyang Liu, Yuxin Li, Bo Chen, Mingyuan Zhou,

(参考訳) 近年,自己教師付き手法によるグラフ表現の学習において,生成グラフモデルが有望な結果を示している。しかし、既存の生成グラフ表現学習(GRL)のアプローチのほとんどは、学習された表現の絡み合いを無視するランダムマスキングに依存している。この監視は、非破壊性と説明可能性の欠如をもたらす。さらに、学習した表現のアンタングル化は依然として重要な課題であり、GRL研究では十分に研究されていない。これらの知見に基づいて,自己教師型学習フレームワークであるDiGGR(Disentangled Generative Graph Representation Learning)を紹介する。 DiGGRは、潜伏不整合因子を学習し、グラフマスクモデリングをガイドし、学習された表現の非整合性を高め、エンドツーエンドのジョイントラーニングを可能にすることを目的としている。 2つの異なるグラフ学習タスクのための11の公開データセットに対する大規模な実験により、DiGGRは、提案手法の有効性を検証し、従来よりも一貫して多くの自己教師付き手法より優れていることが示された。

Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability. Furthermore, disentangling the learned representations remains a significant challenge and has not been sufficiently explored in GRL research. Based on these insights, this paper introduces DiGGR (Disentangled Generative Graph Representation Learning), a self-supervised learning framework. DiGGR aims to learn latent disentangled factors and utilizes them to guide graph mask modeling, thereby enhancing the disentanglement of learned representations and enabling end-to-end joint learning. Extensive experiments on 11 public datasets for two different graph learning tasks demonstrate that DiGGR consistently outperforms many previous self-supervised methods, verifying the effectiveness of the proposed approach.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 対称局所ランダム回路のユニタリ設計

Unitary Designs of Symmetric Local Random Circuits ( http://arxiv.org/abs/2408.13472v1 )

ライセンス: Link先を確認

Yosuke Mitsuhashi, Ryotaro Suzuki, Tomohiro Soejima, Nobuyuki Yoshioka,

(参考訳) 我々は、対称局所乱数回路によって生成されるユニタリ設計を特徴付ける方法を確立した。具体的には、近似t-設計を形成する回路に必要な十分条件が、一般対称性と局所性に対する単純な整数最適化によって与えられることを示した。この結果を用いて、一般局所性に対する$\mathbb{Z}_2$, U(1), SU(2)対称性の下で、ユニタリ設計の極大順序を明示的に与える。この研究は、対称性の基本概念とランダム性の観点からの局所性の関係を明らかにする。

We have established the method of characterizing the unitary design generated by a symmetric local random circuit. Concretely, we have shown that the necessary and sufficient condition for the circuit forming an approximate t-design is given by simple integer optimization for general symmetry and locality. By using the result, we explicitly give the maximal order of unitary design under the $\mathbb{Z}_2$, U(1), and SU(2) symmetries for general locality. This work reveals the relation between the fundamental notions of symmetry and locality in terms of randomness.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# アンチワーク:RoBERTaをベースとした作業関連ストレス同定とリード要因分析システム

Why Antiwork: A RoBERTa-Based System for Work-Related Stress Identification and Leading Factor Analysis ( http://arxiv.org/abs/2408.13473v1 )

ライセンス: Link先を確認

Tao Lu, Muzhe Wu, Xinyi Lu, Siyuan Xu, Shuyu Zhan, Anuj Tambwekar, Emily Mower Provost,

(参考訳) ハーシュ労働環境と労働関連ストレスは、不安、抑うつ、自殺の考えといった精神的な健康問題に寄与することが知られている。そのため、従業員の不幸を検知し、問題の根本原因を見つけることができるソリューションを作成することが最重要である。従来の研究は機械学習を用いてメンタルヘルスの原因を調べてきたが、一般的には一般的なメンタルヘルス分析に焦点を合わせており、説明可能なソリューションや職場固有の設定に焦点を絞っているものはほとんどない。 r/antiworkは、反作業運動のサブレディットです。このサブレディットを職場環境の不満のプロキシとして利用し、アンチワーク感情検出のための新しいデータセットを作成し、その後、アンチワーク感情で単語をハイライトするモデルを訓練する。その後、我々は定性的かつ定量的な分析を行い、反作業運動と同一視する個人のマインドセットに対する重要な洞察と、彼らの作業環境がそれらにどのように影響するかを明らかにした。我々は、従業員の権限や責任を与えない作業環境、求職経験のフラストレーション、不公平な報酬が、反作業感情の主因となっていることを発見し、その結果、従業員の自己自信やモチベーションが欠如している。

Harsh working environments and work-related stress have been known to contribute to mental health problems such as anxiety, depression, and suicidal ideation. As such, it is paramount to create solutions that can both detect employee unhappiness and find the root cause of the problem. While prior works have examined causes of mental health using machine learning, they typically focus on general mental health analysis, with few of them focusing on explainable solutions or looking at the workplace-specific setting. r/antiwork is a subreddit for the antiwork movement, which is the desire to stop working altogether. Using this subreddit as a proxy for work environment dissatisfaction, we create a new dataset for antiwork sentiment detection and subsequently train a model that highlights the words with antiwork sentiments. Following this, we performed a qualitative and quantitative analysis to uncover some of the key insights into the mindset of individuals who identify with the antiwork movement and how their working environments influenced them. We find that working environments that do not give employees authority or responsibility, frustrating recruiting experiences, and unfair compensation, are some of the leading causes of the antiwork sentiment, resulting in a lack of self-confidence and motivation among their employees.

翻訳日:2024-08-27 19:29:34 公開日:2024-08-24

# 連続ゲート集合の量子回路におけるランダム性の評価

Characterization of Randomness in Quantum Circuits of Continuous Gate Sets ( http://arxiv.org/abs/2408.13475v1 )

ライセンス: Link先を確認

Yosuke Mitsuhashi, Ryotaro Suzuki, Tomohiro Soejima, Nobuyuki Yoshioka,

(参考訳) arXiv:2408.XXXXX の付録では、対称局所乱数回路によって生成される近似ユニタリな設計の極大順序を特徴付ける方法を確立し、$\mathbb{Z}_2$, U(1), SU(2)対称性の場合にその順序を明示的に指定した。ここでは、一般対称性と具体的な対称性に対する主定理の導出についての詳細を述べる。さらに、対称局所ユニタリゲート集合を含む連結コンパクトユニタリ部分群の有限集合にアクセス可能な一般フレームワークを考える。

In the accompanying paper of arXiv:2408.XXXXX, we have established the method of characterizing the maximal order of approximate unitary designs generated by symmetric local random circuits, and have explicitly specified the order in the cases of $\mathbb{Z}_2$, U(1), and SU(2) symmetries. Here, we provide full details on the derivation of the main theorems for general symmetry and for concrete symmetries. Furthermore, we consider a general framework where we have access to a finite set of connected compact unitary subgroups, which includes symmetric local unitary gate sets.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# 量子機械による創薬支援 : 調査と展望

Quantum-machine-assisted Drug Discovery: Survey and Perspective ( http://arxiv.org/abs/2408.13479v1 )

ライセンス: Link先を確認

Yidong Zhou, Jintai Chen, Weikang Li, Jinglei Cheng, Gopal Karemore, Marinka Zitnik, Frederic Chong, Junyu Liu, Tianfan Fu, Zhiding Liang,

(参考訳) 医薬品の発見と開発は複雑でコストのかかる取り組みであり、新しい薬を市場に出すには10年以上の資金と相当な資金を必要としている。従来のコンピュータ支援ドラッグデザイン(CADD)は、このプロセスの加速に大きな進歩を遂げてきたが、量子コンピューティングの開発は、そのユニークな能力のために潜在的に有益である。本稿では、量子コンピューティングの創薬・開発への統合について論じ、量子技術が医薬品開発サイクルの様々な段階をいかに加速し、促進するかに焦点を当てる。具体的には,分子シミュレーションや薬物-標的相互作用の予測,臨床試験結果の最適化など,薬物発見に関わる課題への量子コンピューティングの適用について検討する。量子コンピューティングの本質的な能力を活用することで、新しい薬を市場に投入する際の時間とコストを削減できるかもしれません。

Drug discovery and development is a highly complex and costly endeavor, typically requiring over a decade and substantial financial investment to bring a new drug to market. Traditional computer-aided drug design (CADD) has made significant progress in accelerating this process, but the development of quantum computing offers potential due to its unique capabilities. This paper discusses the integration of quantum computing into drug discovery and development, focusing on how quantum technologies might accelerate and enhance various stages of the drug development cycle. Specifically, we explore the application of quantum computing in addressing challenges related to drug discovery, such as molecular simulation and the prediction of drug-target interactions, as well as the optimization of clinical trial outcomes. By leveraging the inherent capabilities of quantum computing, we might be able to reduce the time and cost associated with bringing new drugs to market, ultimately benefiting public health.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# MPruner: CKAに基づく相互情報処理によるニューラルネットワークサイズ最適化

MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning ( http://arxiv.org/abs/2408.13482v1 )

ライセンス: Link先を確認

Seungbeom Hu, ChanJun Park, Andrew Ferraiuolo, Sang-Ki Ko, Jinwoo Kim, Haein Song, Jieung Kim,

(参考訳) 実行時のパフォーマンスとメモリ使用量に直接影響するため、ニューラルネットワークの最適なサイズを決定することが重要だ。プルーニング(Pruning)は、ニューラルネットワークのサイズを削減し、精度の保存を数学的に保証する、よく確立されたモデル圧縮技術である。しかし、最近のプルーニングメソッドの多くは、個々のモデルコンポーネントのグローバルなコントリビューションを見落としているため、プルーニングされたモデルが望ましいデータセットとパフォーマンス要件を満たすことを保証するのは難しい。これらの課題に対処するため,ベクトル類似性により相互情報を活用する新しいプルーニングアルゴリズムMPrunerを開発した。 MPrunerはCKA(Centered Kernel Alignment)の類似度測定でレイヤクラスタリングを活用し、ニューラルネットワークのグローバル情報をより正確で効率的なレイヤワイドプルーニングに組み込むことができる。我々はMPrunerを様々なアーキテクチャや構成で評価し、その汎用性を実証し、実践的なガイドラインを提供した。 MPrunerはCNNとトランスフォーマーベースのモデルで最大50%のパラメータとメモリ使用量の削減を実現した。

Determining the optimal size of a neural network is critical, as it directly impacts runtime performance and memory usage. Pruning is a well-established model compression technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation. However, many recent pruning methods overlook the global contributions of individual model components, making it difficult to ensure that a pruned model meets the desired dataset and performance requirements. To address these challenges, we developed a new pruning algorithm, MPruner, that leverages mutual information through vector similarity. MPruner utilizes layer clustering with the Centered Kernel Alignment (CKA) similarity metric, allowing us to incorporate global information from the neural network for more precise and efficient layer-wise pruning. We evaluated MPruner across various architectures and configurations, demonstrating its versatility and providing practical guidelines. MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# IntOPE: 干渉の有無におけるオフ・ポリティ・アセスメント

IntOPE: Off-Policy Evaluation in the Presence of Interference ( http://arxiv.org/abs/2408.13484v1 )

ライセンス: Link先を確認

Yuqi Bai, Ziyu Zhao, Minqin Zhu, Kun Kuang,

(参考訳) オフ・ポリシィ・アセスメント(OPE: Off-Policy Evaluation)は、個人化された医療やレコメンデーションシステムなど、オンラインインタラクションが重大なリスクやコストに結びついている分野において重要な、ログ化された文脈的包括的フィードバックを用いて、仮説的ポリシーの潜在的影響を評価するために用いられる。伝統的に、OPEの手法は安定単位処理値推定 (SUTVA) に依存しており、これは任意の個人に対する報酬が他人の行動に影響されないと仮定している。しかし、この仮定は、個人が自分の行動だけでなく、仲間の行動にも影響される、干渉の存在によって現実のシナリオで失敗することが多い。この実現は、現実世界のアプリケーションにおける既存のOPEメソッドの重大な制限を明らかにしている。この制限に対処するため,IPW(Inverse Probability Weighting, 逆確率重み付け)フレームワークを拡張したIPW型推定器であるIntIPWを提案する。 IntIPW法の有効性を実証するために, 合成データと実世界のデータの両方を用いて大規模な実験を行った。

Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# ターゲットの任意ライブラリーの分類における量子イルミネーションの利点

Quantum Illumination Advantage for Classification Among an Arbitrary Library of Targets ( http://arxiv.org/abs/2408.13489v1 )

ライセンス: Link先を確認

Ali Cox, Quntao Zhuang, Jeffrey H. Shapiro, Saikat Guha,

(参考訳) 量子照明(QI)は、理想記憶に保持された基準ビームと量子状態が絡み合っている送信機プローブを用いてシーンを照会するタスクであり、その後、記憶された基準と共に目標反転光を最適に検出し、同じ明るさおよびその他の同じ条件の古典的な送信機で達成可能なものを超える精度で、スタンオフ範囲の目標特性を決定する。摂動理論のツールを用いて, 透過率の低い輝度, 高損失, 高熱背景の限界において, 古典的コヒーレント状態照明(CI)に比べて, ガウス状態の絡み合ったQIプローブを用いた場合, 任意のアプリオリ反射目標の識別において, 誤り確率のチャーノフ指数が4倍に向上することを示した。この利点は標的の有無を検出することで知られていたが、任意の対象ライブラリを識別する一般的なタスクでは証明されなかった。その結果,QI と CI の量子チャーノフ指数の低次漸近展開に対する簡易な一般解析式を,受光器の入室後に空間モードソータによって分離された場合の信号輝度,損失,熱雑音,および目標反射光の放射光出射プロファイルの変調膨張係数を用いて導出した。

Quantum illumination (QI) is the task of querying a scene using a transmitter probe whose quantum state is entangled with a reference beam retained in ideal storage, followed by optimally detecting the target-returned light together with the stored reference, to make decisions on characteristics of targets at stand-off range, at precision that exceeds what is achievable with a classical transmitter of the same brightness and otherwise identical conditions. Using tools from perturbation theory, we show that in the limit of low transmitter brightness, high loss, and high thermal background, there is a factor of four improvement in the Chernoff exponent of the error probability in discriminating any number of apriori-known reflective targets when using a Gaussian-state entangled QI probe, over using classical coherent-state illumination (CI). While this advantage was known for detecting the presence or absence of a target, it had not been proven for the generalized task of discriminating between arbitrary target libraries. In proving our result, we derive simple general analytic expressions for the lowest-order asymptotic expansions of the quantum Chernoff exponents for QI and CI in terms of the signal brightness, loss, thermal noise, and the modal expansion coefficients of the target-reflected light's radiant exitance profiles when separated by a spatial mode sorter after entering the entrance pupil of the receiver's aperture.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# ESA:セマンティックセグメンテーションのためのアノテーション効率の良いアクティブラーニング

ESA: Annotation-Efficient Active Learning for Semantic Segmentation ( http://arxiv.org/abs/2408.13491v1 )

ライセンス: Link先を確認

Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao,

(参考訳) アクティブラーニングは、ラベル付けのための最も明快なサンプルを選択することで、アノテーションの効率を高める。セマンティックセグメンテーションのこれまでの方法は、個々のピクセルや小さな領域を中心に、自然画像の豊富なパターンや高度な事前学習モデルのパワーを無視してきた。まず、局所的な構造的手がかりを捉えるために、スーパーピクセルグループ化と組み合わせた、クラス非依存のマスク提案ネットワークを利用する革新的で効率的なアクティブラーニング戦略であるEntity-Superpixel Annotation (ESA)を導入する。さらに,対象領域の各画像内のエンティティのサブセットを選択し,エントロピーの高いスーパーピクセルを優先し,包括的表現を保証する。同時に、限られた数のキーエンティティに焦点を当て、効率を最適化する。本手法は,画像固有の構造を活かしたアノテータフレンドリな設計により,既存の画素ベースの手法よりも優れ,最小限のクエリで優れた結果を得ることができ,特にクリックコストを98%削減し,性能を1.71%向上させることができる。例えば、従来の手法で要求される5000クリックとは対照的に、アノテーションにはたった40クリックしか必要としない。

Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributions: Firstly, we introduce Entity-Superpixel Annotation (ESA), an innovative and efficient active learning strategy which utilizes a class-agnostic mask proposal network coupled with super-pixel grouping to capture local structural cues. Additionally, our method selects a subset of entities within each image of the target domain, prioritizing superpixels with high entropy to ensure comprehensive representation. Simultaneously, it focuses on a limited number of key entities, thereby optimizing for efficiency. By utilizing an annotator-friendly design that capitalizes on the inherent structure of images, our approach significantly outperforms existing pixel-based methods, achieving superior results with minimal queries, specifically reducing click cost by 98% and enhancing performance by 1.71%. For instance, our technique requires a mere 40 clicks for annotation, a stark contrast to the 5000 clicks demanded by conventional methods.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# オンライン連続一般化カテゴリー発見

Online Continuous Generalized Category Discovery ( http://arxiv.org/abs/2408.13492v1 )

ライセンス: Link先を確認

Keon-Hee Park, Hakyung Lee, Kyungwoo Song, Gyeong-Moon Park,

(参考訳) コンピュータビジョンにおけるディープニューラルネットワークの進歩により、人工知能(AI)は現実世界の応用に広く利用されている。しかし、AIは依然として、新しいカテゴリー発見のような高度な人間の能力を模倣する際の限界に直面している。オフライン連続学習を利用した新しいカテゴリー発見手法が提案されているが、実環境におけるデータストリームの連続性は無視されている。本研究では,オンライン連続一般化カテゴリー発見(OCGCD)を紹介し,データストリームの動的性質について考察する。さらに,エネルギー誘導による新たなカテゴリーの発見と,エネルギーに基づくコントラッシブ・ロスによる差別的学習を促進する手法であるDEAN,ディスカバリ・バイ・エナジー・ガイダンス,機能拡張比Nを提案する。さらに、DECANは分散ベースの特徴拡張を通じて、ラベルなしデータを効果的に擬似ラベルする。実験の結果,提案手法はOCGCDシナリオにおいて優れた性能を発揮することが示された。

With the advancement of deep neural networks in computer vision, artificial intelligence (AI) is widely employed in real-world applications. However, AI still faces limitations in mimicking high-level human capabilities, such as novel category discovery, for practical use. While some methods utilizing offline continual learning have been proposed for novel category discovery, they neglect the continuity of data streams in real-world settings. In this work, we introduce Online Continuous Generalized Category Discovery (OCGCD), which considers the dynamic nature of data streams where data can be created and deleted in real time. Additionally, we propose a novel method, DEAN, Discovery via Energy guidance and feature AugmentatioN, which can discover novel categories in an online manner through energy-guided discovery and facilitate discriminative learning via energy-based contrastive loss. Furthermore, DEAN effectively pseudo-labels unlabeled data through variance-based feature augmentation. Experimental results demonstrate that our proposed DEAN achieves outstanding performance in proposed OCGCD scenario.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# 多目的強化学習における閾値レキソグラフィ

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning ( http://arxiv.org/abs/2408.13493v1 )

ライセンス: Link先を確認

Alperen Tercan, Vinayak S. Prabhu,

(参考訳) 語彙的多目的問題は、多くの現実のシナリオにおいて、目的に対して語彙的重要性の順序を課す。既存の強化学習では、語彙的タスクに直接対処する作業が不足している。ベルマン方程式はそれらに適用できないため、いくつかの提案されたアプローチは、理論的な保証なしにヒューリスティックであるとみなされた。さらに、これらの従来のアプローチの実践的適用性も、目標状態に到達できないなど、さまざまな問題に悩まされている。これらの問題のいくつかは以前にも知られていたが、本研究ではさらなる欠点を調査し、多くの場合、実用的なパフォーマンスを改善するための修正を提案する。また,Lexicographic Projection Optimization (LPO)アルゴリズムを用いた政策最適化手法を提案する。最後に,ベンチマーク問題に対する提案アルゴリズムの実証を行った。

Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# Bモード超音波画像からのヒップランドマーク検出のためのトポロジカルGCN

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images ( http://arxiv.org/abs/2408.13495v1 )

ライセンス: Link先を確認

Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi,

(参考訳) Bモード超音波を用いたコンピュータ支援診断 (CAD) は, 乳児の発達障害 (DDH) の診断に有効であることを示した。しかし, 超音波インジェスにおけるスペックルノイズの影響から, ヒップランドマークを正確に検出することは依然として課題である。本研究では,トポロジカルGCN (TGCN) と改良コンバータ (TGCN-ICF) を統合した新しいヒップランドマーク検出モデルを提案する。 TGCN-ICFには、熱マップを生成する改良コンバータ(ICF)サブネットワークと、ランドマーク検出をさらに洗練するTGCNサブネットワークという2つのサブネットワークが含まれている。このTGCNは、クラスラベルのガイダンスにより検出精度を効果的に向上させることができる。 Moreo-ver では,Multual Modulation Fusion (MMF) モジュールが開発され,ICF の U-Net と Transformer のブランチから抽出した特徴を深く改ざんし,融合する。実DDHデータセットにおける実験結果から,提案したTGCN-ICFが比較アルゴリズムのすべてより優れていることが示された。

The B-mode ultrasound based computer-aided diagnosis (CAD) has demonstrated its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants. However, due to effect of speckle noise in ultrasound im-ages, it is still a challenge task to accurately detect hip landmarks. In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance. The TGCN-ICF includes two subnet-works: an Improved Conformer (ICF) subnetwork to generate heatmaps and a TGCN subnetwork to additionally refine landmark detection. This TGCN can effectively improve detection accuracy with the guidance of class labels. Moreo-ver, a Mutual Modulation Fusion (MMF) module is developed for deeply ex-changing and fusing the features extracted from the U-Net and Transformer branches in ICF. The experimental results on the real DDH dataset demonstrate that the proposed TGCN-ICF outperforms all the compared algorithms.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# 虹彩周囲画像作成の可能性について

On the Feasibility of Creating Iris Periocular Morphed Images ( http://arxiv.org/abs/2408.13496v1 )

ライセンス: Link先を確認

Juan E. Tapia, Sebastian Gonzalez, Daniel Benalcazar, Christoph Busch,

(参考訳) ここ数年、顔認識システム(FRS)の複雑な課題として、顔の変形が示されている。したがって,指紋,虹彩,その他の生体特性の評価は,生体システムを強化するために検討され,評価されなければならない。本研究は、画像レベルで虹彩形態を生成するためのエンドツーエンドのフレームワークを提案し、眼周囲虹彩画像から虹彩形態を生成する。このフレームワークは、ペア対象の選択、セグメンテーション、形態形成、新しい虹彩認識システムなど、さまざまな段階を考慮している。現実的な形態画像を作成するために、ランダムな選択と類似の半径サイズ選択という2つの対象選択法が検討されている。また,脆弱性解析と単一モーフィング検出アルゴリズムについても検討した。その結果,従来の虹彩認識システムと混同できる非常にリアルな画像が得られた。

In the last few years, face morphing has been shown to be a complex challenge for Face Recognition Systems (FRS). Thus, the evaluation of other biometric modalities such as fingerprint, iris, and others must be explored and evaluated to enhance biometric systems. This work proposes an end-to-end framework to produce iris morphs at the image level, creating morphs from Periocular iris images. This framework considers different stages such as pair subject selection, segmentation, morph creation, and a new iris recognition system. In order to create realistic morphed images, two approaches for subject selection are explored: random selection and similar radius size selection. A vulnerability analysis and a Single Morphing Attack Detection algorithm were also explored. The results show that this approach obtained very realistic images that can confuse conventional iris recognition systems.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# 因果強化学習における状態分散の再考

Rethinking State Disentanglement in Causal Reinforcement Learning ( http://arxiv.org/abs/2408.13498v1 )

ライセンス: Link先を確認

Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi,

(参考訳) 雑音に対処する際の強化学習(RL)における重要な課題の1つは、潜在状態を観測から推定することである。因果性は、根底にある状態が識別可能性によって一意に回復できることを保証するための厳密な理論的支援を提供する。その結果、いくつかの既存の研究は、アルゴリズムの設計を支援するために因果的な視点から識別可能性を確立することに重点を置いている。しかしながら、これらの結果はしばしば、特定のRLコンテキストを無視する純粋に因果的な視点から導かれる。我々はこの研究ラインを再考し、RL固有のコンテキストを取り入れることで、潜在状態に対する以前の識別可能性分析における不要な仮定を低減できることを示した。さらに重要なのは、これらの仮定を削除することで、アルゴリズム設計は、それらによって制約された以前の境界を超えることができることだ。これらの知見を生かして、従来手法の複雑な構造的制約を遷移と報酬保存の2つの単純な制約に置き換えることで、一般に観測可能なマルコフ決定過程(POMDP)の新たなアプローチを提案する。この2つの制約により、提案アルゴリズムは、基礎となる力学に忠実な状態とノイズを乱すことが保証される。広範囲なベンチマーク制御タスクによる実証的な証拠は、我々のアプローチが既存の手法よりも優れていることを示す。

One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# R2Gは3Dシーンで地平線に反応する

R2G: Reasoning to Ground in 3D Scenes ( http://arxiv.org/abs/2408.13499v1 )

ライセンス: Link先を確認

Yixuan Li, Zan Wang, Wei Liang,

(参考訳) 本稿では,3次元シーン内の対象物体を理論的にグラウンド化するニューラルネットワークモデルであるReasoning to Ground (R2G)を提案する。従来の作業とは対照的に、R2Gは意味論的概念に基づくシーングラフで3Dシーンを明示的にモデル化し、オブジェクトエンティティ間での注意伝達を反復的にシミュレートすることで、ターゲットオブジェクトを最も高い確率でグラウンド化するプロセスを実現する。具体的には、事前に定義された意味語彙を用いて、グラフノード内に複数のオブジェクト特性を埋め込み、エッジ内にエンティティ間の空間的関係を埋め込む。注意伝達を導くために、私たちは、参照発話を分析して、同じ意味空間内の推論命令に変換する学習やプロンプトベースの手法を採用している。各推論ラウンドにおいて、R2Gは(1)命令と埋め込みエンティティプロパティの類似性と現在の注意分布をマージするか、(2)命令と埋め込み空間関係の類似性に基づいてシーングラフに注目を移す。 Sr3D/Nr3Dベンチマークの実験により、R2Gは3D言語接地のための新しいパスを破り、解釈可能性の改善を維持しながら、以前の作業と同等の結果を得ることが示された。

We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner. In contrast to prior works, R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities; thus makes the process of grounding the target objects with the highest probability interpretable. Specifically, we respectively embed multiple object properties within the graph nodes and spatial relations among entities within the edges, utilizing a predefined semantic vocabulary. To guide attention transferring, we employ learning or prompting-based methods to analyze the referential utterance and convert it into reasoning instructions within the same semantic space. In each reasoning round, R2G either (1) merges current attention distribution with the similarity between the instruction and embedded entity properties or (2) shifts the attention across the scene graph based on the similarity between the instruction and embedded spatial relations. The experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while maintaining improved interpretability, breaking a new path for 3D language grounding.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# 古代中国医学におけるエンティティ認識のための大規模言語モデルを用いた新型コロナウイルス文学の比較研究

Utilizing Large Language Models for Named Entity Recognition in Traditional Chinese Medicine against COVID-19 Literature: Comparative Study ( http://arxiv.org/abs/2408.13501v1 )

ライセンス: Link先を確認

Xu Tong, Nina Smirnova, Sharmila Upadhyaya, Ran Yu, Jack H. Culbert, Chao Sun, Wolfgang Otto, Philipp Mayr,

(参考訳) 目的: 新型コロナウイルスの文献に対するTCM内のさまざまなエンティティタイプやドメインをカバーするドメイン固有のNERタスクにおいて、ChatGPTや他の最先端のLLMのパフォーマンスを探索し、比較する。方法: 新型コロナウイルスに対するTCMに関する389項目のデータセットを作成し, その内48項目に3つのドメインに属する6種類のエンティティを手動で注釈付けし, LLMのNER性能を評価した。次に,ChatGPT (GPT-3.5, GPT-4) と4つの最先端BERTベースのQAモデル (RoBERTa, MiniLM, PubMedBERT, SciBERT) を用いて,特定のタスクを事前にトレーニングすることなく,NERタスクを実行した。ドメインファインチューニングモデル (GSAP-NER) も包括的な比較に応用された。結果: LLMの総合的な性能は, 正確な一致とファジィマッチにおいて有意に異なっていた。ファジィマッチでは、ChatGPTは6タスク中5タスクでBERTベースのQAモデルを上回ったが、正確なマッチでは、BERTベースのQAモデルは6タスク中5タスクでChatGPTを上回ったが、F-1の差は小さい。 GPT-4はファジィマッチにおける他のモデル、特にTCM式と中国の特許医薬品(TFD)および成分(IG)の実体型に対して有意な優位性を示した。 GPT-4は、エンティティタイプであるハーブ、ターゲット、研究方法においてBERTベースのモデルよりも優れていたが、F-1のスコアは0.5を超えなかった。 GSAP-NERはGPT-4よりもF-1よりもRMにわずかに差があった。 ChatGPTは、特にファジィマッチにおいて、精度よりもかなり高いリコールを達成した。結論: LLMのNERパフォーマンスはエンティティタイプに大きく依存しており、そのパフォーマンスはアプリケーションのシナリオによって異なります。高いリコールが好まれるシナリオでは、ChatGPTがよい選択になるかも知れません。しかし、厳密なシナリオでの知識獲得については、ChatGPTやBERTベースのQAモデルはプロの実践者のための既製のツールではない。

Objective: To explore and compare the performance of ChatGPT and other state-of-the-art LLMs on domain-specific NER tasks covering different entity types and domains in TCM against COVID-19 literature. Methods: We established a dataset of 389 articles on TCM against COVID-19, and manually annotated 48 of them with 6 types of entities belonging to 3 domains as the ground truth, against which the NER performance of LLMs can be assessed. We then performed NER tasks for the 6 entity types using ChatGPT (GPT-3.5 and GPT-4) and 4 state-of-the-art BERT-based question-answering (QA) models (RoBERTa, MiniLM, PubMedBERT and SciBERT) without prior training on the specific task. A domain fine-tuned model (GSAP-NER) was also applied for a comprehensive comparison. Results: The overall performance of LLMs varied significantly in exact match and fuzzy match. In the fuzzy match, ChatGPT surpassed BERT-based QA models in 5 out of 6 tasks, while in exact match, BERT-based QA models outperformed ChatGPT in 5 out of 6 tasks but with a smaller F-1 difference. GPT-4 showed a significant advantage over other models in fuzzy match, especially on the entity type of TCM formula and the Chinese patent drug (TFD) and ingredient (IG). Although GPT-4 outperformed BERT-based models on entity type of herb, target, and research method, none of the F-1 scores exceeded 0.5. GSAP-NER, outperformed GPT-4 in terms of F-1 by a slight margin on RM. ChatGPT achieved considerably higher recalls than precisions, particularly in the fuzzy match. Conclusions: The NER performance of LLMs is highly dependent on the entity type, and their performance varies across application scenarios. ChatGPT could be a good choice for scenarios where high recall is favored. However, for knowledge acquisition in rigorous scenarios, neither ChatGPT nor BERT-based QA models are off-the-shelf tools for professional practitioners.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# タタミプリンター:タタミパッズ用物理ZKP

Tatami Printer: Physical ZKPs for Tatami Puzzles ( http://arxiv.org/abs/2408.13507v1 )

ライセンス: Link先を確認

Suthee Ruangwises,

(参考訳) 畳パズル(たたみパズル、英: Tatami puzzles)は、矩形格子を四つの領域がコーナーポイントを共有しないような長方形領域に分割する目的を持つ鉛筆と紙の論理パズルである。本稿では,タタミパズルの解法を検証するために,タタミプリンタと呼ばれる物理カードベースのプロトコルを開発する。また、タタミプリンタを用いて、タタミバリとスクエアジャムという2つのパズルの物理的ゼロ知識証明プロトコルを構築する。これらのプロトコルにより、証明者はパズルの解の存在を証明者が明らかにすることなく示すことができる。

Tatami puzzles are pencil-and-paper logic puzzles with an objective to partition a rectangular grid into rectangular regions such that no four regions share a corner point, as well as satisfying other constraints. In this paper, we develop a physical card-based protocol called Tatami printer that can help verify solutions of Tatami puzzles. We also use the Tatami printer to construct physical zero-knowledge proof protocols for two such puzzles: Tatamibari and Square Jam. These protocols enable a prover to show a verifier the existence of the puzzles' solutions without revealing them.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# G3DST: シーンとスタイルにわたるニューラルラジアンス場を用いた3次元移動の一般化

G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles ( http://arxiv.org/abs/2408.13508v1 )

ライセンス: Link先を確認

Adil Meric, Umut Kocasari, Matthias Nießner, Barbara Roessle,

(参考訳) NeRF(Neural Radiance Fields)は、高精細でフォトリアリスティックなシーンを作るための強力なツールとして登場した。既存のNeRFベースの3Dスタイル転送手法では、シングルまたは複数スタイルのシーンごとの最適化が必要であり、3Dスタイル転送の適用性と効率が制限される。本研究では, シーンごとの最適化やスタイルごとの最適化を必要とせずに, NeRF からスタイリングされた新しいビューをレンダリングすることで, 既存の手法の限界を克服する。この目的のために、一般化可能なNeRFモデルを利用して3次元のスタイル伝達を容易にし、様々な場面で1つの学習モデルを使用することを可能にした。ハイパーネットワークを一般化可能なNeRFに組み込むことで,スタイリングされた新規ビューをオンザフライで生成することが可能になる。さらに,複数のビューにまたがる一貫性を維持するために,新しいフローベース多視点整合性損失を導入する。シーン固有の暗黙的モデルを必要としない高品質で多視点整合性のあるスタイリング画像を生成する上で,これらの手法を様々なシーンや芸術的スタイルで評価し,その性能を示す。以上の結果から,本手法はシーンごとの手法に匹敵する良質な視覚的品質を実現するだけでなく,効率性や適用性も著しく向上し,3Dスタイル転送の分野における顕著な進歩を示すことが示唆された。

Neural Radiance Fields (NeRF) have emerged as a powerful tool for creating highly detailed and photorealistic scenes. Existing methods for NeRF-based 3D style transfer need extensive per-scene optimization for single or multiple styles, limiting the applicability and efficiency of 3D style transfer. In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization. To this end, we take advantage of a generalizable NeRF model to facilitate style transfer in 3D, thereby enabling the use of a single learned model across various scenes. By incorporating a hypernetwork into a generalizable NeRF, our approach enables on-the-fly generation of stylized novel views. Moreover, we introduce a novel flow-based multi-view consistency loss to preserve consistency across multiple views. We evaluate our method across various scenes and artistic styles and show its performance in generating high-quality and multi-view consistent stylized images without the need for a scene-specific implicit model. Our findings demonstrate that this approach not only achieves a good visual quality comparable to that of per-scene methods but also significantly enhances efficiency and applicability, marking a notable advancement in the field of 3D style transfer.

翻訳日:2024-08-27 19:19:21 公開日:2024-08-24

# DualAnoDiff:Few-Shot異常画像生成のためのDual-Interrelated Diffusion Model

DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation ( http://arxiv.org/abs/2408.13509v1 )

ライセンス: Link先を確認

Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Hao Chen, Jiafu Wu, Wenbing Zhu, Mingmin Chi, Jun Liu, Yabiao Wang, Chengjie Wang,

(参考訳) 製造業における異常検査の性能は異常データの不足によって制約される。この課題を克服するために、研究者は異常データセットを増大させるために異常生成アプローチを採用し始めた。しかし、既存の異常生成法は、生成した異常の多様性が限られており、この異常を元の画像とシームレスに融合させるのに苦労している。本稿では,これらの課題を新たな視点から克服し,全体像と対応する異常部分を同時に生成する。本稿では,新しい拡散型少数ショット画像生成モデルであるDualAnoDiffを提案する。このモデルでは,2つの相互関連拡散モデルを用いて多種多様な現実的な画像を生成することができ,一方が画像全体を生成するのに使われ,他方が異常部分を生成する。さらに,背景情報や形状情報を抽出することで,画像生成時の歪みやぼやけを緩和する。集約的な実験は,本提案モデルが現実主義と多様性の両方の観点から,最先端の手法よりも優れていることを示す。本手法は, 異常検出, 異常局所化, 異常分類タスクなど, 下流異常検出タスクの性能を大幅に向上させる。

The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of both realism and diversity. Overall, our approach significantly improves the performance of downstream anomaly detection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 六方晶窒化ホウ素のスピン対からの量子放出

Quantum Emission from Coupled Spin Pairs in Hexagonal Boron Nitride ( http://arxiv.org/abs/2408.13515v1 )

ライセンス: Link先を確認

Song Li, Anton Pershin, Adam Gali,

(参考訳) 広帯域ギャップ材料における光学的に対応可能な欠陥量子ビットは、室温量子情報処理の候補として好ましい。 2次元(2次元)六方晶窒化ホウ素(hBN)は、量子メモリで明るい量子エミッタをホストする優れた固体プラットフォームであり、2次元材料のポテンシャルを利用して欠陥量子ビットのスケーラブルな調製を実現する。室温の明るい欠陥量子ビットは近年hBNで報告されているが、その微視的起源は、光学遷移の性質と光学的に検出された磁気共鳴(ODMR)の性質が解明されていない。ここでは、量子エミッタの光安定性とスペクトル拡散を、アブイニシアト計算を用いてhBN内のドナー・アクセプター対(DAP)に接続する。 DAPは、ドナーパートナーに依存する非ゼロ磁場において、S = 1/2基底状態の欠陥対のアクセプター対に対してODMR信号を示すことができる。ドナー・アクセプターペアモデルとその遷移機構は、量子アプリケーションのためのhBNにおける欠陥量子ビット識別と性能最適化のためのレシピを提供する。

Optically addressable defect qubits in wide band gap materials are favorable candidates for room temperature quantum information processing. The two dimensional (2D) hexagonal boron nitride (hBN) is an attractive solid state platform with a great potential for hosting bright quantum emitters with quantum memories with leveraging the potential of 2D materials for realizing scalable preparation of defect qubits. Although, room temperature bright defect qubits have been recently reported in hBN but their microscopic origin, the nature of the optical transition as well as the optically detected magnetic resonance (ODMR) have been remained elusive. Here we connect the photostability and spectral diffusion of quantum emitters to donor-acceptor pairs (DAP) in hBN by means of ab initio calculations. We find that DAPs can exhibit ODMR signal for the acceptor counterpart of the defect pair with S = 1/2 ground state at non-zero magnetic fields depending on the donor partner. The donor-acceptor pair model and its transition mechanisms provide a recipe towards defect qubit identification and performance optimization in hBN for quantum applications.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# AnoPLe: 正常サンプルのみを用いた双方向プロンプト学習によるFew-Shot異常検出

AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples ( http://arxiv.org/abs/2408.13516v1 )

ライセンス: Link先を確認

Yujin Lee, Seoyoon Jang, Hyunsoo Yoon,

(参考訳) FAD (Few-shot Anomaly Detection) は、トレーニングサンプルの入手が限られ、異常サンプルの欠如が頻発しているため、重大な課題となっている。従来のアプローチでは、検出を改善するためにアノテーションや真の異常サンプルに頼っていたが、このようなテキストや視覚的な手がかりは必ずしもアクセスできない。そこで本稿では,異常検出のためのマルチモーダル・プロンプト学習手法であるAnoPLeを紹介する。 AnoPLeは異常をシミュレートし、テキストと視覚のプロンプトを双方向に結合して2つのモード間の深い相互作用を促進する。さらに,学習可能なマルチビュー信号と軽量デコーダを統合し,局所的意味理解を高めるために,マルチスケール画像に基づいて訓練する。パフォーマンスをさらに向上するため、我々は、画像レベルの異常の理解を深め、グローバルとローカルのセマンティクスを整合させる。実験の結果、AnoPLe は MVTec-AD と VisA で 94.1% と 86.2% Image AUROC をそれぞれ記録し、真の異常に晒されていないにもかかわらず、SoTA と比較して 1% の差しか示さなかった。コードはhttps://github.com/YoojLee/AnoPLe.comで入手できる。

Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for anomaly detection without prior knowledge of anomalies. AnoPLe simulates anomalies and employs bidirectional coupling of textual and visual prompts to facilitate deep interaction between the two modalities. Additionally, we integrate a lightweight decoder with a learnable multi-view signal, trained on multi-scale images to enhance local semantic comprehension. To further improve performance, we align global and local semantics, enriching the image-level understanding of anomalies. The experimental results demonstrate that AnoPLe achieves strong FAD performance, recording 94.1% and 86.2% Image AUROC on MVTec-AD and VisA respectively, with only around a 1% gap compared to the SoTA, despite not being exposed to true anomalies. Code is available at https://github.com/YoojLee/AnoPLe.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 強化学習によるスケーラブルな類似性を考慮したテストスイートの最小化

Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning ( http://arxiv.org/abs/2408.13517v1 )

ライセンス: Link先を確認

Sijia Gu, Ali Mesbah,

(参考訳) Multi-Criteria Test Suite Minimization (MCTSM)問題は、コードカバレッジや障害検出機能といった適切な基準でガイドされた冗長なテストケースを削除することで、テストスイートを洗練することを目的としている。しかし、現在の手法は、その実用性を制限するNPハードの性質により、高い障害検出能力を失うか、スケーラビリティの課題に直面している。 Integer Linear Program (ILP) に記述カバレッジや障害検出能力などの従来の基準とテストカバレッジの類似性を統合したTripRLを提案する。 TripRLは2部グラフ表現とその埋め込みを利用して、簡潔なILP定式化を行い、ILPと効果的な強化学習(RL)トレーニングを組み合わせる。この組み合わせにより、大規模テストスイートの最小化がよりスケーラブルになり、テストの有効性が向上する。実験により, TripRL のランタイムは MCTSM 問題の大きさと線形にスケールすることを示した。特に、既存のアプローチが妥当な時間枠でソリューションを提供できない大規模なテストスイートでは、我々の技術は47分以内でソリューションを継続的に提供します。 TripRLによって生成されたテストスイートの削減は、未知の障害を検出する可能性が高くながら、元のステートメントカバレッジと障害検出能力も維持している。

The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to refine test suites by removing redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# Token-Level Reward関数推定による選択的選好最適化

Selective Preference Optimization via Token-Level Reward Function Estimation ( http://arxiv.org/abs/2408.13518v1 )

ライセンス: Link先を確認

Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou,

(参考訳) 大規模言語モデルのアライメントの最近の進歩は、トークンレベルの監督を利用して、きめ細かい好みの最適化を行う。しかし、既存のトークンレベルのアライメントメソッドは、ノイズが多く非効率なすべてのトークンを最適化するか、複雑で高価なキー選択戦略で選択的なトレーニングを実行する。本研究では,鍵トークン選択を効率よく行う新しい選択的アライメント戦略であるセレクティブ・パラメータ最適化(SePO)を提案する。 SePOは直接選好最適化(DPO)に基づく最初のトークン選択法を提案し、ターゲットデータ上でトークンレベルの報酬関数を推定するためにオラクルモデルを訓練する。この方法は、応答レベルのアノテーションを持つ既存のアライメントデータセットに適用され、小規模のオラクルモデルとトレーニングデータによるコスト効率の高いトークン選択を可能にする。次に、推定された報酬関数を使用して、ターゲットデータセット内のすべてのトークンをスコアし、キートークンのみを選択して、参照モデルなしのコントラスト目的関数でターゲットポリシーモデルを監督する。 3つの公開評価ベンチマークの大規模な実験により、SEPOはターゲットデータセット上の30%のキートークンを最適化するだけで、競合するベースラインメソッドを著しく上回ります。弱強一般化に対するSePOの応用は、弱いオラクルモデルは最大16.8倍のパラメータを持つ強いポリシーモデルを効果的に監督することを示している。 SePOはまた、配布外データからキートークンを効果的に選択し、強力なポリシーモデルを強化し、過度な最適化問題を緩和する。

Recent advancements in large language model alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be noisy and inefficient, or perform selective training with complex and expensive key token selection strategies. In this work, we propose Selective Preference Optimization (SePO), a novel selective alignment strategy that centers on efficient key token selection. SePO proposes the first token selection method based on Direct Preference Optimization (DPO), which trains an oracle model to estimate a token-level reward function on the target data. This method applies to any existing alignment datasets with response-level annotations and enables cost-efficient token selection with small-scale oracle models and training data. The estimated reward function is then utilized to score all tokens within the target dataset, where only the key tokens are selected to supervise the target policy model with a reference model-free contrastive objective function. Extensive experiments on three public evaluation benchmarks show that SePO significantly outperforms competitive baseline methods by only optimizing 30% key tokens on the target dataset. SePO applications on weak-to-strong generalization show that weak oracle models effectively supervise strong policy models with up to 16.8x more parameters. SePO also effectively selects key tokens from out-of-distribution data to enhance strong policy models and alleviate the over-optimization problem.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# WebXRとAフレームを用いたオープンでクロスプラットフォームなWebベースメタバース

An Open, Cross-Platform, Web-Based Metaverse Using WebXR and A-Frame ( http://arxiv.org/abs/2408.13520v1 )

ライセンス: Link先を確認

Giuseppe Macario,

(参考訳) メタバースはここ数年、文学や産業で注目を集めてきたが、オープンでクロスプラットフォームなアーキテクチャが欠如しているため、相互に通信できない多くの異なるメタバースが生まれている。本研究では,A-FrameフレームワークとNetworked-Aframeフレームワークを用いて,Webと拡張現実デバイスの両方からアクセス可能な,オープンかつ相互運用可能なメタバースを考慮した空間Webアプリを開発するための,WebXRベースのクロスプラットフォームアーキテクチャを提案する。プロトタイプが実装され、評価され、様々なプラットフォームやデバイスにまたがる没入的な体験を可能にする技術スタックの機能をサポートする。没入型環境の使いやすさに対する肯定的なフィードバックは、提案したアプローチをさらに裏付け、エンゲージメントとインタラクティブな仮想空間の促進における効果を強調する。相互運用性と傾きの原則に従うことで、Tim Berners-Lee氏のWorld Wide Webというビジョンを、地理的および技術的な境界を越えるオープンなプラットフォームとして実現しています。

The metaverse has received much attention in the literature and industry in the last few years, but the lack of an open and cross-platform architecture has led to many distinct metaverses that cannot communicate with each other. This work proposes a WebXR-based cross-platform architecture for developing spatial web apps using the A-Frame and Networked-Aframe frameworks with a view to an open and interoperable metaverse, accessible from both the web and extended reality devices. A prototype was implemented and evaluated, supporting the capability of the technology stack to enable immersive experiences across different platforms and devices. Positive feedback on ease of use of the immersive environment further corroborates the proposed approach, underscoring its effectiveness in facilitating engaging and interactive virtual spaces. By adhering to principles of interoperability and inclusivity, it lives up to Tim Berners-Lee's vision of the World Wide Web as an open platform that transcends geographical and technical boundaries.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# HRGraph:情報伝達に基づく求人勧告による人事データ知識グラフのLLM活用

HRGraph: Leveraging LLMs for HR Data Knowledge Graphs with Information Propagation-based Job Recommendation ( http://arxiv.org/abs/2408.13521v1 )

ライセンス: Link先を確認

Azmine Toushik Wasi,

(参考訳) セマンティック・ネットワークとして機能する知識グラフ(KG)は、知識の進化に容易に適応できる、統一され、コンテキスト化され、構造化された表現を提供することによって、異なるドメインにおける複雑な相互接続データを管理するのに非常に効果的である。複雑な人事(HR)データを処理するKGは、採用、仕事のマッチング、学習ギャップの識別、従業員の維持強化など、さまざまな人事機能に役立ちます。その可能性にもかかわらず、実用的な人事知識グラフを実装するための限られた努力がなされている。本研究では,大規模言語モデルを用いた文書から人事知識グラフを効果的に開発するためのフレームワークを提案することにより,このギャップを解消する。結果として得られるKGは、ジョブマッチング、従業員スキルギャップの特定など、さまざまなダウンストリームタスクに使用することができる。この研究では、HR KGが正確な仕事のマッチングに役立ち、雇用主と従業員の両方に利点をもたらす事例を紹介します。 KGs と Graph Neural Nets の情報伝達実験による実証的証拠とケーススタディは、仕事や従業員の推薦や仕事領域の分類といったタスクにおける KGs の有効性を裏付けるものである。コードとデータは、https://github.com/azminewasi/HRGraphで入手できる。

Knowledge Graphs (KGs) serving as semantic networks, prove highly effective in managing complex interconnected data in different domains, by offering a unified, contextualized, and structured representation with flexibility that allows for easy adaptation to evolving knowledge. Processing complex Human Resources (HR) data, KGs can help in different HR functions like recruitment, job matching, identifying learning gaps, and enhancing employee retention. Despite their potential, limited efforts have been made to implement practical HR knowledge graphs. This study addresses this gap by presenting a framework for effectively developing HR knowledge graphs from documents using Large Language Models. The resulting KG can be used for a variety of downstream tasks, including job matching, identifying employee skill gaps, and many more. In this work, we showcase instances where HR KGs prove instrumental in precise job matching, yielding advantages for both employers and employees. Empirical evidence from experiments with information propagation in KGs and Graph Neural Nets, along with case studies underscores the effectiveness of KGs in tasks such as job and employee recommendations and job area classification. Code and data are available at : https://github.com/azminewasi/HRGraph

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 異常検出のためのエンコーダのみアーキテクチャを用いた直交遅延空間の因子的学習 : 警報管理の観点から

Learning a Factorized Orthogonal Latent Space using Encoder-only Architecture for Fault Detection; An Alarm management perspective ( http://arxiv.org/abs/2408.13526v1 )

ライセンス: Link先を確認

Vahid MohammadZadeh Eivaghi, Mahdi Aliyari Shoorehdeli,

(参考訳) 産業断層検出システムにおける誤報やニュアンスアラームは、しばしば不確実性によって引き起こされ、通常のプロセス変動変動は誤って断層として特定される。本稿では, プロセス変数の確率的, 決定論的成分を, 検出遅延を伴わずに効果的に分離する, エンコーダに基づく残差設計を提案する。提案モデルは2つの異なるエンコーダを用いて、潜在空間を2つの直交空間に分解する: 1つは決定的部分、もう1つは確率的部分である。所望の空間の識別可能性を確保するため、トレーニング中に制約を適用する。決定性空間は、決定性を保証するために滑らかに制約される一方、確率性空間は標準ガウスノイズに類似するように要求される。さらに、デコレーションという用語は、学習された表現の独立を強制する。このアプローチの有効性は、数値的な例を通して示され、テネシー・イーストマン法に応用され、堅牢な断層検出の可能性を強調している。決定論理を決定論的要因のみに焦点をあてることで、提案モデルは、ほぼゼロの誤報と検出の欠如を達成しつつ、予測品質を著しく向上させ、産業環境における運用安全性と整合性を向上させる道を開く。

False and nuisance alarms in industrial fault detection systems are often triggered by uncertainty, causing normal process variable fluctuations to be erroneously identified as faults. This paper introduces a novel encoder-based residual design that effectively decouples the stochastic and deterministic components of process variables without imposing detection delay. The proposed model employs two distinct encoders to factorize the latent space into two orthogonal spaces: one for the deterministic part and the other for the stochastic part. To ensure the identifiability of the desired spaces, constraints are applied during training. The deterministic space is constrained to be smooth to guarantee determinism, while the stochastic space is required to resemble standard Gaussian noise. Additionally, a decorrelation term enforces the independence of the learned representations. The efficacy of this approach is demonstrated through numerical examples and its application to the Tennessee Eastman process, highlighting its potential for robust fault detection. By focusing decision logic solely on deterministic factors, the proposed model significantly enhances prediction quality while achieving nearly zero false alarms and missed detections, paving the way for improved operational safety and integrity in industrial environments.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 一般化空間変調の最大近似検出のためのグロバー適応探索

Grover Adaptive Search for Maximum Likelihood Detection of Generalized Spatial Modulation ( http://arxiv.org/abs/2408.13531v1 )

ライセンス: Link先を確認

Kein Yukiyoshi, Taku Mikuriya, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Naoki Ishikawa,

(参考訳) 本稿では、一般化空間変調(GSM)信号の最大極大検出(MLD)のための量子支援ソリューションを提案する。具体的には、GSMのMDDは、まず新しい多項式最適化問題として定式化され、次いでGrover適応探索という量子アルゴリズムが適用される。提案手法の問合せ複雑性に関する性能を数値解析により評価し, 提案手法はフォールトトレラント量子計算において, データシンボルの数とコンステレーションサイズが相対的に大きい場合, 古典解よりも優れていることを示す。

We propose a quantum-assisted solution for the maximum likelihood detection (MLD) of generalized spatial modulation (GSM) signals. Specifically, the MLD of GSM is first formulated as a novel polynomial optimization problem, followed by the application of a quantum algorithm, namely, the Grover adaptive search. The performance in terms of query complexity of the proposed method is evaluated and compared to the classical alternative via a numerical analysis, which reveals that under fault-tolerant quantum computation, the proposed method outperforms the classical solution if the number of data symbols and the constellation size are relatively large.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 実効弾性特性のリアルタイム予測と高速逆設計によるFFT系メタマテリアルのサロゲートモデリング

FFT-based surrogate modeling of auxetic metamaterials with real-time prediction of effective elastic properties and swift inverse design ( http://arxiv.org/abs/2408.13532v1 )

ライセンス: Link先を確認

Hooman Danesh, Daniele Di Lorenzo, Francisco Chinesta, Stefanie Reese, Tim Brepols,

(参考訳) 負ポアソン比で知られる公理構造は、その基盤となる構造幾何学と基底材料特性に強く影響される効果的な弾性特性を示す。軸単位細胞の周期的均質化はこれらの特性の研究に利用できるが、計算コストが高く、設計空間の探索や逆解析に制限がある。本稿では,異なる形状の直交ヴォイドを持つ補助単位細胞の有効弾性特性をリアルタイムに予測するための代理モデルを開発した。単位細胞は、長方形、ダイヤモンド、楕円形、ピーナッツ形のヴォイドを含む4つの異なる形状の直交ヴォイドを特徴とする。生成したサロゲートモデルは、ベース材料の幾何学的パラメータと弾性特性を入力として受け入れ、有効弾性定数をリアルタイムで予測する。この迅速な評価により、所望の有効応答をもたらす最適な設計パラメータを得るための実用的な逆解析フレームワークが実現される。高速フーリエ変換(FFT)に基づくホモジェナイゼーション手法を用いて、周期メッシュの生成や有限要素法(FEM)に典型的に関連する境界条件への懸念を回避し、サロゲートモデルを開発するためのデータを効率的に生成する。生成したサロゲートモデルの性能について,列車/テスト分割手法,パラメトリックスタディ,逆問題を用いて厳密に検討した。最後に,グラフィカルユーザインタフェース(GUI)を開発し,有効接点剛性のリアルタイム予測と,最適パラメータを決定するための逆解析を行う。

Auxetic structures, known for their negative Poisson's ratio, exhibit effective elastic properties heavily influenced by their underlying structural geometry and base material properties. While periodic homogenization of auxetic unit cells can be used to investigate these properties, it is computationally expensive and limits design space exploration and inverse analysis. In this paper, surrogate models are developed for the real-time prediction of the effective elastic properties of auxetic unit cells with orthogonal voids of different shapes. The unit cells feature orthogonal voids in four distinct shapes, including rectangular, diamond, oval, and peanut-shaped voids, each characterized by specific void diameters. The generated surrogate models accept geometric parameters and the elastic properties of the base material as inputs to predict the effective elastic constants in real-time. This rapid evaluation enables a practical inverse analysis framework for obtaining the optimal design parameters that yield the desired effective response. The fast Fourier transform (FFT)-based homogenization approach is adopted to efficiently generate data for developing the surrogate models, bypassing concerns about periodic mesh generation and boundary conditions typically associated with the finite element method (FEM). The performance of the generated surrogate models is rigorously examined through a train/test split methodology, a parametric study, and an inverse problem. Finally, a graphical user interface (GUI) is developed, offering real-time prediction of the effective tangent stiffness and performing inverse analysis to determine optimal geometric parameters.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# Pandora の Box あるいは Aladdin の Lamp: 大規模言語モデルにおける RAG ノイズの役割を包括的に分析する

Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models ( http://arxiv.org/abs/2408.13533v1 )

ライセンス: Link先を確認

Jinyang Wu, Feihu Che, Chuyuan Zhang, Jianhua Tao, Shuai Zhang, Pengpeng Shao,

(参考訳) 大規模言語モデル(LLM)における幻覚に対処するための重要な手法として,検索型拡張生成(RAG)が登場している。最近の研究はRAGモデルを複雑な雑音のシナリオにまで拡張しているが、これらの探索はしばしば限定的なノイズタイプに限定し、ノイズはLLMに本質的に有害であり、現実の検索環境から逸脱し、実用的な適用性を制限する可能性があることを前提にしている。本稿では,言語的観点から7つの異なるノイズタイプを定義し,複数のデータセットと推論タスクを含む総合的な評価フレームワークであるNoiserBench(NoiserBench)を確立する。種々の構造と規模を持つ8つのLLMの実証評価により,これらのノイズは,LLMに有益である雑音(高能音)とLLMに有害である雑音(高能音)の2つの実用的なグループにさらに分類できることが判明した。有害なノイズは一般的に性能を損なうが、有益なノイズはモデル機能と全体的なパフォーマンスのいくつかの側面を強化する可能性がある。我々の分析は、より堅牢で適応可能なRAGソリューションを開発し、多様な検索シナリオにまたがる幻覚を緩和するための洞察を提供する。

Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs). While recent research has extended RAG models to complex noisy scenarios, these explorations often confine themselves to limited noise types and presuppose that noise is inherently detrimental to LLMs, potentially deviating from real-world retrieval environments and restricting practical applicability. In this paper, we define seven distinct noise types from a linguistic perspective and establish a Noise RAG Benchmark (NoiserBench), a comprehensive evaluation framework encompassing multiple datasets and reasoning tasks. Through empirical evaluation of eight representative LLMs with diverse architectures and scales, we reveal that these noises can be further categorized into two practical groups: noise that is beneficial to LLMs (aka beneficial noise) and noise that is harmful to LLMs (aka harmful noise). While harmful noise generally impairs performance, beneficial noise may enhance several aspects of model capabilities and overall performance. Our analysis offers insights for developing more robust, adaptable RAG solutions and mitigating hallucinations across diverse retrieval scenarios.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# メヌスの文化的適応 : 微粒化アプローチ

Cultural Adaptation of Menus: A Fine-Grained Approach ( http://arxiv.org/abs/2408.13534v1 )

ライセンス: Link先を確認

Zhonghe Zhang, Xiaoyu He, Vivek Iyer, Alexandra Birch,

(参考訳) CSI(Machine Translation of Culture-Specific Items)は、重要な課題である。 CSI翻訳に関する最近の研究は、様々な言語や文化に適応するために、LLM(Large Language Models)を用いたいくつかの成功例を示しているが、それぞれの手法の利点や落とし穴を調べるためには、より深い分析が必要である。本稿では,中国語と英語のメニューコーパスで最大である ChineseMenuCSI データセットについて紹介する。我々は、よりニュアンスな分析のためのCSIの3つのレベルを定義し、多くのカテゴリにおいてGPTベースのプロンプトよりも優れた自動CSI識別手法を開発した。重要なことは、人間翻訳理論をLLM駆動翻訳プロセスに統合し、COMETスコアを最大7ポイント増加させ、翻訳精度を大幅に向上させることである。

Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures; however, a deeper analysis is needed to examine the benefits and pitfalls of each method. In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels and a fine-grained test set. We define three levels of CSI figurativeness for a more nuanced analysis and develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories. Importantly, we are the first to integrate human translation theories into LLM-driven translation processes, significantly improving translation accuracy, with COMET scores increasing by up to 7 points.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# 少数者からの学習:限られたデータセットを用いた小児手首病理診断へのきめ細かいアプローチ

Learning from the few: Fine-grained approach to pediatric wrist pathology recognition on a limited dataset ( http://arxiv.org/abs/2408.13542v1 )

ライセンス: Link先を確認

Ammar Ahmed, Ali Shariq Imran, Zenun Kastrati, Sher Muhammad Daudpota, Mohib Ullah, Waheed Noord,

(参考訳) 特に子どもや青年に共通する骨折は、重大な診断上の課題である。 X線画像は依然として一般的な診断ツールであるが、誤解釈率の増加は、特に多くの外科医や医師の間で専門的な訓練が欠如していることを考えると、より正確な分析の必要性を浮き彫りにしている。近年の深部畳み込みニューラルネットワークの進歩は、外傷X線における病理検出の自動化を約束している。しかしながら、X線における小児手首の病態の微妙な変化を区別することは依然として困難である。従来の手作業による注釈は効果的だが、精巧で費用がかかり、専門的な専門知識を必要とする。本稿では,手動による介入を伴わずに,X線における識別領域を自動的に同定することを目的とした,小児手首病理診断の課題を,きめ細かなアプローチで解決する。我々は、アブレーション解析とLIONの統合により、きめ細かいアーキテクチャを洗練する。説明可能なAIテクニックであるGrad-CAMを活用することで、これらの領域を強調します。実世界の医学研究の制約を反映した限られたデータを用いても,本手法は,拡張テストとオリジナルテストの両方において,最先端の画像認識モデルよりも一貫して優れている。提案した改良されたアーキテクチャは,ベースライン法に比べて1.06%,1.25%の精度向上を実現し,それぞれ86%,84%の精度向上を実現している。さらに, 骨折感度は97%と高い値を示し, 手首の病態認識を向上する可能性が示唆された。実装コードはhttps://github.com/ammarlodhi255/fine-fine-approach-to-wrist-pathology-recognitionで見ることができる。

Wrist pathologies, {particularly fractures common among children and adolescents}, present a critical diagnostic challenge. While X-ray imaging remains a prevalent diagnostic tool, the increasing misinterpretation rates highlight the need for more accurate analysis, especially considering the lack of specialized training among many surgeons and physicians. Recent advancements in deep convolutional neural networks offer promise in automating pathology detection in trauma X-rays. However, distinguishing subtle variations between {pediatric} wrist pathologies in X-rays remains challenging. Traditional manual annotation, though effective, is laborious, costly, and requires specialized expertise. {In this paper, we address the challenge of pediatric wrist pathology recognition with a fine-grained approach, aimed at automatically identifying discriminative regions in X-rays without manual intervention. We refine our fine-grained architecture through ablation analysis and the integration of LION.} Leveraging Grad-CAM, an explainable AI technique, we highlight these regions. Despite using limited data, reflective of real-world medical study constraints, our method consistently outperforms state-of-the-art image recognition models on both augmented and original (challenging) test sets. {Our proposed refined architecture achieves an increase in accuracy of 1.06% and 1.25% compared to the baseline method, resulting in accuracies of 86% and 84%, respectively. Moreover, our approach demonstrates the highest fracture sensitivity of 97%, highlighting its potential to enhance wrist pathology recognition. The implementation code can be found at https://github.com/ammarlodhi255/fine-grained-approach-to-wrist-pathology-recognition

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# IQA-EVAL:人間モデル対話型質問応答の自動評価

IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering ( http://arxiv.org/abs/2408.13545v1 )

ライセンス: Link先を確認

Ruosen Li, Barry Wang, Ruochen Li, Xinya Du,

(参考訳) 質問応答(QA)のための大規模言語モデル(LLM)を評価するために、従来の手法は、与えられた質問と文脈に基づいてモデルが生成した即時応答を直接評価することに重点を置いている。 AIアシスタントの助けを求める人間の一般的な場合において、これらの非対話的評価は人間のモデル会話のダイナミックな性質を考慮せず、相互作用認識による評価は、正確なQAモデルが人間に好まれていることを示している(Lee et al , 2023)。 HCI(Human-Computer Interaction)の最近の研究は、人間による評価を用いて、対話や評価を行っているが、それらはしばしば、スケールするのに非常に高価で時間を要する。本研究では,対話型質問応答評価のための自動評価フレームワークIQA-EVALを導入する。具体的には, LLMに基づく評価エージェント(LEA)を導入し, 1) IQAモデルとのインタラクションを生成するための人間の振る舞いをシミュレートし, (2) 生成したインタラクションを自動的に評価する。さらに,実際の人間評価者のグループをより良くシミュレートするために,LEAにペルソナを割り当てることを提案する。 1) GPT-4 (あるいは Claude) をバックボーンモデルとした評価フレームワークは, IQAタスクにおける人的評価と高い相関を達成し, 2) 観衆をより良く表現するためにLAAにペルソナを割り当てることにより, 相関性は大幅に向上する。最後に、我々の自動測定値を用いて、複雑で曖昧な質問応答タスクから1000を超える質問を5つ評価する。

To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on directly assessing the immediate responses generated by the models based on the given question and context. In the common use case of humans seeking AI assistant's help in finding information, these non-interactive evaluations do not account for the dynamic nature of human-model conversations, and interaction-aware evaluations have shown that accurate QA models are preferred by humans (Lee et al., 2023). Recent works in human-computer interaction (HCI) have employed human evaluators to conduct interactions and evaluations, but they are often prohibitively expensive and time-consuming to scale. In this work, we introduce an automatic evaluation framework IQA-EVAL to Interactive Question Answering Evaluation. More specifically, we introduce LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. Moreover, we propose assigning personas to LEAs to better simulate groups of real human evaluators. We show that: (1) our evaluation framework with GPT-4 (or Claude) as the backbone model achieves a high correlation with human evaluations on the IQA task; (2) assigning personas to LEA to better represent the crowd further significantly improves correlations. Finally, we use our automatic metric to evaluate five recent representative LLMs with over 1000 questions from complex and ambiguous question answering tasks, which comes with a substantial cost of $5k if evaluated by humans.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# ダブルダイナミクスを有する車両ネットワークのための機械(SoM)強化ISACプリコーディングの合成

Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics ( http://arxiv.org/abs/2408.13546v1 )

ライセンス: Link先を確認

Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang,

(参考訳) 統合センシング・通信(ISAC)技術は車載ネットワークにおいて重要な役割を担っている。しかし、このコンテキスト内の通信チャネルは時間的特性を示し、潜在的なターゲットは急速に移動し、二重ダイナミクスをもたらす可能性がある。これらは、まだ徹底的に調査されていないリアルタイムISACプリコーディング設計において重要な課題である。最適化に基づくプリコーディング法は広く研究されているが、計算的に複雑であり、ダブルダイナミクスの状況ではめったに利用できない完全な事前情報に大きく依存している。本稿では,基地局が位置決めやチャネル情報などの様々なモダリティを活用して二重ダイナミクスに適応し,環境情報を利用してISAC性能境界を深層強化学習フレームワークで拡張する,SoM(SoM)強化プリコーディングのシンセサイザを提案する。さらに、パラメータ共有アクタークリティカルアーキテクチャは、複雑な状態やアクション空間でのトレーニングを迅速化するように設計されている。提案手法が既存手法よりも多面的優位性を示した。

Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks. However, the communication channel within this context exhibits time-varying characteristics, and potential targets may move rapidly, resulting in double dynamics. These presents significant challenges for real-time ISAC precoding design that have not been thoroughly explored. While optimization-based precoding methods have been extensively studied, they are computationally complex and heavily rely on perfect prior information that is rarely available in situations with double dynamics. In this paper, we propose a synesthesia of machine (SoM)-enhanced precoding paradigm, where the base station leverages various modalities such as positioning and channel information to adapt to double dynamics, and effectively utilizes environmental information to stretch ISAC performance boundaries through a deep reinforcement learning framework. Additionally, a parameter-shared actor-critic architecture is tailored to expedite training in complex state and action spaces. Extensive experimental validation has demonstrated the multifaceted superiority of our method over existing approaches.

翻訳日:2024-08-27 19:09:24 公開日:2024-08-24

# トランペット量子ビットの制御振幅における系統的誤差に対するロバスト最適制御

Robust optimal control for a systematic error in the control amplitude of transmon qubits ( http://arxiv.org/abs/2408.13554v1 )

ライセンス: Link先を確認

Max Cykiert, Eran Ginossar,

(参考訳) ノイズの中間規模量子コンピューティングや誤り訂正回路の時代には、物理量子ビットコヒーレンス時間と高忠実度ゲートが量子コンピュータの機能に欠かせない。本稿では,トランスモン量子ビットの制御振幅誤差に起因して,最適化により設計したパルスを用いてフィリティの損失を防止できることを理論的,実験的に実証する。我々は、ロバストな最適制御により得られる制御環境を分析し、誤差範囲に依存すること、すなわち、解が準最適解のアトラクションの流域に閉じ込められることを発見した。異なる誤差値に対してロバスト制御が見出され、有限緩和率による不整合性機構の損失と比較される。コントロールはIBMQのqubitでテストされ、かなりの$\sim 10\%$エラーに対するレジリエンスを示す。

In the era of Noisy Intermediate-Scale Quantum computing as well as in error correcting circuits, physical qubits coherence time and high fidelity gates are essential to the functioning of quantum computers. In this paper, we demonstrate theoretically and experimentally, that pulses designed by optimization can be used to counteract the loss of fidelity due to a control amplitude error of the transmon qubit. We analyze the control landscape obtained by robust optimal control and find it to depend on the error range, namely the solutions can get trapped in the basin of attraction of sub-optimal solutions. Robust controls are found for different error values and are compared to an incoherent loss of fidelity mechanism due to a finite relaxation rate. The controls are tested on the IBMQ's qubit and found to demonstrate resilience against significant $\sim 10\%$ errors.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# サプライチェーンリスクマネジメントにおける因果機械学習

What if? Causal Machine Learning in Supply Chain Risk Management ( http://arxiv.org/abs/2408.13556v1 )

ライセンス: Link先を確認

Mateusz Wyrembek, George Baryannis, Alexandra Brintrup,

(参考訳) サプライチェーン管理において機械学習モデルを開発するための最終目標は、最適な介入を行うことである。しかし、ほとんどの機械学習モデルは因果関係を推測するのではなく、データの相関関係を識別するので、より優れた結果を体系的に計画することは困難である。本稿では,サプライチェーンのリスク介入モデル開発における因果機械学習の利用を提案するとともに,海洋工学分野におけるサプライチェーンのリスク管理のケーススタディでその利用を実証する。我々の研究は、因果機械学習が、異なるサプライチェーンの介入の下で達成できる変化を識別することで意思決定プロセスを強化することを強調し、シナリオ計画の"What-if"を可能にした。そこで我々は、リスク予測のためのさまざまな機械学習開発経路を提案し、リスク最小化のための介入を計画し、サプライチェーン研究者が因果機械学習を探求するための重要なステップを概説する。

The penultimate goal for developing machine learning models in supply chain management is to make optimal interventions. However, most machine learning models identify correlations in data rather than inferring causation, making it difficult to systematically plan for better outcomes. In this article, we propose and evaluate the use of causal machine learning for developing supply chain risk intervention models, and demonstrate its use with a case study in supply chain risk management in the maritime engineering sector. Our findings highlight that causal machine learning enhances decision-making processes by identifying changes that can be achieved under different supply chain interventions, allowing "what-if" scenario planning. We therefore propose different machine learning developmental pathways for for predicting risk, and planning for interventions to minimise risk and outline key steps for supply chain researchers to explore causal machine learning.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# 異常検出のための変分オートエンコーダの比較検討

Variational Autoencoder for Anomaly Detection: A Comparative Study ( http://arxiv.org/abs/2408.13561v1 )

ライセンス: Link先を確認

Huy Hoang Nguyen, Cuong Nhat Nguyen, Xuan Tung Dao, Quoc Trung Duong, Dzung Pham Thi Kim, Minh-Tan Pham,

(参考訳) 本論文は,同時代の変分オートエンコーダ(VAE)アーキテクチャを異常検出に適用し,その性能と動作特性について比較解析することを目的とする。検討中のアーキテクチャ構成には、元々のVAEベースライン、ガウスランダムフィールド(VAE-GRF)を備えたVAE、ビジョントランスフォーマー(ViT-VAE)を搭載したVAEが含まれる。その結果,VT-VAEは様々なシナリオで模範的性能を示すが,VAE-GRFはより複雑なハイパーパラメータチューニングが必要であり,最適な性能を実現する。さらに、広く使われているMVTecデータセットから得られる結果に対する過度信頼度を緩和するために、最近公開されたMiADデータセットをベンチマークに活用する。この意図的な包摂性は、MVTec専用のドメイン固有モデルの影響を軽減することで結果の競争力を高めることを目的としており、その結果、より堅牢な評価フレームワークに寄与する。 Codesはhttps://github.com/endtheme123/VAE-compare.gitで入手できる。

This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a vision transformer (ViT-VAE). The findings reveal that ViT-VAE exhibits exemplary performance across various scenarios, whereas VAE-GRF may necessitate more intricate hyperparameter tuning to attain its optimal performance state. Additionally, to mitigate the propensity for over-reliance on results derived from the widely used MVTec dataset, this paper leverages the recently-public MiAD dataset for benchmarking. This deliberate inclusion seeks to enhance result competitiveness by alleviating the impact of domain-specific models tailored exclusively for MVTec, thereby contributing to a more robust evaluation framework. Codes is available at https://github.com/endtheme123/VAE-compare.git.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# マルチエージェント強化学習におけるマルチタスク一般化のためのハイブリッドトレーニング

Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2408.13567v1 )

ライセンス: Link先を確認

Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti,

(参考訳) マルチエージェント強化学習(MARL)では,多様なエージェントや目的に対するマルチタスクの一般化が大きな課題となっている。既存のオンラインMARLアルゴリズムは、主にシングルタスクのパフォーマンスに重点を置いているが、マルチタスクの一般化能力の欠如は、計算の無駄と現実の応用性に限界をもたらす。一方、既存のオフラインマルチタスクのMARLアプローチはデータ品質に大きく依存しており、しばしば目に見えないタスクのパフォーマンスが低下する。本稿では,マルチタスクの一般化と学習効率の両立を図るために,オンラインとオフラインの学習を統合したハイブリッドMARLフレームワークであるHyGenを紹介する。具体的には、オフラインマルチタスクデータセットから、潜在的な汎用スキルを抽出する。次に、政策を訓練し、中央集権的な訓練・分散実行パラダイム(CTDE)の下で最適なスキルを選択する。この段階では、オフラインデータとオンラインインタラクションの両方を統合するリプレイバッファを使用します。我々は、我々のフレームワークが一般的なスキルを効果的に抽出し、洗練し、目に見えないタスクに印象的な一般化をもたらすことを実証的に実証した。 StarCraftのマルチエージェントチャレンジの比較分析によると、HyGenはオンラインおよびオフラインのみのメソッドで、幅広いパフォーマンスを誇っている。

In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily focus on single-task performance, but their lack of multi-task generalization capabilities typically results in substantial computational waste and limited real-life applicability. Meanwhile, existing offline multi-task MARL approaches are heavily dependent on data quality, often resulting in poor performance on unseen tasks. In this paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for Enhanced Multi-Task Generalization, which integrates online and offline learning to ensure both multi-task generalization and training efficiency. Specifically, our framework extracts potential general skills from offline multi-task datasets. We then train policies to select the optimal skills under the centralized training and decentralized execution paradigm (CTDE). During this stage, we utilize a replay buffer that integrates both offline data and online interactions. We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show that HyGen outperforms a wide range of existing solely online and offline methods.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# 集団強結合のための量子埋め込みアプローチ --極性理論における単純モデルへのab initioとマクロQEDを接続する

Quantized Embedding Approaches for Collective Strong Coupling -- Connecting ab initio and macroscopic QED to Simple Models in Polaritonics ( http://arxiv.org/abs/2408.13570v1 )

ライセンス: Link先を確認

Frieder Lindel, Dominik Lentrodt, Stefan Yoshi Buhmann, Christian Schäfer,

(参考訳) 化学とエネルギーの移動を制御するために、集合的な光-物質相互作用が使われてきたが、明示的なシミュレーションの計算コストの急激な増加により、アブイニシアト法と大きな多体量子光学系を組み合わせたアプローチは欠落している。本稿では,多体量子光学系に対するアブイニシアト量子埋め込みの概念を導入し,分子構造に対するアブイニシアト量子化学の厳密さを維持しつつ,分子多体系の集合結合をマクロなQEDの精神で効果的に扱うことを可能にする。我々のアプローチは分極場の量子ゆらぎを完全に含むが、力学平均場理論のような複雑な埋め込みアプローチよりもずっと単純で直感的である。本稿では、Tavis-Cummingsモデルとの比較により、基礎となる仮定を説明する。量子化埋め込み法とその透過的制限の直感的な応用は、現実的な分子アンサンブルにおける集合的効果を記述するために、アブ初期分極化学の分野の実践的な枠組みを提供する。

Collective light-matter interactions have been used to control chemistry and energy transfer, yet accessible approaches that combine ab initio methodology with large many-body quantum optical systems are missing due to the fast increase in computational cost for explicit simulations. We introduce an accessible ab initio quantum embedding concept for many-body quantum optical systems that allows to treat the collective coupling of molecular many-body systems effectively in the spirit of macroscopic QED while keeping the rigor of ab initio quantum chemistry for the molecular structure. Our approach fully includes the quantum fluctuations of the polaritonic field and yet remains much simpler and more intuitive than complex embedding approaches such as dynamical mean-field theory. We illustrate the underlying assumptions by comparison to the Tavis--Cummings model. The intuitive application of the quantized embedding approach and its transparent limitations offer a practical framework for the field of ab initio polaritonic chemistry to describe collective effects in realistic molecular ensembles.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# PointDGMamba: 一般化状態空間モデルによるポイントクラウド分類のドメイン一般化

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model ( http://arxiv.org/abs/2408.13574v1 )

ライセンス: Link先を確認

Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan,

(参考訳) ドメイン一般化(DG)は、最近、ポイントクラウド分類(PCC)モデルの、目に見えない領域への一般化性を改善するために研究されている。しかし、畳み込みニューラルネットワークや視覚変換器を使用するため、受容野や二次的な複雑さに悩まされることが多い。本稿では、DG PCCにおける状態空間モデル(SSM)の一般化可能性について研究し、DG PCCに直接SSMを適用することは、いくつかの課題に直面することを発見した。さらに、ドメインに依存しない特徴学習とデータスキャンにおける設計の欠如は、3Dシーケンスデータに予期せぬドメイン固有情報をもたらすだろう。そこで本研究では,未知の領域に対する強い一般化性に優れ,大域的受容場と効率的な線形複雑性の利点を有する新しいフレームワークであるPointDGMambaを提案する。 PointDGMambaは、3つの革新的なコンポーネントで構成されている。Masked Sequence Denoising (MSD)、Sequence-wise Cross- Domain Feature Aggregation (SCFA)、Dual-level Domain Scanning (DDS)である。特にMSDは、ポイントクラウドシーケンスのノイズポイントトークンを選択的にマスクアウトし、SCFAはクロスドメインだが同クラスのポイントクラウド機能を導入し、モデルにより一般化された特徴の抽出方法を学ぶように促している。 DDSには、機能間の情報交換を容易にするドメイン内スキャンとクロスドメインスキャンが含まれる。さらに,マルチドメイン一般化のための新しい,より挑戦的なベンチマークPointDG-3to1を提案する。大規模実験により提案したPointDGMambaの有効性と性能を実証した。

Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to the use of convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models (SSMs) in DG PCC and find that directly applying SSMs into DG PCC will encounter several challenges: the inherent topology of the point cloud tends to be disrupted and leads to noise accumulation during the serialization stage. Besides, the lack of designs in domain-agnostic feature learning and data scanning will introduce unanticipated domain-specific information into the 3D sequence data. To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. PointDGMamba consists of three innovative components: Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). In particular, MSD selectively masks out the noised point tokens of the point cloud sequences, SCFA introduces cross-domain but same-class point cloud features to encourage the model to learn how to extract more generalized features. DDS includes intra-domain scanning and cross-domain scanning to facilitate information exchange between features. In addition, we propose a new and more challenging benchmark PointDG-3to1 for multi-domain generalization. Extensive experiments demonstrate the effectiveness and state-of-the-art performance of our presented PointDGMamba.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# Visual Foundation Modelsは長期的ポイントトラッキングを実現することができるか?

Can Visual Foundation Models Achieve Long-term Point Tracking? ( http://arxiv.org/abs/2408.13575v1 )

ライセンス: Link先を確認

Görkay Aydemir, Weidi Xie, Fatma Güney,

(参考訳) 大規模ビジョンファウンデーションモデルは、様々なタスクで顕著な成功を示し、その堅牢な一般化能力を強調している。両面対応能力は検討されているが, 複合環境における長期対応の有効性は未解明のままである。これを解決するために,視覚基盤モデルの幾何学的認識を点追跡の文脈で評価する。 (i) 訓練を受けずに、ゼロショット設定で (二)低容量層で探すこと (iii)低位順応(LoRA)による微調整。以上より, 安定拡散とDINOv2の特徴は, ゼロショット設定において優れた幾何対応能力を示すことが示唆された。さらに、DINOv2は適応設定における教師付きモデルに匹敵する性能を実現し、対応学習の強力な初期化の可能性を実証している。

Large-scale vision foundation models have demonstrated remarkable success across various tasks, underscoring their robust generalization capabilities. While their proficiency in two-view correspondence has been explored, their effectiveness in long-term correspondence within complex environments remains unexplored. To address this, we evaluate the geometric awareness of visual foundation models in the context of point tracking: (i) in zero-shot settings, without any training; (ii) by probing with low-capacity layers; (iii) by fine-tuning with Low Rank Adaptation (LoRA). Our findings indicate that features from Stable Diffusion and DINOv2 exhibit superior geometric correspondence abilities in zero-shot settings. Furthermore, DINOv2 achieves performance comparable to supervised models in adaptation settings, demonstrating its potential as a strong initialization for correspondence learning.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# フロケット系のシュテダ式:セサロ・スミメーションによる位相不変量と量子化異常

The Středa Formula for Floquet Systems: Topological Invariants and Quantized Anomalies from Cesàro Summation ( http://arxiv.org/abs/2408.13576v1 )

ライセンス: Link先を確認

Lucila Peralta Gavensky, Gonzalo Usaj, Nathan Goldman,

(参考訳) 本研究は,2次元フロケット系のトポロジカル不変量を表す一般理論フレームワークを導入する。周期駆動系のサムベ表現に基づいて,静的系のSt\v{r}eda式に着想を得て,磁気摂動に応答して状態の非有界フロケット密度の流れを評価する。この Floquet-St\v{r}eda 応答は、数学的に不確定な事前定義であり、Ces\aro summation 法を用いて正規化される。このアプローチの鍵となる結果として、フロケ・ブロッホ・ハミルトニアンの単純なバンド特性と、関連するフロケの巻数をすべて関連付ける。これらの一般的な関係は、フロケ系のトポロジカルな特性が、駆動系の分光的時間進化から完全に導出できることを示している。重要なことに、Floquet-St\v{r}eda反応に対する物理的に区別可能な2つの寄与は、システムのエッジとバルクの間の電荷の量子化フローと、システムと運転場の間のエネルギーの「異常な」量子化フローであり、異常なエッジ状態の物理的起源に関する新たな知見を提供することである。副生成物として, フロケット巻数とフロケット・ブロッホ帯の軌道磁化の関係, 不均一試料中のフロケット位相にアクセス可能なフロケット巻数に対する局所マーカー, 工学的な浴場の存在下で密度測定からこれらのフロケット巻数を取り出す実験プロトコル, およびこれらのトポロジカル不変量に対する状態のフロケット密度の磁気応答の一般表現, 相互作用するフロケット系のトポロジカルな特徴付けの経路を開く。

This work introduces a general theoretical framework, which expresses the topological invariants of two-dimensional Floquet systems in terms of tractable response functions: Building on the Sambe representation of periodically-driven systems, and inspired by the St\v{r}eda formula for static systems, we evaluate the flow of the unbounded Floquet density of states in response to a magnetic perturbation. This Floquet-St\v{r}eda response, which is a priori mathematically ill-defined, is regularized by means of a Ces\`aro summation method. As a key outcome of this approach, we relate all relevant Floquet winding numbers to simple band properties of the Floquet-Bloch Hamiltonian. These general relations indicate how the topological characterization of Floquet systems can be entirely deduced from the stroboscopic time-evolution of the driven system. Importantly, we identify two physically distinguishable contributions to the Floquet-St\v{r}eda response: a quantized flow of charge between the edge and the bulk of the system, and an 'anomalous' quantized flow of energy between the system and the driving field, which provides new insight on the physical origin of the anomalous edge states. As byproducts, our theory provides: a general relation between Floquet winding numbers and the orbital magnetization of Floquet-Bloch bands; a local marker for Floquet winding numbers, which allows to access Floquet topology in inhomogeneous samples; an experimental protocol to extract these Floquet winding numbers from density-measurements in the presence of an engineered bath; as well as general expressions for these topological invariants in terms of the magnetic response of the Floquet density of states, opening a route for the topological characterization of interacting Floquet systems.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# 外部磁場中における量子荷電粒子の複雑度

Complexity of Quantum Charged Particle in External Magnetic Field ( http://arxiv.org/abs/2408.13577v1 )

ライセンス: Link先を確認

M. Radomirov,

(参考訳) 本稿では,外磁場中における量子荷電粒子の回路複雑性について検討する。 Nielsenアプローチを用いて、時間、温度、サイクロトロン周波数の関数として熱場二重状態の複雑さを決定する。様々なパラメーター値における振動の複雑さと振幅を解析し、これらの結果が高調波発振器の場合の極限として導出できないことを明らかにする。最後に、複雑性の速度を計算し、それがロイド境界に従うことを示す。

In this paper, we investigate the circuit complexity of a quantum charged particle in an external magnetic field. Utilizing the Nielsen approach, we determine the complexity of thermofield double states as functions of time, temperature, and cyclotron frequency. We analyze both the complexity and the amplitude of its oscillations across various parameter values, and reveal that these results cannot be derived as a limit of the harmonic oscillator case. Finally, we calculate the rate of complexity and show that it obeys the Lloyd bound.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# CSS-Segment: LSVOS Challenge VOS Trackの第2位

CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track ( http://arxiv.org/abs/2408.13582v1 )

ライセンス: Link先を確認

Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu,

(参考訳) ビデオオブジェクトのセグメンテーションは、ビデオ編集や自動運転など、多くの下流アプリケーションの基礎となる難しいタスクである。本稿では,第6回LSVOS Challenge VOS Track at ECCV 2024において,ビデオオブジェクトセグメンテーションのためのチーム「ユアンジー」のソリューションについて紹介する。提案したCSS-Segmentは、複雑なオブジェクトの動きや長期的なプレゼンテーションのビデオにおいて、より優れたパフォーマンスが期待できる。本稿では,映像オブジェクトセグメンテーションにおけるCSS-Segmentの有効性を検証した。最終的に,本手法は80.84点,試験段階を達成し,ECCV 2024において第6回LSVOSチャレンジVOSトラックの2位にランクインした。

Video object segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this technical report, we briefly introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation. In this report, we successfully validated the effectiveness of the CSS-Segment in video object segmentation. Finally, our method achieved a J\&F score of 80.84 in and test phases, and ultimately ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# FLEURS-ASL:多言語マルチタスク評価におけるアメリカの手話を含む

FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation ( http://arxiv.org/abs/2408.13585v1 )

ライセンス: Link先を確認

Garrett Tanzer,

(参考訳) 手話翻訳は歴史的に主流の機械翻訳研究の周辺であった。フィールドの収束を支援するため,FLORES(テキスト用)とFLEURS(音声用)のマルチウェイ並列ベンチマークの拡張であるFLEURS-ASLを導入し,最初の手話(ビデオ用)であるAmerican Sign Languageを5Certified Deaf Interpretersで翻訳した。 FLEURS-ASLは、ASLと200言語間の様々なタスク(主に文と談話レベルの翻訳)をテキストとして、あるいは102言語を音声として評価するために使用することができる。タイムスタンプトークンと過去のテキストトークンを34秒のコンテキストウィンドウに組み込んで,YouTube-ASLのランダムなビデオクリップに基づいてトレーニングした統合モデリング手法を用いて,ASLから英語テキストへのタスクのベースラインを提供する。このモデルは、多数の新しいタスクをサポートしながら、フレーズレベルのベースラインのパフォーマンスを満たしたり、超えたりします。また、FLEURS-ASLを用いて、マルチモーダルフロンティアモデルがASLを事実上理解していないことを示し、標準評価スイートに手話を含めることの重要性を強調した。

Sign language translation has historically been peripheral to mainstream machine translation research. In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters. FLEURS-ASL can be used to evaluate a variety of tasks -- primarily sentence- and discourse-level translation -- between ASL and 200 other languages as text, or 102 languages as speech. We provide baselines for tasks from ASL to English text using a unified modeling approach that incorporates timestamp tokens and previous text tokens in a 34-second context window, trained on random video clips from YouTube-ASL. This model meets or exceeds the performance of phrase-level baselines while supporting a multitude of new tasks. We also use FLEURS-ASL to show that multimodal frontier models have virtually no understanding of ASL, underscoring the importance of including sign languages in standard evaluation suites.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# LLMサンプリングにおける多様性とリスクのバランス:オープンエンディングテキスト生成のための方法とパラメータの選択方法

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation ( http://arxiv.org/abs/2408.13586v1 )

ライセンス: Link先を確認

Yuxuan Zhou, Margret Keuper, Mario Fritz,

(参考訳) サンプルベースの復号化戦略は大規模言語モデル(LLM)に広く採用されており、温度調整とテールトランケーション(トップ-k、トップ-pサンプリングなど)による多様性と品質のバランスを目標としている。近年の研究では,LLMの予測分布の尾を適応的に切り離す手法が提案されている。オープンエンドテキスト生成タスクにおいて,これらの手法により改善された結果が報告されているが,その結果はキュレートされたトランケーションパラメータや例テキストに大きく依存している。本稿では,全文の文脈を保存した収集したプレフィックスツリーに基づいて,各デコードステップにおける多様性とリスクのトレードオフを考慮し,トランケーションサンプリング手法の本質的な能力を推定する体系的手法を提案する。本研究は,既存のトラクションサンプリング手法の総合的な比較と,ユーザのガイドラインとして推奨されるパラメータについて紹介する。

Sampling-based decoding strategies have been widely adopted for Large Language Models (LLMs) in numerous applications, which target a balance between diversity and quality via temperature tuning and tail truncation (e.g., top-k and top-p sampling). Considering the high dynamic range of the candidate next-token given different prefixes, recent studies propose to adaptively truncate the tail of LLM's predicted distribution. Although improved results haven been reported with these methods on open-ended text generation tasks, the results are highly dependent on the curated truncation parameters and exemplar text. In this paper, we propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step, based on our collected prefix tree which preserves the context of a full sentence. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# クレーター検出と月着陸ナビゲーションのための説明可能な畳み込みネットワーク

Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation ( http://arxiv.org/abs/2408.13587v1 )

ライセンス: Link先を確認

Jianing Song, Nabil Aouf, Duarte Rondao, Christophe Honvault, Luis Mansilla,

(参考訳) 月面着陸は近年、月探査に大きな関心を惹きつけており、自律的な月面着陸航法がこの課題に欠かせない。 AIは自律的でインテリジェントな宇宙ミッションにおいて重要な役割を果たすことが期待されているが、人間の専門家はAIソリューションの信頼性に疑問を呈している。そこで,この論文では,月面着陸の透明で理解可能な予測を目的とした,視覚に基づく月面着陸のための \gls{xai} について検討した。特徴抽出構造として注意に基づくDarknet53を提案する。クレーター検出とナビゲーションのタスクには、それぞれ注目ベースのYOLOv3とアテンションベースのDarknet53-LSTMが提示される。実験の結果,提案したネットワークは相対的なクレーター検出と月面着陸時のポーズ推定に競争力を発揮することが示された。モデル構築中にネットワークにアテンション機構を導入することにより、提供されたネットワークの説明可能性を実現する。さらに,PCCを用いて提案したネットワークの説明可能性について定量的に評価し,ネットワーク内の様々な畳み込み層の機能を示す。

The Lunar landing has drawn great interest in lunar exploration in recent years, and autonomous lunar landing navigation is fundamental to this task. AI is expected to play a critical role in autonomous and intelligent space missions, yet human experts question the reliability of AI solutions. Thus, the \gls{xai} for vision-based lunar landing is studied in this paper, aiming at providing transparent and understandable predictions for intelligent lunar landing. Attention-based Darknet53 is proposed as the feature extraction structure. For crater detection and navigation tasks, attention-based YOLOv3 and attention-Darknet53-LSTM are presented respectively. The experimental results show that the offered networks provide competitive performance on relative crater detection and pose estimation during the lunar landing. The explainability of the provided networks is achieved by introducing an attention mechanism into the network during model building. Moreover, the PCC is utilised to quantitively evaluate the explainability of the proposed networks, with the findings showing the functions of various convolutional layers in the network.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# 長期・非線形実効ボラティリティモデルによるリスク値の損失に基づくベイズ系列予測

Loss-based Bayesian Sequential Prediction of Value at Risk with a Long-Memory and Non-linear Realized Volatility Model ( http://arxiv.org/abs/2408.13588v1 )

ライセンス: Link先を確認

Rangika Peiris, Minh-Ngoc Tran, Chao Wang, Richard Gerlach,

(参考訳) リスク・アット・リスク(VaR)の予測には,長期記憶と非線形実現ボラティリティモデルクラスが提案されている。 RNN-HARと呼ばれるこのモデルは、リカレントニューラルネットワーク(Recurrent Neural Network, RNN)を統合して非線形力学を扱うことで、実現された測定において、長いメモリを効率的にキャプチャするフレームワークであるヘテロジニアス自己回帰(HAR)モデルを拡張している。 RNN HARのモデル推定と逐次予測には,損失に基づくモンテカルロを用いた一般化ベイズ推定を用いる。実証分析は、日替わり価格を用いて実施され、2000年から2022年にかけて31の市場指標で実施された。提案したモデルでは,VaR予測性能を基本HARモデルとその拡張と比較する。その結果、提案したRNN-HARモデルは、この研究で考慮された他のモデルよりも一貫して優れていることが示された。

A long memory and non-linear realized volatility model class is proposed for direct Value at Risk (VaR) forecasting. This model, referred to as RNN-HAR, extends the heterogeneous autoregressive (HAR) model, a framework known for efficiently capturing long memory in realized measures, by integrating a Recurrent Neural Network (RNN) to handle non-linear dynamics. Loss-based generalized Bayesian inference with Sequential Monte Carlo is employed for model estimation and sequential prediction in RNN HAR. The empirical analysis is conducted using daily closing prices and realized measures from 2000 to 2022 across 31 market indices. The proposed models one step ahead VaR forecasting performance is compared against a basic HAR model and its extensions. The results demonstrate that the proposed RNN-HAR model consistently outperforms all other models considered in the study.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# 分裂共鳴を持つシリコンマイクロリング共振器における工学的双光子スペクトル波動関数

Engineering biphoton spectral wavefunction in a silicon micro-ring resonator with split resonances ( http://arxiv.org/abs/2408.13590v1 )

ライセンス: Link先を確認

Liao Ye, Haoran Ma, Xiaoqing Guo, Fanjie Ruan, Yuehai Wang, Jianyi Yang,

(参考訳) 周波数時間(英: Frequency-time)は、フォトニックな高次元の絡み合いに適した自由度であり、単一モードデバイスとの互換性や分散に対する感受性などの利点がある。光子電界の周波数時間振幅の工学的制御は、2次光学非線形性を持つプラットフォーム上で実証されている。 3階の光学非線形性しか持たない集積フォトニックプラットフォームでは、工学的に構築された状態の生成は未解明のままである。ここでは,周波数領域における分離可能な状態と制御可能な絡み合った状態の両方を,後処理なしで生成できるシリコンオン絶縁体(SOI)プラットフォーム上のキャビティ強化光子対源を実証する。共振の組み合わせの異なる選択とオンチップ光場微分を用いることで、状態の結合スペクトル強度(JSI)に影響を与える2つの関数を独立に制御できる。半解析モデルを用いて、共振分割とポンプ微分の存在下での双光子スペクトルの波動関数をシミュレートし、そのパラメータは共振器の測定された線形応答から適合性に基づくパラメータ抽出によって完全に決定することができる。分離可能な状態に対する測定されたスペクトル純度は、95.5\pm 1.2\%$であり、一方、絡み合った状態に対する測定されたJSIは、2次元の周波数空間における2-または4-ピーク関数を示す。実験とシミュレーションは、パルス時間モード符号化や長距離量子鍵分布を用いた量子情報処理などの応用を約束するシリコンデバイスにおける周波数領域波動関数を操作する能力を示す。

Frequency-time is a degree of freedom suitable for photonic high-dimensional entanglement, with advantages such as compatibility with single-mode devices and insensitivity to dispersion. The engineering control of the frequency-time amplitude of a photon's electric field has been demonstrated on platforms with second-order optical nonlinearity. For integrated photonic platforms with only third-order optical nonlinearity, the engineered generation of the state remains unexplored. Here, we demonstrate a cavity-enhanced photon-pair source on the silicon-on-insulator (SOI) platform that can generate both separable states and controllable entangled states in the frequency domain without post-manipulation. By choosing different resonance combinations and employing on-chip optical field differentiation, we achieve independent control over two functions that affect the joint spectral intensity (JSI) of the state. A semi-analytical model is derived to simulate the biphoton spectral wavefunction in the presence of resonance splitting and pump differentiation, and its parameters can be fully determined through fitting-based parameter extraction from the resonator's measured linear response. The measured spectral purity for the separable state is $95.5\pm 1.2\%$, while the measured JSIs for the entangled states show two- or four-peaked functions in two-dimensional frequency space. The experiments and simulations demonstrate the capacity to manipulate the frequency-domain wavefunction in a silicon-based device, which is promising for applications like quantum information processing using pulsed temporal-mode encoding or long-distance quantum key distribution.

翻訳日:2024-08-27 18:59:33 公開日:2024-08-24

# ランダム特徴量を用いた最適カーネル量子学習

Optimal Kernel Quantile Learning with Random Features ( http://arxiv.org/abs/2408.13591v1 )

ライセンス: Link先を確認

Caixing Wang, Xingdong Feng,

(参考訳) ランダム機能(RF)アプローチは、スケーラブルなカーネルメソッドのための確立された効率的なツールであるが、既存の文献では、主にランダム機能付きカーネルリッジ回帰(KRR-RF)に焦点を当てている。本稿では、KQR-RFにおけるチェック損失の非平滑性を考慮したカーネル量子化レグレッション(KQR-RF)の一般化研究について、改良されたエラー分解を導入し、KQR-RFとKRR-RFの新たな接続を確立する。本研究は,KQR-RFの容量依存学習率を,いくつかの対数因子に最適化された極小RF数に対して軽度条件下で確立する。重要なことは、データ依存サンプリング戦略を利用した理論的結果は、ターゲット量子関数が仮定されたカーネル空間と正確に一致しないような非依存的な設定をカバーするために拡張することができる。私たちの仮定を少し修正することで、Lipschitzが連続的な損失を被るケースにもキャパシティ依存のエラー分析を適用することができ、機械学習コミュニティにおける幅広い応用を可能にします。理論的な結果を検証するため,シミュレーション実験と実データ応用を行った。

The random feature (RF) approach is a well-established and efficient tool for scalable kernel methods, but existing literature has primarily focused on kernel ridge regression with random features (KRR-RF), which has limitations in handling heterogeneous data with heavy-tailed noises. This paper presents a generalization study of kernel quantile regression with random features (KQR-RF), which accounts for the non-smoothness of the check loss in KQR-RF by introducing a refined error decomposition and establishing a novel connection between KQR-RF and KRR-RF. Our study establishes the capacity-dependent learning rates for KQR-RF under mild conditions on the number of RFs, which are minimax optimal up to some logarithmic factors. Importantly, our theoretical results, utilizing a data-dependent sampling strategy, can be extended to cover the agnostic setting where the target quantile function may not precisely align with the assumed kernel space. By slightly modifying our assumptions, the capacity-dependent error analysis can also be applied to cases with Lipschitz continuous losses, enabling broader applications in the machine learning community. To validate our theoretical findings, simulated experiments and a real data application are conducted.