Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240922となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ゼロ知識ゲーム Zero Knowledge Games ( http://arxiv.org/abs/2009.13521v7 ) ライセンス: Link先を確認	Ian Malloy,	(参考訳) 本稿では,不完全なリコールと不完全な情報によって,全ての戦略が不完全であるようなゲームをモデル化する。また,リニアトランスフォーメーションとして修正されたスライディングブロックコードを導入し,プレイヤーの公開発表時の情報伝達に関する共通知識を生成する。最終的に、2つのプレイヤーまたは2つの連立関係の間に、両方のプレイヤーに知らせられるゼロ知識ゲームは、混合戦略ナッシュ均衡に確立された信頼の効力を持つ。ゼロ知識ゲームは信頼と健全性の1つである。非インフォームドの選手の場合、そのようなプレイヤーは非インフォームドであることを明らかにする。検証の意思」は、クレームが繰り返し虚偽のクレームの責任を負ったり、非インフォームされたりすることがないように浸食されることがある。 In this paper we model a game such that all strategies are non-revealing, with imperfect recall and incomplete information. We also introduce a modified sliding-block code as a linear transformation which generates common knowledge of how informed a player is under public announcements. Ultimately, we see that between two players or two coalitions; zero-knowledge games where both players are informed have the utility of trust established in the mixed strategy Nash equilibrium. A zero-knowledge game is one of trust and soundness, placing utility in being informed. For any player who may be uninformed, such players reveal they are uninformed. The "will to verify" may be eroded such that the claimant is never held responsible for their repeated false claims or being uninformed.	翻訳日:2024-11-09 15:57:56 公開日:2024-09-22
# ゴールデンデリケートアップルにおける酵素ブルーニング欠陥検出のための新しい簡易可視化アルゴリズム A New Simple Vision Algorithm for Detecting the Enzymic Browning Defects in Golden Delicious Apples ( http://arxiv.org/abs/2110.03574v2 ) ライセンス: Link先を確認	Hamid Majidi Balanji,	(参考訳) 本研究は, 酵素的玄米処理によるゴールデンデリシスリンゴの表面欠陥を抽出し, 同定するために, 簡単な視覚アルゴリズムを設計, 実装した。実験では34種類のゴールデン・デリシアスリンゴが選択され、そのうち17個は酵素的染料欠陥があり、残りの17個は音が聞こえた。提案した視覚アルゴリズムの画像処理部は, リンゴの欠陥表面積を97.15%の精度で抽出した。分割画像の面積と平均は、2x1特徴ベクトルとして選択され、設計された人工ニューラルネットワークに入力される。以上の特徴から, 平均0.0065以下の画像は, 欠陥リンゴに属さないことが明らかとなった。本研究で適用されたニューラルネットワークの分類精度は99.19%であった。 In this work, a simple vision algorithm is designed and implemented to extract and identify the surface defects on the Golden Delicious apples caused by the enzymic browning process. 34 Golden Delicious apples were selected for the experiments, of which 17 had enzymic browning defects and the other 17 were sound. The image processing part of the proposed vision algorithm extracted the defective surface area of the apples with high accuracy of 97.15%. The area and mean of the segmented images were selected as the 2x1 feature vectors to feed into a designed artificial neural network. The analysis based on the above features indicated that the images with a mean less than 0.0065 did not belong to the defective apples; rather, they were extracted as part of the calyx and stem of the healthy apples. The classification accuracy of the neural network applied in this study was 99.19%	翻訳日:2024-11-09 15:57:56 公開日:2024-09-22
# より高速なグラディエントバリアントを用いたプライバシー保護ロジスティック回帰トレーニング Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant ( http://arxiv.org/abs/2201.10838v9 ) ライセンス: Link先を確認	John Chiang,	(参考訳) 暗号化されたデータに対するロジスティック回帰のトレーニングは、セキュリティ上の問題に何年も取り組んできた。本稿では、プライバシー保護ロジスティック回帰トレーニングのための効率的な勾配変種である$quadratic$$gradient$を紹介する。我々は,Nesterov の Accelerated Gradient (NAG),Adaptive Gradient Algorithm (Adagrad) およびAdamアルゴリズムを2次勾配を組み込んで拡張し,これらの改良アルゴリズムを様々なデータセット上で評価する。実験により, 従来の1次勾配法と比較して, 改良アルゴリズムは収束速度を著しく向上することを示した。さらに,同相ロジスティック回帰学習の実装に改良NAG法を適用し,わずか4回の反復で同等の結果を得ることができた。二次勾配法は2階のニュートン・ラフソン法と1階の勾配勾配勾配/上昇アルゴリズムを統合することができ、幅広い数値最適化問題に適用できる可能性は高い。 Training logistic regression over encrypted data has been a compelling approach in addressing security concerns for several years. In this paper, we introduce an efficient gradient variant, called $quadratic$ $gradient$, for privacy-preserving logistic regression training. We enhance Nesterov's Accelerated Gradient (NAG), Adaptive Gradient Algorithm (Adagrad) and Adam algorithms by incorporating their quadratic gradients and evaluate these improved algorithms on various datasets. Experimental results demonstrate that the enhanced algorithms achieve significantly improved convergence speed compared to traditional first-order gradient methods. Moreover, we applied the enhanced NAG method to implement homomorphic logistic regression training, achieving comparable results within just 4 iterations. There is a good chance that the quadratic gradient approach could integrate first-order gradient descent/ascent algorithms with the second-order Newton-Raphson methods, and that it could be applied to a wide range of numerical optimization problems.	翻訳日:2024-11-09 15:46:48 公開日:2024-09-22
# 低ビットレート映像理解のための符号化フレームワークとベンチマーク A Coding Framework and Benchmark towards Low-Bitrate Video Understanding ( http://arxiv.org/abs/2202.02813v3 ) ライセンス: Link先を確認	Yuan Tian, Guo Lu, Yichao Yan, Guangtao Zhai, Li Chen, Zhiyong Gao,	(参考訳) ビデオ圧縮は、ほとんどのビデオ分析システムにとって不可欠である。転送帯域を節約しているにもかかわらず、特に低ビットレート設定では、下流のビデオ理解タスクも悪化する。この問題を体系的に検討するために,我々はまず,従来の手法,すなわちタスク分離,ラベルなし,データエマージされたセマンティクスという3つの原則が,マシンフレンドリーなコーディングフレームワークにとって重要であるが,今のところ完全に満足していないことを明らかにした。本稿では,従来のコーデックとニューラルネットワーク(NN)の両方を活用することによって,これらすべての原則を同時に満たす従来型ニューラル混合コーディングフレームワークを提案する。一方、従来のコーデックはビデオのピクセル信号を効率的に符号化できるが、意味情報を歪ませることもある。一方、高非線形NNは、ビデオセマンティクスをコンパクトな表現に凝縮するのに熟練している。このフレームワークは、自己管理された方法でラベルのないデータから自発的に学習されるコーディング手順に、動画の移動効率のよい意味表現が保存されることを保証することで最適化される。 2つのストリーム(コーデックとNN)から共同でデコードされたビデオは、リッチなセマンティクスを持ち、視覚的に写真リアリスティックであり、いくつかの主流のダウンストリームビデオ分析タスクのパフォーマンスを、後処理なしで実証的に向上させる。さらに,アテンション機構とアダプティブ・モデリング・スキームを導入することで,本手法の映像セマンティック・モデリング能力をさらに強化する。最後に、8つのデータセット上の3つの下流タスクを備えた低ビットレートビデオ理解ベンチマークを構築し、我々のアプローチの顕著な優位性を実証した。すべてのコード、データ、モデルは、 \url{https://github.com/tianyuan168326/VCS-Pytorch}で利用可能である。 Video compression is indispensable to most video analysis systems. Despite saving transportation bandwidth, it also deteriorates downstream video understanding tasks, especially at low-bitrate settings. To systematically investigate this problem, we first thoroughly review the previous methods, revealing that three principles, i.e., task-decoupled, label-free, and data-emerged semantic prior, are critical to a machine-friendly coding framework but are not fully satisfied so far. In this paper, we propose a traditional-neural mixed coding framework that simultaneously fulfills all these principles, by taking advantage of both traditional codecs and neural networks (NNs). On one hand, the traditional codecs can efficiently encode the pixel signal of videos but may distort the semantic information. On the other hand, highly non-linear NNs are proficient in condensing video semantics into a compact representation. The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved w.r.t. the coding procedure, which is spontaneously learned from unlabeled data in a self-supervised manner. The videos collaboratively decoded from two streams (codec and NN) are of rich semantics, as well as visually photo-realistic, empirically boosting several mainstream downstream video analysis task performances without any post-adaptation procedure. Furthermore, by introducing the attention mechanism and adaptive modeling scheme, the video semantic modeling ability of our approach is further enhanced. Finally, we build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach. All codes, data, and models will be available at \url{https://github.com/tianyuan168326/VCS-Pytorch}.	翻訳日:2024-11-09 15:46:48 公開日:2024-09-22
# 逐次実験に対する実測的推論 Counterfactual inference for sequential experiments ( http://arxiv.org/abs/2202.06891v4 ) ライセンス: Link先を確認	Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah,	(参考訳) 複数の単位が時間とともに適応する処理ポリシーを用いて、複数の時間点に対する処理を割り当てるシーケンシャルな設計実験のアフタースタディ統計的推論を考察する。我々の目標は、最小限の可能な規模(各単位と各単位の異なる処理の下での平均結果)で、適応的な処理ポリシーに関する最小限の仮定で、カウンターファクト平均に対する推論保証を提供することです。反事実的手段に関する構造的な仮定がなければ、この課題は観測されたデータポイントよりも多くの未知のために実現不可能である。そこで本研究では,非線形混合効果モデルの非パラメトリック一般化と,先行研究で考慮された双線形潜在因子モデルの非パラメトリック一般化として機能する潜在因子モデルを提案する。推定には、近辺の変種である非パラメトリック法を用い、各単位と各時間に対する対実平均に対して非漸近的高確率誤差を定めている。正規性条件の下では、この境界は、単位数と時間点が適切な速度で一緒に$\infty$に増加するにつれて、反ファクトリアル平均に対する漸近的に妥当な信頼区間をもたらす。我々は,いくつかのシミュレーションと,モバイル医療臨床試験HeartStepsのデータを含むケーススタディを通して,我々の理論を解説する。 We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.	翻訳日:2024-11-09 15:46:48 公開日:2024-09-22
# 生物学的時系列データから確率力学方程式を発見する Discovering stochastic dynamical equations from biological time series data ( http://arxiv.org/abs/2205.02645v6 ) ライセンス: Link先を確認	Arshed Nabeel, Ashwin Karichannavar, Shuaib Palathingal, Jitesh Jhawar, David B. Brückner, Danny Raj M., Vishwesha Guttal,	(参考訳) 理論的研究により、確率性は反直観的な方法で生態系の力学に影響を与えることが示されている。しかし、個体群や生態系の動態を規定する方程式を知らずに、実際のデータセットにおける確率性の役割を確かめることは困難である。したがって、データセットから支配確率方程式を推定する逆問題は重要である。本稿では,状態変数の時系列データを入力とし,確率微分方程式を出力する方程式探索手法を提案する。確率計算からの従来のアプローチと方程式発見手法を組み合わせることでこれを実現できる。いくつかの応用を通して,本手法の一般化を実証する。まず、基本的に異なる支配方程式を持つ様々な確率モデルを意図的に選択するが、ほぼ同一の定常分布を生成する。時系列データのみの解析から,正しい基礎となる方程式を復元し,その安定性を正確に推定できることが示される。我々は,魚の学習と単一細胞移動という,時空間スケールとダイナミクスの異なる2つの実世界のデータセット上で,我々の手法を実証する。本手法の様々な限界と潜在的な落とし穴と診断方法による克服方法について述べる。最後に、PyDaDDy(Python Library for Data Driven Dynamics)というパッケージを通じて、オープンソースコードを提供しています。 Theoretical studies have shown that stochasticity can affect the dynamics of ecosystems in counter-intuitive ways. However, without knowing the equations governing the dynamics of populations or ecosystems, it is difficult to ascertain the role of stochasticity in real datasets. Therefore, the inverse problem of inferring the governing stochastic equations from datasets is important. Here, we present an equation discovery methodology that takes time series data of state variables as input and outputs a stochastic differential equation. We achieve this by combining traditional approaches from stochastic calculus with the equation-discovery techniques. We demonstrate the generality of the method via several applications. First, we deliberately choose various stochastic models with fundamentally different governing equations; yet they produce nearly identical steady-state distributions. We show that we can recover the correct underlying equations, and thus infer the structure of their stability, accurately from the analysis of time series data alone. We demonstrate our method on two real-world datasets -- fish schooling and single-cell migration -- which have vastly different spatiotemporal scales and dynamics. We illustrate various limitations and potential pitfalls of the method and how to overcome them via diagnostic measures. Finally, we provide our open-source codes via a package named PyDaDDy (Python library for Data Driven Dynamics).	翻訳日:2024-11-09 15:46:48 公開日:2024-09-22
# Mine yOur owN anatomy: Revising Medical Image Segmentation with Extremely Limited Labels (特集バイオサイバネティックスとバイオサイバネティックス) Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels ( http://arxiv.org/abs/2209.13476v6 ) ライセンス: Link先を確認	Chenyu You, Weicheng Dai, Fenglin Liu, Yifei Min, Nicha C. Dvornek, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S. Duncan,	(参考訳) 近年のコントラスト学習の研究は, 医療画像セグメンテーションの文脈において, ラベルの少ないことのみを生かして, 優れた成果を上げている。既存の方法は、主にインスタンスの識別と不変マッピングに焦点を当てている。 1) 尾性: 医療画像データは通常、暗黙の長い尾のクラス分布に従う。したがって、訓練ですべてのピクセルを盲目的に活用することは、データの不均衡を招き、パフォーマンスを悪化させる。(2)一貫性:セグメント化モデルが、異なる解剖学的特徴のクラス内変化のために有意義で一貫性のある解剖学的特徴を学習したかどうか、(3)多様性:データセット全体のスライス内相関は、著しく低い注意を払っている。これは、データセット自体を戦略的に利用し、異なる解剖学的視点から類似しているが異なるサンプルを発見するための、原則化されたアプローチを求める動機である。本稿では,Mine yOur owN Anatomy (MONA) と呼ばれる,半教師付き2次元医用画像セグメンテーションフレームワークを紹介する。まず、先行研究では、すべてのピクセルがモデルトレーニングに等しく重要であると論じており、これらだけでは、主に監視信号が欠如していることから、意味のある解剖学的特徴を定義することは不可能である、と実証的に観察している。より強力なデータ拡張と最も近い隣人を使って、不変性を学ぶための2つの簡単なソリューションを示します。第2に,医療画像の解剖学的特徴の集合体への分解を教師なしで行うことをモデルに促す目的の集合を構築した。最後に、我々は実験的かつ理論的に、異なるラベル付き設定で3つのベンチマークデータセットに対してMONAの有効性を実証し、異なるラベル付き半教師付き設定で新しい最先端を実現する。 Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.	翻訳日:2024-11-09 15:46:48 公開日:2024-09-22
# FIRE:エッジコンピューティングマイグレーションのための障害適応型強化学習フレームワーク FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations ( http://arxiv.org/abs/2209.14399v3 ) ライセンス: Link先を確認	Marie Siew, Shikhar Sharma, Zekai Li, Kun Guo, Chao Xu, Tania Lorido-Botran, Tony Q. S. Quek, Carlee Joe-Wong,	(参考訳) エッジコンピューティングでは、ユーザのモビリティのために、ユーザのサービスプロファイルが移行される。強化学習(RL)フレームワークは、しばしばシミュレーションデータに基づいて訓練される。しかし、既存のRLフレームワークは時折サーバの障害を見落としており、これは、自律運転やリアルタイム障害検出のような遅延に敏感なアプリケーションに影響を与えている。それでも、過去のトレーニングデータで適切に表現されていないこれらの失敗(まれな出来事)は、データ駆動RLアルゴリズムに挑戦する。実世界のトレーニング用アプリケーションにおいて、故障頻度を調整することは現実的ではないため、エッジコンピューティングのディジタルツイン環境でRLポリシーをトレーニングすることで、まれな事象に適応するフレームワークであるFIREを導入する。 ImREは重要なサンプリングに基づくQ-ラーニングアルゴリズムであり、希少事象をその値関数への影響に比例してサンプリングする。 FIREは、個々のサービスプロファイルと共有サービスのプロファイルにまたがる遅延、マイグレーション、障害、バックアップの配置コストを考慮に入れている。我々はImREの有界性と最適性への収束性を証明する。次に、拡張性を高めるために、新しいQ-ラーニング(ImDQL)とアクタ評論家(ImACRE)バージョンを導入します。リスクトレランスの異なるユーザに対応するために、当社のフレームワークを拡張しています。トレース駆動実験により,障害発生時のバニラRLやグリーディベースラインと比較して,FIREがコストを削減できることが判明した。 In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.	翻訳日:2024-11-09 15:35:37 公開日:2024-09-22
# 航空機エンジンブレードの知的欠陥検出のための超画素知覚グラフニューラルネットワーク Superpixel perception graph neural network for intelligent defect detection of aero-engine blade ( http://arxiv.org/abs/2210.07539v2 ) ライセンス: Link先を確認	Hongbing Shang, Qixiu Yang, Chuang Sun, Xuefeng Chen, Ruqiang Yan,	(参考訳) エアロエンジンは航空機や他の宇宙船のコアコンポーネントである。高速回転翼は空気を吸って完全に燃焼することで力を提供し、様々な欠陥が必然的に発生し、航空エンジンの運転安全性を脅かす。そのため、このような複雑なシステムには定期的な検査が不可欠である。しかしながら、ボアスコープ検査を行う既存の技術は、労働集約的で、時間がかかり、経験に依存している。特徴抽出のための多段階グラフ畳み込みネットワーク(MSGCN)と領域提案のための超画素知覚領域提案ネットワーク(SPRPN)を用いて,この技術を知能で実現するために,新しい超画素知覚グラフニューラルネットワーク(SPGNN)を提案する。まず、複雑な不規則なテクスチャをキャプチャするために、画像は一連のパッチに変換され、グラフ表現を得る。次に、複数のGCNブロックからなるMSGCNがグラフ構造の特徴を抽出し、グラフレベルでグラフ情報処理を行う。最後に、SPRPNは、グラフ表現特徴とスーパーピクセル知覚特徴を融合させて知覚境界ボックスを生成する。そのため,提案したSPGNNは,SPGNNパイプライン全体のグラフレベルにおいて,常に特徴抽出と情報伝達を実装し,受容野の減少と情報損失を軽減する。 SPGNNの有効性を検証するため,3000枚の画像を用いたシミュレートされたブレードデータセットを構築した。アルミニウムのパブリックデータセットは、異なる方法のパフォーマンスを検証するためにも使われる。実験結果から,提案したSPGNNは最先端手法と比較して優れた性能を示した。 Aero-engine is the core component of aircraft and other spacecraft. The high-speed rotating blades provide power by sucking in air and fully combusting, and various defects will inevitably occur, threatening the operation safety of aero-engine. Therefore, regular inspections are essential for such a complex system. However, existing traditional technology which is borescope inspection is labor-intensive, time-consuming, and experience-dependent. To endow this technology with intelligence, a novel superpixel perception graph neural network (SPGNN) is proposed by utilizing a multi-stage graph convolutional network (MSGCN) for feature extraction and superpixel perception region proposal network (SPRPN) for region proposal. First, to capture complex and irregular textures, the images are transformed into a series of patches, to obtain their graph representations. Then, MSGCN composed of several GCN blocks extracts graph structure features and performs graph information processing at graph level. Last but not least, the SPRPN is proposed to generate perceptual bounding boxes by fusing graph representation features and superpixel perception features. Therefore, the proposed SPGNN always implements feature extraction and information transmission at the graph level in the whole SPGNN pipeline, to alleviate the reduction of receptive field and information loss. To verify the effectiveness of SPGNN, we construct a simulated blade dataset with 3000 images. A public aluminum dataset is also used to validate the performances of different methods. The experimental results demonstrate that the proposed SPGNN has superior performance compared with the state-of-the-art methods.	翻訳日:2024-11-09 15:35:37 公開日:2024-09-22
# ソフトラベルプロトタイプを用いた事例から新しい課題を学習する Learning New Tasks from a Few Examples with Soft-Label Prototypes ( http://arxiv.org/abs/2210.17437v4 ) ライセンス: Link先を確認	Avyav Kumar Singh, Ekaterina Shutova, Helen Yannakoudakis,	(参考訳) 既存のNLPにおける少数ショット学習へのアプローチは、大言語モデル(LLM)および/またはこれらを微調整して、アウト・オブ・ディストリビューションデータの一般化に頼っている。そこで本研究では,入力領域における異なるクラスの分布を総合的に把握するソフトラベルのプロトタイプ(SLP)に基づく,新しい数発学習手法を提案する。本稿では,NLP タスクをクラスごとのごく少数の例 (4, 8, 16) から学習することに集中し,本手法がパラメータ効率が高く,テスト済みタスクの大部分に対して優れた性能を達成できることを実験的に実証する。また,本手法は,より汎用的な学習環境,主にメタラーニングに組み込むことで,強力なベースラインに対して優れた性能が得られることを示す。 Existing approaches to few-shot learning in NLP rely on large language models (LLMs) and/or fine-tuning of these to generalise on out-of-distribution data. In this work, we propose a novel few-shot learning approach based on soft-label prototypes (SLPs) designed to collectively capture the distribution of different classes across the input domain space. We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class and experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting while being highly parameter efficient. We also show that our few-shot adaptation method can be integrated into more generalised learning settings, primarily meta-learning, to yield superior performance against strong baselines.	翻訳日:2024-11-09 15:35:37 公開日:2024-09-22
# スパースディープニューラルネットワークアーキテクチャのための適応的・安定的階層的学習手法 An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture ( http://arxiv.org/abs/2211.06860v2 ) ライセンス: Link先を確認	C G Krishnanunni, Tan Bui-Thanh,	(参考訳) この研究は、与えられたトレーニングデータセットに対してうまく一般化するディープニューラルネットワーク(DNN)アーキテクチャを段階的に開発するための2段階適応フレームワークを提案する。第1段階では、新しいレイヤを毎回追加し、前のレイヤでパラメータを凍結することで独立してトレーニングする、レイヤワイズトレーニングアプローチが採用されている。我々は、多様体正則化、スパーシティ正則化、物理インフォームド項を用いることで、DNNに望ましい構造を課す。本稿では, 学習アルゴリズムの望ましい特性として, エプシロン・デルタ安定促進の概念を導入し, 多様体正規化を用いることで, エプシロン・デルタ安定促進アルゴリズムが得られることを示す。さらに,新たに加えた層をトレーニングするために必要な条件を導出し,トレーニング飽和問題について検討する。アルゴリズムの第2段(後処理)では、浅いネットワークのシーケンスを用いて、第1段で生成された残差から情報を抽出し、予測精度を向上させる。試行錯誤問題と分類問題に関する数値的研究により,提案手法が同一サイズの完全連結DNNより優れていることを示す。さらに、物理インフォームドニューラルネットワーク(PINN)に偏微分方程式を解くための適応型アーキテクチャ戦略を組み込むことにより、適応型PINNは標準のPINNよりも優れているだけでなく、証明可能な安定性を持つ解釈可能な隠蔽層を生成することを数値的に示す。また, 楕円偏微分方程式に支配される逆問題の解法として, アーキテクチャ設計戦略を適用した。 This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.	翻訳日:2024-11-09 15:35:37 公開日:2024-09-22
# Z-SSMNet : Bi-parametric MRIによる前立腺癌検出と診断のためのゾーナル・アウェア自己監督メッシュネットワーク Z-SSMNet: Zonal-aware Self-supervised Mesh Network for Prostate Cancer Detection and Diagnosis with Bi-parametric MRI ( http://arxiv.org/abs/2212.05808v2 ) ライセンス: Link先を確認	Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, Jinman Kim,	(参考訳) 臨床的に有意な前立腺癌(csPCa)の検出と診断において,bi-parametric magnetic resonance imaging (bpMRI)が重要なモダリティとなっている。 bpMRIを用いてcsPCaを識別するAIベースのシステムを開発することで、効率性とコスト効率を向上させることにより、PCa管理を変革することができる。しかし、畳み込みニューラルネットワーク(CNN)を用いた現在の最先端手法は、異方性画像から平面内および三次元空間情報を学習する際に限られている。それらのパフォーマンスは、大きく、多様で、よく注釈付けされたbpMRIデータセットの可用性にも依存する。本研究では,多次元(2D/2.5D/3D)畳み込みを適応的に統合し,高密度なスライス情報と異方性bpMRIのスライス間情報をバランスよく学習するZ-SSMNetを提案する。 bpMRIの外観,テクスチャ,構造を学習するために,大規模未ラベルデータを用いてネットワークを事前学習するための自己教師付き学習(SSL)手法を提案する。トレーニング前の段階で、スライス内情報とスライス間情報の両方をキャプチャすることを目的としている。さらに,我々は,csPCaの検出・診断能力をさらに向上するため,粒子解剖学的領域に集中するようにネットワークを拘束した。 10000以上のマルチセンターデータとマルチスキャナデータからなるPI-CAIデータセットについて広範な実験を行った。 Z-SSMNetは病変レベルの診断(APスコア0.633)と患者レベルの診断(AUROCスコア0.881)の両方に優れ,PI-CAIチャレンジのオープン開発フェーズにおけるトップ位置を確保し,APスコア0.690とAUROCスコア0.909を達成し,クローズドテストフェーズにおける第2位の地位を確保した。 Bi-parametric magnetic resonance imaging (bpMRI) has become a pivotal modality in the detection and diagnosis of clinically significant prostate cancer (csPCa). Developing AI-based systems to identify csPCa using bpMRI can transform PCa management by improving efficiency and cost-effectiveness. However, current state-of-the-art methods using convolutional neural networks (CNNs) are limited in learning in-plane and three-dimensional spatial information from anisotropic images. Their performances also depend on the availability of large, diverse, and well-annotated bpMRI datasets. We propose a Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptively integrates multi-dimensional (2D/2.5D/3D) convolutions to learn dense intra-slice information and sparse inter-slice information of the anisotropic bpMRI in a balanced manner. A self-supervised learning (SSL) technique is proposed to pre-train our network using large-scale unlabeled data to learn the appearance, texture, and structure semantics of bpMRI. It aims to capture both intra-slice and inter-slice information during the pre-training stage. Furthermore, we constrained our network to focus on the zonal anatomical regions to further improve the detection and diagnosis capability of csPCa. We conducted extensive experiments on the PI-CAI dataset comprising 10000+ multi-center and multi-scanner data. Our Z-SSMNet excelled in both lesion-level detection (AP score of 0.633) and patient-level diagnosis (AUROC score of 0.881), securing the top position in the Open Development Phase of the PI-CAI challenge and maintained strong performance, achieving an AP score of 0.690 and an AUROC score of 0.909, and securing the second-place ranking in the Closed Testing Phase.	翻訳日:2024-11-09 15:35:37 公開日:2024-09-22
# ハニーポットデータにおける教師なし攻撃パターン検出のためのネストディリクレモデル Nested Dirichlet models for unsupervised attack pattern detection in honeypot data ( http://arxiv.org/abs/2301.02505v3 ) ライセンス: Link先を確認	Francesco Sanna Passino, Anastasia Mantziou, Daniyar Ghani, Philip Thiede, Ross Bevington, Nicholas A. Heard,	(参考訳) サイバーシステムは侵入の試みからほぼ一貫した脅威にさらされている。攻撃の種類は異なるが、それぞれの試みは典型的には特定の意図を持ち、加害者は典型的には同様の目的を持った個人のグループである。共通の意図を共有しているように見えるクラスタリング攻撃は、脅威追跡の専門家にとって非常に価値がある。本稿では、悪意のある攻撃者を誘惑するように設計された特別なネットワークホストであるハニーポットから収集した端末セッションコマンドをクラスタリングするためのディリクレ分布トピックモデルについて検討する。セッションをクラスタリングする主な実践的意味は2つある。様々な統計モデルが検討され、コマンドライン構文の構造に適応している。特に、セカンダリトピックとセカンダリトピックの概念、そしてセッションレベルおよびコマンドレベルトピックの概念が、解釈可能性を改善するためにモデルに導入される。提案手法はさらにベイズ的非パラメトリックな方法で拡張され、語彙サイズと潜在意図数の非有界性を許容する。これらの手法は、従来のトピックモデリングアプローチでは検出されていない、既存の暗号通貨のコインマイニングインフラを乗っ取ろうとする、珍しいMIRAI変異を発見している。 Cyber-systems are under near-constant threat from intrusion attempts. Attacks types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common intent is very valuable to threat-hunting experts. This article explores Dirichlet distribution topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks, and identifying outliers. A range of statistical models are considered, adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant which attempts to take over existing cryptocurrency coin-mining infrastructure, not detected by traditional topic-modelling approaches.	翻訳日:2024-11-09 15:24:36 公開日:2024-09-22
# Cesno: 新しいプログラミング言語の初期設計 Cesno: The Initial Design of a New Programming Language ( http://arxiv.org/abs/2303.15750v4 ) ライセンス: Link先を確認	Ozelot Vanilla, Jingxiang Yu, Hemn Barzan Abdalla,	(参考訳) プログラミング言語は非常に多彩で、開発者は個々の要件に合ったアプリケーションやプログラムを作成できます。この記事では、高度でユーザフレンドリで使いやすいプログラミング環境を提供するためにゼロから設計された、Cesnoという新しい言語を紹介します。 Cesnoの構文は他の人気のある言語と似ているため、学習と作業が簡単になる。構文シュガー、組み込みライブラリ、関数型プログラミングのサポート、オブジェクト指向プログラミング、動的型付け、型システム、さまざまな関数パラメータと制約など、他の言語の機能が含まれている。この記事では、Cesnoの文法の設計について検討し、Cesnoがどのようにコードを処理し、コンパイルするかを概観し、Cesnoのコードがどのようなもので、どのように開発に役立てるかを検証します。 Programming languages are incredibly versatile, enabling developers to create applications and programs that suit their individual requirements. This article introduces a new language called Cesno, designed from the ground up to offer an advanced, user-friendly, and easy-to-use programming environment. Cesno's syntax is similar to other popular languages, making it simple to learn and work with. It incorporates features from other languages, such as syntactic sugar, a built-in library, support for functional programming, object-oriented program-ming, dynamic typing, a type system, and a variety of function parameters and restrictions. This article will explore the design of Cesno's grammar, provide a brief overview of how Cesno processes and compiles code, and provide exam-ples of what Cesno's code looks like and how it can aid in development.	翻訳日:2024-11-09 15:24:36 公開日:2024-09-22
# 確率的エージェントドロップアウト下におけるマルチエージェントMDPのモデル自由学習と最適ポリシー設計 Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout ( http://arxiv.org/abs/2304.12458v2 ) ライセンス: Link先を確認	Carmel Fiscko, Soummya Kar, Bruno Sinopoli,	(参考訳) 本研究では,エージェントドロップアウトを行うマルチエージェントマルコフ決定プロセス(MDP)と,事前ドロップアウトシステムの制御とサンプリングに基づくポストドロップアウトシステムのポリシーの計算について検討する。中央プランナーの目的は、エージェントのドロップアウト確率の事前知識が与えられた場合、期待されるシステムの価値を最大化する最適なポリシーを見つけることである。特定の遷移独立性と報酬分離性構造を持つMDPに対して、システムからエージェントを取り除くことは、新しい状態と行動空間を持つ残りのエージェントと、除去されたエージェントを疎外する遷移ダイナミクスと、除去されたエージェントとは独立な報酬からなる新しいMDPを形成すると仮定する。この「ロバストMDP」は、Nがエージェント数を表すようなシステムの2ドルN$実現度を全て評価する必要性を排除している。さらに、モデルフリーの文脈では、ロバストなMDP値を事前ドロップアウトシステムによって生成されたサンプルで推定できることが示され、つまり、ドロップアウトが起こる前にロバストなポリシーを見つけることができる。この事実は、ドロップアウトシナリオに対するポリシー評価を行うための政策重要サンプリング(IS)ルーチンの提案に利用され、既存のシステムを適切な事前ドロップアウトポリシーで制御する。ポリシーISルーチンは、堅牢なMDPと特定のドロップアウトシステムの実現の両方に対して値推定を生成し、指数的信頼境界で正当化される。最後に、このアプローチの有用性をシミュレーションで検証し、エージェントのドロップアウトの構造的特性が、ドロップアウトが起こる前にコントローラが優れたドロップアウトポリシーを見つけるのにどう役立つかを示す。 This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The central planner's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where N denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.	翻訳日:2024-11-09 15:13:22 公開日:2024-09-22
# CADGE: グラフ構造化知識集約による文脈認識対話生成 CADGE: Context-Aware Dialogue Generation Enhanced with Graph-Structured Knowledge Aggregation ( http://arxiv.org/abs/2305.06294v4 ) ライセンス: Link先を確認	Hongbo Zhang, Chen Tang, Tyler Loakman, Bohao Yang, Stefan Goetze, Chenghua Lin,	(参考訳) 常識知識は多くの自然言語処理タスクに不可欠である。既存の研究は通常、グラフ知識を従来のグラフニューラルネットワーク(GNN)に組み込む。しかし、この区画化は、これらの2種類の入力知識間の文脈的相互作用を完全に活用するわけではない。本稿では,文脈対応グラフアテンションモデル (Context-aware graph-attention model) を提案する。具体的には、フラットなグラフ知識とテキストデータとを融合させることにより、不均一な特徴を調和させる表現学習に革新的なアプローチを採用する。コンテクスト情報によって補完される連結部分グラフにおけるグラフ知識集約の階層的適用により、コモンセンス駆動対話の生成を促進する。実験により,本フレームワークは従来のGNNベース言語モデルよりも性能が優れていることが示された。自動評価と人的評価の両面から,提案モデルのフローベースラインに対する性能向上が確認できた。 Commonsense knowledge is crucial to many natural language processing tasks. Existing works usually incorporate graph knowledge with conventional graph neural networks (GNNs), resulting in a sequential pipeline that compartmentalizes the encoding processes for textual and graph-based knowledge. This compartmentalization does, however, not fully exploit the contextual interplay between these two types of input knowledge. In this paper, a novel context-aware graph-attention model (Context-aware GAT) is proposed, designed to effectively assimilate global features from relevant knowledge graphs through a context-enhanced knowledge aggregation mechanism. Specifically, the proposed framework employs an innovative approach to representation learning that harmonizes heterogeneous features by amalgamating flattened graph knowledge with text data. The hierarchical application of graph knowledge aggregation within connected subgraphs, complemented by contextual information, to bolster the generation of commonsense-driven dialogues is analyzed. Empirical results demonstrate that our framework outperforms conventional GNN-based language models in terms of performance. Both, automated and human evaluations affirm the significant performance enhancements achieved by our proposed model over the concept flow baseline.	翻訳日:2024-11-09 15:13:22 公開日:2024-09-22
# 脳腫瘍セグメンテーション(BraTS)課題 : 塗布による健康な脳組織の局所的合成 The Brain Tumor Segmentation (BraTS) Challenge: Local Synthesis of Healthy Brain Tissue via Inpainting ( http://arxiv.org/abs/2305.08992v3 ) ライセンス: Link先を確認	Florian Kofler, Felix Meissen, Felix Steinbauer, Robert Graf, Stefan K Ehrlich, Annika Reinke, Eva Oswald, Diana Waldmannstetter, Florian Hoelzl, Izabela Horvath, Oezguen Turgut, Suprosanna Shit, Christina Bukas, Kaiyuan Yang, Johannes C. Paetzold, Ezequiel de da Rosa, Isra Mekki, Shankeeth Vinayahalingam, Hasan Kassem, Juexin Zhang, Ke Chen, Ying Weng, Alicia Durrer, Philippe C. Cattin, Julia Wolleb, M. S. Sadique, M. M. Rahman, W. Farzana, A. Temtam, K. M. Iftekharuddin, Maruf Adewole, Syed Muhammad Anwar, Ujjwal Baid, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Hongwei Bran Li, Ahmed W Moawad, Gian-Marco Conte, Keyvan Farahani, James Eddy, Micah Sheller, Sarthak Pati, Alexandros Karagyris, Alejandro Aristizabal, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Elaine Johanson, Zeke Meier, Ariana Familiar, Christos Davatzikos, John Freymann, Justin Kirby, Michel Bilello, Hassan M Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Rivka R Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc-André Weber, Abhishek Mahajan, Suyash Mohan, John Mongan, Christopher Hess, Soonmee Cha, Javier Villanueva-Meyer, Errol Colak, Priscila Crivellaro, Andras Jakab, Abiodun Fatade, Olubukola Omidiji, Rachel Akinola Lagos, O O Olatunji, Goldey Khanna, John Kirkpatrick, Michelle Alonso-Basanta, Arif Rashid, Miriam Bornhorst, Ali Nabavizadeh, Natasha Lepore, Joshua Palmer, Antonio Porras, Jake Albrecht, Udunna Anazodo, Mariam Aboian, Evan Calabrese, Jeffrey David Rudie, Marius George Linguraru, Juan Eugenio Iglesias, Koen Van Leemput, Spyridon Bakas, Benedikt Wiestler, Ivan Ezhov, Marie Piraud, Bjoern H Menze,	(参考訳) 脳MR画像の自動解析のための無数のアルゴリズムが、臨床医の意思決定を支援するために利用可能である。脳腫瘍患者の場合、画像取得の時系列は通常、すでに病理的なスキャンから始まる。多くのアルゴリズムは健康な脳を解析し、病変を特徴とする画像の保証を提供しない。例えば、脳解剖学のパーセレーション、組織セグメンテーション、脳抽出のアルゴリズムがある。このジレンマを解決するために,BraTS塗装の課題を紹介する。そこで参加者は、損傷した脳から健康な脳スキャンを合成するための塗装技術を探る。下記の原稿にはタスクの定式化、データセット、提出手順が含まれている。その後、課題の調査結果をまとめるために更新される。この挑戦はASNR-BraTS MICCAIチャレンジの一部として組織されている。 A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with an already pathological scan. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantee for images featuring lesions. Examples include, but are not limited to, algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS inpainting challenge. Here, the participants explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later, it will be updated to summarize the findings of the challenge. The challenge is organized as part of the ASNR-BraTS MICCAI challenge.	翻訳日:2024-11-09 15:13:22 公開日:2024-09-22
# フェデレーション学習のための視覚変換器の連続的適応 Continual Adaptation of Vision Transformers for Federated Learning ( http://arxiv.org/abs/2306.09970v2 ) ライセンス: Link先を確認	Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira,	(参考訳) 本稿では,サーバがクライアントの集合と通信し,データを共有したり保存したりすることなく,新たな概念を段階的に学習する,CFL(Continuousal Federated Learning)の重要な課題に焦点を当てる。この問題の複雑さは、継続学習とフェデレート学習の両方の観点からの課題によって複雑化されます。具体的には、CFLセットアップでトレーニングされたモデルは、クライアント間のデータの異質性によって悪化する破滅的な忘れ込みに悩まされる。この問題に対する既存の試みは、クライアントや通信チャネルに大きなオーバーヘッドを課す傾向にあり、あるいは保存されたデータにアクセスする必要があるため、プライバシによる実際の使用には適さない。本稿では,記憶データへのアクセスを必要とせず,オーバーヘッドコストを最小限に抑えながら,忘れと不均一性に取り組む。本研究では,視覚変換器の文脈でこの問題を考察し,動的分布に適応するパラメータ効率のアプローチを,最小限に抑えながら検討する。我々は、プロンプトベースのアプローチ(プロンプトとクラシファイアヘッドのみを通信しなければならない)を活用し、サーバにおけるクライアントモデルを統合するための、新しくて軽量な生成と蒸留方式を提案する。我々は、画像分類の問題を定式化し、比較のための強力なベースラインを確立し、CIFAR-100上で実験を行い、ImageNet-RやDomainNetのような大規模データセットに挑戦する。提案手法は,通信コストとクライアントレベルの計算コストを大幅に削減しつつ,既存手法と独自のベースラインを最大7%向上させる。コードはhttps://github.com/shaunak27/hepco-fed.comで公開されている。 In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Vision Transformers and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs. Code available at https://github.com/shaunak27/hepco-fed.	翻訳日:2024-11-09 14:51:04 公開日:2024-09-22
# 恒常的ホモロジーランク関数を用いた推論の安定性 Stability for Inference with Persistent Homology Rank Functions ( http://arxiv.org/abs/2307.02904v2 ) ライセンス: Link先を確認	Qiquan Wang, Inés García-Redondo, Pierre Faugère, Gregory Henselman-Petrusek, Anthea Monod,	(参考訳) 永続ホモロジーバーコードとダイアグラムは、点雲、ネットワーク、関数など、幅広い複雑なデータ構造の「形」を捉えたトポロジ的データ解析の基盤である。しかし、その複雑な幾何学的構造のため、統計的な設定での使用は困難である。本稿では,統計と機械学習のツールとして,バーコードと永続化図に数学的に等価な永続的ホモロジーランク関数を再検討する。ランク関数は、関数であり、関数の形でデータに適合する統計の領域である、機能データ分析(FDA)の統計理論の直接的な適用を可能にする。しかし、実際にバーコードに対して提示される重要な課題は、安定性の欠如である。データの忠実な表現としての使用を検証する上で重要な特性であり、したがって実行可能な要約統計量である。本稿では,FDA 統合のための適切な基準の下で,永続的ホモロジーランク関数に対する2つの安定性結果を導出することにより,このギャップを埋める。次に、機能的推論統計学および機械学習におけるランク関数の性能を、単パラメータおよび多パラメータの永続的ホモロジーの両方において、実データアプリケーション上で研究する。階数関数によって捕捉される永続的ホモロジーの使用は、既存の非永続的アプローチよりも明らかな改善をもたらす。 Persistent homology barcodes and diagrams are a cornerstone of topological data analysis that capture the "shape" of a wide range of complex data structures, such as point clouds, networks, and functions. However, their use in statistical settings is challenging due to their complex geometric structure. In this paper, we revisit the persistent homology rank function, which is mathematically equivalent to a barcode and persistence diagram, as a tool for statistics and machine learning. Rank functions, being functions, enable the direct application of the statistical theory of functional data analysis (FDA)-a domain of statistics adapted for data in the form of functions. A key challenge they present over barcodes in practice, however, is their lack of stability-a property that is crucial to validate their use as a faithful representation of the data and therefore a viable summary statistic. In this paper, we fill this gap by deriving two stability results for persistent homology rank functions under a suitable metric for FDA integration. We then study the performance of rank functions in functional inferential statistics and machine learning on real data applications, in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing non-persistence-based approaches.	翻訳日:2024-11-09 14:51:04 公開日:2024-09-22
# コードLLMのための高リソースから低リソースプログラミング言語への知識伝達 Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs ( http://arxiv.org/abs/2308.09895v6 ) ライセンス: Link先を確認	Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha,	(参考訳) ここ数年、Large Language Models of Code (Code LLMs) はプログラミングの実践に大きな影響を与え始めています。プログラミング言語やソフトウェア工学の研究のためのビルディングブロックとして、コードLLMが登場している。しかし、Code LLMはトレーニングデータ(例えば、Java、Python、JavaScript)でよく表現されているが、トレーニングデータに制限のある低リソースの言語では苦労しているプログラミング言語に対して印象的な結果をもたらす。低リソース言語にはOCaml、Racket、その他いくつかのものがある。本稿では,半合成データを用いた低リソース言語上でのコードLLMの性能向上に有効な手法を提案する。我々のアプローチであるMultiPL-Tは、ハイソース言語からのトレーニングデータを、以下の方法で低リソース言語のトレーニングデータに変換する。 1) Code LLMを使用して、高ソース言語からのコメント付きコードのテストの合成を行い、欠陥のあるテストとテストカバレッジの低いコードをフィルタリングします。 2) コードLLMを使用してPythonコードをターゲットとする低リソース言語に翻訳し,テストを使用して翻訳を検証する。このアプローチを適用して,Julia,Lua,OCaml,R,Racketの各トレーニング項目を数万個生成する。さらに、オープンなトレーニングデータ(The Stack)を備えたオープンモデル(StarCoderBase)を使用することで、ベンチマークの削除や、ライセンスに違反することなくモデルをトレーニングし、それ以外の方法では不可能な実験を実行することが可能になります。 MultiPL-T 生成データを用いて,Julia,Lua,OCaml,R,Racket 用の StarCoderBase と Code Llama の微調整版を提示する。確立されたベンチマーク(MultiPL-E)では、これらのモデルは他のオープンコードLLMよりも優れている。 MultiPL-Tアプローチは、新しい言語に簡単に適用でき、トレーニングのような代替手段よりもはるかに効率的で効果的である。 Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.	翻訳日:2024-11-09 14:40:04 公開日:2024-09-22
# 量子擬似ランダムスクランブラ Quantum Pseudorandom Scramblers ( http://arxiv.org/abs/2309.08941v2 ) ライセンス: Link先を確認	Chuhan Lu, Minglong Qin, Fang Song, Penghui Yao, Mingnan Zhao,	(参考訳) 量子擬似ランダム状態発生器(PRSG)は近年、エキサイティングな発展を促している。固定初期(例えば全ゼロ)状態のPSRGは、Haarランダム状態と計算的に区別できない出力状態を生成する。しかし、出力状態の擬似ランダム性は他の初期状態では保証されない。実際、既知のPSSG構造はいくつかの初期状態で確実に失敗する。本研究では、任意の初期状態上で擬似乱数状態を生成する量子擬似乱数状態スクランブラ(PRSS)を提案し、構築する。情報理論的な設定では、任意の初期状態を全変動距離におけるハールランダムに近い量子状態の分布にマッピングするスクランブラを得る。その結果,スクランブラーは分散特性を示した。一般には、状態空間の$\epsilon$-netにまたがることができる。このことは、平均出力状態がハールランダム状態に近似するならば、状態空間の小さな領域のみに集中できるため、標準PSRGが誘導できるものを大幅に強化する。我々のPRSS構造は有名なKacの歩行を平行に拡張し、標準のKacの歩行よりも指数関数的に高速に混合することを示す。これは我々の証明の核となる。 PRSSの応用についても述べる。 PRSSの構成は、量子後片道関数を仮定するが、PRSSはより弱いプリミティブであり、標準PSSGと同様の相対化世界の片道関数から分離することができる。 Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial states. In this work, we propose and construct quantum Pseudorandom State Scramblers (PRSSs), which can produce a pseudorandom state on an arbitrary initial state. In the information-theoretical setting, we obtain a scrambler which maps an arbitrary initial state to a distribution of quantum states that is close to Haar random in total variation distance. As a result, our scrambler exhibits a dispersing property. Loosely, it can span an $\epsilon$-net of the state space. This significantly strengthens what standard PRSGs can induce, as they may only concentrate on a small region of the state space provided that average output state approximates a Haar random state. Our PRSS construction develops a parallel extension of the famous Kac's walk, and we show that it mixes exponentially faster than the standard Kac's walk. This constitutes the core of our proof. We also describe a few applications of PRSSs. While our PRSS construction assumes a post-quantum one-way function, PRSSs are potentially a weaker primitive and can be separated from one-way functions in a relativized world similar to standard PRSGs.	翻訳日:2024-11-09 14:28:50 公開日:2024-09-22
# Intelligent Scoliosis スクリーニングと診断 : アンケート調査 Intelligent Scoliosis Screening and Diagnosis: A Survey ( http://arxiv.org/abs/2310.08756v2 ) ライセンス: Link先を確認	Zhenlin Zhang, Lixin Pu, Ang Li, Jun Zhang, Xianjie Li, Jipeng Fan,	(参考訳) scooliosisは3次元の脊椎変形であり、胸椎変形や骨盤傾斜などの異常な形態を呈する可能性がある。重度の患者は神経損傷や尿路異常に悩まされることがある。現在、中国では小学校・中学校のスコリシス患者が500万人を超えており、毎年3%から5%の頻度で伸びている。したがって、スコリオーシスの研究は重要な臨床的価値を持っている。本稿では,コンピュータ支援型スコリアススクリーニングと診断を体系的に導入し,現状の課題におけるアルゴリズムモデルの利点と限界を分析する。また、この分野での現在の開発ボトルネックについても論じ、今後の開発動向を楽しみにしている。 Scoliosis is a three-dimensional spinal deformity, which may lead to abnormal morphologies, such as thoracic deformity, and pelvic tilt. Severe patients may suffer from nerve damage and urinary abnormalities. At present, the number of scoliosis patients in primary and secondary schools has exceeded five million in China, the incidence rate is about 3% to 5% which is growing every year. The research on scoliosis, therefore, has important clinical value. This paper systematically introduces computer-assisted scoliosis screening and diagnosis as well as analyzes the advantages and limitations of different algorithm models in the current issue field. Moreover, the paper also discusses the current development bottlenecks in this field and looks forward to future development trends.	翻訳日:2024-11-09 10:01:09 公開日:2024-09-22
# ドメイン横断点雲の分割にSAMを適応させる学習 Learning to Adapt SAM for Segmenting Cross-domain Point Clouds ( http://arxiv.org/abs/2310.08820v4 ) ライセンス: Link先を確認	Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Yujing Sun, Tai Wang, Xinge Zhu, Yuexin Ma,	(参考訳) 3Dセグメンテーションタスクにおける非教師なしドメイン適応(UDA)は、主にポイントクラウドデータの希薄で非秩序な性質から生じる、恐ろしい挑戦である。特にLiDARの点雲では、様々な撮影シーン、変動する気象条件、使用中の様々なLiDARデバイス間でドメインの差が明らかになる。従来のUDA手法では、ソースとターゲットのドメイン間の特徴を整列させることで、このギャップを緩和しようと試みてきたが、ドメインのかなりの変動により、3Dセグメンテーションに適用した場合、このアプローチは不十分である。イメージセグメンテーションの領域において,視覚基盤モデル SAM が示す顕著な一般化能力に着想を得て,SAM 内に埋め込まれた豊富な一般知識を活用し,多様な3次元領域にまたがる特徴表現を統一し,さらに3次元領域適応問題を解く。具体的には,3次元特徴空間とSAMの特徴空間との整合性を大幅に向上させ,シーンレベルとインスタンスレベルの両方で動作する,革新的なハイブリッド機能拡張手法を提案する。提案手法は,広く認識されている多くのデータセットで評価され,最先端の性能を実現する。 Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily stemming from the sparse and unordered nature of point cloud data. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. While previous UDA methodologies have often sought to mitigate this gap by aligning features between source and target domains, this approach falls short when applied to 3D segmentation due to the substantial domain variations. Inspired by the remarkable generalization capabilities exhibited by the vision foundation model, SAM, in the realm of image segmentation, our approach leverages the wealth of general knowledge embedded within SAM to unify feature representations across diverse 3D domains and further solves the 3D domain adaptation problem. Specifically, we harness the corresponding images associated with point clouds to facilitate knowledge transfer and propose an innovative hybrid feature augmentation methodology, which significantly enhances the alignment between the 3D feature space and SAM's feature space, operating at both the scene and instance levels. Our method is evaluated on many widely-recognized datasets and achieves state-of-the-art performance.	翻訳日:2024-11-09 10:01:09 公開日:2024-09-22
# ソノルミネッセンス:時間依存アナログ系における光子生成 Sonoluminescence: Photon production in time dependent analog system ( http://arxiv.org/abs/2311.03305v2 ) ライセンス: Link先を確認	Rajesh Karmakar, Debaprasad Maity,	(参考訳) ソノルミネッセンス(英: Sonoluminescence)は、適切な環境で振動するガスの泡が周期的に可視域の光を放出する、よく知られた実験室現象である。本稿では,アナログ重力の枠組みにおけるシステムについて考察する。アナログ幾何学の観点から発振気泡をモデル化し,電磁場の最小結合処方則を提案する。この幾何学は、量子真空からのパラメトリック共鳴を通じて、光子の繰り返し束が広い周波数範囲で生成される類似の発振時間依存背景として振る舞う。数値的な制限のため、$\sim 10^5 ~\mbox{m}^{-1}$まで到達することができた。しかし、観測周波数範囲を$\sim 10^7 ~\mbox{m}^{-1}$とする多項式形式に数値的に適合する。現在の分析は、アナログ背景におけるパラメトリック共鳴が、量子場理論の枠組みにおいてそのような現象を説明する上で、基本的な役割を担っていることを示唆している。 Sonoluminescence is a well known laboratory phenomenon where an oscillating gas bubble in the appropriate environment periodically emits a flash of light in the visible frequency range. In this submission, we study the system in the framework of analog gravity. We model the oscillating bubble in terms of analog geometry and propose a non-minimal coupling prescription of the electromagnetic field with the geometry. The geometry behaves as an analogous oscillating time dependent background in which repeated flux of photons are produced in a wide frequency range through parametric resonance from quantum vacuum. Due to our numerical limitation, we could reach the frequency up to $\sim 10^5 ~\mbox{m}^{-1}$. However, we numerically fit the spectrum in a polynomial form including the observed frequency range around $\sim 10^7 ~\mbox{m}^{-1}$. Our current analysis seems to suggest that parametric resonance in analog background may play a fundamental role in explaining such phenomena in the quantum field theory framework.	翻訳日:2024-11-09 09:50:02 公開日:2024-09-22
# 非監督的遠隔生理計測のための自己相似事前蒸留法 Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement ( http://arxiv.org/abs/2311.05100v2 ) ライセンス: Link先を確認	Xinyu Zhang, Weiyu Sun, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen,	(参考訳) 遠隔光胸腺造影法(remote Photoplethysmography, RPPG)は、心臓活動による血液量の変化による顔画素の微妙な変化を捉えることを目的とした非侵襲的手法である。既存のrPPGタスクの教師なし手法のほとんどは、生理的信号の前の自己相似性を無視しながら、サンプル間の対照的な学習に焦点を当てている。本稿では,心活動の本質的な自己相似性に着目した,教師なしrPPG推定のための自己相似事前蒸留(SSPD)フレームワークを提案する。具体的には、まず、様々な種類のノイズの影響を軽減するために、物理的に適切な組込み拡張手法を導入する。そして、より信頼性の高い自己相似的生理的特徴を抽出するために、自己相似性認識ネットワークを調整する。最後に,顔画像から自己相似的な生理的パターンを引き離すネットワークを支援するために,階層的な自己蒸留パラダイムを開発する。包括的実験により、教師なしのSSPDフレームワークは、最先端の教師付き手法と比較して、同等またはそれ以上のパフォーマンスを達成することが示された。一方、SSPDは、エンドツーエンドモデルの中で最も低い推論時間と計算コストを維持している。 Remote photoplethysmography (rPPG) is a noninvasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the contrastive learning between samples while neglecting the inherent self-similar prior in physiological signals. In this paper, we propose a Self-Similarity Prior Distillation (SSPD) framework for unsupervised rPPG estimation, which capitalizes on the intrinsic self-similarity of cardiac activities. Specifically, we first introduce a physical-prior embedded augmentation technique to mitigate the effect of various types of noise. Then, we tailor a self-similarity-aware network to extract more reliable self-similar physiological features. Finally, we develop a hierarchical self-distillation paradigm to assist the network in disentangling self-similar physiological patterns from facial videos. Comprehensive experiments demonstrate that the unsupervised SSPD framework achieves comparable or even superior performance compared to the state-of-the-art supervised methods. Meanwhile, SSPD maintains the lowest inference time and computation cost among end-to-end models.	翻訳日:2024-11-09 09:50:02 公開日:2024-09-22
# 量子過程の実現のための絡み合いコスト Entanglement cost of realizing quantum processes ( http://arxiv.org/abs/2311.10649v2 ) ライセンス: Link先を確認	Xin Wang, Mingrui Jing, Chengkai Zhu,	(参考訳) 量子絡み合い(quantum entanglement)は、量子コンピューティングやセキュアな通信といった強力な技術を支える粒子間の特異な接続である。しかし、量子状態を作成し、量子プロセスを実装するのに必要な最小の絡み合いを定量化することは重要な課題である。我々は、特定の物理原理を尊重する任意の量子過程を実現するのに必要な絡み合いの量を確実に推定する効率的な計算可能なツールを開発する。我々のツールは、従来の方法の制限を超越して、漸近的状態における幅広い量子状態の準備に必要な絡み合いに適用する。また、量子演算のクラスを考えるために一度消費された絡み合いが、漸近的にさえ完全に回復できないことも確認する。この不可逆な振る舞いは、部分的転置の正当性を完全に保存する量子演算でさえも、フルランクの絡み合った状態と実質的に関連する振幅減衰チャネルに対して明らかである。本稿では,SWAPチャネルの両分極化を実現するための絡み合い条件の推定や,熱相互作用下でのハミルトンシミュレーションの解法などの例を通して,我々のアプローチのパワーを実証する。我々の研究は、一般的な状態と量子力学の絡み合い要求をベンチマークするための実用的なツールキットを提供し、量子技術の性能を評価し、最適化する方法を開拓する。 Quantum entanglement, a peculiar connection between particles, underpins powerful technologies such as quantum computing and secure communication. However, quantifying the minimum entanglement required to prepare quantum states and implement quantum processes remains a significant challenge. We develop an efficiently computable tool that reliably estimates the amount of entanglement needed for realizing arbitrary quantum processes respecting certain physical principles. Our tool applies to the entanglement required to prepare a broad range of quantum states in the asymptotic regime, surpassing previous methods' limitations. We also confirm that entanglement, once consumed to realize the considered class of quantum operations, cannot be fully recovered, even asymptotically. This irreversible behavior is evident for full-rank entangled states and practically relevant amplitude damping channels, even under quantum operations that completely preserve the positivity of partial transpose. We showcase our approach's power through examples such as estimating entanglement requirements for realizing bipartite dephasing SWAP channels and solving Hamiltonian simulations under thermal interaction, highlighting its advantages over existing techniques. Our work provides a practical toolkit for benchmarking entanglement requirements for generic states and quantum dynamics, paving the way for assessing and optimizing the performances of quantum technologies.	翻訳日:2024-11-09 09:38:58 公開日:2024-09-22
# Event Camera Data Dense 事前トレーニング Event Camera Data Dense Pre-training ( http://arxiv.org/abs/2311.11533v2 ) ライセンス: Link先を確認	Yan Yang, Liyuan Pan, Liu Liu,	(参考訳) 本稿では,イベントカメラデータを用いた高密度予測タスクに適したニューラルネットワークの事前学習を目的とした,自己教師付き学習フレームワークを提案する。当社のアプローチでは,トレーニングにイベントデータのみを活用する。イベントカメラデータへの高密度RGB事前トレーニングによる成果の転送は、サブパーパフォーマンスをもたらす。これは、多くのピクセルが情報を含まないイベント画像(イベントデータから変換される)に固有の空間空間の空間空間性に起因する。この余分な問題を緩和するために、イベントイメージをイベントパッチ機能にエンコードし、パッチ間のコンテキスト的類似性関係を自動的にマイニングし、パッチ機能を固有のコンテキストにグループ化し、コンテキストとコンテキストの類似性を強制し、識別可能なイベント機能を学ぶ。フレームワークをトレーニングするために、さまざまなシーンと動きパターンを特徴とする合成イベントカメラデータセットをキュレートする。下流の高密度予測タスクにおける伝達学習性能は,最先端手法よりも提案手法が優れていることを示す。 This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. Our approach utilizes solely event data for training. Transferring achievements from dense RGB pre-training directly to event camera data yields subpar performance. This is attributed to the spatial sparsity inherent in an event image (converted from event data), where many pixels do not contain information. To mitigate this sparsity issue, we encode an event image into event patch features, automatically mine contextual similarity relationships among patches, group the patch features into distinctive contexts, and enforce context-to-context similarities to learn discriminative event features. For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns. Transfer learning performance on downstream dense prediction tasks illustrates the superiority of our method over state-of-the-art approaches.	翻訳日:2024-11-09 09:38:57 公開日:2024-09-22
# 流通福祉による政策学習 Policy Learning with Distributional Welfare ( http://arxiv.org/abs/2311.15878v3 ) ライセンス: Link先を確認	Yifan Cui, Sukjin Han,	(参考訳) 本稿では,分配福祉を対象とする最適治療配分政策について検討する。治療選択に関する文献の多くは、条件付き平均治療効果(ATE)に基づく実用的福祉を考察している。平均的な福祉は直感的であるが、特に個人が異質な(例えば、外れ値を持つ)場合、望ましくない割り当てをもたらす可能性がある。本研究の動機は,個別処理効果の条件量子化(QoTE)に基づいて治療を割り当てる最適政策を提案することである。量的確率の選択によっては、この基準は慎重または無神経な政策立案者に対応することができる。 QoTEを特定することの課題は、実験データにおいても回復が困難である対実的な結果の共分散に関する知識の要求にある。したがって、不確実性をモデル化する上で堅牢なミニマックスポリシーを導入する。仮定を特定できる範囲は、より情報的なポリシーを生み出すのに利用できる。確率的・決定論的両政策については,提案された政策の実施を後悔することによる漸近的境界を確立する。シミュレーションと2つの経験的応用において、QoTEに基づく最適決定と他の基準に基づく決定を比較した。この枠組みは、福祉が潜在的な成果の共役分布の関数として定義されるあらゆる状況に一般化することができる。 In this paper, we explore optimal treatment allocation policies that target distributional welfare. Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE). While average welfare is intuitive, it may yield undesirable allocations especially when individuals are heterogeneous (e.g., with outliers) - the very reason individualized treatments were introduced in the first place. This observation motivates us to propose an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE). Depending on the choice of the quantile probability, this criterion can accommodate a policymaker who is either prudent or negligent. The challenge of identifying the QoTE lies in its requirement for knowledge of the joint distribution of the counterfactual outcomes, which is generally hard to recover even with experimental data. Therefore, we introduce minimax policies that are robust to model uncertainty. A range of identifying assumptions can be used to yield more informative policies. For both stochastic and deterministic policies, we establish the asymptotic bound on the regret of implementing the proposed policies. In simulations and two empirical applications, we compare optimal decisions based on the QoTE with decisions based on other criteria. The framework can be generalized to any setting where welfare is defined as a functional of the joint distribution of the potential outcomes.	翻訳日:2024-11-09 09:38:57 公開日:2024-09-22
# ネットワークによるUnixwap日替わり取引指標のデータセット A Dataset of Uniswap daily transaction indices by network ( http://arxiv.org/abs/2312.02660v2 ) ライセンス: Link先を確認	Nir Chemaya, Lin William Cong, Emma Jorgensen, Dingyue Liu, Luyao Zhang,	(参考訳) 分散ファイナンス(DeFi)は、仲介者なしで直接取引を可能にし、豊富なオープンファイナンスデータを作成することによって、従来のファイナンスを再構築している。レイヤ2(L2)ソリューションは、レイヤ1(L1)システムを超えた、DeFiエコシステムのスケーラビリティと効率を高めるために登場しています。しかし、L2ソリューションの影響は、主に経済分析のための包括的トランザクションデータ指標が欠如していることから、いまだ研究が進んでいない。この研究は、L1ネットワークとL2ネットワークの両方にわたる主要な分散取引であるUnixwapから5000万件以上のトランザクションを分析し、ギャップを埋める。私たちはEthereum、Optimism、Arbitrum、Polygonのブロックチェーンデータから毎日のインデックスを作成し、DeFiの採用、スケーラビリティ、分散化、富の分散に関する洞察を提供しました。さらに、分散化指標を計算するためのオープンソースのPythonフレームワークを開発し、このデータセットが高度な機械学習研究に非常に役立つようにした。私たちの仕事はデータサイエンティストに貴重なリソースを提供し、インテリジェントなWeb3エコシステムの成長に貢献しています。 Decentralized Finance (DeFi) is reshaping traditional finance by enabling direct transactions without intermediaries, creating a rich source of open financial data. Layer 2 (L2) solutions are emerging to enhance the scalability and efficiency of the DeFi ecosystem, surpassing Layer 1 (L1) systems. However, the impact of L2 solutions is still underexplored, mainly due to the lack of comprehensive transaction data indices for economic analysis. This study bridges that gap by analyzing over 50 million transactions from Uniswap, a major decentralized exchange, across both L1 and L2 networks. We created a set of daily indices from blockchain data on Ethereum, Optimism, Arbitrum, and Polygon, offering insights into DeFi adoption, scalability, decentralization, and wealth distribution. Additionally, we developed an open-source Python framework for calculating decentralization indices, making this dataset highly useful for advanced machine learning research. Our work provides valuable resources for data scientists and contributes to the growth of the intelligent Web3 ecosystem.	翻訳日:2024-11-09 09:27:53 公開日:2024-09-22
# SkyScenes: 航空シーン理解のための合成データセット SkyScenes: A Synthetic Dataset for Aerial Scene Understanding ( http://arxiv.org/abs/2312.06719v2 ) ライセンス: Link先を確認	Sahil Khose, Anisha Pal, Aayushi Agarwal, Deepanshi, Judy Hoffman, Prithvijit Chattopadhyay,	(参考訳) 実世界の航空シーンの理解は、様々な条件の下でキュレーションされた濃密な注釈付き画像を含むデータセットの欠如によって制限される。制御された実世界の環境でこのような画像を得るための固有の課題のため、無人航空機(UAV)の視点から捉えた高密度の注釈付き空中画像の合成データセットSkyScenesを提示する。我々は、CARLAのSkyScenes画像を慎重にキュレートし、レイアウト(都市や農村の地図)、気象条件、日時、ピッチ角、高度を、対応する意味、例、深さアノテーションで包括的に把握する。 1)SkyScenesを用いた実験により,(1)SkyScenesで訓練されたモデルが現実のシナリオに順応し,(2)SkyScenesデータによる実画像のトレーニングが実世界のパフォーマンスを向上させること,(3)SkyScenesの制御されたバリエーションが,視点条件(高さとピッチ),天気と日時の変化にモデルがどう反応するか,(4)付加的なセンサモーダル性(深度)を組み込むことにより,空間の理解が向上すること,などが示されている。私たちのデータセットと関連する生成コードは、https://hoffman-group.github.io/SkyScenes/で公開されています。 Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions (height and pitch), weather and time of day, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding. Our dataset and associated generation code are publicly available at: https://hoffman-group.github.io/SkyScenes/	翻訳日:2024-11-09 09:16:50 公開日:2024-09-22
# オンライン連続手話認識と翻訳に向けて Towards Online Continuous Sign Language Recognition and Translation ( http://arxiv.org/abs/2401.05336v2 ) ライセンス: Link先を確認	Ronglai Zuo, Fangyun Wei, Brian Mak,	(参考訳) 聴覚と聴覚のコミュニケーションギャップを埋めるためには,連続手話認識(CSLR)の研究が不可欠である。過去の多くの研究では、コネクショニスト時間分類(CTC)の損失を用いてモデルを訓練してきた。推論の間、これらのCTCベースのモデルは一般的に、高いレイテンシとかなりのメモリ使用量に悩まされるオフライン認識と呼ばれるプロセスである予測を行うために、入力としてサインビデオ全体を必要とする。本研究では,オンラインCSLRに向けた第一歩を踏み出す。私たちのアプローチは3つのフェーズで構成されています。 1)手話辞書の作成 2 辞書上で孤立手話認識モデルを訓練すること、及び 3)入力サインシーケンスにスライディングウインドウアプローチを適用し,各サインクリップを最適化したオンライン認識モデルに供給する。さらに、我々のオンライン認識モデルは、グロス・トゥ・テキスト・ネットワークを統合することで、オンライン翻訳をサポートするように拡張することができ、オフラインモデルの性能を向上させることができる。これらの拡張により、オンラインアプローチは、様々なタスク設定にまたがる3つの人気のあるベンチマークに対して、最先端のパフォーマンスを新たに達成する。コードとモデルはhttps://github.com/FangyunWei/SLRT.comで公開されている。 Research on continuous sign language recognition (CSLR) is essential to bridge the communication gap between deaf and hearing individuals. Numerous previous studies have trained their models using the connectionist temporal classification (CTC) loss. During inference, these CTC-based models generally require the entire sign video as input to make predictions, a process known as offline recognition, which suffers from high latency and substantial memory usage. In this work, we take the first step towards online CSLR. Our approach consists of three phases: 1) developing a sign dictionary; 2) training an isolated sign language recognition model on the dictionary; and 3) employing a sliding window approach on the input sign sequence, feeding each sign clip to the optimized model for online recognition. Additionally, our online recognition model can be extended to support online translation by integrating a gloss-to-text network and can enhance the performance of any offline model. With these extensions, our online approach achieves new state-of-the-art performance on three popular benchmarks across various task settings. Code and models are available at https://github.com/FangyunWei/SLRT.	翻訳日:2024-11-09 05:28:28 公開日:2024-09-22
# 森林伐採の理論的・実証的研究 Theoretical and Empirical Advances in Forest Pruning ( http://arxiv.org/abs/2401.05535v3 ) ライセンス: Link先を確認	Albert Dorador,	(参考訳) 開始から数十年後、レグレッション・フォレストは最先端の精度を提供し続けており、この点において、レグレッション・ツリーやニューラルネットワークのような代替機械学習モデルよりも優れています。しかし、アンサンブル手法であるレグレッション・フォレストは、レグレッション・ツリーを著しく過小評価する傾向にある。本研究は,回帰林の精度と回帰樹の解釈可能性という,両世界を最大限に活用するアプローチである森林伐採を再考するものである。この追求はランダム森林理論の核心にあるが、経験的研究において大きな成功を収めている。本稿では,これらの経験的知見を裏付け,検証する理論的な結果,すなわち,非常に弱い仮定のもとに,未開林に対するラッソ刈り林の漸近的優位性を証明し,また,本手法により刈り取られた回帰林に対する高確率有限サンプル一般化境界を検証し,シミュレーションにより検証する。次に,19の異なるデータセット (合成, 3実) 上で, 未伐採林と比較し, 伐採林の精度を検証した。テストされたほとんどのシナリオでは、少なくとも1つの森林伐採方法があり、それは元の森林(予想通り)と同等かそれ以上の精度が得られる。その結果,森林面積の減少が劇的であり,結果として得られた亜熱帯林を1本木に有意にマージし,原生林よりも質的に優れた解釈可能性を得ることができた。 Decades after their inception, regression forests continue to provide state-of-the-art accuracy, outperforming in this respect alternative machine learning models such as regression trees or even neural networks. However, being an ensemble method, the one aspect where regression forests tend to severely underperform regression trees is interpretability. In the present work, we revisit forest pruning, an approach that aims to have the best of both worlds: the accuracy of regression forests and the interpretability of regression trees. This pursuit, whose foundation lies at the core of random forest theory, has seen vast success in empirical studies. In this paper, we contribute theoretical results that support and qualify those empirical findings; namely, we prove the asymptotic advantage of a Lasso-pruned forest over its unpruned counterpart under extremely weak assumptions, as well as high-probability finite-sample generalization bounds for regression forests pruned according to the main methods, which we then validate by way of simulation. Then, we test the accuracy of pruned regression forests against their unpruned counterparts on 19 different datasets (16 synthetic, 3 real). We find that in the vast majority of scenarios tested, there is at least one forest-pruning method that yields equal or better accuracy than the original full forest (in expectation), while just using a small fraction of the trees. We show that, in some cases, the reduction in the size of the forest is so dramatic that the resulting sub-forest can be meaningfully merged into a single tree, obtaining a level of interpretability that is qualitatively superior to that of the original regression forest, which remains a black box.	翻訳日:2024-11-09 05:28:28 公開日:2024-09-22
# 乳房におけるニップ・ハロシン化の知識検証 Knowledge Verification to Nip Hallucination in the Bud ( http://arxiv.org/abs/2401.10768v5 ) ライセンス: Link先を確認	Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi,	(参考訳) 大規模言語モデル(LLM)は、人間のアライメントに続く様々なタスクにおいて例外的な性能を示したが、それでも、幻覚として知られる事実知識と矛盾する応答を生成する可能性がある。本稿では、アライメントデータに存在する外部知識と基礎LLM内に埋め込まれた固有の知識との矛盾を検証し、最小化することにより、幻覚を緩和する可能性を示す。具体的には,知識一貫性アライメント(KCA, Knowledge Consistent Alignment)と呼ばれる,外部知識に基づく評価を自動的に定式化し,基礎LPMの知識境界を評価する手法を提案する。アライメントデータにおける知識の不整合に対処するため、KCAはこれらのデータインスタンスを扱うためのいくつかの具体的な戦略を実装している。 6つのベンチマークで幻覚を減らし, バックボーンとスケールの異なる基礎的LCMを利用することで, KCAの優れた効果を実証した。これは、知識の不整合を減らして幻覚を緩和する効果を確認する。私たちのコード、モデルウェイト、データは、 \url{https://github.com/fanqiwan/KCA}で公開アクセスできます。 While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at \url{https://github.com/fanqiwan/KCA}.	翻訳日:2024-11-09 05:17:11 公開日:2024-09-22
# Sum-Product Networks を用いた類似物生成 Generating Likely Counterfactuals Using Sum-Product Networks ( http://arxiv.org/abs/2401.14086v3 ) ライセンス: Link先を確認	Jiri Nemecek, Tomas Pevny, Jakub Marecek,	(参考訳) AIシステムによる決定の説明責任は、最近の規制とユーザ要求の両方によって引き起こされる。これらの決定はしばしば、事実の後に \emph{post hoc} のみを説明することができる。反事実的説明において、最も優れた反事実的説明を構成するものは何であるかを問うことができる。明らかに、"サンプルからの距離"は重要な基準であるが、複数の基準を考慮する必要がある。カウンターファクトの妥当性を考える最近の手法は、この本来の目的を犠牲にしているようだ。本稿では,密接かつ疎密な高次説明を提供するシステムを提案する。そこで本研究では,多くのデシデラタを満足する最も可能性の高い説明の探索を混合整数最適化 (MIO) を用いてモデル化できることを述べる。本プロセスでは,SPN(Sum-Product Network)のMIO定式化を提案し,SPNを用いて,独立利害関係にある可能性のある反事実の可能性を推定する。 Explainability of decisions made by AI systems is driven by both recent regulation and user demand. These decisions are often explainable only \emph{post hoc}, after the fact. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although "distance from the sample" is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using mixed-integer optimization (MIO). In the process, we propose an MIO formulation of a Sum-Product Network (SPN) and use the SPN to estimate the likelihood of a counterfactual, which can be of independent interest.	翻訳日:2024-11-09 05:17:11 公開日:2024-09-22
# 光学鋼ロープの非破壊損傷検出法 A new method for optical steel rope non-destructive damage detection ( http://arxiv.org/abs/2402.03843v4 ) ライセンス: Link先を確認	Yunqing Bao, Bin Hu,	(参考訳) 本稿では,高高度(空中ロープウェイ)における鋼ロープの非破壊損傷検出のための新しいアルゴリズムを提案する。まず、RGBD-UNetという名前のセグメンテーションモデルは、複雑な背景から鋼のロープを正確に抽出するように設計されている。このモデルは、提案したCMAモジュールを通して色と深度情報を処理・結合する機能を備えている。第2に、VovNetV3.5と呼ばれる検出モデルは、通常の鋼ロープと異常鋼ロープを区別するために開発された。 VovNetアーキテクチャとDBBモジュールを統合してパフォーマンスを向上させる。また,セグメンテーションモデルの一般化能力を高めるために,新たなバックグラウンド拡張手法を提案する。セグメンテーションと検出モデルの両方のトレーニングとテストのために、異なるシナリオでスチールロープの画像を含むデータセットを作成する。実験はベースラインモデルよりも大幅に改善された。提案したデータセットでは,検出モデルにより達成された最高精度が0.975に達し,セグメンテーションモデルにより達成された最大F値が0.948に達した。 This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.	翻訳日:2024-11-09 04:54:55 公開日:2024-09-22
# 正則フラー理論 I:有限次元および無限次元における指数積分 Holomorphic Floer theory I: exponential integrals in finite and infinite dimensions ( http://arxiv.org/abs/2402.07343v2 ) ライセンス: Link先を確認	Maxim Kontsevich, Yan Soibelman,	(参考訳) プロジェクト『ホロモルフィック・フレアー理論』にまつわる一連の論文の第一部において、指数積分と関連する壁交差構造について論じる。この主題について、変形量子化の考えに基づくものと、フレアー理論の考えに基づくものという2つの視点を強調する。それらの等価性は、一般化されたリーマン・ヒルベルト対応の系である。指数積分の場合には、デ・ラムとベッチコホモロジーの局所版と大域版の比較同型性に相当する。我々は、モーゼ・ノヴィコフ理論を正則なケースに特に一般化する対応する理論を発展させる。また、壁交差構造が解析可能であることを証明する。我々は、量子波動関数の一般理論を開発し、チャーン・サイモンズ理論の場合、一般化されたナーム和の概念に基づくチャーン・サイモンズ壁交差構造の代替記述を与えることを示す。対応する摂動級数の解析性と復活に関するいくつかの予想を提案する。 In the first of the series of papers devoted to our project ``Holomorphic Floer Theory" we discuss exponential integrals and related wall-crossing structures. We emphasize two points of view on the subject: the one based on the ideas of deformation quantization and the one based on the ideas of Floer theory. Their equivalence is a corollary of our generalized Riemann-Hilbert correspondence. In the case of exponential integrals this amounts to several comparison isomorphisms between local and global versions of de Rham and Betti cohomology. We develop the corresponding theories in particular generalizing Morse-Novikov theory to the holomorphic case. We prove that arising wall-crossing structures are analytic. As a corollary, perturbative expansions of exponential integrals are resurgent. Based on a careful study of finite-dimensional exponential integrals we propose a conjectural approach to infinite-dimensional exponential integrals. We illustrate this approach in the case of Feynman path integral with holomorphic Lagrangian boundary conditions as well as in the case of the complexified Chern-Simons theory. We discuss the arising perverse sheaf of infinite rank as well as analyticity of the corresponding ``Chern-Simons wall-crossing structure". We develop a general theory of quantum wave functions and show that in the case of Chern-Simons theory it gives an alternative description of the Chern-Simons wall-crossing structure based on the notion of generalized Nahm sum. We propose several conjectures about analyticity and resurgence of the corresponding perturbative series.	翻訳日:2024-11-09 04:43:41 公開日:2024-09-22
# 思慮に注意を払う:オンライン談話における行動検出のための実践的ニュアンスをマイニングする Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse ( http://arxiv.org/abs/2402.09934v2 ) ライセンス: Link先を確認	Khiem Phi, Noushin Salek Faramarzi, Chenlu Wang, Ritwik Banerjee,	(参考訳) 物語をディスラプトし、不信を喚起する強力なツールである「Whataboutism」は、量的NLP研究において未発見のままである。また、過去の研究は、誤情報やプロパガンダの戦略としての使用と、実用的・意味的なフレーミングの道具としての使用とを区別していない。我々は、TwitterとYouTubeからの新しいデータセットを導入し、オーバーラップと、どこが問題なのか、プロパガンダ、そしてTu quoqueの誤用の区別を明らかにした。さらに、言語意味論における最近の研究に基づき、「何について」の語彙構造と「何について」を区別する。我々の実験は、その正確な検出において、非常に独特な課題をもたらし、負のサンプルマイニングに注意重みを用いた新しい方法が導入された。本誌のTwitterとYouTubeのコレクションでは、これまでの最先端の手法に比べて、4%と10%の大幅な改善が報告されている。 Whataboutism, a potent tool for disrupting narratives and sowing distrust, remains under-explored in quantitative NLP research. Moreover, past work has not distinguished its use as a strategy for misinformation and propaganda from its use as a tool for pragmatic and semantic framing. We introduce new datasets from Twitter and YouTube, revealing overlaps as well as distinctions between whataboutism, propaganda, and the tu quoque fallacy. Furthermore, drawing on recent work in linguistic semantics, we differentiate the `what about' lexical construct from whataboutism. Our experiments bring to light unique challenges in its accurate detection, prompting the introduction of a novel method using attention weights for negative sample mining. We report significant improvements of 4% and 10% over previous state-of-the-art methods in our Twitter and YouTube collections, respectively.	翻訳日:2024-11-09 04:43:41 公開日:2024-09-22
# Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, overall ? Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, Overall ? ( http://arxiv.org/abs/2402.11105v4 ) ライセンス: Link先を確認	Avimita Chatterjee, Swaroop Ghosh,	(参考訳) 量子誤り訂正符号(Quantum Error Correction Codes、QECC)は、ノイズやエラーの悪影響から量子状態を保護することにより、量子コンピューティングの進歩において重要なものである。既存のものの新しい開発や修正を含む様々なQECCの開発により、特定の条件に合わせて適切なQECCを選択することが重要である。 QECCの分野では大幅な改善があったが、それらを一貫した基準で評価するための統一的な方法論はいまだ解明されていない。このギャップに対処するため,本論文では,QECCの最初のベンチマークフレームワークを提案する。 8つの重要なQECCを評価し,その分析のために8つのパラメータからなる包括的スイートを提案する。提案手法は普遍的なベンチマーク手法を確立し,量子誤り訂正の複雑さを強調し,QECCの選択は各シナリオのユニークな要件と制限に依存することを示す。さらに、与えられたシナリオの特定の要求に適応するQECCを選択するための体系的な戦略を開発し、量子誤り訂正に対する調整されたアプローチを容易にする。さらに,ユーザが提供したシナリオの特徴を評価する新しいQECCレコメンデーションツールを導入し,各コードに対して達成可能な最大距離とともに,最も適度なQECCのスペクトルを推奨する。このツールは適応可能なように設計されており、新しいQECCを組み込んで、最小限の労力でパラメータを修正できる。 Quantum Error Correction Codes (QECCs) are pivotal in advancing quantum computing by protecting quantum states against the adverse effects of noise and errors. With a variety of QECCs developed, including new developments and modifications of existing ones, selecting an appropriate QECC tailored to specific conditions is crucial. Despite significant improvements in the field of QECCs, a unified methodology for evaluating them on a consistent basis has remained elusive. Addressing this gap, this paper presents the first benchmarking framework for QECCs, introducing a set of universal parameters. By evaluating eight prominent QECCs, we propose a comprehensive suite of eight parameters for their analysis. Our methodology establishes a universal benchmarking approach and highlights the complexity of quantum error correction, indicating that the choice of a QECC depends on the unique requirements and limitations of each scenario. Furthermore, we develop a systematic strategy for selecting QECCs that adapts to the specific requirements of a given scenario, facilitating a tailored approach to quantum error correction. Additionally, we introduce a novel QECC recommendation tool that assesses the characteristics of a given scenario provided by the user, subsequently recommending a spectrum of QECCs from most to least suitable, along with the maximum achievable distance for each code. This tool is designed to be adaptable, allowing for the inclusion of new QECCs and the modification of their parameters with minimal effort, ensuring its relevance in the evolving landscape of quantum computing.	翻訳日:2024-11-09 04:43:41 公開日:2024-09-22
# 超伝導ジョセフソン接合による第5番ゲージボソンの検出 Detecting a Fifth-Force Gauge Boson via Superconducting Josephson Junctions ( http://arxiv.org/abs/2402.14514v2 ) ライセンス: Link先を確認	Yu Cheng, Jie Sheng, Tsutomu T. Yanagida,	(参考訳) B-L=電荷を持つ粒子間の新しい第5の力は、標準モデルの興味深い$U(1)_{B-L}$拡張によって動機付けられる。ゲージボソンメディエーターのF\'eetonもダークマター候補として機能している。本稿では,超伝導ジョセフソン接合を用いた第5の力による量子位相差を検出するための新しい実験設計を提案する。この実験は、ゲージボソンがF'eetonダークマターにとって興味深い質量領域である0.01\,$eVから10\,$eVの範囲内にあるときに、ゲージカップリングに最も敏感である。これは、新しい物理をミリ以下の小さなスケールで測定するための新しい道を開く。 A new fifth force between particles carrying $B-L$ charges is well-motivated by the intriguing $U(1)_{B-L}$ extension of the standard model. The gauge boson mediator, F\'eeton, also serves as a dark matter candidate. In this letter, we propose a novel experimental design to detect the quantum phase difference caused by this fifth force using a superconducting Josephson junction. We find that the experiment has the best sensitivity to the gauge coupling when the gauge boson is within the mass range of $0.01\,$eV to $10\,$eV, which is an interesting mass region for the F\'eeton dark matter. This opens up a new avenue for the measurement of new physics at small scale below millimeter.	翻訳日:2024-11-09 04:32:42 公開日:2024-09-22
# マルチモーダルLDMとチェーン・オブ・ソート推論が相反する画像に出会うとき Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image ( http://arxiv.org/abs/2402.14899v3 ) ライセンス: Link先を確認	Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu,	(参考訳) テキストや画像の理解能力に優れたマルチモーダルLLM(MLLM)が注目されている。 MLLMを用いたより優れた推論を実現するために、CoT推論が広く研究され、中間的推論ステップを与えることでMLLMの説明可能性をさらに向上させる。 MLLMによるマルチモーダル推論の強い力にもかかわらず、最近の研究はMLLMがいまだに敵対的なイメージに悩まされていることを示している。 CoTはまた、MLLMの対角的堅牢性を強化しますか? CoTの中間的推論ステップは、敵対的攻撃にどのような意味があるのか? これらの質問に答えるために、我々はまず、CoTベースの推論に対する既存の攻撃を2つの主要なコンポーネント、すなわち理性と答えを攻撃することによって一般化する。 CoTは,マルチステップ推論プロセスを活用することで,既存の攻撃手法に対するMLLMの対角的堅牢性を向上させるが,実質的には向上しない。そこで本研究では,CoT推論過程をバイパスしながらモデルを攻撃する新たな攻撃手法を提案する。 3つのMLLMと2つの視覚的推論データセットによる実験により,提案手法の有効性が検証された。本研究は, 停止共振攻撃は, 誤認予測やベースライン攻撃の精度を著しく向上させる可能性があることを示す。 Multimodal LLMs (MLLMs) with a great ability of text and image understanding have received great attention. To achieve better reasoning with MLLMs, Chain-of-Thought (CoT) reasoning has been widely explored, which further promotes MLLMs' explainability by giving intermediate reasoning steps. Despite the strong power demonstrated by MLLMs in multimodal reasoning, recent studies show that MLLMs still suffer from adversarial images. This raises the following open questions: Does CoT also enhance the adversarial robustness of MLLMs? What do the intermediate reasoning steps of CoT entail under adversarial attacks? To answer these questions, we first generalize existing attacks to CoT-based inferences by attacking the two main components, i.e., rationale and answer. We find that CoT indeed improves MLLMs' adversarial robustness against the existing attack methods by leveraging the multi-step reasoning process, but not substantially. Based on our findings, we further propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process. Experiments on three MLLMs and two visual reasoning datasets verify the effectiveness of our proposed method. We show that stop-reasoning attack can result in misled predictions and outperform baseline attacks by a significant margin.	翻訳日:2024-11-09 04:32:42 公開日:2024-09-22
# SAFDNet: 完全スパース3Dオブジェクト検出のためのシンプルで効果的なネットワーク SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection ( http://arxiv.org/abs/2403.05817v3 ) ライセンス: Link先を確認	Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu,	(参考訳) LiDARベースの3Dオブジェクト検出は、自動運転において重要な役割を果たす。既存の高性能な3Dオブジェクト検出器は通常、バックボーンネットワークと予測ヘッドに密度の高い特徴マップを構築する。しかし、高密度特徴写像によって引き起こされる計算コストは、知覚範囲が大きくなるにつれて2次的に増大し、これらのモデルが長距離検出にスケールアップすることが困難になる。いくつかの最近の研究は、この問題を解決するために完全なスパース検出器を構築しようとしたが、結果として得られたモデルは複雑な多段パイプラインに依存するか、劣った性能を示すかのいずれかであった。本研究では,SAFDNetを提案する。SAFDNetは,完全スパースな3Dオブジェクト検出に適した,単純かつ高効率なアーキテクチャである。 SAFDNetでは、中心的特徴不足問題に対処するために適応的特徴拡散戦略が設計されている。 Waymo Open、nuScenes、Argoverse2データセットについて広範な実験を行った。 SAFDNetは、最初の2つのデータセットでは以前のSOTAよりもわずかに優れていたが、最後のデータセットでは、長距離検出を必要とするシナリオにおいて、SAFDNetの有効性を検証した。特にArgoverse2では、SAFDNetは以前の最高のハイブリッド検出器であるHEDNetを2.1倍高速で2.6%上回り、以前の最高のスパース検出器であるFSDv2よりも2.1%上回った。コードはhttps://github.com/zhanggang001/HEDNetで入手できる。 LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet.	翻訳日:2024-11-09 04:10:35 公開日:2024-09-22
# 多様なユーザシミュレーションによる非協調対話の戦略計画の改善 Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation ( http://arxiv.org/abs/2403.06769v3 ) ライセンス: Link先を確認	Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua,	(参考訳) 我々は,多様なユーザとの戦略的対話を期待する非協力的対話エージェントについて,システム目標に好意的に依存する相互合意を確保するために検討する。これは、既存の対話エージェントに2つの大きな課題をもたらす。 1) ユーザ固有の特徴を戦略的計画に組み込むことができないこと、及び 2)多様な利用者に一般化できる戦略プランナーの育成が困難である。これらの課題に対処するため,我々は,ユーザ対応戦略計画モジュールと人口ベーストレーニングパラダイムを取り入れた,適切な戦略計画の能力を高めるためのTripを提案する。協調的でない対話タスクのベンチマーク実験を通じて,多様なユーザを対象としたTripの有効性を実証した。 We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. Through experiments on benchmark non-collaborative dialogue tasks, we demonstrate the effectiveness of Trip in catering to diverse users.	翻訳日:2024-11-09 04:10:35 公開日:2024-09-22
# PoIFusion:関心点での核融合による多モード3次元物体検出 PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest ( http://arxiv.org/abs/2403.09212v2 ) ライセンス: Link先を確認	Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid,	(参考訳) 本稿では,RGB画像とLiDAR点雲の情報を興味ある点(PoI)に融合させる,概念的にシンプルで効果的なマルチモーダル3Dオブジェクト検出フレームワークPoIFusionを提案する。マルチセンサデータを統一的なビューに変換する,あるいは統合を容易にするグローバルアテンション機構を活用する,これまでで最も正確な方法とは違い,本手法は各モードのビューを維持し,計算に優しい投影と補間によりマルチモーダル特徴を得る。特に、私たちのPoIFusionは、クエリベースのオブジェクト検出のパラダイムに従い、オブジェクトクエリを動的3Dボックスとして定式化し、各クエリボックスに基づいてPoIのセットを生成します。 PoIは3Dオブジェクトを表すキーポイントとして機能し、マルチモーダル融合において基本ユニットの役割を担う。具体的には、PoIを各モードのビューに投影し、対応する特徴をサンプリングし、動的融合ブロックを介して各PoIのマルチモーダル特徴を統合する。さらに、同じクエリボックスから派生したPoIの機能を集約してクエリ機能を更新する。本手法は、ビュー変換による情報損失を防止し、計算集約的なグローバルな注目を排除し、マルチモーダル3Dオブジェクト検出器をより適用できるようにする。我々はnuScenesとArgoverse2データセットについて広範囲に実験を行い、我々のアプローチを評価した。注目すべきことに、提案手法は、ベルとホイッスルを使わずに両方のデータセットで最先端の結果を得る。 \emph{i.e.}, 74.9\% NDS and 73.4\% mAP on nuScenes, 31.6\% CDS and 40.6\% mAP on Argoverse2。コードは \url{https://djiajunustc.github.io/projects/poifusion} で公開される。 In this work, we present PoIFusion, a conceptually simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the points of interest (PoIs). Different from the most accurate methods to date that transform multi-sensor data into a unified view or leverage the global attention mechanism to facilitate fusion, our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and interpolation. In particular, our PoIFusion follows the paradigm of query-based object detection, formulating object queries as dynamic 3D boxes and generating a set of PoIs based on each query box. The PoIs serve as the keypoints to represent a 3D object and play the role of the basic units in multi-modal fusion. Specifically, we project PoIs into the view of each modality to sample the corresponding feature and integrate the multi-modal features at each PoI through a dynamic fusion block. Furthermore, the features of PoIs derived from the same query box are aggregated together to update the query feature. Our approach prevents information loss caused by view transformation and eliminates the computation-intensive global attention, making the multi-modal 3D object detector more applicable. We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach. Remarkably, the proposed approach achieves state-of-the-art results on both datasets without any bells and whistles, \emph{i.e.}, 74.9\% NDS and 73.4\% mAP on nuScenes, and 31.6\% CDS and 40.6\% mAP on Argoverse2. The code will be made available at \url{https://djiajunustc.github.io/projects/poifusion}.	翻訳日:2024-11-09 04:10:35 公開日:2024-09-22
# 憎しみの解読:憎しみのあるミームとそのターゲットを識別する Deciphering Hate: Identifying Hateful Memes and Their Targets ( http://arxiv.org/abs/2403.10829v2 ) ライセンス: Link先を確認	Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum,	(参考訳) インターネットミームは、個人がソーシャルメディア上で感情、思考、視点を表現するための強力な手段となっている。ユーモアやエンターテイメントの源と見なされることが多いが、ミームは個人やコミュニティをターゲットにしたヘイトフルなコンテンツを広めることもできる。既存の研究は、ベンガル語(バングラ語としても知られる)のような低リソース言語にまつわる独特な課題を見越して、高リソース言語のミームの負の側面に焦点を当てている。さらに、ベンガルのミームに関する以前の研究は、憎しみのあるミームを検出することに焦点を合わせてきたが、その対象物を検出するための研究は行われていない。このギャップを埋め、この領域での研究を促進するために、ベンガルのBHM(Bengali Hateful Memes)のための新しいマルチモーダルデータセットを導入する。データセットは、ベンガル語で書かれた7,148のミームと、2つのタスクに合わせたコードミキシングされたキャプションで構成されている。一憎しみのあるミームを検知し、 (二)対象とする社会団体(個人、組織、コミュニティ、社会)を検出すること。これらの課題を解決するために,メメから重要なモダリティ特徴を体系的に抽出し,その文脈をよりよく理解するためのモダリティ特化特徴と共同で評価するマルチモーダルディープニューラルネットワークであるDORA(Dual cO attention fRAmework)を提案する。我々の実験は、DORAが他の低リソースのヘイトフルミームデータセットで一般化可能であることを示し、最先端の競合するいくつかのベースラインを上回っている。 Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges associated with low-resource languages like Bengali (also known as Bangla). Furthermore, while previous work on Bengali memes has focused on detecting hateful memes, there has been no work on detecting their targeted entities. To bridge this gap and facilitate research in this arena, we introduce a novel multimodal dataset for Bengali, BHM (Bengali Hateful Memes). The dataset consists of 7,148 memes with Bengali as well as code-mixed captions, tailored for two tasks: (i) detecting hateful memes, and (ii) detecting the social entities they target (i.e., Individual, Organization, Community, and Society). To solve these tasks, we propose DORA (Dual cO attention fRAmework), a multimodal deep neural network that systematically extracts the significant modality features from the memes and jointly evaluates them with the modality-specific features to understand the context better. Our experiments show that DORA is generalizable on other low-resource hateful meme datasets and outperforms several state-of-the-art rivaling baselines.	翻訳日:2024-11-09 03:59:25 公開日:2024-09-22
# m&m's: マルチステップマルチモーダルタスクのためのツール利用評価ベンチマーク m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks ( http://arxiv.org/abs/2403.11085v4 ) ライセンス: Link先を確認	Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna,	(参考訳) 実世界のマルチモーダル問題は、単一の機械学習モデルではほとんど解決されず、しばしば複数のモデルを縫合する多段階の計算計画を必要とする。ツール拡張 LLM は、そのような計算計画の自動生成に非常に有望である。しかし、マルチステップマルチモーダルタスクのプランナーとしてLLMを評価するための標準ベンチマークが欠如していることは、プランナー設計決定の体系的な研究を妨げている。 LLMは、ひとつのショットで完全なプランを生成するべきか、ステップバイステップで生成すべきか? ツールを直接PythonコードやJSONのような構造化データフォーマットで呼び出すべきか? フィードバックは計画を改善するか? マルチモーダルモデル、(無料)パブリックAPI、画像処理モジュールを含む33のツールを含む4K以上のマルチモーダルタスクを含むベンチマーク。これら各タスククエリに対して、この現実的なツールセットを使用して自動生成されたプランを提供する。我々はさらに,人間による検証と正確な実行が可能な,1,565のタスクプランの高品質なサブセットを提供する。 m&mでは,2つの計画戦略(複数ステップ対ステップバイステッププランニング),2つの計画形式(JSON対コード),3種類のフィードバック(パーシング/検証/実行)を備えた10のLLMを評価した。最後に、我々の広範な実験の要点を要約する。私たちのデータセットとコードは、HuggingFace (https://huggingface.co/datasets/zixianma/mnms)とGithub (https://github.com/RAIVNLab/mnms)で利用可能です。 Real-world multi-modal problems are rarely solved by a single machine learning model, and often require multi-step computational plans that involve stitching several models. Tool-augmented LLMs hold tremendous promise for automating the generation of such computational plans. However, the lack of standardized benchmarks for evaluating LLMs as planners for multi-step multi-modal tasks has prevented a systematic study of planner design decisions. Should LLMs generate a full plan in a single shot or step-by-step? Should they invoke tools directly with Python code or through structured data formats like JSON? Does feedback improve planning? To answer these questions and more, we introduce m&m's: a benchmark containing 4K+ multi-step multi-modal tasks involving 33 tools that include multi-modal models, (free) public APIs, and image processing modules. For each of these task queries, we provide automatically generated plans using this realistic toolset. We further provide a high-quality subset of 1,565 task plans that are human-verified and correctly executable. With m&m's, we evaluate 10 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution). Finally, we summarize takeaways from our extensive experiments. Our dataset and code are available on HuggingFace (https://huggingface.co/datasets/zixianma/mnms) and Github (https://github.com/RAIVNLab/mnms).	翻訳日:2024-11-09 03:59:25 公開日:2024-09-22
# B-LoRAを用いたインプシットスタイル・コンテンツ分離 Implicit Style-Content Separation using B-LoRA ( http://arxiv.org/abs/2403.14572v2 ) ライセンス: Link先を確認	Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or,	(参考訳) イメージスタイリングは、画像の視覚的な外観とテクスチャ(スタイル)を操作しつつ、その基盤となるオブジェクト、構造、概念(コンテンツ)を保存することを含む。スタイルと内容の分離は、画像のスタイルをその内容から独立して操作するために不可欠であり、調和し、視覚的に喜ぶ結果を保証する。この分離を実現するには、画像の視覚的特徴と意味的特徴の両方を深く理解する必要がある。本稿では,LoRA(Low-Rank Adaptation)を利用して,画像のスタイルとコンテンツコンポーネントを暗黙的に分離し,画像スタイリング作業を容易にする手法であるB-LoRAを紹介する。 SDXLのアーキテクチャをLoRAと組み合わせて解析することにより、B-LoRAと呼ばれる2つのブロックのLoRA重みを共同で学習することで、各B-LoRAを個別に訓練することでは達成できないスタイル-コンテンツ分離を実現する。トレーニングを2ブロックに集約し、スタイルとコンテンツを分離することで、スタイル操作を大幅に改善し、モデル微調整に関連する過度な問題を克服できます。トレーニングが完了すると、2つのB-LoRAは独立したコンポーネントとして使用でき、画像スタイルの転送、テキストベースの画像スタイリング、一貫したスタイル生成、スタイル内容の混合など、様々な画像スタイリングタスクが可能である。 Image stylization involves manipulating the visual appearance and texture (style) of an image while preserving its underlying objects, structures, and concepts (content). The separation of style and content is essential for manipulating the image's style independently from its content, ensuring a harmonious and visually pleasing result. Achieving this separation requires a deep understanding of both the visual and semantic characteristics of images, often necessitating the training of specialized models or employing heavy optimization. In this paper, we introduce B-LoRA, a method that leverages LoRA (Low-Rank Adaptation) to implicitly separate the style and content components of a single image, facilitating various image stylization tasks. By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks (referred to as B-LoRAs) achieves style-content separation that cannot be achieved by training each B-LoRA independently. Consolidating the training into only two blocks and separating style and content allows for significantly improving style manipulation and overcoming overfitting issues often associated with model fine-tuning. Once trained, the two B-LoRAs can be used as independent components to allow various image stylization tasks, including image style transfer, text-based image stylization, consistent style generation, and style-content mixing.	翻訳日:2024-11-09 03:48:22 公開日:2024-09-22
# GeNet: グラフニューラルネットワークによるタスク指向セマンティック通信パラダイム GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm ( http://arxiv.org/abs/2403.18296v3 ) ライセンス: Link先を確認	Chunhang Zheng, Kechao Cai,	(参考訳) 意味コミュニケーションタスクに対する従来のアプローチは、チャネルノイズを軽減するためにSNR(Signal-to-Noise ratio)の知識に依存していた。さらに、これらの手法は特定のSNR条件下でのトレーニングを必要とし、かなりの時間と計算資源を必要とする。本稿では,ノイズ対策を目的とした意味コミュニケーションのためのグラフニューラルネットワーク(GNN)に基づくパラダイムであるGeNetを提案し,タスク指向通信(TOC)を容易にする。入力データイメージをグラフ構造に変換する新しい手法を提案する。そして、GNNベースのエンコーダを利用して、ソースデータから意味情報を抽出する。この抽出された意味情報はチャネルを介して送信される。受信側の最後には、GNNベースのデコーダを使用して、TOCのソースデータから関連する意味情報を再構成する。実験により,SNR依存性を疎結合化しながら,アンチノイズTOCにおけるGeNetの有効性を示す。さらに,ノード数を変えてGeNetの性能を評価し,その汎用性を意味コミュニケーションの新しいパラダイムとして明らかにした。さらに,GeNetの幾何変換に対する頑健さを,データ拡張に頼ることなく,異なる回転角度でテストすることで示す。 Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.	翻訳日:2024-11-09 03:37:10 公開日:2024-09-22
# SugarcaneNet2024:Sgarcane病分類のためのLASSO正規化事前訓練モデルの最適化された平均アンサンブルアプローチ SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification ( http://arxiv.org/abs/2403.18870v2 ) ライセンス: Link先を確認	Md. Simul Hasan Talukder, Sharmin Akter, Abdullah Hafez Nur, Mohammad Aljaidi, Rejwan Bin Sulaiman,	(参考訳) 世界の砂糖産業にとって重要な作物であるシュガーカインは、その収量と品質の両方にかなりの悪影響を及ぼすいくつかの病気の傾向にある。予防イニシアチブを効果的に管理し、実施するには、疾患を迅速かつ正確に検出する必要がある。本研究では,サトウキビ病を自動的にかつ迅速に検出するための従来の手法よりも優れたサトウキビNet2024というユニークなモデルを提案する。 InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception、ResNet152V2の7つのカスタマイズおよびLASSO正規化事前学習モデルの最適化された平均アンサンブルを集約した。当初、0.0001 LASSO正則化、30%のドロップアウト層、3つのバッチ正規化を加えた。この添加によりサトウキビ葉病分類の精度が大幅に向上した。その後、平均アンサンブルと個々のモデルの比較研究を行い、アンサンブルの手法がより良くなったことを示唆した。すべての改良された事前訓練されたモデルの平均アンサンブルは、それぞれ100%、99%、99%、99.45%のスコア、精度、リコール、精度で優れた結果をもたらした。グリッドサーチを組み込んだ最適化された平均アンサンブル手法の実装により、さらに性能が向上した。この最適化されたサトウキビNet2024モデルは、精度、精度、リコール、F1スコアの99.67%、100%、100%、100%を達成し、サトウキビ病の診断に最善を尽くした。 Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf disease classification was greatly increased by this addition. Following this, several comparative studies between the average ensemble and individual models were carried out, indicating that the ensemble technique performed better. The average ensemble of all modified pre-trained models produced outstanding outcomes: 100%, 99%, 99%, and 99.45% for f1 score, precision, recall, and accuracy, respectively. Performance was further enhanced by the implementation of an optimized weighted average ensemble technique incorporated with grid search. This optimized sugarcaneNet2024 model performed the best for detecting sugarcane diseases, having achieved accuracy, precision, recall, and F1 score of 99.67%, 100%, 100%, and 100% , respectively.	翻訳日:2024-11-09 03:37:10 公開日:2024-09-22
# ハンドオブジェクト接触セマンティックマッピングによるクラッタ環境における多指ロボットハンドグラッピング Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping ( http://arxiv.org/abs/2404.08844v2 ) ライセンス: Link先を確認	Lei Zhang, Kaixin Bai, Guowen Huang, Zhenshan Bing, Zhaopeng Chen, Alois Knoll, Jianwei Zhang,	(参考訳) 深層学習モデルには,多指ハンドグリップのための巧妙な操作技術が著しく進歩している。しかし, 乱雑な環境下での接触情報誘導の把握は, いまだに過小評価されている。このギャップに対処するため,接触セマンティックマップを用いて,乱雑な環境下でのマルチフィンガーハンドグリップサンプルを生成する手法を開発した。オブジェクトポイントクラウドから総合的な接触セマンティックマップを作成するための接触セマンティック条件変分オートエンコーダネットワーク(CoSe-CVAE)を導入する。接触セマンティックマップから手つかみポーズを推定するために把握検出法を利用する。最後に, 把握品質と衝突確率を定量的に評価する統合的把握評価モデルを構築し, 散在シナリオにおける最適把握の信頼性を著しく向上する。実世界の単一物体環境における把握成功率の平均は81.0%、散在するシーンでの把握成功率は75.3%である。また,マルチモーダルなマルチフィンガーグリップデータセット生成手法を提案する。マルチフィンガーハンドグルーピングデータセットは、シーンの多様性、モダリティの多様性において、過去のデータセットよりも優れています。データセット、コード、補足資料はhttps://sites.google.com/view/ffh-cluttered-graspingで見ることができる。 The deep learning models has significantly advanced dexterous manipulation techniques for multi-fingered hand grasping. However, the contact information-guided grasping in cluttered environments remains largely underexplored. To address this gap, we have developed a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map. We introduce a contact semantic conditional variational autoencoder network (CoSe-CVAE) for creating comprehensive contact semantic map from object point cloud. We utilize grasp detection method to estimate hand grasp poses from the contact semantic map. Finally, an unified grasp evaluation model is designed to assess grasp quality and collision probability, substantially improving the reliability of identifying optimal grasps in cluttered scenarios. Our grasp generation method has demonstrated remarkable success, outperforming state-of-the-art methods by at least 4.65% with 81.0% average grasping success rate in real-world single-object environment and 75.3% grasping success rate in cluttered scenes. We also proposed the multi-modal multi-fingered grasping dataset generation method. Our multi-fingered hand grasping dataset outperforms previous datasets in scene diversity, modality diversity. The dataset, code and supplementary materials can be found at https://sites.google.com/view/ffh-cluttered-grasping.	翻訳日:2024-11-09 03:14:33 公開日:2024-09-22
# 拡散モデルを用いた頑健な深度推定のためのコントラスト学習 Digging into contrastive learning for robust depth estimation with diffusion models ( http://arxiv.org/abs/2404.09831v4 ) ライセンス: Link先を確認	Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao,	(参考訳) 近年, 拡散型深度推定法は, エレガントなデノナイジングパターンと有望な性能により, 広く注目を集めている。しかし、雨や雪などの現実のシナリオでよく見られる悪条件下では、信頼できないのが普通である。本稿では,複雑な環境における性能劣化を軽減するために,拡散モデルに適した独自のコントラスト学習モードを備えた,D4RDと呼ばれる新しい頑健な深度推定手法を提案する。具体的には、知識蒸留の強みを対照的な学習に統合し、「真性」の対照的なスキームを構築する。このスキームは前方拡散過程のサンプルノイズを自然参照として利用し、様々な場面で予測されたノイズをより安定かつ正確な最適化に向けて導く。さらに、より汎用的な特徴や画像レベルを包含する雑音レベルトリニティを拡張し、マルチレベルコントラストを確立し、ネットワーク全体にわたって頑健な知覚の重荷を分散する。複雑なシナリオに対処する前に、3つの単純かつ効果的な改善によりベースライン拡散モデルの安定性を高め、収束を容易にし、奥行きの外れを除去する。大規模な実験により、D4RDは、合成汚職データセットや現実世界の気象条件に関する既存の最先端のソリューションを超越していることが示された。ソースコードとデータは \url{https://github.com/wangjiyuan9/D4RD} で公開されている。 Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.	翻訳日:2024-11-09 03:14:33 公開日:2024-09-22
# ViViDex:人間のビデオから視覚に基づく有害な操作を学習する ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos ( http://arxiv.org/abs/2404.15709v2 ) ライセンス: Link先を確認	Zerui Chen, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid,	(参考訳) 本研究は,多指ロボットによる多様なポーズでさまざまな物体を操作するための統一的な視覚ベースのポリシーを学習することを目的としている。以前の研究は、政策学習にヒューマンビデオを使うことの利点を示してきたが、推定軌跡のノイズによって性能向上は制限されてきた。さらに、接地木オブジェクトのような特権オブジェクト情報への依存は、現実的なシナリオにおける適用性をさらに制限する。これらの制約に対処するため、人間のビデオから視覚に基づくポリシー学習を改善するための新しいフレームワークViViDexを提案する。最初は、強化学習と軌道誘導報酬を使って、各ビデオのステートベースのポリシーを訓練し、ビデオから視覚的に自然と身体的にもっともらしい軌跡の両方を得る。次に、州ベースのポリシーから成功したエピソードをロールアウトし、特権情報を使用しずに統一された視覚ポリシーをトレーニングします。本稿では,視覚的視点のクラウド表現をさらに強化するコーディネート変換を提案し,視覚的ポリシートレーニングのための行動クローニングと拡散ポリシーを比較した。シミュレーションと実際のロボットの両方の実験では、ViViDexは3つの巧妙な操作タスクにおける最先端のアプローチよりも優れていることが示されている。 In this work, we aim to learn a unified vision-based policy for multi-fingered robot hands to manipulate a variety of objects in diverse poses. Though prior work has shown benefits of using human videos for policy learning, performance gains have been limited by the noise in estimated trajectories. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. We propose coordinate transformation to further enhance the visual point cloud representation, and compare behavior cloning and diffusion policy for the visual policy training. Experiments both in simulation and on the real robot demonstrate that ViViDex outperforms state-of-the-art approaches on three dexterous manipulation tasks.	翻訳日:2024-11-09 03:03:34 公開日:2024-09-22
# 半データと400倍の計算量を持つ高性能網膜基礎モデルの訓練 Training a high-performance retinal foundation model with half-the-data and 400 times less compute ( http://arxiv.org/abs/2405.00117v2 ) ライセンス: Link先を確認	Justin Engelmann, Miguel O. Bernabeu,	(参考訳) 医学における人工知能は、伝統的に大規模なトレーニングデータセットの欠如によって制限されている。ファンデーションモデルは、小さなデータセットで下流タスクに適応できる事前訓練されたモデルであり、この問題を軽減する可能性がある。ムーアフィールドズアイ病院(MEH)の研究者たちは、90万枚の画像でトレーニングされた網膜基盤モデルであるRETFound-MEHを提案した。最近、データ効率のよいDERETFoundが提案され、わずか15万の公開画像でトレーニングされている。しかし、これら2つのモデルは、当初トレーニングするために非常に重要なリソースを必要とし、下流での使用にリソースが集中していた。本稿では,75,000枚しか公開されていない画像と400倍の計算量でトレーニングされた網膜基盤モデルであるRETFound-Greenのトレーニングに使用する,新しいToken Restructionの目標を提案する。本稿では,RETFound-MEH と DERETFound のトレーニング費用をそれぞれ10,000 ドル,および 14,000 ドルと見積もる。 RETFound-Greenは100ドル以下でトレーニングできる。ダウンロード速度は14倍、ベクトル埋め込みは2.7倍、ストレージ容量は2.6倍である。それにもかかわらず、RETFound-Greenは体系的に悪いパフォーマンスをしない。実際、ブラジル、インド、中国の3つの下流データセットのさまざまなタスクにおいて、119の比較のうち68のタスクで最善を尽くし、DERETFoundでは21、RETFound-MEHでは13である。以上の結果から,RETFound-Greenは非常に効率的で高性能な網膜基盤モデルであることが示唆された。われわれは、Token Restructionの目的を、さらに高いパフォーマンスのためにスケールアップし、網膜画像以外の他の領域にも適用できることを期待している。 Artificial Intelligence in medicine is traditionally limited by the lack of massive training datasets. Foundation models, pre-trained models that can be adapted to downstream tasks with small datasets, could alleviate this problem. Researchers at Moorfields Eye Hospital (MEH) proposed RETFound-MEH, a retinal foundation model trained on 900,000 images, including private hospital data. Recently, data-efficient DERETFound was proposed providing comparable performance while being trained on only 150,000 publicly available images. However, both these models required very substantial resources to train initially and are resource-intensive in downstream use. We propose a novel Token Reconstruction objective that we use to train RETFound-Green, a retinal foundation model trained using only 75,000 publicly available images and 400 times less compute. We estimate the cost of training RETFound-MEH and DERETFound at \$10,000 and \$14,000, respectively. RETFound-Green could be trained for less than \$100, with equally reduced environmental impact. RETFound-Green is also far more efficient in downstream use: it can be downloaded 14 times faster, computes vector embeddings 2.7 times faster which then require 2.6 times less storage space. Despite this, RETFound-Green does not perform systematically worse. In fact, on various task on three downstream datasets from Brazil, India and China, it performs best on 68 tasks out of 119 comparisons, versus 21 for DERETFound and 13 for RETFound-MEH. Our results suggest that RETFound-Green is a very efficient, high-performance retinal foundation model. We anticipate that our Token Reconstruction objective could be scaled up for even higher performance and be applied to other domains beyond retinal imaging.	翻訳日:2024-11-09 02:52:30 公開日:2024-09-22
# UniGen: ゼロショットデータセット生成による感覚分類のためのユニバーサルドメインの一般化 UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation ( http://arxiv.org/abs/2405.01022v3 ) ライセンス: Link先を確認	Juhwan Choi, Yeonghwa Kim, Seunguk Yu, JungMin Yun, YoungBin Kim,	(参考訳) 事前学習された言語モデルは、プロンプトベースの数発の学習で非常に柔軟性と汎用性を示してきたが、広いパラメータサイズと推論の適用性に悩まされている。近年の研究では、PLMをデータセットジェネレータとして使用し、効率的な推論を実現するために、タスク固有の小さなモデルを訓練することが示唆されている。しかし、ドメイン固有のデータセットを生成する傾向があるため、さまざまなドメインへの適用性は制限されている。本研究では,対象領域によらずデータセットを生成する普遍的領域一般化に対する新しいアプローチを提案する。これにより、ラベル空間を共有する任意のドメインに小さなタスクモデルを一般化することができ、データセット生成パラダイムの現実的な適用性を高めることができる。提案手法は, PLM よりも桁違いの小さいパラメータ集合を用いて, 各領域にまたがる一般化性を実現する。 Although pre-trained language models have exhibited great flexibility and versatility with prompt-based few-shot learning, they suffer from the extensive parameter size and limited applicability for inference. Recent studies have suggested that PLMs be used as dataset generators and a tiny task-specific model be trained to achieve efficient inference. However, their applicability to various domains is limited because they tend to generate domain-specific datasets. In this work, we propose a novel approach to universal domain generalization that generates a dataset regardless of the target domain. This allows for generalization of the tiny task model to any domain that shares the label space, thus enhancing the real-world applicability of the dataset generation paradigm. Our experiments indicate that the proposed method accomplishes generalizability across various domains while using a parameter set that is orders of magnitude smaller than PLMs.	翻訳日:2024-11-09 02:52:29 公開日:2024-09-22
# EconLogicQA: 経済シーケンス推論における大規模言語モデル評価のための質問応答ベンチマーク EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning ( http://arxiv.org/abs/2405.07938v2 ) ライセンス: Link先を確認	Yinzhu Quan, Zefang Liu,	(参考訳) 本稿では,経済,ビジネス,サプライチェーン管理の複雑な領域において,大規模言語モデル(LLM)の逐次推論能力を評価するための厳密なベンチマークであるEconLogicQAを紹介する。 EconLogicQAは、後続のイベントを個別に予測する従来のベンチマークとは違い、複数の相互接続されたイベントを識別してシーケンスする必要があるため、経済論理の複雑さを捉える必要がある。 EconLogicQAは、時間的および論理的事象の関係に関する洞察に富んだ理解を必要とする、経済的な記事から派生した多段階シナリオで構成されている。 EconLogicQAは、包括的な評価を通じて、経済的な文脈に固有のシーケンシャルな複雑さをナビゲートするLLMの習熟度を効果的に評価することを示した。本稿では,EconLogicQAデータセットの詳細な説明と,各種先進LLMのベンチマーク評価結果について述べる。ベンチマークデータセットはhttps://huggingface.co/datasets/yinzhu-quan/econ_logic_qaで公開されています。 In this paper, we introduce EconLogicQA, a rigorous benchmark designed to assess the sequential reasoning capabilities of large language models (LLMs) within the intricate realms of economics, business, and supply chain management. Diverging from traditional benchmarks that predict subsequent events individually, EconLogicQA poses a more challenging task: it requires models to discern and sequence multiple interconnected events, capturing the complexity of economic logics. EconLogicQA comprises an array of multi-event scenarios derived from economic articles, which necessitate an insightful understanding of both temporal and logical event relationships. Through comprehensive evaluations, we exhibit that EconLogicQA effectively gauges a LLM's proficiency in navigating the sequential complexities inherent in economic contexts. We provide a detailed description of EconLogicQA dataset and shows the outcomes from evaluating the benchmark across various leading-edge LLMs, thereby offering a thorough perspective on their sequential reasoning potential in economic contexts. Our benchmark dataset is available at https://huggingface.co/datasets/yinzhu-quan/econ_logic_qa.	翻訳日:2024-11-09 02:30:11 公開日:2024-09-22
# 映像オブジェクトセグメンテーション参照のための時間アウェア適応を用いたハラスティング視覚言語事前学習モデル Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation ( http://arxiv.org/abs/2405.10610v2 ) ライセンス: Link先を確認	Zikun Zhou, Wentao Xiong, Li Zhou, Xin Li, Zhenyu He, Yaowei Wang,	(参考訳) Referring Video Object Segmentation (RVOS) の要点は、抽象言語概念とピクセルレベルでの動的視覚的内容とを関連付けるために、密集したテキストとビデオの関係をモデル化することにある。現在のRVOSメソッドは一般的に、バックボーンとして独立して事前訓練された視覚と言語モデルを使用する。画像とテキストは結合しない特徴空間にマッピングされるため、視覚・言語関係モデリング(VL)をスクラッチから学習する難しい課題に直面します。 VLP(Vision-Language Pretrained)モデルの成功に気付き、協調したVL特徴空間に基づいてRVOSの関連モデリングを学ぶことを提案する。それでも、VLPモデルをRVOSに転送するのは、事前訓練タスク(静的画像/領域レベルの予測)とRVOSタスク(動的ピクセルレベルの予測)の間にかなりのギャップがあるため、非常に難しい作業である。この移行問題に対処するため,時相適応によりRVOSのVLPモデルを利用するVLP-RVOSというフレームワークを導入する。まず、画素レベルの予測のために事前訓練された表現を適応させるだけでなく、視覚エンコーダを時間文脈のモデル化に活用する時間適応型プロンプトチューニング手法を提案する。さらに、頑健な時空間推論のための立方体フレームアテンション機構をカスタマイズする。さらに,包括的VL理解のための特徴抽出における多段階VL関係モデリングを提案する。大規模な実験により,本手法は最先端のアルゴリズムに対して良好に動作し,強力な一般化能力を示すことが示された。 The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pretrained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language (VL) relation modeling from scratch. Witnessing the success of Vision-Language Pretrained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pretraining task (static image/region-level prediction) and the RVOS task (dynamic pixel-level prediction). To address this transfer challenge, we introduce a framework named VLP-RVOS which harnesses VLP models for RVOS through temporal-aware adaptation. We first propose a temporal-aware prompt-tuning method, which not only adapts pretrained representations for pixel-level prediction but also empowers the vision encoder to model temporal contexts. We further customize a cube-frame attention mechanism for robust spatial-temporal reasoning. Besides, we propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Extensive experiments demonstrate that our method performs favorably against state-of-the-art algorithms and exhibits strong generalization abilities.	翻訳日:2024-11-09 02:30:11 公開日:2024-09-22
# データアロケーションによる選択的アノテーション:これらのデータはモデルではなくアノテーションのために専門家にトリアージされるべきである Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model ( http://arxiv.org/abs/2405.12081v2 ) ライセンス: Link先を確認	Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan,	(参考訳) 限られた予算下で高品質なアノテーションを得るために、半自動アノテーション法が一般的に用いられ、データの一部を専門家によって注釈付けされ、残りのデータに対するアノテーションを完成させるためにモデルが訓練される。しかしながら、これらの手法は主に、モデル予測能力(トリアージ・トゥ・ヒューマン・データ)を改善するために専門家アノテーションのための情報的データを選択することに焦点を当て、残りのデータはモデルアノテーション(トリアージ・トゥ・モデル・データ)に無差別に割り当てられている。これはアノテーションの予算配分の非効率につながる可能性がある。モデルが正確にアノテートできる簡単なデータは専門家に不要に割り当てられる可能性があるし、ハードデータはモデルによって誤って分類される可能性があるからだ。その結果、全体的なアノテーションの品質が損なわれる可能性がある。この問題に対処するため、我々はSANTと呼ばれる選択的なアノテーションフレームワークを提案する。提案した誤り認識トリアージと二重み付け機構により、トリアージ・ツー・ヒューマンデータとトリアージ・ツー・モデルデータの両方を効果的に活用する。そのため、情報的あるいはハードなデータは専門家にアノテーションとして割り当てられ、簡単なデータはモデルによって処理される。実験の結果、SANTは他のベースラインを一貫して上回り、専門家とモデルワーカーの両方にデータの適切な割り当てを通じて高品質なアノテーションをもたらすことが示された。我々は、予算制約の中でデータアノテーションに関する先駆的な研究を行い、将来のトリアージベースのアノテーション研究のランドマークを確立します。 To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.	翻訳日:2024-11-09 02:30:11 公開日:2024-09-22
# テキストフリーマルチドメイングラフ事前学習:グラフ基礎モデルに向けて Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models ( http://arxiv.org/abs/2405.13934v4 ) ライセンス: Link先を確認	Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang,	(参考訳) さまざまな領域にまたがる幅広いグラフデータに基づいてグラフ基盤モデルをトレーニングすることは可能ですか? この目標への大きなハードルは、異なる領域のグラフがしばしば非常に異なる特性を示すという事実にある。事前トレーニングのためのマルチドメイングラフの統合には、最初はいくつかの取り組みがあったが、主にグラフを整列させるためにテキスト記述に依存しており、そのアプリケーションはテキスト対応グラフに制限されている。さらに、異なるソースドメインが互いに衝突したり干渉したりし、ターゲットドメインとの関係は著しく変化する。これらの問題に対処するため,MDGPTというテキストフリーなマルチドメイングラフ事前学習・適応フレームワークを提案する。まず、シナジスティックな事前学習のために、ソースドメインにまたがる機能を調整するために、一連のドメイントークンを提案する。第2に、統一的なプロンプトと混合プロンプトからなる二重プロンプトを提案し、統合されたマルチドメイン知識とドメイン固有の知識の調整された混合により、ターゲットドメインをさらに適応させる。最後に、6つの公開データセットによる広範な実験を行い、MDGPTを評価し分析する。 Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, they primarily rely on textual descriptions to align the graphs, limiting their application to text-attributed graphs. Moreover, different source domains may conflict or interfere with each other, and their relevance to the target domain can vary significantly. To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning. First, we propose a set of domain tokens to to align features across source domains for synergistic pre-training. Second, we propose a dual prompts, consisting of a unifying prompt and a mixing prompt, to further adapt the target domain with unified multi-domain knowledge and a tailored mixture of domain-specific knowledge. Finally, we conduct extensive experiments involving six public datasets to evaluate and analyze MDGPT, which outperforms prior art by up to 37.9%.	翻訳日:2024-11-09 02:18:45 公開日:2024-09-22
# SF-DQN:Deep Reinforcement Learningのための継承機能を用いた確率的知識伝達 SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning ( http://arxiv.org/abs/2405.15920v2 ) ライセンス: Link先を確認	Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang,	(参考訳) 本稿では、複数のRL問題が異なる報酬関数を持つが、基礎となる遷移力学を共有する転写強化学習(RL)問題を考察する。この設定では、各RL問題(タスク)のQ-関数を後継特徴(SF)と報酬マッピング(前者は遷移ダイナミクスを、後者はタスク固有報酬関数を特徴付ける)に分解することができる。このQ関数分解は、一般化政策改善(GPI)と呼ばれる政策改善演算子と組み合わせて、最適なQ関数を見つける際のサンプルの複雑さを低減し、SF \& GPIフレームワークは、Q学習のような従来のRL手法と比較して有望な経験的性能を示す。しかし、その理論的基盤は、特に深層ニューラルネットワーク(SF-DQN)を用いて後継機能を学ぶ際には、ほとんど確立されていない。本稿では,移動RL問題におけるSFs-DQNを用いた証明可能な知識伝達について検討する。 GPIを用いたSF-DQNの証明可能な一般化保証を用いた最初の収束解析を確立する。この理論は、GPI を持つ SF-DQN が、より高速な収束率とより優れた一般化の両面から、ディープQ-ネットワークのような従来の RL アプローチより優れていることを明らかにしている。実および合成RLタスクの数値実験により, SF-DQN \& GPIの優れた性能が得られた。 This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.	翻訳日:2024-11-09 02:07:29 公開日:2024-09-22
# XRec: 説明可能な推奨のための大規模言語モデル XRec: Large Language Models for Explainable Recommendation ( http://arxiv.org/abs/2406.02377v2 ) ライセンス: Link先を確認	Qiyao Ma, Xubin Ren, Chao Huang,	(参考訳) リコメンダシステムは、ユーザが好みに合わせてパーソナライズされたレコメンデーションを提供することによって、情報の過負荷をナビゲートするのに役立つ。協調フィルタリング(CF)は広く採用されているアプローチであるが、グラフニューラルネットワーク(GNN)や自己教師付き学習(SSL)といった高度な技術は、より良いユーザ表現のためにCFモデルを拡張しているが、推奨項目の説明を提供する能力に欠けることが多い。説明可能なレコメンデーションは、レコメンデーション決定プロセスに対する透明性と洞察を提供することで、ユーザの理解を深めることによって、このギャップに対処することを目的としている。この作業は、LLM(Large Language Models)の言語機能を活用して、説明可能なレコメンデータシステムのバウンダリを押し上げる。我々は、LLMがレコメンデーションシステムにおけるユーザの振る舞いを包括的に説明できるXRecというモデルに依存しないフレームワークを紹介した。協調的な信号の統合と軽量な協調的適応器の設計により、このフレームワークはLLMにユーザとイテムのインタラクションにおける複雑なパターンを理解し、ユーザの好みをより深く理解する権限を与える。我々はXRecの有効性を実証し、説明可能なレコメンデーションシステムにおけるベースラインアプローチよりも優れた、包括的で意味のある説明を生成する能力を示した。私たちはモデル実装をhttps://github.com/HKUDS/XRec.comでオープンソース化しました。 Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanations for the recommended items. Explainable recommendations aim to address this gap by offering transparency and insights into the recommendation decision-making process, enhancing users' understanding. This work leverages the language capabilities of Large Language Models (LLMs) to push the boundaries of explainable recommender systems. We introduce a model-agnostic framework called XRec, which enables LLMs to provide comprehensive explanations for user behaviors in recommender systems. By integrating collaborative signals and designing a lightweight collaborative adaptor, the framework empowers LLMs to understand complex patterns in user-item interactions and gain a deeper understanding of user preferences. Our extensive experiments demonstrate the effectiveness of XRec, showcasing its ability to generate comprehensive and meaningful explanations that outperform baseline approaches in explainable recommender systems. We open-source our model implementation at https://github.com/HKUDS/XRec.	翻訳日:2024-11-09 01:56:09 公開日:2024-09-22
# 任意下流予測タスクのためのフェアネス最適化合成EHR生成 Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks ( http://arxiv.org/abs/2406.02510v2 ) ライセンス: Link先を確認	Mirza Farhan Bin Tarek, Raphael Poulain, Rahmatollah Beheshti,	(参考訳) 医療アプリケーションのためのAIツールの責任ある設計を保証するためのさまざまな側面の中で、公平性に関する懸念に対処することが、重要な焦点となっている。具体的には、電子健康記録(EHR)データの普及と、幅広い臨床的意思決定支援タスクを通知する大きな可能性を考慮し、このカテゴリの健康AIツールの公平性を向上させることが重要である。このような広範な問題(EHRベースのAIモデルにおける公平性を緩和する)は、様々な手法を用いて取り組まれてきたが、タスクやモデルに依存しない手法は顕著に稀である。本研究では,実データと合成されたERHデータを生成するパイプラインを新たに提示し,実データと組み合わせることで,下流タスクにおける公平性(エンドユーザが定義する)の懸念を軽減することを目的とした。下流タスクと2つの異なるEHRデータセットにまたがるパイプラインの有効性を実証する。提案したパイプラインは、ダウンストリームモデルの設計を変更するような、健康なAIアプリケーションにおける公平性に対処する既存のツールボックスに、広く適用可能な補完ツールを追加することができる。プロジェクトのコードベースはhttps://github.com/healthylaife/FairSynthで公開されています。 Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth	翻訳日:2024-11-09 01:56:09 公開日:2024-09-22
# DICE:数学推論のためのLDMの微調整相における分布内汚染の検出 DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning ( http://arxiv.org/abs/2406.04197v2 ) ライセンス: Link先を確認	Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li,	(参考訳) 大規模言語モデル(LLM)の進歩は、公開ベンチマークによる評価に依存するが、データ汚染は過大評価パフォーマンスをもたらす可能性がある。従来の研究は、トレーニング中にモデルが全く同じデータを見たかどうかを判断することで汚染を検出することに重点を置いていた。さらに、以前の研究では、ベンチマークデータに類似したデータに対するトレーニングでさえ、パフォーマンス、すなわち \emph{In-distribution contamination} を膨らませていることがすでに示されている。本研究では, 分散汚染がOODベンチマークの性能低下につながることを論じる。そこで本研究では,LSMの内部状態を利用して汚染を検出・検出する新しい手法であるDICEを提案する。 DICEはまず汚染に対して最も敏感な層を特定し、その層の内部状態に基づいて分類器を訓練する。実験により、DICEは様々なLSMと数学推論データセットをまたいだ分布内汚染を検出するのに高い精度を示している。また、類似した分布を持つ複数のベンチマーク間で汚染を検出することができる訓練されたDICE検出器の一般化能力を示す。さらに、DICEの予測は、私たちまたは他の組織によって微調整されたLLMの性能と相関し、0.61から0.75の判定係数(R^2$)を達成する。コードとデータはhttps://github.com/THU-KEG/DICE.comで公開されている。 The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. Besides, prior work has already shown that even training on data similar to benchmark data inflates performance, namely \emph{In-distribution contamination}. In this work, we argue that in-distribution contamination can lead to the performance drop on OOD benchmarks. To effectively detect in-distribution contamination, we propose DICE, a novel method that leverages the internal states of LLMs to locate-then-detect the contamination. DICE first identifies the most sensitive layer to contamination, then trains a classifier based on the internal states of that layer. Experiments reveal DICE's high accuracy in detecting in-distribution contamination across various LLMs and math reasoning datasets. We also show the generalization capability of the trained DICE detector, which is able to detect contamination across multiple benchmarks with similar distributions. Additionally, we find that DICE's predictions correlate with the performance of LLMs fine-tuned by either us or other organizations, achieving a coefficient of determination ($R^2$) between 0.61 and 0.75. The code and data are available at https://github.com/THU-KEG/DICE.	翻訳日:2024-11-09 01:44:51 公開日:2024-09-22
# 時系列モデルに対する会員推測攻撃 Membership Inference Attacks Against Time-Series Models ( http://arxiv.org/abs/2407.02870v2 ) ライセンス: Link先を確認	Noam Koren, Abigail Goldsteen, Guy Amit, Ariel Farkash,	(参考訳) 個人情報、特に医療分野での時系列データを分析すると、深刻なプライバシー上の懸念が浮かび上がっている。患者からの敏感な健康データは、診断と継続的なケアのための機械学習モデルのトレーニングにしばしば使用される。このようなモデルのプライバシリスクを評価することは、プロダクションでモデルを使用するか、サードパーティと共有するかに関して、知識に富んだ決定を行う上で極めて重要です。メンバーシップ推論攻撃(MIA)はこの種の評価の鍵となる手法であるが、時系列予測モデルは、この文脈では十分に研究されていない。時系列モデルにおける既存のMIA技術について検討し、データの季節性やトレンドに焦点をあてた新機能を紹介する。季節性は多変量フーリエ変換を用いて推定され、低次多項式を用いて傾向を近似する。健康領域のデータセットを用いて,これらの手法を各種時系列モデルに適用した。以上の結果から,これらの新機能はMIAの識別における有効性を高め,医療データアプリケーションにおけるプライバシリスクの理解を向上させることが示唆された。 Analyzing time-series data that contains personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train machine learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production or share it with third parties. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications.	翻訳日:2024-11-09 00:59:29 公開日:2024-09-22
# ゲノム言語モデル:機会と課題 Genomic Language Models: Opportunities and Challenges ( http://arxiv.org/abs/2407.11435v2 ) ライセンス: Link先を確認	Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song,	(参考訳) 大規模言語モデル(LLM)は、幅広い科学分野、特に生物医学分野において、変革的な影響を及ぼしている。自然言語処理の目的が単語の列を理解することにあるように、生物学の主要な目的は生物学的列を理解することである。ゲノム言語モデル(gLM)は、DNA配列に基づいて訓練されたLLMであり、ゲノムの理解を深め、様々なスケールのDNA要素がどのように相互作用して複雑な機能を引き起こすかを示す可能性がある。この可能性を示すために,機能制約予測,シーケンス設計,伝達学習など,gLMの重要応用を強調した。しかし、最近の顕著な進歩にもかかわらず、効率的かつ効率的なgLMの開発は、特に大型で複雑なゲノムを持つ種に対して多くの課題を呈している。本稿では,gLMの開発と評価について論じる。 Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.	翻訳日:2024-11-08 21:10:26 公開日:2024-09-22
# 拡散モデルにおけるブレンド概念 How to Blend Concepts in Diffusion Models ( http://arxiv.org/abs/2407.14280v2 ) ライセンス: Link先を確認	Lorenzo Olearo, Giorgio Longari, Simone Melzi, Alessandro Raganato, Rafael Peñaloza,	(参考訳) 過去10年間、多次元(ラテント)空間を使って概念を表現しようとする動きがあったが、それでもこれらの概念や理由をどう操作するかは明らかになっていない。最近の手法では複数の潜在表現とその関連性を利用しており、この研究はさらに絡み合っている。我々のゴールは、潜在空間における操作が根底にある概念にどのように影響するかを理解することです。そこで本研究では,拡散モデルを用いた概念ブレンディングの課題について検討する。拡散モデルは、テキストプロンプトの潜時表現と画像再構成と生成を可能にする潜時空間との間の接続に基づいている。このタスクにより、異なるテキストベースの組み合わせ戦略を試すことができ、視覚分析により容易に評価できる。我々の結論は、宇宙操作によるブレンドの概念は可能であるが、最良の戦略はブレンドの文脈に依存する。 For the last decade, there has been a push to use multi-dimensional (latent) spaces to represent concepts; and yet how to manipulate these concepts or reason with them remains largely unclear. Some recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. To that end, we explore the task of concept blending through diffusion models. Diffusion models are based on a connection between a latent representation of textual prompts and a latent space that enables image reconstruction and generation. This task allows us to try different text-based combination strategies, and evaluate easily through a visual analysis. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.	翻訳日:2024-11-08 19:38:31 公開日:2024-09-22
# グローバルな構造-動きからの再考 Global Structure-from-Motion Revisited ( http://arxiv.org/abs/2407.20219v2 ) ライセンス: Link先を確認	Linfei Pan, Dániel Baráth, Marc Pollefeys, Johannes L. Schönberger,	(参考訳) 画像から3D構造とカメラの動きを復元することは、コンピュータビジョン研究の長年の焦点であり、Structure-from-Motion (SfM)として知られている。この問題に対する解決策は、漸進的およびグローバルなアプローチに分類される。これまでのところ、最もポピュラーなシステムは精度と堅牢性のために漸進的なパラダイムを踏襲しているが、グローバルなアプローチは劇的にスケーラブルで効率的である。本研究は,グローバルSfMの問題を再考し,グローバルSfMにおける最先端技術を上回る新しい汎用システムとしてGLOMAPを提案する。精度とロバスト性の観点からは、最も広く使われている増分SfMであるCOLMAPよりも桁違いに高速な結果が得られる。当社のシステムは,https://github.com/colmap/glomap} でオープンソース実装として公開しています。 Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.	翻訳日:2024-11-08 14:05:01 公開日:2024-09-22
# 機能的MRI理解のための階層型量子制御ゲート Hierarchical Quantum Control Gates for Functional MRI Understanding ( http://arxiv.org/abs/2408.03596v3 ) ライセンス: Link先を確認	Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill, Samee U. Khan, Khoa Luu,	(参考訳) 量子コンピューティングは、古典的コンピュータ、特に暗号、最適化、ニューロコンピューティングといった一般的な分野において、難解な複雑な問題を解決する強力なツールとして登場した。本稿では,fMRI(Functional Magnetic Resonance Imaging)データを効率的に理解するために,HQCG(Hierarchical Quantum Control Gates)法という新しい量子ベース手法を提案する。このアプローチには、それぞれfMRI信号の局所的特徴とグローバルな特徴を抽出するために設計されたローカル量子制御ゲート(LQCG)とグローバル量子制御ゲート(GQCG)の2つの新しいモジュールが含まれている。提案手法は,量子マシン上でエンドツーエンドで動作し,量子力学を利用して,古典コンピュータの課題である30,000サンプルなどの超高次元fMRI信号のパターンを学習する。実験結果から,本手法は古典的手法よりも有意に優れていることが示された。さらに、提案した量子モデルは古典的手法よりも安定性が高く、過度に適合する傾向が低いことが判明した。 Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods.	翻訳日:2024-11-08 12:33:46 公開日:2024-09-22
# 境界駆動量子系の定常状態:いくつかの正確な結果 Stationary states of boundary driven quantum systems: some exact results ( http://arxiv.org/abs/2408.06887v2 ) ライセンス: Link先を確認	Eric A. Carlen, David a. Huse, Joel L. Lebowitz,	(参考訳) 密度行列がリンドブラディアンの=-i[H,\rho]+{\mathcal D}\rho$を介して進化する有限次元開量子系について検討する。ここで、$H$は孤立系のハミルトニアンであり、${\mathcal D}$は散逸子である。そこで、${\mathcal D}={\mathcal D}_A\otimes{\mathcal I}_B$、${\mathcal D}_A$がpart $A$、${\mathcal I}_B$がpart $B$である。例えば、${\mathcal D}_A$ をエルゴードとすると、${\mathcal D}_A\hat{\rho}_A=0$ は 1 つの一意密度行列 $\hat{\rho}_A$ に対してのみである。任意の定常密度行列 $\bar{\rho}$ がフルシステム上で$H$ と可換であることは、ある$\rho_B$ に対して $\bar{\rho}=\hat{\rho}_A\otimes\rho_B$ の積形式でなければならないことを示す。これにより、Gibs測度が $\rho_\beta\sim e^{-\beta H}$ を $\beta\neq 0$ の定常状態として持つ${\mathcal D}_A$ を見つけることができる。 A$ と $B$ の相互作用を持つシステムに対して、定常状態 $\bar{\rho}$ の特異性の基準を与える。非エルゴードケースの関連結果についても論じる。 We study finite-dimensional open quantum systems whose density matrix evolves via a Lindbladian, $\dot{\rho}=-i[H,\rho]+{\mathcal D}\rho$. Here $H$ is the Hamiltonian of the isolated system and ${\mathcal D}$ is the dissipator. We consider the case where the system consists of two parts, the "boundary'' $A$ and the ``bulk'' $B$, and ${\mathcal D}$ acts only on $A$, so ${\mathcal D}={\mathcal D}_A\otimes{\mathcal I}_B$, where ${\mathcal D}_A$ acts only on part $A$, while ${\mathcal I}_B$ is the identity superoperator on part $B$. Let ${\mathcal D}_A$ be ergodic, so ${\mathcal D}_A\hat{\rho}_A=0$ only for one unique density matrix $\hat{\rho}_A$. We show that any stationary density matrix $\bar{\rho}$ on the full system which commutes with $H$ must be of the product form $\bar{\rho}=\hat{\rho}_A\otimes\rho_B$ for some $\rho_B$. This rules out finding any ${\mathcal D}_A$ that has the Gibbs measure $\rho_\beta\sim e^{-\beta H}$ as a stationary state with $\beta\neq 0$, unless there is no interaction between parts $A$ and $B$. We give criteria for the uniqueness of the stationary state $\bar{\rho}$ for systems with interactions between $A$ and $B$. Related results for non-ergodic cases are also discussed.	翻訳日:2024-11-08 07:53:35 公開日:2024-09-22
# 大規模言語モデルによるソースコードの品質評価 : 比較研究 Evaluating Source Code Quality with Large Languagem Models: a comparative study ( http://arxiv.org/abs/2408.07082v2 ) ライセンス: Link先を確認	Igor Regis da Silva Simões, Elaine Venson,	(参考訳) コード品質は、複雑さ、可読性、テスト容易性、相互運用性、再利用可能性、良いプラクティスや悪いプラクティスの使用など、さまざまなメトリクスで構成されている属性です。静的コード解析ツールは、コード品質を評価するための属性のセットを測定することを目的としている。しかしながら、いくつかの品質特性は、コードレビューアクティビティにおいて人間によってのみ測定され、可読性はその例です。自然言語のテキスト処理能力を考えると、LLM(Large Language Model)がコードの品質を評価することができると仮定する。本稿では,LLMを静的解析ツールとして使用して得られた結果を記述し,解析し,コード全体の品質を評価することを目的とする。 GPT 3.5 Turbo と GPT 4o の2つのバージョンを比較し,総計1,641 のクラスを解析した。 GPT 3.5 Turbo LLMにはコード品質を評価する能力があり,Sonarのメトリクスと相関関係があることを実証した。しかし、LSMがSonarQubeと異なる具体的な側面がある。 GPT 4o版では、低品質と評価されたコードに高い分類を割り当てることで、以前のモデルとSonarから切り離された結果が示されなかった。本研究では,LLMによるコード品質評価の可能性を示す。しかし, LLMのコスト, 出力のばらつき, 従来の静的解析ツールでは測定されない品質特性の探索など, さらなる研究が必要である。 Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of attributes to assess code quality. However, some quality attributes can only be measured by humans in code review activities, readability being an example. Given their natural language text processing capability, we hypothesize that a Large Language Model (LLM) could evaluate the quality of code, including attributes currently not automatable. This paper aims to describe and analyze the results obtained using LLMs as a static analysis tool, evaluating the overall quality of code. We compared the LLM with the results obtained with the SonarQube software and its Maintainability metric for two Open Source Software (OSS) Java projects, one with Maintainability Rating A and the other B. A total of 1,641 classes were analyzed, comparing the results in two versions of models: GPT 3.5 Turbo and GPT 4o. We demonstrated that the GPT 3.5 Turbo LLM has the ability to evaluate code quality, showing a correlation with Sonar's metrics. However, there are specific aspects that differ in what the LLM measures compared to SonarQube. The GPT 4o version did not present the same results, diverging from the previous model and Sonar by assigning a high classification to codes that were assessed as lower quality. This study demonstrates the potential of LLMs in evaluating code quality. However, further research is necessary to investigate limitations such as LLM's cost, variability of outputs and explore quality characteristics not measured by traditional static analysis tools.	翻訳日:2024-11-08 07:53:35 公開日:2024-09-22
# ソリューション設計による自動プログラム修復の強化 Enhancing Automated Program Repair with Solution Design ( http://arxiv.org/abs/2408.12056v2 ) ライセンス: Link先を確認	Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang, Fang Liu,	(参考訳) 自動プログラム修正(APR)は、バグ解決、新機能開発、機能強化の3つのカテゴリを含む、特定のプロジェクト内の問題を自律的に修正する試みである。様々な方法論を提唱する広範な研究にもかかわらず、実際の問題に対処する効果は相変わらず不十分である。一般的に、エンジニアは、ソリューション計画のソリューションと基本的な理由のセットについて、設計の合理性(DR)を持っています。オープンソースプロジェクトでは、これらのDRはJiraのようなプロジェクト管理ツールを通じて、イシューログにキャプチャされることが多い。問題ログに散在するDRを活用して、APRを効率的に拡張するにはどうすればよいのか? DRCodePilot は GPT-4-Turbo の APR 機能を強化し,DR をプロンプト命令に組み込む手法である。さらに, GPT-4のプロジェクトコンテキストを十分に把握する上での制約や, 正確な識別子を生成する上での欠点を考慮し, フィードバックに基づく自己回帰フレームワークを考案し, 提案したパッチや提案した識別子を参照して, GPT-4のアウトプットを再検討し, 改善するよう促した。 GitHubとJiraにホストされている2つのオープンソースリポジトリからソースされた938のイシューパッチペアからなるベンチマークを確立しました。 DRCodePilotはGPT-4を直接利用するよりも4.7倍高いフルマッチ比を達成しています。さらに、CodeBLEUスコアも有望な拡張を示している。さらに,本研究では, DRのスタンドアロン適用により, ベンチマークスイート内でのCodeLlama, GPT-3.5, GPT-4間のフルマッチ比が向上する可能性が示唆された。我々は、DRCodePilotイニシアチブが、APRの分野を前進させる新しい人道となると信じている。 Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4's constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR.	翻訳日:2024-11-08 05:49:00 公開日:2024-09-22
# パラメタライズドシンボリック抽象グラフによるワンショット映像の模倣 One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs ( http://arxiv.org/abs/2408.12674v2 ) ライセンス: Link先を確認	Jianren Wang, Kangni Liu, Dingkun Guo, Xian Zhou, Christopher G Atkeson,	(参考訳) 動的で変形可能なオブジェクトを単一のデモビデオから操作することを学ぶことは、スケーラビリティという面で大きな約束である。これまでのアプローチでは、オブジェクト関係のリプレイやアクターの軌跡に主に焦点が当てられていた。前者は様々なタスクを一般化するのに苦労するが、後者はデータ非効率に悩まされる。さらに、どちらの手法も、力などの見えない物理的特性を捉える際の課題に直面している。本稿では,パラメータ化シンボル抽象グラフ(PSAG)を用いて,オブジェクトとエッジがオブジェクト間の関係を表すビデオデモを解釈する。さらに,非幾何学的,視覚的に知覚できない属性を推定するために,シミュレーションによる幾何学的制約を基礎とする。強化PSAGは実際のロボット実験に応用される。我々のアプローチは、Avocado、Cutting Vegetable、Pouring Liquid、Rolling Dough、Slicing Pizzaといった様々なタスクで検証されている。視覚的・物理的特性の異なる新しい物体への一般化を成功に導く。 Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.	翻訳日:2024-11-08 05:37:29 公開日:2024-09-22
# ADRS-CNet:DNAストレージクラスタリングアルゴリズムのための適応次元削減選択と分類ネットワーク ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms ( http://arxiv.org/abs/2408.12751v2 ) ライセンス: Link先を確認	Bowen Liu, Jiankun Li,	(参考訳) DNAストレージ技術は、高いストレージ密度、長期保存、低いメンテナンスコスト、コンパクトサイズのために、大量のデータストレージに対処する新たな可能性を提供します。記憶されている情報の信頼性を向上させるために、ベースエラーと不足するストレージシーケンスは、直面するべき課題である。現在、元のシーケンス情報を可能な限り回復するために、シーケンスシーケンスのクラスタリングと比較が採用されている。それでも、異なる長さのDNA配列を特徴として抽出すると、寸法の呪いが生じる。これを解決するために、PCA、UMAP、t-SNEといった技術は、高次元の特徴を低次元空間に投影するために一般的に用いられる。そこで本研究では,これらの手法が,異なるデータセットを扱う場合の次元削減に様々な効果を示すことを考慮し,入力DNA配列の特徴を分類するための多層パーセプトロンモデルを訓練し,その後のクラスタリング結果を高めるために最適な次元縮小法を適応的に選択することを提案する。オープンソースデータセットのテストや,さまざまなベースライン手法との比較を通じて,本モデルが優れた分類性能を示し,クラスタリング結果を大幅に改善することを示す実験結果が得られた。これにより,クラスタリングモデルに対する次元の呪いの影響を効果的に軽減できることを示す。 DNA storage technology offers new possibilities for addressing massive data storage due to its high storage density, long-term preservation, low maintenance cost, and compact size. To improve the reliability of stored information, base errors and missing storage sequences are challenges that must be faced. Currently, clustering and comparison of sequenced sequences are employed to recover the original sequence information as much as possible. Nonetheless, extracting DNA sequences of different lengths as features leads to the curse of dimensionality, which needs to be overcome. To address this, techniques like PCA, UMAP, and t-SNE are commonly employed to project high-dimensional features into low-dimensional space. Considering that these methods exhibit varying effectiveness in dimensionality reduction when dealing with different datasets, this paper proposes training a multilayer perceptron model to classify input DNA sequence features and adaptively select the most suitable dimensionality reduction method to enhance subsequent clustering results. Through testing on open-source datasets and comparing our approach with various baseline methods, experimental results demonstrate that our model exhibits superior classification performance and significantly improves clustering outcomes. This displays that our approach effectively mitigates the impact of the curse of dimensionality on clustering models.	翻訳日:2024-11-08 05:37:29 公開日:2024-09-22
# Zeoformer: OSDA-Zeolite親和性予測のための粗粒周期グラフ変換器 Zeoformer: Coarse-Grained Periodic Graph Transformer for OSDA-Zeolite Affinity Prediction ( http://arxiv.org/abs/2408.12984v3 ) ライセンス: Link先を確認	Xiangxiang Shen, Zheng Wan, Lingfeng Wen, Licheng Sun, Ou Yang Ming Jie, Xuan Tang, Xian Zeng, Mingsong Chen, Xiao He, Xian Wei,	(参考訳) 国際ゼオライト協会構造委員会(IZA-SC)はこれまでに255の異なるゼオライト構造をカタログ化しており、数百万もの理論上可能な構造がまだ発見されていない。特定のゼオライトの合成は、主にOSDAとゼオライトの親和性によって決定されるため、有機構造誘導剤(OSDA)の使用を必要とする。したがって、最も親和性が高いOSDA-ゼオライトペアが標的ゼオライトの合成の鍵となる。しかし、OSDA-ゼオライト対はしばしば複雑な幾何学構造、すなわち多数の原子によって形成される複雑な結晶構造を示す。既存の機械学習手法では結晶の周期性を表現できるが、局所的な可変性を持つ結晶構造を正確に表現することはできない。この問題に対処するため,Zeoformerという,粗粒度結晶周期性と粒度局所変動性を効果的に表現する手法を提案する。ゼオフォーマーは各原子を中心に単位細胞を再構成し、この中心原子と他の原子との対距離を再構成された単位細胞内に符号化する。再構成ユニットセル内の対距離の導入は、ユニットセルの全体構造と異なるユニットセルの違いをより効果的に表現し、OSDA-ゼオライト対と一般的な結晶構造の性質をより正確に効率的に予測することができる。総合評価により,OSDA-ゼオライトペアデータセットと2種類の結晶材料データセットで最高の性能を示す。 To date, the International Zeolite Association Structure Commission (IZA-SC) has cataloged merely 255 distinct zeolite structures, with millions of theoretically possible structures yet to be discovered. The synthesis of a specific zeolite typically necessitates the use of an organic structure-directing agent (OSDA), since the selectivity for a particular zeolite is largely determined by the affinity between the OSDA and the zeolite. Therefore, finding the best affinity OSDA-zeolite pair is the key to the synthesis of targeted zeolite. However, OSDA-zeolite pairs frequently exhibit complex geometric structures, i.e., a complex crystal structure formed by a large number of atoms. Although some existing machine learning methods can represent the periodicity of crystals, they cannot accurately represent crystal structures with local variability. To address this issue, we propose a novel approach called Zeoformer, which can effectively represent coarse-grained crystal periodicity and fine-grained local variability. Zeoformer reconstructs the unit cell centered around each atom and encodes the pairwise distances between this central atom and other atoms within the reconstructed unit cell. The introduction of pairwise distances within the reconstructed unit cell more effectively represents the overall structure of the unit cell and the differences between different unit cells, enabling the model to more accurately and efficiently predict the properties of OSDA-zeolite pairs and general crystal structures. Through comprehensive evaluation, our Zeoformer model demonstrates the best performance on OSDA-zeolite pair datasets and two types of crystal material datasets.	翻訳日:2024-11-08 05:26:28 公開日:2024-09-22
# CorefUDにおけるマルチリンガル干渉分解能向上のための多重戦略の探索 Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD ( http://arxiv.org/abs/2408.16893v2 ) ライセンス: Link先を確認	Ondřej Pražák, Miloslav Konopík,	(参考訳) テキスト中の同じエンティティを参照する式を識別するタスクである参照解決は、さまざまな自然言語処理(NLP)アプリケーションにおいて重要なコンポーネントである。本稿では,12言語にまたがる17のデータセットにまたがるCorefUD 1.1データセットを用いて,エンドツーエンドのニューラルコア参照解決システムを提案する。我々のモデルは、エンドツーエンドのニューラルコア参照解決システムに基づいている。まず、単言語と言語間変異を含む強力なベースラインモデルを構築し、その後、多様な言語文脈における性能向上のためのいくつかの拡張を提案する。これらの拡張には、言語間のトレーニング、構文情報の取り込み、最適化された単語予測のためのSpan2Headモデル、高度なシングルトンモデリングが含まれる。また,重なり合うセグメントによる単語スパン表現と長文書モデリングについても実験を行った。提案された拡張、特にヘッドオンリーのアプローチ、シングルトンモデリング、長いドキュメント予測は、ほとんどのデータセットのパフォーマンスを大幅に改善した。また、ゼロショット言語間実験を行い、コア参照分解における言語間移動の可能性と限界を強調した。本研究は,マルチリンガル・コアス・リゾリューションのための堅牢でスケーラブルなコアス・システムの開発に寄与する。最後に、CorefUD 1.1テストセット上でのモデルの評価を行い、CRAC 2023共有タスクの最良のモデルよりも大きなマージンで比較した。私たちのモデルはGitHubで利用可能です。 Coreference resolution, the task of identifying expressions in text that refer to the same entity, is a critical component in various natural language processing (NLP) applications. This paper presents our end-to-end neural coreference resolution system, utilizing the CorefUD 1.1 dataset, which spans 17 datasets across 12 languages. Our model is based on the end-to-end neural coreference resolution system. We first establish strong baseline models, including monolingual and cross-lingual variations, and then propose several extensions to enhance performance across diverse linguistic contexts. These extensions include cross-lingual training, incorporation of syntactic information, a Span2Head model for optimized headword prediction, and advanced singleton modeling. We also experiment with headword span representation and long-document modeling through overlapping segments. The proposed extensions, particularly the heads-only approach, singleton modeling, and long document prediction, significantly improve performance across most datasets. We also perform zero-shot cross-lingual experiments, highlighting the potential and limitations of cross-lingual transfer in coreference resolution. Our findings contribute to the development of robust and scalable coreference systems for multilingual coreference resolution. Finally, we evaluate our model on the CorefUD 1.1 test set and surpass the best model from the CRAC 2023 shared task of comparable size by a large margin. Our model is available on GitHub: https://github.com/ondfa/coref-multiling	翻訳日:2024-11-08 04:08:49 公開日:2024-09-22
# ICの認証・追跡のための階層型ブルームフィルタベースフレームワーク A Persistent Hierarchical Bloom Filter-based Framework for Authentication and Tracking of ICs ( http://arxiv.org/abs/2408.16950v2 ) ライセンス: Link先を確認	Fairuz Shadmani Shishir, Md Mashfiq Rizvee, Tanvir Hossain, Tamzidul Hoque, Domenic Forte, Sumaiya Shomaji,	(参考訳) 信頼できないサプライチェーンにおける偽ファイト集積回路(IC)の検出には、堅牢な追跡と認証が必要である。 Physical Unclonable Function (PUF) はユニークなIC識別子を提供するが、ノイズはそれらの実用性を損なう。本研究では,PHBF(Persistent Hierarchical Bloom Filter)フレームワークを導入し,ノイズのあるPUF生成シグネチャであっても,サプライチェーン全体で100%の精度で高速かつ正確なIC認証を実現する。 Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply chain even with noisy PUF-generated signatures.	翻訳日:2024-11-08 04:08:49 公開日:2024-09-22
# マルコフ連鎖変動推定法 : 確率近似法 Markov Chain Variance Estimation: A Stochastic Approximation Approach ( http://arxiv.org/abs/2409.05733v2 ) ライセンス: Link先を確認	Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri,	(参考訳) マルコフ連鎖上で定義される関数の漸近的分散を推定する問題は、定常平均の統計的推測の重要なステップである。我々は各ステップで$O(1)$計算を必要とする新しい再帰的推定器を設計し、履歴サンプルやラン長に関する事前知識を保存する必要がなく、証明可能な有限標本保証付き平均二乗誤差(MSE)に対する最適$O(\frac{1}{n})$収束率を有する。ここで、$n$は生成されたサンプルの総数を指す。我々の推定子は、ポアソン方程式の解の項による漸近分散の等価な定式化の線形確率近似に基づいている。我々は,ベクトル値関数の共分散行列の推定,マルコフ鎖の定常分散の推定,および基礎となるマルコフ鎖の状態空間が大きくなるような条件下での漸近分散の推定など,いくつかの方向の近似器を一般化する。また, 平均報酬強化学習(RL)における推定器の応用について述べる。この文脈でポリシー評価に適した時間差型アルゴリズムを設計する。表型および線形関数近似の設定について検討する。我々の研究は、分散制約付きRLのためのアクター・クリティカルなスタイルのアルゴリズムを開発するための道を開いた。 We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computation at each step, does not require storing any historical samples or any prior knowledge of run-length, and has optimal $O(\frac{1}{n})$ rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees. Here, $n$ refers to the total number of samples generated. Our estimator is based on linear stochastic approximation of an equivalent formulation of the asymptotic variance in terms of the solution of the Poisson equation. We generalize our estimator in several directions, including estimating the covariance matrix for vector-valued functions, estimating the stationary variance of a Markov chain, and approximately estimating the asymptotic variance in settings where the state space of the underlying Markov chain is large. We also show applications of our estimator in average reward reinforcement learning (RL), where we work with asymptotic variance as a risk measure to model safety-critical applications. We design a temporal-difference type algorithm tailored for policy evaluation in this context. We consider both the tabular and linear function approximation settings. Our work paves the way for developing actor-critic style algorithms for variance-constrained RL.	翻訳日:2024-11-07 22:27:40 公開日:2024-09-22
# 離散音声ユニットの完全性の推定 Estimating the Completeness of Discrete Speech Units ( http://arxiv.org/abs/2409.06109v2 ) ライセンス: Link先を確認	Sung-Lin Yeh, Hao Tang,	(参考訳) 離散単位による音声表現は音声コーデックや音声生成に広く用いられている。しかし、k-meansで音声情報や話者情報を混同したり、k-means以降の情報損失を仮定したりするなど、自己管理された離散単位に関する不確実な主張がいくつかある。本研究では,情報理論の観点を用いて,情報量(情報完全性)と情報量(情報アクセシビリティ)(情報アクセシビリティ)を,残差ベクトル量子化前後に求める。残差ベクトル量子化後の離散化HuBERT表現に対して,情報完全性と推定完全性に対する低い境界を示す。我々は,HuBERT離散単位には話者情報が十分に存在しており,残音には音声情報が十分存在しており,ベクトル量子化が絡み合っていないことを示す。この結果から, 離散単位の選択に関する総合的な評価が得られ, 残余の情報は廃棄されるよりも多く掘り下げるべきであることが示唆された。 Representing speech with discrete units has been widely used in speech codec and speech generation. However, there are several unverified claims about self-supervised discrete units, such as disentangling phonetic and speaker information with k-means, or assuming information loss after k-means. In this work, we take an information-theoretic perspective to answer how much information is present (information completeness) and how much information is accessible (information accessibility), before and after residual vector quantization. We show a lower bound for information completeness and estimate completeness on discretized HuBERT representations after residual vector quantization. We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement. Our results offer a comprehensive assessment on the choice of discrete units, and suggest that a lot more information in the residual should be mined rather than discarded.	翻訳日:2024-11-07 22:16:23 公開日:2024-09-22
# 変態等価性を用いたチャネル拡張による半監督型3次元物体検出 Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance ( http://arxiv.org/abs/2409.06583v2 ) ライセンス: Link先を確認	Minju Kang, Taehun Kong, Tae-Kyun Kim,	(参考訳) 正確な3Dオブジェクト検出は、自動運転車やロボットにとって、安全かつ効果的に環境をナビゲートし、対話する上で不可欠である。一方、3D検出器の性能は高価であるデータサイズとアノテーションに依存している。その結果,ラベル付きデータによるトレーニングの需要が高まっている。本稿では,3次元半教師対象検出のためのチャネル拡張を用いた新しい教師学生フレームワークについて検討する。教師の学生SSLは、教師と生徒にそれぞれ弱い増補と強い増補を採用するのが一般的である。本研究では、変換等分散検出器(TED)を用いて、両方のネットワークに多重チャネル拡張を適用する。 TEDにより、点雲上の拡張の異なる組み合わせを探索し、マルチチャネル変換等式を効率的に集約することができる。原則として、教師ネットワークに固定チャネル拡張を適用することにより、学生は信頼できる擬似ラベルで安定的に訓練することができる。強力なチャネル拡張を採用することで、データの多様性を強化し、変換に対する堅牢性を高め、学生ネットワークの一般化性能を向上させることができる。我々はSOTA階層的監視をベースラインとして使用し、その二重閾値をTEDに適応させ、これはチャネルIoU整合性と呼ばれる。提案手法をKITTIデータセットを用いて評価し,SOTA3D半教師付き物体検出モデルを上回る性能向上を実現した。 Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.	翻訳日:2024-11-07 22:05:05 公開日:2024-09-22
# コンピュータビジョンにおける倫理的課題: 公開データセットにおけるプライバシの確保とバイアスの緩和 Ethical Challenges in Computer Vision: Ensuring Privacy and Mitigating Bias in Publicly Available Datasets ( http://arxiv.org/abs/2409.10533v3 ) ライセンス: Link先を確認	Ghalib Ahmed Tahir,	(参考訳) 本稿では,コンピュータビジョン技術の創造と展開に関する倫理的問題,特に公開データセットの利用に関して,光を当てることを目的としている。機械学習と人工知能の急速な成長により、コンピュータビジョンは医療、セキュリティシステム、貿易など多くの産業において重要なツールとなっている。しかし、その影響についての情報的な議論により、同意なく収集されることが多い視覚的データの広範な使用は、プライバシーと偏見に関する重大な懸念を提起する。また、コンピュータビジョンモデルのトレーニングに通常使用されるCOCO、LFW、ImageNet、CelebA、PASCAL VOCなどの一般的なデータセットを分析して、これらの問題についても検討する。我々は、個人の権利の保護、バイアスの最小化、開放性と責任に関するこれらの課題に対処する包括的な倫理的枠組みを提供する。我々は、社会的な価値と倫理的基準を考慮に入れたAI開発を奨励し、公共の害を避けることを目指している。 This paper aims to shed light on the ethical problems of creating and deploying computer vision tech, particularly in using publicly available datasets. Due to the rapid growth of machine learning and artificial intelligence, computer vision has become a vital tool in many industries, including medical care, security systems, and trade. However, extensive use of visual data that is often collected without consent due to an informed discussion of its ramifications raises significant concerns about privacy and bias. The paper also examines these issues by analyzing popular datasets such as COCO, LFW, ImageNet, CelebA, PASCAL VOC, etc., that are usually used for training computer vision models. We offer a comprehensive ethical framework that addresses these challenges regarding the protection of individual rights, minimization of bias as well as openness and responsibility. We aim to encourage AI development that will take into account societal values as well as ethical standards to avoid any public harm.	翻訳日:2024-11-07 20:35:12 公開日:2024-09-22
# Green Federated Learning: Green Aware AIの新しい時代 Green Federated Learning: A new era of Green Aware AI ( http://arxiv.org/abs/2409.12626v2 ) ライセンス: Link先を確認	Dipanwita Thakur, Antonella Guzzo, Giancarlo Fortino, Francesco Piccialli,	(参考訳) AIアプリケーション、特に大規模無線ネットワークにおける開発は、使用されるアーキテクチャのサイズと複雑さとともに指数関数的に増加している。特に、機械学習は、今日の最もエネルギー集約的な計算応用の1つとして認められており、次世代インテリジェントシステムの環境持続可能性に重大な課題を提起している。環境の持続可能性を達成するには、すべてのAIアルゴリズムが持続可能性を考慮して設計され、アーキテクチャフェーズ以降の緑の考慮事項を統合する必要がある。最近、フェデレートラーニング(FL)は、その分散した性質から、このニーズに対処する新たな機会を提示している。したがって、最近のFLの進歩と持続可能性への影響から生じる可能性と課題を解明することが不可欠である。さらに、グリーンアウェアなAIアルゴリズムの既存の取り組みとギャップをナビゲートし、理解するためのロードマップを研究者、ステークホルダ、関心のある関係者に提供することが重要です。この調査は主に、100を超えるFL作品を特定し分析し、持続可能な環境のためのグリーンアウェアな人工知能への貢献を評価し、IoT研究に特に焦点をあてることによって、この目標を達成することを目的としている。エネルギー効率の観点から、グリーンフェデレーション学習の現在の課題を掘り下げ、グリーンIoTアプリケーション研究の潜在的な課題と今後の展望について論じている。 The development of AI applications, especially in large-scale wireless networks, is growing exponentially, alongside the size and complexity of the architectures used. Particularly, machine learning is acknowledged as one of today's most energy-intensive computational applications, posing a significant challenge to the environmental sustainability of next-generation intelligent systems. Achieving environmental sustainability entails ensuring that every AI algorithm is designed with sustainability in mind, integrating green considerations from the architectural phase onwards. Recently, Federated Learning (FL), with its distributed nature, presents new opportunities to address this need. Hence, it's imperative to elucidate the potential and challenges stemming from recent FL advancements and their implications for sustainability. Moreover, it's crucial to furnish researchers, stakeholders, and interested parties with a roadmap to navigate and understand existing efforts and gaps in green-aware AI algorithms. This survey primarily aims to achieve this objective by identifying and analyzing over a hundred FL works, assessing their contributions to green-aware artificial intelligence for sustainable environments, with a specific focus on IoT research. It delves into current issues in green federated learning from an energy-efficient standpoint, discussing potential challenges and future prospects for green IoT application research.	翻訳日:2024-11-07 14:08:12 公開日:2024-09-22
# 自動運転における半監督セマンティックセマンティックセグメンテーションのための小型擬似ラベルの爆発 Exploiting Minority Pseudo-Labels for Semi-Supervised Semantic Segmentation in Autonomous Driving ( http://arxiv.org/abs/2409.12680v2 ) ライセンス: Link先を確認	Yuting Hong, Hui Xiao, Huazheng Hao, Xiaojie Qiu, Baochen Yao, Chengbin Peng,	(参考訳) 自律運転の進歩により、セマンティックセグメンテーションは目覚ましい進歩を遂げた。このようなネットワークのトレーニングは画像アノテーションに大きく依存する。半教師付き学習は、擬似ラベルの助けを借りてラベル付きデータと未ラベル付きデータの両方を利用することができる。しかし、クラスがバランスの取れない現実のシナリオでは、多数派クラスがトレーニングにおいて支配的な役割を果たすことが多く、マイノリティクラスの学習品質が損なわれることがある。この制限を克服するために、マイノリティクラス学習を強化する専門的なトレーニングモジュールと、より包括的な意味情報を学ぶための一般的なトレーニングモジュールを含む、シナジスティックなトレーニングフレームワークを提案する。画素選択戦略に基づいて、互いから反復的に学習し、エラーの蓄積と結合を低減する。さらに、より明確な決定境界を保証するために、アンカーを用いた二重コントラスト学習を提案する。実験では,ベンチマークデータセットの最先端手法と比較して優れた性能を示す。 With the advancement of autonomous driving, semantic segmentation has achieved remarkable progress. The training of such networks heavily relies on image annotations, which are very expensive to obtain. Semi-supervised learning can utilize both labeled data and unlabeled data with the help of pseudo-labels. However, in many real-world scenarios where classes are imbalanced, majority classes often play a dominant role during training and the learning quality of minority classes can be undermined. To overcome this limitation, we propose a synergistic training framework, including a professional training module to enhance minority class learning and a general training module to learn more comprehensive semantic information. Based on a pixel selection strategy, they can iteratively learn from each other to reduce error accumulation and coupling. In addition, a dual contrastive learning with anchors is proposed to guarantee more distinct decision boundaries. In experiments, our framework demonstrates superior performance compared to state-of-the-art methods on benchmark datasets.	翻訳日:2024-11-07 13:56:59 公開日:2024-09-22
# 量子貯水池計算にはコヒーレンス流入が不可欠である Coherence influx is indispensable for quantum reservoir computing ( http://arxiv.org/abs/2409.12693v2 ) ライセンス: Link先を確認	Shumpei Kobayashi, Quoc Hoan Tran, Kohei Nakajima,	(参考訳) Echo状態特性(ESP)は、入力駆動の動的システムが情報処理タスクを実行できる基本的な特性である。近年、非定常系やサブシステム、すなわち非定常ESPやサブサブスペースESPへのESPの拡張が提案されている。本稿では,非定常ESPとサブセット/サブスペース非定常ESPを満たすために,量子システムに必要な十分かつ必要な条件を理論的に数値解析する。パウリ変換行列 (PTM) 形式を広く用いた結果,(1)$\textit{coherence influx}$ と呼ばれる量子コヒーレントな環境との相互作用は非定常ESPの実現には不可欠であり,(2)PTMのスペクトル半径は量子貯水池計算 (QRC) のフェードメモリ特性を特徴づけることができることがわかった。スピングラス/マニーボディの局在相を含むハミルトニアン系を含む数値解析実験により, PTMのスペクトル半径は, そのような系に固有の動的相転移を記述できることが判明した。 QRCのESP下でのメカニズムを包括的に理解するために,1次元乗算入力を持つ貯水池計算(RC)システムである簡易モデルである乗算貯水池計算(mRC)を提案する。理論的,数値的には,mRCにおけるスペクトル半径とコヒーレンス流入に対応するパラメータは,その線形記憶容量(MC)と直接相関することを示す。 QRC と mRC に関する知見は PTM の理論的側面と QRC の入力乗算率をもたらす。その結果、オープン量子システムにおけるQRCと情報処理の理解を深めることになる。 Echo state property (ESP) is a fundamental property that allows an input-driven dynamical system to perform information processing tasks. Recently, extensions of ESP to potentially nonstationary systems and subsystems, that is, nonstationary ESP and subset/subspace ESP, have been proposed. In this paper, we theoretically and numerically analyze the sufficient and necessary conditions for a quantum system to satisfy nonstationary ESP and subset/subspace nonstationary ESP. Based on extensive usage of the Pauli transfer matrix (PTM) form, we find that (1) the interaction with a quantum-coherent environment, termed $\textit{coherence influx}$, is indispensable in realizing nonstationary ESP, and (2) the spectral radius of PTM can characterize the fading memory property of quantum reservoir computing (QRC). Our numerical experiment, involving a system with a Hamiltonian that entails a spin-glass/many-body localization phase, reveals that the spectral radius of PTM can describe the dynamical phase transition intrinsic to such a system. To comprehensively understand the mechanisms under ESP of QRC, we propose a simplified model, multiplicative reservoir computing (mRC), which is a reservoir computing (RC) system with a one-dimensional multiplicative input. Theoretically and numerically, we show that the parameters corresponding to the spectral radius and coherence influx in mRC directly correlates with its linear memory capacity (MC). Our findings about QRC and mRC will provide a theoretical aspect of PTM and the input multiplicativity of QRC. The results will lead to a better understanding of QRC and information processing in open quantum systems.	翻訳日:2024-11-07 13:56:58 公開日:2024-09-22
# 対話的で学習可能な協調運転自動化を目指して--大規模言語モデル駆動意思決定フレームワーク Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework ( http://arxiv.org/abs/2409.12812v2 ) ライセンス: Link先を確認	Shiyu Fang, Jiaqi Liu, Mingyu Ding, Yiming Cui, Chen Lv, Peng Hang, Jian Sun,	(参考訳) 現在、コネクテッド・オートモービルズ(CAV)は世界中の道路試験を開始したが、複雑なシナリオにおける安全性と効率性はまだ不十分である。協調運転は、CAVの接続能力を活用して、複雑なシナリオにおいてCAVの性能を改善するための有望なアプローチとなる。しかしながら、インタラクションと継続的学習能力の欠如は、現在の協調運転を単一シナリオアプリケーションと特定の協調運転自動化(CDA)に制限する。これらの課題に対処するため,本研究では,対話型かつ学習可能なLLM駆動協調運転フレームワークであるCoDrivingLLMを提案し,全シナリオと全CDAを実現する。まず,Large Language Models (LLMs) は数学的計算に適さないため,セマンティックな決定に基づく車両位置の更新を行う環境モジュールを導入し,車両位置のLLM制御による潜在的なエラーを回避する。第2に、SAE J3216規格で定義された4段階のCDAに基づいて、状態認識、意図共有、交渉、意思決定を含むChain-of-Thought(COT)ベースの推論モジュールを提案し、多段階推論タスクにおけるLCMの安定性を向上させる。中央集権的な紛争解決は、推論プロセスのコンフリクトコーディネータを通じて管理される。最後に、メモリモジュールを導入し、検索拡張世代を採用することで、CAVには過去の経験から学ぶ能力が与えられている。提案したCoDrivingLLMは,交渉モジュール上でのアブレーション実験,ショット経験の相違による推論,および他の協調運転法との比較により検証した。 At present, Connected Autonomous Vehicles (CAVs) have begun to open road testing around the world, but their safety and efficiency performance in complex scenarios is still not satisfactory. Cooperative driving leverages the connectivity ability of CAVs to achieve synergies greater than the sum of their parts, making it a promising approach to improving CAV performance in complex scenarios. However, the lack of interaction and continuous learning ability limits current cooperative driving to single-scenario applications and specific Cooperative Driving Automation (CDA). To address these challenges, this paper proposes CoDrivingLLM, an interactive and learnable LLM-driven cooperative driving framework, to achieve all-scenario and all-CDA. First, since Large Language Models(LLMs) are not adept at handling mathematical calculations, an environment module is introduced to update vehicle positions based on semantic decisions, thus avoiding potential errors from direct LLM control of vehicle positions. Second, based on the four levels of CDA defined by the SAE J3216 standard, we propose a Chain-of-Thought (COT) based reasoning module that includes state perception, intent sharing, negotiation, and decision-making, enhancing the stability of LLMs in multi-step reasoning tasks. Centralized conflict resolution is then managed through a conflict coordinator in the reasoning process. Finally, by introducing a memory module and employing retrieval-augmented generation, CAVs are endowed with the ability to learn from their past experiences. We validate the proposed CoDrivingLLM through ablation experiments on the negotiation module, reasoning with different shots experience, and comparison with other cooperative driving methods.	翻訳日:2024-11-07 13:23:33 公開日:2024-09-22
# ビザンチン攻撃による分散型マルチエージェント政策評価の難しさについて On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks ( http://arxiv.org/abs/2409.12882v2 ) ライセンス: Link先を確認	Hairi, Minghong Fang, Zifan Zhang, Alvaro Velasquez, Jia Liu,	(参考訳) 本稿では,協調型マルチエージェント強化学習において重要なサブプロブレムである完全分散型マルチエージェント政策評価問題について,最大$f$の障害エージェントの存在下で検討する。特に、モデル中毒設定を伴ういわゆるビザンツの欠陥モデルに焦点を当てる。一般に、政策評価は、任意の政策の価値関数を評価することである。協調型マルチエージェントシステムでは、システム全体の報酬は通常、すべてのエージェントからの報酬の均一平均としてモデル化される。ビザンチン系エージェントの存在下でのマルチエージェント政策評価問題,特に異種局所報酬の設定について検討する。理想的には、エージェントの目標は、与えられたポリシーに対する通常のエージェントの報酬の均一な平均である、蓄積されたシステム全体の報酬を評価することである。これは、すべてのエージェントが共通値(コンセンサス部)に合意し、さらにコンセンサス値が値関数(収束部)であることを意味する。しかし、我々はこの目標が達成できないことを証明している。代わりに、エージェントの目標は蓄積されたシステム全体の報酬を評価することであり、通常のエージェントの適切な重み付けされた平均報酬である。さらに、正の重みの総数が $\|\mathcal{N}\|-f $ を超えることを保証できる正のアルゴリズムが存在しないことを証明している。最後に、スカラー関数近似の下で漸近的コンセンサスを保証するビザンチン耐性の分散時間差分法を提案する。次に,提案アルゴリズムの有効性を実証的に検証する。 In this paper, we study a fully-decentralized multi-agent policy evaluation problem, which is an important sub-problem in cooperative multi-agent reinforcement learning, in the presence of up to $f$ faulty agents. In particular, we focus on the so-called Byzantine faulty model with model poisoning setting. In general, policy evaluation is to evaluate the value function of any given policy. In cooperative multi-agent system, the system-wide rewards are usually modeled as the uniform average of rewards from all agents. We investigate the multi-agent policy evaluation problem in the presence of Byzantine agents, particularly in the setting of heterogeneous local rewards. Ideally, the goal of the agents is to evaluate the accumulated system-wide rewards, which are uniform average of rewards of the normal agents for a given policy. It means that all agents agree upon common values (the consensus part) and furthermore, the consensus values are the value functions (the convergence part). However, we prove that this goal is not achievable. Instead, we consider a relaxed version of the problem, where the goal of the agents is to evaluate accumulated system-wide reward, which is an appropriately weighted average reward of the normal agents. We further prove that there is no correct algorithm that can guarantee that the total number of positive weights exceeds $\|\mathcal{N}\|-f $, where $\|\mathcal{N}\|$ is the number of normal agents. Towards the end, we propose a Byzantine-tolerant decentralized temporal difference algorithm that can guarantee asymptotic consensus under scalar function approximation. We then empirically test the effective of the proposed algorithm.	翻訳日:2024-11-07 12:59:09 公開日:2024-09-22
# VLMのロールプレイングゲームは可能か? ブラックマイスウォンを研究事例に Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case ( http://arxiv.org/abs/2409.12889v2 ) ライセンス: Link先を確認	Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng,	(参考訳) 近年,大規模言語モデル(LLM)に基づくエージェントは,様々な分野において大きな進歩を遂げている。最も人気のある研究分野の1つは、これらのエージェントをビデオゲームに適用することである。伝統的に、これらの手法はゲーム内の環境および行動データにアクセスするためにゲームAPIに依存してきた。しかし、このアプローチはAPIの可用性によって制限されており、人間がゲームをする方法を反映していない。視覚言語モデル(VLM)の出現により、エージェントは視覚的理解能力を強化し、視覚入力のみを使用してゲームと対話できるようになった。これらの進歩にもかかわらず、現在のアプローチはアクション指向のタスク、特に強化学習法が一般的だが一般化が不十分で広範な訓練を必要とするアクションロールプレイングゲーム(ARPG)において、依然として課題に直面している。これらの制限に対処するため、視覚のみの入力と複雑なアクション出力を必要とするシナリオにおいて、既存のVLMの機能境界を探索する研究プラットフォームとして、ARPGの ``Black Myth: Wukong'' を選択する。ゲーム内の12のタスクを定義し、75%が戦闘に焦点を当て、いくつかの最先端のVLMをこのベンチマークに組み込む。さらに、記録されたゲームプレイビデオとマウスとキーボードアクションを含む操作ログを含む人間の操作データセットをリリースする。さらに,行動計画システムと視覚軌道システムからなるVARP(Vision Action Role-Playing)エージェントフレームワークを提案する。我々のフレームワークは、基本的なタスクを実行し、簡単かつ中程度の戦闘シナリオの90%を成功させる能力を示している。本研究の目的は,複雑なアクションゲーム環境にマルチモーダルエージェントを適用するための新たな洞察と方向性を提供することである。コードとデータセットはhttps://varp-agent.github.io/で公開される。 Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. With the advent of vision language models (VLMs), agents now have enhanced visual understanding capabilities, enabling them to interact with games using only visual inputs. Despite these advances, current approaches still face challenges in action-oriented tasks, particularly in action role-playing games (ARPGs), where reinforcement learning methods are prevalent but suffer from poor generalization and require extensive training. To address these limitations, we select an ARPG, ``Black Myth: Wukong'', as a research platform to explore the capability boundaries of existing VLMs in scenarios requiring visual-only input and complex action output. We define 12 tasks within the game, with 75% focusing on combat, and incorporate several state-of-the-art VLMs into this benchmark. Additionally, we will release a human operation dataset containing recorded gameplay videos and operation logs, including mouse and keyboard actions. Moreover, we propose a novel VARP (Vision Action Role-Playing) agent framework, consisting of an action planning system and a visual trajectory system. Our framework demonstrates the ability to perform basic tasks and succeed in 90% of easy and medium-level combat scenarios. This research aims to provide new insights and directions for applying multimodal agents in complex action game environments. The code and datasets will be made available at https://varp-agent.github.io/.	翻訳日:2024-11-07 12:59:09 公開日:2024-09-22
# オープンワールドにおけるライダーパノプティクスのセグメンテーション Lidar Panoptic Segmentation in an Open World ( http://arxiv.org/abs/2409.14273v1 ) ライセンス: Link先を確認	Anirudh S Chakravarthy, Meghana Reddy Ganesina, Peiyun Hu, Laura Leal-Taixe, Shu Kong, Deva Ramanan, Aljosa Osep,	(参考訳) ライダー・パノプティクス・セグメンテーション(LPS)への対処は、自動運転車の安全な配備に不可欠である。 LPSは、可算オブジェクト(例えば歩行者や車両)のモノクラスや、非定型領域(例えば、植生や道路)のモノクラスを含む、セマンティッククラスの事前に定義された語彙を認識・セグメントすることを目的としている。重要なのは、LPSは個々のインスタンス(例えば、すべての車両)をセグメント化する必要があることだ。現在のLPS法は、意味クラス語彙が実際のオープンな世界で固定されているという非現実的な仮定をしているが、実際には、クラスオントロジは通常、事前に定義されたクラス語彙のように未知であると考えられる新しいクラスのインスタンスに遭遇するにつれて、時間とともに進化する。この非現実的な仮定に対処するため、我々はOpen World (LiPSOW): 定義済みのセマンティッククラスボキャブラリを持つデータセット上でモデルをトレーニングし、それらの一般化を、モノやモノの新たなインスタンスが現れるような大きなデータセットに研究する。この実験的な設定は興味深い結論をもたらす。先行技術訓練では、クラス固有のインスタンスセグメンテーション法と、既知のクラスにおける最先端の結果を得るが、クラスに依存しないボトムアップグルーピング法は、初期クラス語彙以外のクラス(すなわち未知クラス)で好意的に機能する。残念ながら、これらのメソッドは、既知のクラスで完全にデータ駆動のメソッドと同等に動作しない。分類に依存しない点クラスタリングを行い、階層的な方法で入力クラウドを過剰に分離し、次に領域提案ネットワークのようにバイナリポイントセグメントの分類を行う。我々は、意味分類とは独立に、点セグメントの重み付き階層木におけるカットを計算することで、最終点雲のセグメンテーションを得る。注目すべきは、この統一されたアプローチは、既知のクラスと未知のクラスの両方で強力なパフォーマンスをもたらすことだ。 Addressing Lidar Panoptic Segmentation (LPS ) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points w.r.t. a pre-defined vocabulary of semantic classes, including thing classes of countable objects (e.g., pedestrians and vehicles) and stuff classes of amorphous regions (e.g., vegetation and road). Importantly, LPS requires segmenting individual thing instances (e.g., every single vehicle). Current LPS methods make an unrealistic assumption that the semantic class vocabulary is fixed in the real open world, but in fact, class ontologies usually evolve over time as robots encounter instances of novel classes that are considered to be unknowns w.r.t. the pre-defined class vocabulary. To address this unrealistic assumption, we study LPS in the Open World (LiPSOW): we train models on a dataset with a pre-defined semantic class vocabulary and study their generalization to a larger dataset where novel instances of thing and stuff classes can appear. This experimental setting leads to interesting conclusions. While prior art train class-specific instance segmentation methods and obtain state-of-the-art results on known classes, methods based on class-agnostic bottom-up grouping perform favorably on classes outside of the initial class vocabulary (i.e., unknown classes). Unfortunately, these methods do not perform on-par with fully data-driven methods on known classes. Our work suggests a middle ground: we perform class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification, akin to Region Proposal Network [1]. We obtain the final point cloud segmentation by computing a cut in the weighted hierarchical tree of point segments, independently of semantic classification. Remarkably, this unified approach leads to strong performance on both known and unknown classes.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# 大規模言語モデルによる証明自動化 Proof Automation with Large Language Models ( http://arxiv.org/abs/2409.14274v1 ) ライセンス: Link先を確認	Minghai Lu, Benjamin Delaware, Tianyi Zhang,	(参考訳) Coqのようなインタラクティブな定理証明器は、ソフトウェアの正しさを正式に保証する強力なツールである。しかし、これらのツールを使用するには、かなりの手作業と専門知識が必要である。大規模言語モデル(LLM)は、自然言語の非公式な証明を自動生成する可能性を示しているが、対話型定理証明器では形式的な証明を生成できない。本稿では,LLMが形式的証明を生成する際に犯した一般的な誤りを特定するための形式的研究を行う。 GPT-3.5による520個の証明生成誤差を解析した結果、GPT-3.5は証明の正しい高次構造をしばしば特定するが、下位レベルの詳細を正しく把握するのに苦労していることがわかった。この知見に基づいて,まず LLM に初期証明を生成することを促し,次に目標とする記号法を利用して低レベルの問題を反復的に修復する,新しい生成・再生手法である PALM を提案する。 10K以上の定理を含む大規模データセット上でPALMを評価する。その結果、PALMは他の最先端の手法よりも大幅に優れており、76.6%から180.4%の定理を証明できた。さらに、PALMは既存のアプローチの範囲を超えて1270の定理を証明している。また,異なるLLM間のPALMの一般化可能性を示す。 Interactive theorem provers such as Coq are powerful tools to formally guarantee the correctness of software. However, using these tools requires significant manual effort and expertise. While Large Language Models (LLMs) have shown promise in automatically generating informal proofs in natural language, they are less effective at generating formal proofs in interactive theorem provers. In this paper, we conduct a formative study to identify common mistakes made by LLMs when asked to generate formal proofs. By analyzing 520 proof generation errors made by GPT-3.5, we found that GPT-3.5 often identified the correct high-level structure of a proof, but struggled to get the lower-level details correct. Based on this insight, we propose PALM, a novel generate-then-repair approach that first prompts an LLM to generate an initial proof and then leverages targeted symbolic methods to iteratively repair low-level problems. We evaluate PALM on a large dataset that includes more than 10K theorems. Our results show that PALM significantly outperforms other state-of-the-art approaches, successfully proving 76.6% to 180.4% more theorems. Moreover, PALM proves 1270 theorems beyond the reach of existing approaches. We also demonstrate the generalizability of PALM across different LLMs.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# 動的散乱チャンネルを用いたマルチユーザ画像暗号化 Dynamic Scattering-channel-based Approach for Multiuser Image Encryption ( http://arxiv.org/abs/2409.14275v1 ) ライセンス: Link先を確認	Mohammadrasoul Taghavi, Edwin A. Marengo,	(参考訳) 全てのユーザが使用する静的な複雑な媒体に基づいて動作する従来の散乱ベースの暗号化システムは、暗号文とプレーンテキストのペアを利用して散乱媒体の応答をモデル化しリバースエンジニアリングする学習ベースの攻撃に対して脆弱であり、物理媒体を使わずに不正な復号を可能にする。本研究では,マルチユーザ画像暗号化のための動的散乱チャネルに基づく新しい手法を開発した。確立されたアプローチは、複数の散乱ナノ粒子の調整可能な集合体としてモデル化された可変な動的散乱媒質を用いる。提案システムでは,異なる時間ブロックに対する散乱行列の異なる組み合わせと,ユーザ固有の複素値係数を組み合わせることで,ユーザごとに一意な暗号鍵を作成できるようにすることにより,複数のユーザを支援する。本手法は,分散メディアを暗号化機構として用いたマルチユーザセキュア通信およびストレージチャネルの実現可能性を高める。 Conventional scattering-based encryption systems that operate based on a static complex medium which is used by all users are vulnerable to learning-based attacks that exploit ciphertext-plaintext pairs to model and reverse-engineer the scattering medium's response, enabling unauthorized decryption without the physical medium. In this contribution, a new dynamic scattering-channel-based technique for multiuser image encryption is developed. The established approach employs variable, dynamic scattering media which are modeled as tunable aggregates of multiple scattering nanoparticles. The proposed system supports multiple users by allowing distinct combinations of scattering matrices for different time blocks, each combined with user-specific complex-valued coefficients, enabling the creation of unique, hard-to-guess encryption keys for each user. The derived methodology enhances the practical feasibility of multiuser secure communication and storage channels employing scattering media as the encryption mechanism.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# Can-Do! 大規模マルチモーダルモデルを用いた身体的計画のためのデータセットとニューロシンボリック接地フレームワーク Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models ( http://arxiv.org/abs/2409.14277v1 ) ライセンス: Link先を確認	Yew Ken Chia, Qi Sun, Lidong Bing, Soujanya Poria,	(参考訳) 大規模マルチモーダルモデルは、視覚や言語タスクにおいて、目覚しい問題解決能力を示し、幅広い世界の知識をエンコードする可能性を持っている。しかし、これらのモデルが現実的な環境で知覚、理性、計画、行動を理解することは、依然としてオープンな課題である。本研究では,従来のデータセットよりも多様で複雑なシナリオを通じて,具体的計画能力を評価するために設計されたベンチマークデータセットであるCan-Doを紹介する。私たちのデータセットには400のマルチモーダルサンプルが含まれており、それぞれが自然言語のユーザ指示、環境を描写した視覚イメージ、状態変化、対応するアクションプランで構成されています。データは、常識知識、身体的理解、安全意識の様々な側面を含んでいる。 GPT-4Vを含む最先端のモデルでは、視覚知覚、理解、推論能力のボトルネックに直面している。これらの課題に対処するために,ニューログラウンド(NeuroGround)を提案する。このフレームワークは,まず認識された環境状態におけるプラン生成を基盤として,モデル生成計画の強化にシンボリックな計画エンジンを活用する。実験により,強いベースラインと比較して,フレームワークの有効性が示された。私たちのコードとデータセットはhttps://embodied-planning.github.io.comで公開されています。 Large multimodal models have demonstrated impressive problem-solving abilities in vision and language tasks, and have the potential to encode extensive world knowledge. However, it remains an open challenge for these models to perceive, reason, plan, and act in realistic environments. In this work, we introduce Can-Do, a benchmark dataset designed to evaluate embodied planning abilities through more diverse and complex scenarios than previous datasets. Our dataset includes 400 multimodal samples, each consisting of natural language user instructions, visual images depicting the environment, state changes, and corresponding action plans. The data encompasses diverse aspects of commonsense knowledge, physical understanding, and safety awareness. Our fine-grained analysis reveals that state-of-the-art models, including GPT-4V, face bottlenecks in visual perception, comprehension, and reasoning abilities. To address these challenges, we propose NeuroGround, a neurosymbolic framework that first grounds the plan generation in the perceived environment states and then leverages symbolic planning engines to augment the model-generated plans. Experimental results demonstrate the effectiveness of our framework compared to strong baselines. Our code and dataset are available at https://embodied-planning.github.io.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# 確率的外勾配の加速:分散およびフェデレート学習におけるコミュニケーションの低減を目的としたヘッセンとグラディエント類似性の混合 Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning ( http://arxiv.org/abs/2409.14280v1 ) ライセンス: Link先を確認	Dmitry Bylinkin, Kirill Degtyarev, Aleksandr Beznosikov,	(参考訳) 学習の現代的現実と傾向はモデルのより一般化能力を必要とし、モデルとトレーニングサンプルサイズの両方が増加する。このようなタスクを単一のデバイスモードで解決することは、すでに困難である。これが、分散学習アプローチとフェデレーション学習アプローチが毎日人気を増している理由です。分散コンピューティングはデバイス間の通信を伴うため、効率性とプライバシという2つの重要な問題を解決する必要がある。通信コストと戦うための最もよく知られたアプローチの1つは、ローカルデータの類似性を活用することである。ヘッセンの類似性と同質勾配の両方が文献で研究されているが、別々に研究されている。本稿では、データ類似性とクライアントサンプリングのアイデアを取り入れた新しい手法を解析する上で、これら2つの仮定を組み合わせる。さらに, プライバシー問題に対処するため, 付加雑音の手法を適用し, 提案手法の収束への影響を解析する。この理論は、実際のデータセットのトレーニングによって確認される。 Modern realities and trends in learning require more and more generalization ability of models, which leads to an increase in both models and training sample size. It is already difficult to solve such tasks in a single device mode. This is the reason why distributed and federated learning approaches are becoming more popular every day. Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy. One of the most well-known approaches to combat communication costs is to exploit the similarity of local data. Both Hessian similarity and homogeneous gradients have been studied in the literature, but separately. In this paper, we combine both of these assumptions in analyzing a new method that incorporates the ideas of using data similarity and clients sampling. Moreover, to address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method. The theory is confirmed by training on real datasets.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# Flag Proxy Networks: 量子LDPCコードのアーキテクチャ、スケジューリング、デコード Flag Proxy Networks: Tackling the Architectural, Scheduling, and Decoding Obstacles of Quantum LDPC codes ( http://arxiv.org/abs/2409.14283v1 ) ライセンス: Link先を確認	Suhas Vittal, Ali Javadi-Abhari, Andrew W. Cross, Lev S. Bishop, Moinuddin Qureshi,	(参考訳) 重要なアプリケーションにおいて指数的スピードアップを達成するためには、量子エラー補正が必要である。平面曲面符号は比較的単純であるため、過去20年間で最も研究されている誤り訂正符号である。しかし、平面曲面符号で特異な論理量子ビットを符号化するには、コード距離~($d$)の物理的量子ビットが必要であり、将来的なアプリケーションに必要な大距離符号には空間非効率である。したがって、平面曲面符号の代替として {\displaystyle {\em Quantum Low-Density Parity-Check (QLDPC) が登場したが、接続性は高い。さらに,これらのコードにはフォールトトレラントシンドロームの抽出と復号化の問題も検討されており,使用上の障害も残っている。本稿では,高次曲面符号と高次カラー符号の2種類のQLDPC符号について考察する。上記の3つの課題に対処する。フラッグ・プロキシ・ネットワーク(FPN)は,フラッグ・プロキシ・キュービットによる低接続を実現する量子符号の一般化可能なアーキテクチャである。本稿では,一般量子符号に対するグレディシンドローム抽出アルゴリズムを提案するとともに,このアルゴリズムをFPN上のフォールトトレラントシンドローム抽出に応用する。フラグ計測を利用して双曲符号を正確に復号する2つの復号器を提案する。我々の研究は、双曲曲面と色符号の次数4のFPNがそれぞれ$2.9\times$と$5.5\times$が$d = 5$平面表面符号よりも空間効率が高く、より高い距離を考慮するとより空間効率が良くなることを示した。双曲符号は、その平面コードに匹敵するエラー率を持つ。 Quantum error correction is necessary for achieving exponential speedups on important applications. The planar surface code has remained the most studied error-correcting code for the last two decades because of its relative simplicity. However, encoding a singular logical qubit with the planar surface code requires physical qubits quadratic in the code distance~($d$), making it space-inefficient for the large-distance codes necessary for promising applications. Thus, {\em Quantum Low-Density Parity-Check (QLDPC)} have emerged as an alternative to the planar surface code but require a higher degree of connectivity. Furthermore, the problems of fault-tolerant syndrome extraction and decoding are understudied for these codes and also remain obstacles to their usage. In this paper, we consider two under-studied families of QLDPC codes: hyperbolic surface codes and hyperbolic color codes. We tackle the three challenges mentioned above as follows. {\em First}, we propose {\em Flag-Proxy Networks (FPNs)}, a generalizable architecture for quantum codes that achieves low connectivity through flag and proxy qubits. {\em Second}, we propose a {\em greedy syndrome extraction scheduling} algorithm for general quantum codes and further use this algorithm for fault-tolerant syndrome extraction on FPNs. {\em Third}, we present two decoders that leverage flag measurements to decode the hyperbolic codes accurately. Our work finds that degree-4 FPNs of the hyperbolic surface and color codes are respectively $2.9\times$ and $5.5\times$ more space-efficient than the $d = 5$ planar surface code, and become even more space-efficient when considering higher distances. The hyperbolic codes also have error rates comparable to their planar counterparts.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# ESPERANTO:テキスト生成のためのAI検出におけるロバスト性を高めるための合成句の評価 ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination ( http://arxiv.org/abs/2409.14285v1 ) ライセンス: Link先を確認	Navid Ayoobi, Lily Knab, Wen Cheng, David Pantoja, Hamidreza Alikhani, Sylvain Flamant, Jin Kim, Arjun Mukherjee,	(参考訳) 大規模言語モデル(LLM)は、様々な領域で重要な有用性を示すが、学術的不正行為や誤情報の拡散など、非倫理的目的の搾取に同時に影響を受けやすい。その結果,AIによるテキスト検出システムが出現した。しかし、これらの検出メカニズムは、回避技術に対する脆弱性を示し、テキスト操作に対する堅牢性を欠いている。本稿では,検出回避のための新しい手法としてバックトランスレーションを導入し,電流検出システムのロバスト性を高める必要性を浮き彫りにした。提案手法では、AI生成したテキストを複数の言語で翻訳し、その後に英語に翻訳する。本稿では、これらの裏書きされたテキストを組み合わせて、オリジナルのAI生成テキストの操作されたバージョンを生成するモデルを提案する。その結果,操作したテキストは元の意味を保ちつつ,既存の検出手法の真の正の率(TPR)を著しく低減していることがわかった。我々は,この手法を,オープンソースと3つのプロプライエタリなシステムを含む9つのAI検出器上で評価し,バックトランスレーション操作に対する感受性を明らかにした。既存のAIテキスト検出装置が抱える欠点に対処し,この形態の操作に対する堅牢性を改善するための対策を提案する。提案手法のTPRは, 逆翻訳操作後, 1.85%しか低下しないことがわかった。さらに,8つの LLM を用いて 720k テキストの大規模なデータセットを構築した。本データセットは,提案手法と既存の検出器の性能を評価するために,各種ドメインの人間によるテキストとLLMによるテキストの両方を格納する。このデータセットは、研究コミュニティの利益のために公開されています。 While large language models (LLMs) exhibit significant utility across various domains, they simultaneously are susceptible to exploitation for unethical purposes, including academic misconduct and dissemination of misinformation. Consequently, AI-generated text detection systems have emerged as a countermeasure. However, these detection mechanisms demonstrate vulnerability to evasion techniques and lack robustness against textual manipulations. This paper introduces back-translation as a novel technique for evading detection, underscoring the need to enhance the robustness of current detection systems. The proposed method involves translating AI-generated text through multiple languages before back-translating to English. We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text. Our findings demonstrate that the manipulated text retains the original semantics while significantly reducing the true positive rate (TPR) of existing detection methods. We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems, revealing their susceptibility to back-translation manipulation. In response to the identified shortcomings of existing AI text detectors, we present a countermeasure to improve the robustness against this form of manipulation. Our results indicate that the TPR of the proposed method declines by only 1.85% after back-translation manipulation. Furthermore, we build a large dataset of 720k texts using eight different LLMs. Our dataset contains both human-authored and LLM-generated texts in various domains and writing styles to assess the performance of our method and existing detectors. This dataset is publicly shared for the benefit of the research community.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# 環境工学のためのオフショア風力エネルギーのオピニオンマイニング Opinion Mining on Offshore Wind Energy for Environmental Engineering ( http://arxiv.org/abs/2409.14292v1 ) ライセンス: Link先を確認	Isabele Bittencourt, Aparna S. Varde, Pankaj Lal,	(参考訳) 本稿では,ソーシャルメディアデータに対する感情分析を行い,オフショア風力エネルギーに関する世論調査を行う。我々は3つの機械学習モデル、すなわちTextBlob, VADER, SentiWordNetを適用する。 TextBlobは、主観性分析と極性分類を提供する。 VADERは累積的な感情スコアを提供する。 SentiWordNetは、感情を文脈を参照して考慮し、それに従って分類を行う。 NLPの手法は、ソーシャルメディアのテキストデータから意味を収集するために利用される。データ視覚化ツールは、全体的な結果を表示するために好適にデプロイされる。この作業は、大量意見の関与による市民科学やスマートガバナンスと密接に結びついており、意思決定支援のガイドとなっている。機械学習とNLPの役割を例示します。 In this paper, we conduct sentiment analysis on social media data to study mass opinion about offshore wind energy. We adapt three machine learning models, namely, TextBlob, VADER, and SentiWordNet because different functions are provided by each model. TextBlob provides subjectivity analysis as well as polarity classification. VADER offers cumulative sentiment scores. SentiWordNet considers sentiments with reference to context and performs classification accordingly. Techniques in NLP are harnessed to gather meaning from the textual data in social media. Data visualization tools are suitably deployed to display the overall results. This work is much in line with citizen science and smart governance via involvement of mass opinion to guide decision support. It exemplifies the role of Machine Learning and NLP here.	翻訳日:2024-11-06 23:26:16 公開日:2024-09-22
# HM3D-OVON:オープン語彙オブジェクトゴールナビゲーションのためのデータセットとベンチマーク HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation ( http://arxiv.org/abs/2409.14296v1 ) ライセンス: Link先を確認	Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, Sehoon Ha,	(参考訳) 本稿では,従来のObject Goal Navigation(ObjectNav)ベンチマークの範囲と意味範囲を広げる大規模ベンチマークであるHabitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON)を提案する。 HM3DSemデータセットを活用することで、HM3D-OVONは379の異なるカテゴリにわたる15万以上の注釈付きオブジェクトのインスタンスを組み込む。目標オブジェクトを6-20カテゴリの事前定義されたセットに制限する以前のObjectNavデータセットとは対照的に、HM3D-OVONはテスト時にフリーフォーム言語で定義された目標のオープンセットでモデルのトレーニングと評価を容易にする。このオープン語彙の定式化を通じて、HM3D-OVONは、オープン語彙的な方法でテキストによって指定された任意のオブジェクトを検索できるビジュオ・セマンティックなナビゲーション行動の学習を促進する。さらに,HM3D-OVONの様々なアプローチを系統的に評価し,比較した。我々は,HM3D-OVONを用いて,オープン語彙のObjectNavエージェントを訓練し,高い性能を実現し,最先端のObjectNavアプローチよりもローカライゼーションやアクティベーションノイズに頑健であることを確認した。われわれのベンチマークとベースラインの結果が、現実世界の空間をナビゲートしてフリーフォーム言語で特定された家庭用のオブジェクトを見つけ、より柔軟で人間らしいセマンティックなビジュアルナビゲーションへと進む、エンボディエージェントの開発への関心を喚起する。コードとビデオは: naoki.io/ovon.comで入手できる。 We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of real-world environments. In contrast to earlier ObjectNav datasets, which limit goal objects to a predefined set of 6-20 categories, HM3D-OVON facilitates the training and evaluation of models with an open-set of goals defined through free-form language at test-time. Through this open-vocabulary formulation, HM3D-OVON encourages progress towards learning visuo-semantic navigation behaviors that are capable of searching for any object specified by text in an open-vocabulary manner. Additionally, we systematically evaluate and compare several different types of approaches on HM3D-OVON. We find that HM3D-OVON can be used to train an open-vocabulary ObjectNav agent that achieves both higher performance and is more robust to localization and actuation noise than the state-of-the-art ObjectNav approach. We hope that our benchmark and baseline results will drive interest in developing embodied agents that can navigate real-world spaces to find household objects specified through free-form language, taking a step towards more flexible and human-like semantic visual navigation. Code and videos available at: naoki.io/ovon.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# DBSCANアルゴリズムのニューロモルフィックな実装 A Neuromorphic Implementation of the DBSCAN Algorithm ( http://arxiv.org/abs/2409.14298v1 ) ライセンス: Link先を確認	Charles P. Rizzo, James S. Plank,	(参考訳) DBSCANはノイズの存在下でクラスタリングを行うアルゴリズムである。本稿では、スパイクニューラルネットワークを用いて、DBSCANをニューロモルフィックに実装するための2つの構成法を提案する。最初の構成は「フラット」と呼ばれ、結果として大きなスパイクニューラルネットワークが高速にアルゴリズムを計算し、5つのステップで計算する。さらに、ネットワークはパイプライン化が可能であり、新しいDBSCAN計算をタイムステップ毎に実行することができる。 2番目の構成は"systolic"と呼ばれ、より小さなネットワークを生成するが、列ごとに複数のタイムステップで入力をスパイクする必要がある。構築の正確な仕様を提供し、実用的なニューロモルフィック・コンピューティング・セッティングで解析する。オープンソース実装も提供しています。 DBSCAN is an algorithm that performs clustering in the presence of noise. In this paper, we provide two constructions that allow DBSCAN to be implemented neuromorphically, using spiking neural networks. The first construction is termed "flat," resulting in large spiking neural networks that compute the algorithm quickly, in five timesteps. Moreover, the networks allow pipelining, so that a new DBSCAN calculation may be performed every timestep. The second construction is termed "systolic", and generates much smaller networks, but requires the inputs to be spiked in over several timesteps, column by column. We provide precise specifications of the constructions and analyze them in practical neuromorphic computing settings. We also provide an open-source implementation.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# 条件付きガウスアンサンブルKalmanフィルタを用いたディープラーニング強化データ同化の競争ベースライン A competitive baseline for deep learning enhanced data assimilation using conditional Gaussian ensemble Kalman filtering ( http://arxiv.org/abs/2409.14300v1 ) ライセンス: Link先を確認	Zachariah Malik, Romit Maulik,	(参考訳) Ensemble Kalman Filtering (EnKF) はデータ同化の一般的な手法であり、幅広い応用がある。しかし、摂動が非線形であるとき、バニラ EnKF フレームワークは十分に定義されていない。本研究では,条件付きガウス式 EnKF (CG-EnKF) と正規値 EnKF (NS-EnKF) と呼ばれるバニラ型 EnKF の非線形拡張について検討した。次に、これらのモデルを、スコアフィルタ(SF)と呼ばれる最先端のディープラーニングに基づく粒子フィルタと比較する。このモデルは、密度を推定するために高価なスコア拡散モデルを使用し、有効性のために摂動作用素に強い仮定を必要とする。比較の結果, CG-EnKF と NS-EnKF は, ローレンツ96 システムによって与えられる高次元多次元データ同化法において, SF を劇的に上回っていることがわかった。解析の結果,CG-EnKFとNS-EnKFは非ガウス的な雑音摂動を処理できることがわかった。 Ensemble Kalman Filtering (EnKF) is a popular technique for data assimilation, with far ranging applications. However, the vanilla EnKF framework is not well-defined when perturbations are nonlinear. We study two non-linear extensions of the vanilla EnKF - dubbed the conditional-Gaussian EnKF (CG-EnKF) and the normal score EnKF (NS-EnKF) - which sidestep assumptions of linearity by constructing the Kalman gain matrix with the `conditional Gaussian' update formula in place of the traditional one. We then compare these models against a state-of-the-art deep learning based particle filter called the score filter (SF). This model uses an expensive score diffusion model for estimating densities and also requires a strong assumption on the perturbation operator for validity. In our comparison, we find that CG-EnKF and NS-EnKF dramatically outperform SF for a canonical problem in high-dimensional multiscale data assimilation given by the Lorenz-96 system. Our analysis also demonstrates that the CG-EnKF and NS-EnKF can handle highly non-Gaussian additive noise perturbations, with the latter typically outperforming the former.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# UU-Mamba: 心血管拡張のための不確かさを意識したU-Mamba UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation ( http://arxiv.org/abs/2409.14305v1 ) ライセンス: Link先を確認	Ting Yu Tsai, Li Lin, Shu Hu, Connie W. Tsao, Xin Li, Ming-Ching Chang, Hongtu Zhu, Xin Wang,	(参考訳) 心血管構造のセグメンテーションにおけるディープラーニングモデルの成功に基づいて、特に小さな注釈付きデータセットにおいて、一般化と堅牢性の改善に注目が集まっている。最近の進歩にもかかわらず、現在のアプローチは、大きなデータセットや狭い最適化技術に依存しているため、過度な適合や精度の制限といった課題に直面していることが多い。本稿では,U-Mambaアーキテクチャの拡張であるU-Mambaモデルを紹介する。 Sharpness-Aware Minimization (SAM) を取り入れたモデルにより、損失景観におけるフラットなミニマをターゲットとした一般化が促進される。さらに、地域ベース、分布ベース、画素ベースコンポーネントを組み合わせた不確実性認識損失関数を提案し、局所的特徴とグローバル的特徴の両方をキャプチャすることでセグメンテーション精度を向上させる。 UU-Mambaモデルはすでに優れた性能を示しているが、その一般化とロバスト性を完全に評価するにはさらなるテストが必要である。 ImageCAS(冠状動脈)とAorta(大動脈枝とゾーン)のデータセットを新たに試行することで評価を拡大し、これまでの研究で用いたACDCデータセット(左および右心室)よりも複雑なセグメンテーション課題を提示し、モデルの適応性とレジリエンスを示す。 UU-Mamba は TransUNet, Swin-Unet, nnUNet, nnFormer などの先行モデルよりも優れた性能を示している。さらに,より広範な実験で示すように,モデルの堅牢性とセグメント化の精度をより包括的に評価する。 Building on the success of deep learning models in cardiovascular structure segmentation, increasing attention has been focused on improving generalization and robustness, particularly in small, annotated datasets. Despite recent advancements, current approaches often face challenges such as overfitting and accuracy limitations, largely due to their reliance on large datasets and narrow optimization techniques. This paper introduces the UU-Mamba model, an extension of the U-Mamba architecture, designed to address these challenges in both cardiac and vascular segmentation. By incorporating Sharpness-Aware Minimization (SAM), the model enhances generalization by targeting flatter minima in the loss landscape. Additionally, we propose an uncertainty-aware loss function that combines region-based, distribution-based, and pixel-based components to improve segmentation accuracy by capturing both local and global features. While the UU-Mamba model has already demonstrated great performance, further testing is required to fully assess its generalization and robustness. We expand our evaluation by conducting new trials on the ImageCAS (coronary artery) and Aorta (aortic branches and zones) datasets, which present more complex segmentation challenges than the ACDC dataset (left and right ventricles) used in our previous work, showcasing the model's adaptability and resilience. We confirm UU-Mamba's superior performance over leading models such as TransUNet, Swin-Unet, nnUNet, and nnFormer. Moreover, we provide a more comprehensive evaluation of the model's robustness and segmentation accuracy, as demonstrated by extensive experiments.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# LLM は One-Shot URL 分類器と Explainer である LLMs are One-Shot URL Classifiers and Explainers ( http://arxiv.org/abs/2409.14306v1 ) ライセンス: Link先を確認	Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga Seneviratne,	(参考訳) 悪意のあるURL分類は、サイバーセキュリティの重要な側面である。既存の作業には、多数の機械学習とディープラーニングベースのURL分類モデルが含まれているが、その多くは、一般的なトレーニングデータセットの欠如に起因する一般化とドメイン適応の問題に悩まされている。さらに、これらのモデルでは、自然言語で与えられたURL分類の説明が得られない。本研究では,この問題に対するLarge Language Models (LLM) の使用について検討し,その実例を示す。具体的には、所与のURLが良性であるかフィッシングであるかを予測するために、Chain-of-Thought(CoT)推論を用いるLLMベースのワンショット学習フレームワークを提案する。 3つのURLデータセットと5つの最先端LLMを使用してフレームワークを評価し、一発のLCMプロンプトが実際に教師付きモデルに近いパフォーマンスを提供し、GPT 4-Turboが最高のモデルであり、Claude 3 Opusが続くことを示した。我々は, LLMの説明を定量的に分析し, LLMによる説明のほとんどは, 教師付き分類器のポストホックな説明と一致し, 高い可読性, 一貫性, 情報性を有することを示す。 Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# ゼイガーのNP完全性とゼロ知識証明 NP-Completeness and Physical Zero-Knowledge Proofs for Zeiger ( http://arxiv.org/abs/2409.14308v1 ) ライセンス: Link先を確認	Suthee Ruangwises,	(参考訳) ゼーガー(Zeiger)は、長方形格子からなる鉛筆パズルで、各セルが水平方向または垂直方向に矢印を向けている。一部の細胞は正の整数も含む。このパズルの目的は、正の整数を、各セルの整数が、そのセルの矢印が指している方向に沿った全てのセルの異なる整数の数に等しいように、すべての無数のセルに埋めることである。本稿では,Zeiger パズルの解答性を決定することは,非等値な正の 3SAT (NAE3SAT+) 問題から還元することで NP 完全であることが証明される。また,Zeigerの物理ゼロ知識証明プロトコルを構築することで,証明者がパズルの解の存在を物理的に示すことができる。 Zeiger is a pencil puzzle consisting of a rectangular grid, with each cell having an arrow pointing in horizontal or vertical direction. Some cells also contain a positive integer. The objective of this puzzle is to fill a positive integer into every unnumbered cell such that the integer in each cell is equal to the number of different integers in all cells along the direction an arrow in that cell points to. In this paper, we prove that deciding solvability of a given Zeiger puzzle is NP-complete via a reduction from the not-all-equal positive 3SAT (NAE3SAT+) problem. We also construct a card-based physical zero-knowledge proof protocol for Zeiger, which enables a prover to physically show a verifier the existence of the puzzle's solution without revealing it.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# Sketch-and-Solve:ランダム化数値線形代数を用いた過決定最小二乗の最適化 Sketch-and-Solve: Optimized Overdetermined Least-Squares Using Randomized Numerical Linear Algebra ( http://arxiv.org/abs/2409.14309v1 ) ライセンス: Link先を確認	Alex Lavaee,	(参考訳) スケッチ・アンド・ソルブ (Sketch-and-solve) は、スケッチ行列を用いて次元を小さくすることで、大規模計算問題に取り組むための強力なパラダイムである。本稿では, 機械学習や信号処理, 数値最適化など, 様々な領域に根ざした, 過度に決定された最小二乗問題の解法として, スケッチ・アンド・ソルジアルゴリズムを適用することに焦点を当てる。本稿では、スケッチ・アンド・ソルブのパラダイムを概観し、濃密・スパースな変種を含む様々なスケッチ演算子を解析する。ランダム化線形代数法を用いて近似解を効率的に計算するSketch-and-Apply (SAA-SAS) アルゴリズムを提案する。大規模最小二乗問題に対する広範な実験により,提案手法は従来のLast-Squares QR (LSQR) アルゴリズムよりも高い性能を示し,精度は同等である。本結果は,大規模数値線形代数問題を効率的に扱う上でのスケッチ・アンド・ソルブ法の可能性を強調した。 Sketch-and-solve is a powerful paradigm for tackling large-scale computational problems by reducing their dimensionality using sketching matrices. This paper focuses on applying sketch-and-solve algorithms to efficiently solve the overdetermined least squares problem, which is fundamental in various domains such as machine learning, signal processing, and numerical optimization. We provide a comprehensive overview of the sketch-and-solve paradigm and analyze different sketching operators, including dense and sparse variants. We introduce the Sketch-and-Apply (SAA-SAS) algorithm, which leverages randomized numerical linear algebra techniques to compute approximate solutions efficiently. Through extensive experiments on large-scale least squares problems, we demonstrate that our proposed approach significantly outperforms the traditional Least-Squares QR (LSQR) algorithm in terms of runtime while maintaining comparable accuracy. Our results highlight the potential of sketch-and-solve techniques in efficiently handling large-scale numerical linear algebra problems.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# ヘラルド単一光子の全繊維源のフルキャラクタリゼーション Full characterization of an all fiber source of heralded single photons ( http://arxiv.org/abs/2409.14310v1 ) ライセンス: Link先を確認	Yunxiao Zhang, Liang Cui, Xueshi Guo, Wen Zhao, Xiaoying Li, Z. Y. Ou,	(参考訳) 本研究では,パルス励起自発4光波混合から発生する光子対を,市販の分散シフトファイバに用いた1光子源を実証する。 1550nm帯の単一光子源は、光子計数法とホモダイン検出法の両方で特徴付けられる。シーディング効率とモード純度は光子計数により測定でき、真空寄与部はホモダイン検出により検出できる。 We demonstrate a heralded single photon source which is based on the photon pairs generated from pulse pumped spontaneous four wave mixing in a piece of commercially available dispersion shifted fiber. The single photon source at 1550 nm telecom band is characterized with both photon counting technique and homodyne detection method. The heralding efficiency and mode purity can be measured by photon counting while the vacuum contribution part can be found by homodyne detection.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# 不均衡画像分類のための異方性拡散確率モデル Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification ( http://arxiv.org/abs/2409.14313v1 ) ライセンス: Link先を確認	Jingyu Kong, Yuan Guo, Yu Wang, Yuping Duan,	(参考訳) 現実世界のデータはしばしば長い尾の分布を持ち、尾のサンプルの不足はモデルの一般化能力を著しく制限する。 Denoising Diffusion Probabilistic Models (DDPM) は確率微分方程式理論に基づく生成モデルであり、画像分類タスクにおいて顕著な性能を示した。しかし、既存の拡散確率モデルは、末尾類を分類する際に満足に機能しない。本研究では,不均衡な画像分類問題に対するAnisotropic Diffusion Probabilistic Model (ADPM)を提案する。我々は,データ分布を利用して,前処理中の異なるクラスサンプルの拡散速度を制御し,逆処理におけるデノイザの分類精度を効果的に向上する。具体的には,不均衡な分類問題に対処するために,誤差解析理論に基づく拡散過程の異なるカテゴリの雑音レベルを選択する理論的戦略を提案する。さらに,前処理に先立ってグローバル画像と局所画像を統合し,空間次元におけるモデルの識別能力を高めるとともに,逆処理に意味レベルの文脈情報を組み込んで,モデルの識別力と堅牢性を高める。 4つの医用ベンチマークデータセットの最先端手法との比較により,提案手法の有効性を検証した。その結果, 異方性拡散モデルにより, ヘッドクラスの精度を維持しつつ, 希少クラスの分類精度が著しく向上することが確認された。皮膚病変データセット,PAD-UFES,HAM10000では,元の拡散確率モデルと比較してF1スコアが4%,3%改善した。 Real-world data often has a long-tailed distribution, where the scarcity of tail samples significantly limits the model's generalization ability. Denoising Diffusion Probabilistic Models (DDPM) are generative models based on stochastic differential equation theory and have demonstrated impressive performance in image classification tasks. However, existing diffusion probabilistic models do not perform satisfactorily in classifying tail classes. In this work, we propose the Anisotropic Diffusion Probabilistic Model (ADPM) for imbalanced image classification problems. We utilize the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process. Specifically, we provide a theoretical strategy for selecting noise levels for different categories in the diffusion process based on error analysis theory to address the imbalanced classification problem. Furthermore, we integrate global and local image prior in the forward process to enhance the model's discriminative ability in the spatial dimension, while incorporate semantic-level contextual information in the reverse process to boost the model's discriminative power and robustness. Through comparisons with state-of-the-art methods on four medical benchmark datasets, we validate the effectiveness of the proposed method in handling long-tail data. Our results confirm that the anisotropic diffusion model significantly improves the classification accuracy of rare classes while maintaining the accuracy of head classes. On the skin lesion datasets, PAD-UFES and HAM10000, the F1-scores of our method improved by 4% and 3%, respectively compared to the original diffusion probabilistic model.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# MVPGS: スパースインプットビューからガウススティングのマルチビュープリミティブを発掘する MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views ( http://arxiv.org/abs/2409.14316v1 ) ライセンス: Link先を確認	Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang,	(参考訳) 近年,Neural Radiance Field (NeRF) の進歩により,NVS (Novell View Synthesis) が実現している。 NeRFの高密度な入力要求を減らそうとする試みは数多くあるが、それでも時間を要するトレーニングとレンダリングのプロセスに悩まされている。最近では、3D Gaussian Splatting (3DGS) が、明示的な点ベース表現でリアルタイムな高品質なレンダリングを実現している。しかし、NeRFと同様に、制約の欠如のために列車のビューに過度に適合する傾向がある。本稿では,3次元ガウススプラッティングに基づくマルチビュー先行を探索する数ショットNVS法である「textbf{MVPGS}」を提案する。我々は3DGSの幾何学的初期化の質を高めるために,最近の学習ベースマルチビューステレオ(MVS)を活用している。オーバーフィッティングを緩和するため、計算された幾何学に基づいて、シーンに応じた外観制約を付加するフォワードウォーピング手法を提案する。さらに、適切な最適化収束を促進するためにガウスパラメータに対するビュー一貫性幾何制約を導入し、補償として単眼深度正規化を利用する。実験により,提案手法はリアルタイムレンダリング速度で最先端の性能を実現することを示す。プロジェクトページ:https://zezeaa.github.io/projects/MVPGS/ Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose \textbf{MVPGS}, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed. Project page: https://zezeaaa.github.io/projects/MVPGS/	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# テキストによるビデオ質問応答のためのシーンテキストグラウンドリング Scene-Text Grounding for Text-Based Video Question Answering ( http://arxiv.org/abs/2409.14319v1 ) ライセンス: Link先を確認	Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua,	(参考訳) テキストベースのビデオ質問応答(TextVideoQA)の既存の取り組みは、不透明な意思決定とシーンテキスト認識への依存で批判されている。本稿では,シーンテキスト領域の時空間的ローカライズをモデルに強制し,シーンテキスト認識からQAを分離し,解釈可能なQAに向けた研究を促進することによって,グラウンドドテキストビデオQAを研究することを提案する。その仕事は3倍の意義がある。まず、シーンテキストのエビデンスを他のショートカットと比較して、回答の予測を推奨する。第2に、シーンテキスト領域を直接視覚的回答として受け入れ、文字列マッチングによる非効率な回答評価の問題を回避している。第3に、ビデオQAとシーンテキスト認識で継承された課題を分離する。これにより、失敗予測の根本原因(例えば、間違ったQAや間違ったシーンテキスト認識など)の診断が可能になる。弱教師付きシーン・テキスト・グラウンドイングとグラウンドド・テキスト・コントラスト学習を両立させるT2S-QAモデルを提案する。評価を容易にするために,52Kのシーンテキスト境界ボックスを,2Kの質問と729の動画に関連する2.2Kの時間セグメント内に配置した新しいデータセットViTXT-GQAを構築した。また,VTXT-GQAを用いて実験を行い,既存の手法の厳密な限界を実証する。 T2S-QAは優れた結果が得られたが、ヒトの葉に対する大きな性能ギャップは改善の余地が十分にある。オラクルのシーンテキスト入力のさらなる分析は、シーンテキスト認識が大きな課題であることを示している。 Grounded TextVideoQAの研究を進めるために、我々のデータセットとコードは \url{https://github.com/zhousheng97/ViTXT-GQA.git} にある。 Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and heavy reliance on scene-text recognition. In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards interpretable QA. The task has three-fold significance. First, it encourages scene-text evidence versus other short-cuts for answer predictions. Second, it directly accepts scene-text regions as visual answers, thus circumventing the problem of ineffective answer evaluation by stringent string matching. Third, it isolates the challenges inherited in VideoQA and scene-text recognition. This enables the diagnosis of the root causes for failure predictions, e.g., wrong QA or wrong scene-text recognition? To achieve Grounded TextVideoQA, we propose the T2S-QA model that highlights a disentangled temporal-to-spatial contrastive learning strategy for weakly-supervised scene-text grounding and grounded TextVideoQA. To facilitate evaluation, we construct a new dataset ViTXT-GQA which features 52K scene-text bounding boxes within 2.2K temporal segments related to 2K questions and 729 videos. With ViTXT-GQA, we perform extensive experiments and demonstrate the severe limitations of existing techniques in Grounded TextVideoQA. While T2S-QA achieves superior results, the large performance gap with human leaves ample space for improvement. Our further analysis of oracle scene-text inputs posits that the major challenge is scene-text recognition. To advance the research of Grounded TextVideoQA, our dataset and code are at \url{https://github.com/zhousheng97/ViTXT-GQA.git}	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# 映画におけるトロープ付き大言語モデルの物語的推論限界の解き明かす Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses ( http://arxiv.org/abs/2409.14324v1 ) ライセンス: Link先を確認	Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu,	(参考訳) 大型言語モデル (LLM) にはチェーン・オブ・シンクレット (CoT) のプロンプトが備わっており、数学、常識、論理学などの実コンテンツにおいて、重要な多段階の推論能力を示している。しかし、より抽象的な能力を必要とする物語的推論における彼らのパフォーマンスは、まだ解明されていない。本研究は,映画シナプスのトロープを利用して,最先端のLDMの抽象的推論能力を評価し,その低性能を明らかにする。本稿では,これらの課題に対処し,F1スコアを11.8ポイント向上するためのトロープワイズクエリ手法を提案する。さらに, 先行研究は, CoTが多段階推論を強化することを示唆する一方で, 本研究は, CoTが物語内容の幻覚を引き起こす可能性を示し, GPT-4の性能を低下させることを示した。また, トロープ関連テキストトークンを露骨なトロープなしで映画シンプに埋め込み, それらのインジェクションに対するCoTの高感度化を明らかにした。我々の総合的な分析は将来の研究の方向性についての洞察を提供する。 Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the abstract reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, while prior studies suggest that CoT enhances multi-step reasoning, this study shows CoT can cause hallucinations in narrative content, reducing GPT-4's performance. We also introduce an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, revealing CoT's heightened sensitivity to such injections. Our comprehensive analysis provides insights for future research directions.	翻訳日:2024-11-06 23:15:03 公開日:2024-09-22
# ISC4DGF: LLM駆動初期種子コーパス生成による直接グレーボックスファジリングの強化 ISC4DGF: Enhancing Directed Grey-box Fuzzing with LLM-Driven Initial Seed Corpus Generation ( http://arxiv.org/abs/2409.14329v1 ) ライセンス: Link先を確認	Yijiang Xu, Hongrui Jia, Liguo Chen, Xin Wang, Zhengran Zeng, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang, Zhonghai Wu,	(参考訳) ファズテストはソフトウェア脆弱性の特定に不可欠であり、AFLやAngoraのようなカバレッジガイド付きグレーボックスファズーは広範な検出に優れています。しかし、ターゲット検出の必要性が高まるにつれて、特定の脆弱性に焦点を当てたディレクトグレーボックスファジング(DGF)が不可欠になっている。最初のシードコーパスは、ファザーが出発点として使用する、慎重に選択された入力サンプルで構成され、ファザーが探索する経路を決定するのに基本的なものである。十分に設計されたシードコーパスは、ファッザをより効果的にコードの重要な領域へ誘導し、ファッザリングプロセスの効率と成功を改善することができる。その重要性にもかかわらず、多くの研究は、初期種子コーパスの最適化に注意を払わずに指導機構の精錬に集中している。本稿では,Large Language Models (LLMs) を用いた DGF の初期シードコーパス生成手法である ISC4DGF を紹介する。 LLMの深いソフトウェア理解と洗練されたユーザー入力を活用することで、ISC4DGFは特定の脆弱性を効率的に引き起こす正確なシードコーパスを生成する。 AFLに実装され、AFLGo、FairFuzz、Entropicといった最先端のファジターに対してMagmaベンチマークを用いてテストを行い、ISC4DGFは35.63倍のスピードアップと616.10倍の目標到達を達成した。さらに、ISC4DGFは、より効果的にターゲットの脆弱性を検知し、コードカバレッジを減らし操作しながら効率を向上させることに重点を置いている。 Fuzz testing is crucial for identifying software vulnerabilities, with coverage-guided grey-box fuzzers like AFL and Angora excelling in broad detection. However, as the need for targeted detection grows, directed grey-box fuzzing (DGF) has become essential, focusing on specific vulnerabilities. The initial seed corpus, which consists of carefully selected input samples that the fuzzer uses as a starting point, is fundamental in determining the paths that the fuzzer explores. A well-designed seed corpus can guide the fuzzer more effectively towards critical areas of the code, improving the efficiency and success of the fuzzing process. Even with its importance, many works concentrate on refining guidance mechanisms while paying less attention to optimizing the initial seed corpus. In this paper, we introduce ISC4DGF, a novel approach to generating optimized initial seed corpus for DGF using Large Language Models (LLMs). By leveraging LLMs' deep software understanding and refined user inputs, ISC4DGF creates precise seed corpus that efficiently trigger specific vulnerabilities. Implemented on AFL and tested against state-of-the-art fuzzers like AFLGo, FairFuzz, and Entropic using the Magma benchmark, ISC4DGF achieved a 35.63x speedup and 616.10x fewer target reaches. Moreover, ISC4DGF focused on more effectively detecting target vulnerabilities, enhancing efficiency while operating with reduced code coverage.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 粒度を考える:多粒度曲線による画像超解像の動的量子化 Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues ( http://arxiv.org/abs/2409.14330v1 ) ライセンス: Link先を確認	Mingshen Wang, Zhao Zhang, Feng Li, Ke Xu, Kang Miao, Meng Wang,	(参考訳) ダイナミック量子化は、画像超解像(SR)において、競争性能を維持しながら、重いSRモデルのモバイルデバイスへの可能性を拡張することで注目を集めている。既存の手法では、各レイヤとパッチにビットを適応的に割り当て、各ローカル領域のレイヤ間構成を探索する。この利点にもかかわらず、SRの精度と量子化効率のトレードオフにはまだ不足している。これとは別に、各層に対して個別に量子化レベルを適用することは、元の層間関係を乱す可能性があるため、量子化モデルの表現能力は低下する。本研究では,画像の固有特性を生かしたグラニュラーDQを提案する。グラニュラーDQは、局所パッチの多粒度解析を行い、その情報密度をさらに探求し、固有のパッチワイドおよび層不変な動的量子化パラダイムを実現する。具体的には、Granular-DQは、異なるパッチの粗い粒度の表現を識別する粒度ビットコントローラ(GBC)を開発し、画像全体への比例的な寄与を一致させて適切なビット幅割り当てを決定する。本研究では,ビット幅と情報密度の関係を考察し,高ビットパッチのよりきめ細かな動的ビット適応を実現するエントロピー・ト・ビット(E2B)機構を考案する。広範囲な実験により、様々なSRモデルにおける最近の最先端手法よりもグラニュラーDQの優位性と一般化能力が検証された。コードは \url{https://github.com/MmmingS/Granular-DQ.git} で入手できる。 Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Existing methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the trade-off of SR accuracy and quantization efficiency. Apart from this, adapting the quantization level for each layer individually can disturb the original inter-layer relationships, thus diminishing the representation capability of quantized models. In this work, we propose Granular-DQ, which capitalizes on the intrinsic characteristics of images while dispensing with the previous consideration for layer sensitivity in quantization. Granular-DQ conducts a multi-granularity analysis of local patches with further exploration of their information densities, achieving a distinctive patch-wise and layer-invariant dynamic quantization paradigm. Specifically, Granular-DQ initiates by developing a granularity-bit controller (GBC) to apprehend the coarse-to-fine granular representations of different patches, matching their proportional contribution to the entire image to determine the proper bit-width allocation. On this premise, we investigate the relation between bit-width and information density, devising an entropy-to-bit (E2B) mechanism that enables further fine-grained dynamic bit adaption of high-bit patches. Extensive experiments validate the superiority and generalization ability of Granular-DQ over recent state-of-the-art methods on various SR models. Code will be available at \url{https://github.com/MmmingS/Granular-DQ.git}.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# PISR: ポーラリメトリック・ニューラルインシシット表面再構成 PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects ( http://arxiv.org/abs/2409.14331v1 ) ライセンス: Link先を確認	Guangcheng Chen, Yicheng He, Li He, Hong Zhang,	(参考訳) 神経性暗黙表面再構成は近年顕著な進歩を遂げている。複雑な放射率モデリングを頼りにしているにもかかわらず、最先端の手法はテクスチャレスやスペキュラーな表面といまだに苦労している。 RGB画像と異なり、偏光画像は表面正規の方位角に直接的な制約を与えることができる。本稿では,幾何学的に正確な偏光損失を利用して形状を洗練させる新しい手法であるPISRを提案する。さらに、PISRは画像空間の表面の正規化を円滑にし、厳密な形状の歪みを排除し、ハッシュグリッドベースのニューラルサイン距離関数を活用して再構成を加速する。実験の結果、PISRは0.5mmのL1チャンファー距離と1mmのFスコアが99.5%であり、従来の偏光面再構成法よりも4～30倍高速であることがわかった。 Neural implicit surface reconstruction has achieved remarkable progress recently. Despite resorting to complex radiance modeling, state-of-the-art methods still struggle with textureless and specular surfaces. Different from RGB images, polarization images can provide direct constraints on the azimuth angles of the surface normals. In this paper, we present PISR, a novel method that utilizes a geometrically accurate polarimetric loss to refine shape independently of appearance. In addition, PISR smooths surface normals in image space to eliminate severe shape distortions and leverages the hash-grid-based neural signed distance function to accelerate the reconstruction. Experimental results demonstrate that PISR achieves higher accuracy and robustness, with an L1 Chamfer distance of 0.5 mm and an F-score of 99.5% at 1 mm, while converging 4~30 times faster than previous polarimetric surface reconstruction methods.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 量子カオスにおける情報取得、スクランブル、およびエラーに対する感度 Information acquisition, scrambling, and sensitivity to errors in quantum chaos ( http://arxiv.org/abs/2409.14332v1 ) ライセンス: Link先を確認	Sreeram PG, Abinash Sahu, Naga Dileep Varikuti, Bishal Kumar Das, Sourav Manna, Vaibhav Madhok,	(参考訳) カオスのシグナチャは、古典的なものがカオスである量子系を研究することによって理解することができる。しかし、可積分性、非可積分性、カオスの概念は古典的な類似を持たないシステムにまで拡張される。ここでは、秩序からカオスへの古典的なルートを最初にレビューする。自然は基本的に量子であるため、量子領域におけるカオスがどのように現れるかについて議論する。半古典的手法を簡潔に記述し、量子情報処理におけるカオスの結果について議論する。我々は、時間外順序相関器(OTOC)、コルモゴロフ-シナイ(KS)エントロピーと誤りに対する感度によって定量化されたリアプノフ指数の量子バージョンをレビューする。次に、量子トモグラフィーを用いた量子カオスのシグネチャの研究をレビューする。古典的には、ダイナミクスを正確に知っていれば、軌道の粗い追跡を一定に保ちながら、初期状態に関する指数関数的にきめ細かな情報を得る。量子環境では、測定記録を固定信号対雑音で追跡すると、初期状態に関する情報が増大する。この過程で我々は、量子状態再構成を伴うクリロフ部分空間に広がる作用素の新しい量子化を与えた。これらのシグネチャの研究は理論的な関心だけでなく、実際的な重要性も持っている。 Signatures of chaos can be understood by studying quantum systems whose classical counterpart is chaotic. However, the concepts of integrability, non-integrability and chaos extend to systems without a classical analogue. Here, we first review the classical route from order into chaos. Since nature is fundamentally quantum, we discuss how chaos manifests in the quantum domain. We briefly describe semi-classical methods, and discuss the consequences of chaos in quantum information processing. We review the quantum version of Lyapunov exponents, as quantified by the out-of-time ordered correlators (OTOC), Kolmogorov-Sinai (KS) entropy and sensitivity to errors. We then review the study of signatures of quantum chaos using quantum tomography. Classically, if we know the dynamics exactly, as we maintain a constant coarse-grained tracking of the trajectory, we gain exponentially fine-grained information about the initial condition. In the quantum setting,as we track the measurement record with fixed signal-to-noise, we gain increasing information about the initial condition. In the process, we have given a new quantification of operator spreading in Krylov subspaces with quantum state reconstruction. The study of these signatures is not only of theoretical interest but also of practical importance.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# MQM-APE:LLM翻訳評価器における自動後編集による高品質エラーアノテーション予測 MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators ( http://arxiv.org/abs/2409.14335v1 ) ライセンス: Link先を確認	Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, Dacheng Tao,	(参考訳) 大規模言語モデル(LLM)は、機械翻訳(MT)の品質評価の裁判官として大きな可能性を示し、スコアときめ細かいフィードバックを提供する。 GEMBA-MQMのような手法は、基準のない評価においてSOTAの性能を示すが、予測誤差は人間によって注釈付けされたものとうまく一致せず、フィードバック信号としての解釈可能性を制限する。 LLM評価器によって予測されるエラーアノテーションの品質を高めるために、各エラーに基づいて原文の翻訳を自動ポスト編集(APE)することで非インパクトエラーをフィルタリングし、品質改善に寄与するエラーのみを残すというアイデアに基づいて、普遍的でトレーニング不要なフレームワークである$\textbf{MQM-APE}$を導入する。具体的には LLM が機能するように促します 1) $\textit{evaluator}$ エラーアノテーションを提供する。 2) $\textit{post-editor}$ エラーが品質改善や品質改善に影響を及ぼすかどうかを決定する。 3) $\textit{pairwise quality verifier}$ as the error filter。 GEMBA-MQMに対する誤りの信頼性と品質は,高リソース言語と低リソース言語の両方において8つのLLMにわたって一貫して改善されている。 MQM-APEは、訓練されたアプローチと直交し、T Towerのような翻訳固有の評価器を補完し、その適用性を強調している。さらに,各モジュールの有効性を検証し,評価器の設計とLLMの選択に関する貴重な知見を提供する。コードはコミュニティを促進するためにリリースされます。 Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment, providing both scores and fine-grained feedback. Although approaches such as GEMBA-MQM has shown SOTA performance on reference-free evaluation, the predicted errors do not align well with those annotated by human, limiting their interpretability as feedback signals. To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement. Specifically, we prompt the LLM to act as 1) $\textit{evaluator}$ to provide error annotations, 2) $\textit{post-editor}$ to determine whether errors impact quality improvement and 3) $\textit{pairwise quality verifier}$ as the error filter. Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM, across eight LLMs in both high- and low-resource languages. Orthogonal to trained approaches, MQM-APE complements translation-specific evaluators such as Tower, highlighting its broad applicability. Further analysis confirm the effectiveness of each module and offer valuable insights into evaluator design and LLMs selection. The code will be released to facilitate the community.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# デュアルビジュアルテキストアライメントを用いたゼロショット骨格に基づく行動認識 Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment ( http://arxiv.org/abs/2409.14336v1 ) ライセンス: Link先を確認	Jidong Kuang, Hongsong Wang, Chaolei Han, Jie Gui,	(参考訳) ゼロショットアクション認識(ゼロショットアクション認識)は、アクション認識におけるスケーラビリティと一般化の問題に対処し、新しいアクションや見えないアクションに動的に適応できるようにする。ゼロショットアクション認識の鍵は、視覚的特徴とアクションカテゴリを表す意味ベクトルの整合にある。既存のほとんどの手法は、視覚的特徴を直接テキストカテゴリのセマンティック空間に投影するか、2つのモード間の共有埋め込み空間を学習する。しかし、直接投影は2つのモダリティを正確に整合させることはできず、視覚的表現とテキスト表現の間の堅牢で差別的な埋め込み空間を学習することはしばしば困難である。これらの問題に対処するために、骨格に基づくゼロショット動作認識のためのデュアルビジュアルテキストアライメント(DVTA)を導入する。 DVTAは2つのアライメントモジュール、DA(Direct Alignment)とAugmented Alignment(Augmented Alignment)で構成され、SDE(Semantic Description Enhancement)が設計されている。 DAモジュールは、特別に設計された視覚プロジェクタを通して、骨格の特徴を意味空間にマッピングし、SDEは、スケルトンとテキストの接続を強化するために、相互アテンションに基づいて、モダリティ間のギャップを減らす。 AAモジュールは、深いメートル法学習を利用して埋め込み空間の学習を強化し、骨格とテキストの類似性を学ぶ。提案手法は、一般的なゼロショットスケルトンに基づく動作認識ベンチマークにおいて、最先端のパフォーマンスを実現する。 Zero-shot action recognition, which addresses the issue of scalability and generalization in action recognition and allows the models to adapt to new and unseen actions dynamically, is an important research topic in computer vision communities. The key to zero-shot action recognition lies in aligning visual features with semantic vectors representing action categories. Most existing methods either directly project visual features onto the semantic space of text category or learn a shared embedding space between the two modalities. However, a direct projection cannot accurately align the two modalities, and learning robust and discriminative embedding space between visual and text representations is often difficult. To address these issues, we introduce Dual Visual-Text Alignment (DVTA) for skeleton-based zero-shot action recognition. The DVTA consists of two alignment modules-Direct Alignment (DA) and Augmented Alignment (AA)-along with a designed Semantic Description Enhancement (SDE). The DA module maps the skeleton features to the semantic space through a specially designed visual projector, followed by the SDE, which is based on cross-attention to enhance the connection between skeleton and text, thereby reducing the gap between modalities. The AA module further strengthens the learning of the embedding space by utilizing deep metric learning to learn the similarity between skeleton and text. Our approach achieves state-of-the-art performances on several popular zero-shot skeleton-based action recognition benchmarks.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# セルフ・スーパービジョンオーディオ・ビジュアル・サウンドスケープ・スティライゼーション Self-Supervised Audio-Visual Soundscape Stylization ( http://arxiv.org/abs/2409.14340v1 ) ライセンス: Link先を確認	Tingle Li, Renhao Wang, Po-Yao Huang, Andrew Owens, Gopala Anumanchipalli,	(参考訳) 音声はシーンに関する情報を多く伝達し、残響から追加の環境音まで様々な効果をもたらす。本稿では、そのシーンから録音された音声-視覚条件の例から、入力音声を異なるシーンで録音されたかのように操作する。本モデルは,自然映像が繰り返し発生する音のイベントやテクスチャを含むという事実を活かして,自己監督を通じて学習する。ビデオから音声クリップを抽出し、音声強調を行う。次に、ビデオ内の他の場所から撮影した別の音声映像クリップを条件付きヒントとして、潜時拡散モデルを訓練し、元の音声を復元する。このプロセスを通じて、モデルは条件付きサンプルの音響特性を入力音声に転送することを学ぶ。提案手法は,未ラベル・イン・ザ・ワイルドビデオによるトレーニングが成功し,付加的な視覚信号による予測能力の向上が期待できることを示す。ビデオの結果については、プロジェクトのWebページをご覧ください。 Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a different scene, given an audio-visual conditional example recorded from that scene. Our model learns through self-supervision, taking advantage of the fact that natural video contains recurring sound events and textures. We extract an audio clip from a video and apply speech enhancement. We then train a latent diffusion model to recover the original speech, using another audio-visual clip taken from elsewhere in the video as a conditional hint. Through this process, the model learns to transfer the conditional example's sound properties to the input speech. We show that our model can be successfully trained using unlabeled, in-the-wild videos, and that an additional visual signal can improve its sound prediction abilities. Please see our project webpage for video results: https://tinglok.netlify.app/files/avsoundscape/	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# メモリマッチングは不十分:ビデオオブジェクトセグメンテーションのためのメモリマッチングとデコーディングを共同で改善 Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation ( http://arxiv.org/abs/2409.14343v1 ) ライセンス: Link先を確認	Jintu Zheng, Yun Liang, Yuqing Zhang, Wanchao Su,	(参考訳) メモリベースビデオオブジェクトセグメンテーション手法は、メモリバンクを確立することで、時間空間と空間空間の長い複数のオブジェクトをモデル化する。しかし、彼らは偽のマッチングを克服するのに苦労し、重要な情報を失う傾向にあり、異なるオブジェクト間で混乱する。本稿では、マッチングとデコーディングの段階を協調的に改善し、偽マッチング問題を緩和する効果的な手法を提案する。メモリマッチングの段階では、短期記憶のわずかな誤差を抑えるコスト認識機構と、広範囲のオブジェクトスケールのマッチング空間を確立する長期記憶用シャッタート・クロススケールマッチングを提案する。読み出し復号の段階では、マッチング段階で欠落している重要な情報を回復することを目的とした補償機構を実装した。 DAVIS 2016&2017 Val (92.4%&88.1%) と DAVIS 2017 Test (83.9%) は、YouTubeVOS 2018&2019 Valで84.8%&84.6%を達成している。 Memory-based video object segmentation methods model multiple objects over long temporal-spatial spans by establishing memory bank, which achieve the remarkable performance. However, they struggle to overcome the false matching and are prone to lose critical information, resulting in confusion among different objects. In this paper, we propose an effective approach which jointly improving the matching and decoding stages to alleviate the false matching issue.For the memory matching stage, we present a cost aware mechanism that suppresses the slight errors for short-term memory and a shunted cross-scale matching for long-term memory which establish a wide filed matching spaces for various object scales. For the readout decoding stage, we implement a compensatory mechanism aims at recovering the essential information where missing at the matching stage. Our approach achieves the outstanding performance in several popular benchmarks (i.e., DAVIS 2016&2017 Val (92.4%&88.1%), and DAVIS 2017 Test (83.9%)), and achieves 84.8%&84.6% on YouTubeVOS 2018&2019 Val.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 絶対分離状態とPT状態の集合の極点について On the extreme points of sets of absolulely separable and PPT states ( http://arxiv.org/abs/2409.14347v1 ) ライセンス: Link先を確認	Zhiwei Song, Lin Chen,	(参考訳) 絶対分離可能状態 (resp. PPT) は、任意の大域的ユニタリ演算の下では分離可能状態 (resp. positive partial transpose) のままである。絶対分離状態と PPT 状態の集合における極点のコンパクトな形式を 2 量子および qubit-qudit 系で提示する。結果は、各極点が少なくとも3つの異なる固有値を持つことを示している。可解線型方程式として表される2量子および四量子系における絶対PPT状態の集合の極点を決定する必要十分条件を確立する。また、qutrit-qudit系における任意の極点が、少なくとも7つの異なる固有値を持つことを示す。非絶対分離性(nonabsolute separability)の概念を導入する。状態が他の状態と混同する必要がある最小の量を定量化し、全体状態が完全に分離可能である。このロバスト性は、単項変換、単調性、凸性の下での正則性、不変性、不変性を満たすことを示し、非絶対分離性(英語版)の資源理論における良い測度である。この測度の分析式は、任意の系における純粋状態と、2量子系における階数と階数と混合状態に対して与えられる。 The absolutely separable (resp. PPT) states remain separable (resp. positive partial transpose) under any global unitary operation. We present a compact form of the extreme points in the sets of absolutely separable states and PPT states in two-qubit and qubit-qudit systems. The results imply that each extreme point has at most three distinct eigenvalues. We establish a necessary and sufficient condition for determining extreme points of the set of absolutely PPT states in two-qutrit and qutrit-qudit systems, expressed as solvable linear equations. We also demonstrate that any extreme point in qutrit-qudit system has at most seven distinct eigenvalues. We introduce the concept of robustness of nonabsolute separability. It quantifies the minimal amount by which a state needs to mix with other states such that the overall state is absolutely separable. We show that the robustness satisfies positivity, invariance under unitary transformation, monotonicity and convexity, so it is a good measure within the resource theory of nonabsolute separability. Analytical expressions for this measure are given for pure states in arbitrary system and rank-two mixed states in two-qubit system.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 1D-CNNを用いた文字・口語タミル音声分類のための特徴工学的アプローチ A Feature Engineering Approach for Literary and Colloquial Tamil Speech Classification using 1D-CNN ( http://arxiv.org/abs/2409.14348v1 ) ライセンス: Link先を確認	M. Nanmalar, S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan,	(参考訳) 理想的なヒューマンコンピュータインタラクション(HCI)では、日常会話で使用される形式であるため、言語の口語形式がほとんどのユーザーに好まれる。しかし、形式的な文体を維持する必要もない。新しいものを受け入れ、古いものを保存することで、共通の人へのサービス(実践性)と言語自体へのサービス(保存性)の両方をレンダリングすることができる。したがって、コンピュータが必要に応じて両方の言語形式で受け入れ、処理し、会話する能力を持つことは理想的である。この問題に対処するためには、まず入力音声の形式を特定することが必要である。このようなフロントエンドシステムは、音声信号の基本となるパターンを捉えることができるいくつかの効果的な特徴に基づいて訓練された、シンプルで効果的で軽量な分類器でなければならない。これを実現するために、時間をかけて特徴の包絡を学習する1次元畳み込みニューラルネットワーク(1D-CNN)を提案する。このネットワークは、最初は特定の手作りの特徴に基づいて訓練され、その後Mel周波数ケプストラム係数(MFCC)を用いて比較を行う。音声のスペクトル特性や時間特性,韻律,声質など,音声の様々な側面に対処するために,手作りの特徴が選択された。特徴は、まず10の並行発話を考慮し、時間に関する各特徴の傾向を観察することによって分析される。提案された1D-CNNは手作りの特徴を使って訓練され、F1スコアは0.9803、MFCCで訓練されたF1スコアは0.9895である。これを踏まえて、特徴アブレーションと特徴の組み合わせを探索する。最高の手工芸品がMFCCと組み合わせられる場合、F1スコアは0.9946である。 In ideal human computer interaction (HCI), the colloquial form of a language would be preferred by most users, since it is the form used in their day-to-day conversations. However, there is also an undeniable necessity to preserve the formal literary form. By embracing the new and preserving the old, both service to the common man (practicality) and service to the language itself (conservation) can be rendered. Hence, it is ideal for computers to have the ability to accept, process, and converse in both forms of the language, as required. To address this, it is first necessary to identify the form of the input speech, which in the current work is between literary and colloquial Tamil speech. Such a front-end system must consist of a simple, effective, and lightweight classifier that is trained on a few effective features that are capable of capturing the underlying patterns of the speech signal. To accomplish this, a one-dimensional convolutional neural network (1D-CNN) that learns the envelope of features across time, is proposed. The network is trained on a select number of handcrafted features initially, and then on Mel frequency cepstral coefficients (MFCC) for comparison. The handcrafted features were selected to address various aspects of speech such as the spectral and temporal characteristics, prosody, and voice quality. The features are initially analyzed by considering ten parallel utterances and observing the trend of each feature with respect to time. The proposed 1D-CNN, trained using the handcrafted features, offers an F1 score of 0.9803, while that trained on the MFCC offers an F1 score of 0.9895. In light of this, feature ablation and feature combination are explored. When the best ranked handcrafted features, from the feature ablation study, are combined with the MFCC, they offer the best results with an F1 score of 0.9946.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 自然言語処理によるテキスト分類によるバーンアウトの表示:オンラインデータから実世界データへ Using Natural Language Processing to find Indication for Burnout with Text Classification: From Online Data to Real-World Data ( http://arxiv.org/abs/2409.14357v1 ) ライセンス: Link先を確認	Mascha Kurpicz-Briki, Ghofrane Merhbene, Alexandre Puttick, Souhir Ben Souissi, Jannic Bieri, Thomas Jörg Müller, Christoph Golz,	(参考訳) ICD-11の症候群に分類されるバーンアウトは、効果的に管理されていない慢性的な職場ストレスから生じる。疲労、シニシズム、職業効果の低下を特徴とし、測定方法の不整合によりその有病率は著しく異なる。自然言語処理(NLP)と機械学習の最近の進歩は、テキストデータ分析を通じてバーンアウトを検出するための有望なツールを提供する。本稿では,ドイツ語テキストのバーンアウト検出に寄与する。 (a)自由文回答とOldenburg Burnout Inventory(OLBI)応答を含む匿名現実世界データセットの収集 b) オンラインデータに基づいて訓練されたジャーマンバートに基づく分類器の限界を示すこと (c) 実世界のアプリケーションでよく機能するモデルを生成する、キュレートされたBurnoutExpressionsデータセットの2つのバージョンを提示します。 (d)バーンアウト検出に使用されるAIモデルの解釈可能性に関する学際的焦点グループからの質的な洞察を提供する。我々の発見は、燃え尽き症候群の検出モデルを洗練するために、AI研究者と臨床専門家とのより深いコラボレーションの必要性を強調した。さらに、NLP研究で開発された現在のAIメソッドの有効性を検証し、強化するためには、より現実的なデータが不可欠である。これは、AIツールが実用的なアプリケーションに適していることを保証するために不可欠である。 Burnout, classified as a syndrome in the ICD-11, arises from chronic workplace stress that has not been effectively managed. It is characterized by exhaustion, cynicism, and reduced professional efficacy, and estimates of its prevalence vary significantly due to inconsistent measurement methods. Recent advancements in Natural Language Processing (NLP) and machine learning offer promising tools for detecting burnout through textual data analysis, with studies demonstrating high predictive accuracy. This paper contributes to burnout detection in German texts by: (a) collecting an anonymous real-world dataset including free-text answers and Oldenburg Burnout Inventory (OLBI) responses; (b) demonstrating the limitations of a GermanBERT-based classifier trained on online data; (c) presenting two versions of a curated BurnoutExpressions dataset, which yielded models that perform well in real-world applications; and (d) providing qualitative insights from an interdisciplinary focus group on the interpretability of AI models used for burnout detection. Our findings emphasize the need for greater collaboration between AI researchers and clinical experts to refine burnout detection models. Additionally, more real-world data is essential to validate and enhance the effectiveness of current AI methods developed in NLP research, which are often based on data automatically scraped from online sources and not evaluated in a real-world context. This is essential for ensuring AI tools are well suited for practical applications.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# MANTA -- 拡張可能なモデルアダプタネイティブ世代 MANTA -- Model Adapter Native generations that's Affordable ( http://arxiv.org/abs/2409.14363v1 ) ライセンス: Link先を確認	Ansh Chaurasia,	(参考訳) モデル生成アルゴリズムは、パーソナライズされた結果を提供するために、単純で柔軟性のないアダプタの選択に依存している。本稿では,実用ハードウェアにおける過去の作業ファクタリングに対する一般化問題としてモデル適応型合成問題を提案し,その新しいアプローチとしてMANTAを導入する。 COCO 2014バリデーションの実験では、MANTAは画像タスクの多様性と品質において、適度な調整のコストで優れていることが示されている。本システムは,タスクの多様性において9,4 %の勝利率,80 %のタスク品質の勝利率を最もよく知られたシステムに対して達成し,合成データ生成や創造的アートドメインにおいて,直接的な利用の可能性を示す。 The presiding model generation algorithms rely on simple, inflexible adapter selection to provide personalized results. We propose the model-adapter composition problem as a generalized problem to past work factoring in practical hardware and affordability constraints, and introduce MANTA as a new approach to the problem. Experiments on COCO 2014 validation show MANTA to be superior in image task diversity and quality at the cost of a modest drop in alignment. Our system achieves a $94\%$ win rate in task diversity and a $80\%$ task quality win rate versus the best known system, and demonstrates strong potential for direct use in synthetic data generation and the creative art domains.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# 初心者プログラマのための大規模言語モデルによるコードコメントの品質評価 Evaluating the Quality of Code Comments Generated by Large Language Models for Novice Programmers ( http://arxiv.org/abs/2409.14368v1 ) ライセンス: Link先を確認	Aysa Xuemo Fan, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Jiaze Ke,	(参考訳) 大規模言語モデル (LLM) は初心者プログラマにコードコメントを生成することを約束している。本研究は, GPT-4, GPT-3.5-Turbo, Llama2によるコードコメントの指導的品質を評価する。 LeetCodeから‘easy’レベルのJavaソリューションのデータセットを分析してみると、GPT-4は、明快さ、初心者フレンドリさ、概念の解明、ステップバイステップのガイダンスなど、初心者にとって重要な側面において、専門家のコメントに匹敵する品質を示す。 GPT-4は複雑性(chi-square = 11.40, p = 0.001)を議論する上でLlama2よりも優れており、GPT-3.5やマン・ホイットニー U-統計学 = 300.5, 322.5, p = 0.0017, 0.0003) の初心者よりもはるかに支持的であると考えられている。この研究は、初心者プログラマに適したコードコメントを生成するLLMの可能性を強調した。 Large Language Models (LLMs) show promise in generating code comments for novice programmers, but their educational effectiveness remains under-evaluated. This study assesses the instructional quality of code comments produced by GPT-4, GPT-3.5-Turbo, and Llama2, compared to expert-developed comments, focusing on their suitability for novices. Analyzing a dataset of ``easy'' level Java solutions from LeetCode, we find that GPT-4 exhibits comparable quality to expert comments in aspects critical for beginners, such as clarity, beginner-friendliness, concept elucidation, and step-by-step guidance. GPT-4 outperforms Llama2 in discussing complexity (chi-square = 11.40, p = 0.001) and is perceived as significantly more supportive for beginners than GPT-3.5 and Llama2 with Mann-Whitney U-statistics = 300.5 and 322.5, p = 0.0017 and 0.0003). This study highlights the potential of LLMs for generating code comments tailored to novice programmers.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# オープンエンド要求に対するエージェント応答における制約満足度評価のための大規模言語モデルの能力 The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests ( http://arxiv.org/abs/2409.14371v1 ) ライセンス: Link先を確認	Lior Madmoni, Amir Zait, Ilia Labzovsky, Danny Karmon,	(参考訳) 生成AIエージェントは、NORA(No One Right Answer)を持つ複雑なユーザリクエストに応答することがしばしば期待されている。このようなリクエストは、エージェントが従うべき一連の制約を伴います。 NORAシナリオのエージェントをうまく開発するには、正確な自動評価フレームワークが不可欠であり、具体的には、エージェントの応答における制約の満足度を検証することができる。近年,大規模な言語モデル (LLM) が多くのNORAタスクに対して多元的評価法として採用されているが,その制約満足度を評価する能力は未だ不明である。そこで本研究では,ACS(Arithmetic Constraint-Satisfaction)ベンチマークデータセットの開発とリリースを行う。データセットは、対応する制約のある複雑なユーザリクエスト、エージェント応答、応答における各制約の満足度を示すヒューマンラベルで構成されている。このデータセットのユニークな特性は、その制約の多くを検証するには、レスポンス全体をレビューする必要があることである(単一の独立した項目の検証を必要とする他の多くのベンチマークとは対照的に)。さらに、推論、文脈内データ抽出、算術演算、計数を行う際のLCMを評価する。次に、制約満足度の評価にオープンとプロプライエタリの両方のLSMをベンチマークし、ほとんどのモデルにまだ改善のための重要なヘッドルームがあることを示し、エラーは主に推論の問題に起因する。さらに、ほとんどのモデルは歪んだ制約満足度予測パターンを示し、接地構造ラベルが「満足」された場合の精度が高い。最後に,本研究モデルの多くは,導入時に性能が低下していることから,タスクのシュートプロンプトは極めて困難であることが判明した。 Generative AI agents are often expected to respond to complex user requests that have No One Right Answer (NORA), e.g., "design a vegetarian meal plan below 1800 calories". Such requests may entail a set of constraints that the agent should adhere to. To successfully develop agents for NORA scenarios, an accurate automatic evaluation framework is essential, and specifically - one capable of validating the satisfaction of constraints in the agent's response. Recently, large language models (LLMs) have been adopted as versatile evaluators for many NORA tasks, but their ability to evaluate constraint-satisfaction in generated text remains unclear. To study this, we develop and release a novel Arithmetic Constraint-Satisfaction (ACS) benchmarking dataset. The dataset consists of complex user requests with corresponding constraints, agent responses and human labels indicating each constraint's satisfaction level in the response. A unique property of this dataset is that validating many of its constraints requires reviewing the response as a whole (in contrast to many other benchmarks that require the validation of a single independent item). Moreover, it assesses LLMs in performing reasoning, in-context data extraction, arithmetic calculations, and counting. We then benchmark both open and proprietary LLMs on evaluating constraint-satisfaction, and show that most models still have a significant headroom for improvement, and that errors primarily stem from reasoning issues. In addition, most models exhibit a skewed constraint-satisfaction prediction pattern, with higher accuracy where the ground-truth label is "satisfied". Lastly, few-shot prompting for our task proved to be rather challenging, since many of the studied models showed a degradation in performance when it was introduced.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# AIシステムへの適切な信頼性を実現するための介入としてデバッグする To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems ( http://arxiv.org/abs/2409.14377v1 ) ライセンス: Link先を確認	Gaole He, Abri Bharos, Ujwal Gadiraju,	(参考訳) 強力な予測AIシステムは、人間の意思決定を増強する大きな可能性を証明している。最近の実証研究は、最適な人間とAIのコラボレーションのビジョンは、AIシステムに対する人間の「適切な依存」を必要とすると主張している。しかし、特にAIシステムに関連するパフォーマンスフィードバックがない場合には、インスタンスレベルでAIアドバイスの信頼性を正確に見積もることは非常に難しい。実際には、アウト・オブ・ディストリビューションデータにおける機械学習モデルのパフォーマンス格差は、データセット固有のパフォーマンスフィードバックを、人間とAIのコラボレーションでは信頼できないものにしている。批判的思考と批判的マインドセットに関する既存の文献にヒントを得て、適切な信頼を育むための介入として、AIシステムをデバッグすることを提案する。本稿では,デバッギング設定におけるAI性能の批判的評価が,ユーザのAIシステム評価を校正し,より適切な信頼性をもたらすかどうかを検討する。定量的実証研究(N = 234)により,提案するデバッグ介入は,適切な依存を促す上では期待通りに機能しないことがわかった。その代わりに、介入後のAIシステムへの依存の減少を観察します。我々は、不適切な信頼パターンの発生を説明するために、異なるパフォーマンスレベルを持つグループ間でのユーザ信頼度とAI信頼度の推定のダイナミクスについて検討する。本研究は、適切な信頼とより良い人間とAIのコラボレーションを促進するために効果的な介入を設計する上で重要な意味を持つ。 Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.	翻訳日:2024-11-06 23:04:03 公開日:2024-09-22
# GroupDiff: 拡散に基づくグループポートレート編集 GroupDiff: Diffusion-based Group Portrait Editing ( http://arxiv.org/abs/2409.14379v1 ) ライセンス: Link先を確認	Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu,	(参考訳) グループ肖像画編集は、ユーザーが常に人を追加したり、削除したり、既存の人を操ったりすることを望んでいるため、非常に望ましい。また、人間同士の相互作用の複雑なダイナミクスや多様なジェスチャーによっても困難である。本稿では,グループ写真編集の先駆的取り組みであるGroupDiffを紹介する。 1) データエンジン:グループ写真編集のためのラベル付きデータがないため、トレーニング用のペアデータを生成するデータエンジンを作成します。トレーニングデータエンジンは、グループ肖像画編集の多様なニーズをカバーしている。 2) 外観保存: 編集後の外観の整合性を維持するため, グループ写真からの人物像を注目モジュールに注入し, 骨格を用いて人体内指導を行う。 3)制御フレキシビリティ:各人物の位置を示す境界ボックスを用いて注意行列を重み付けし、各人物の特徴を正しい場所に注入する。この対人的指導は、操作の柔軟な方法を提供する。大規模な実験では、GroupDiffは既存の方法と比較して最先端のパフォーマンスを示している。 GroupDiffは、オリジナルの写真の忠実さを編集し、維持するためのコントロール機能を提供する。 Group portrait editing is highly desirable since users constantly want to add a person, delete a person, or manipulate existing persons. It is also challenging due to the intricate dynamics of human interactions and the diverse gestures. In this work, we present GroupDiff, a pioneering effort to tackle group photo editing with three dedicated contributions: 1) Data Engine: Since there is no labeled data for group photo editing, we create a data engine to generate paired data for training. The training data engine covers the diverse needs of group portrait editing. 2) Appearance Preservation: To keep the appearance consistent after editing, we inject the images of persons from the group photo into the attention modules and employ skeletons to provide intra-person guidance. 3) Control Flexibility: Bounding boxes indicating the locations of each person are used to reweight the attention matrix so that the features of each person can be injected into the correct places. This inter-person guidance provides flexible manners for manipulation. Extensive experiments demonstrate that GroupDiff exhibits state-of-the-art performance compared to existing methods. GroupDiff offers controllability for editing and maintains the fidelity of the original photos.	翻訳日:2024-11-06 22:52:53 公開日:2024-09-22
# 大規模言語モデルにおける層の重要性の調査 Investigating Layer Importance in Large Language Models ( http://arxiv.org/abs/2409.14381v1 ) ライセンス: Link先を確認	Yang Zhang, Yanfei Dong, Kenji Kawaguchi,	(参考訳) 大規模言語モデル (LLM) は、テキストの理解と処理に際し、注目を集めている。しかし、LLMはいまだに不透明である。 LLMの理解の欠如は、安全クリティカルなシナリオへの展開を妨げ、より良いモデルの開発を妨げる。本研究では,LLMにおける個々の層の重要性を調査し,LLMの理解を深める。本稿では,特徴属性とデータ評価に広く用いられている説明フレームワークであるShapley値を用いて,レイヤの重要性を忠実に評価する効率的なサンプリング手法を提案する。さらに,特定の層を排除して生じる性能劣化を評価するために,層アブレーション実験を実施している。以上の結果から,岩盤層の存在が明らかとなり,初期層が他の層に対して支配的な寄与を示すことが示唆された。 1つのグラウト層を除去すると、モデルの性能が劇的に低下し、しばしばランダムな推測に還元される。逆に、非コーナストーン層を除去すると、パフォーマンスの限界が変更される。本研究は, LLMの基盤層を同定し, 今後の研究におけるその重要な役割を浮き彫りにする。 Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. Nevertheless, LLMs largely remain opaque. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs. We propose an efficient sampling method to faithfully evaluate the importance of layers using Shapley values, a widely used explanation framework in feature attribution and data valuation. In addition, we conduct layer ablation experiments to assess the performance degradation resulting from the exclusion of specific layers. Our findings reveal the existence of cornerstone layers, wherein certain early layers can exhibit a dominant contribution over others. Removing one cornerstone layer leads to a drastic collapse of the model performance, often reducing it to random guessing. Conversely, removing non-cornerstone layers results in only marginal performance changes. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.	翻訳日:2024-11-06 22:52:53 公開日:2024-09-22
# 顔超解像のための事前知識蒸留ネットワーク Prior Knowledge Distillation Network for Face Super-Resolution ( http://arxiv.org/abs/2409.14385v1 ) ライセンス: Link先を確認	Qiu Yang, Xiao Sun, Xin-yu Li, Feng-Qi Cui, Yu-Tong Guo, Shuang-Zhen Hu, Ping Luo, Si-Ying Li,	(参考訳) 顔超解像(FSR)の目的は、低分解能(LR)入力から高分解能(HR)顔画像を再構成することである。ディープラーニング技術の継続的な進歩により、現代の先進的なFSR法は、当初は顔の事前を推定し、この情報を用いて超解像再構成プロセスを支援する。しかし、事前推定の正確性を保証することは依然として困難であり、単純なカスケードと畳み込み操作は、しばしば事前の知識を十分に活用できない。不正確な、あるいは不十分に利用された事前情報は、必然的にFSR性能を低下させる。この問題に対処するため,教師ネットワークから学生ネットワークに事前情報を転送するFSRのための事前知識蒸留ネットワーク(PKDN)を提案する。このアプローチにより、ネットワークは、テスト段階では低解像度の顔画像のみを頼りながら、トレーニング段階での事前学習を可能にし、事前推定の不正確さによる悪影響を軽減することができる。さらに,事前情報を効果的に活用する解析マップ融合ブロックの設計に,ロバストな注意機構を取り入れた。特徴の喪失を防止するため,特徴抽出段階ではマルチスケールの特徴を保ち,その後の超解像再構成プロセスで採用する。ベンチマークデータセットによる実験結果から,我々のPKDNアプローチは,高品質な顔画像を生成する上で,既存のFSR手法を超越していることが示された。 The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation remains challenging, and straightforward cascading and convolutional operations often fail to fully leverage prior knowledge. Inaccurate or insufficiently utilized prior information inevitably degrades FSR performance. To address this issue, we propose a prior knowledge distillation network (PKDN) for FSR, which involves transferring prior information from the teacher network to the student network. This approach enables the network to learn priors during the training stage while relying solely on low-resolution facial images during the testing stage, thus mitigating the adverse effects of prior estimation inaccuracies. Additionally, we incorporate robust attention mechanisms to design a parsing map fusion block that effectively utilizes prior information. To prevent feature loss, we retain multi-scale features during the feature extraction stage and employ them in the subsequent super-resolution reconstruction process. Experimental results on benchmark datasets demonstrate that our PKDN approach surpasses existing FSR methods in generating high-quality face images.	翻訳日:2024-11-06 22:52:53 公開日:2024-09-22
# 非エルミートハミルトニアンによるTE波とTM波の散乱と量子力学 Scattering of TE and TM waves and quantum dynamics generated by non-Hermitian Hamiltonians ( http://arxiv.org/abs/2409.14386v1 ) ライセンス: Link先を確認	Farhang Loran, Ali Mostafazadeh,	(参考訳) 平面対称性を持つ線形等方性媒質による電磁波の散乱の研究は、TEおよびTMモードの散乱に還元することができる。媒体が平行な均一なスラブで構成されている状況では、標準転送行列技術を用いてこれらのモードの散乱問題に対処することができる。本手法は, TEおよびTM波の散乱の動的定式化を提案し, 有効非単項量子系の進化演算子を用いて媒体の遷移行列を与えることにより, 不均一誘電率および透過性プロファイルに拡張する。これにより、反射振幅と透過振幅の力学方程式の系が導かれる。これらの方程式を分離することにより、TEモードとTMモードの散乱問題の解を、リカティ方程式の初期値問題に還元する。 TE波やTM波を反射しない媒体を所定の波数と入射角で同定する上で,この観測の適用について論じる。 The study of the scattering of electromagnetic waves by a linear isotropic medium with planar symmetry can be reduced to that of their TE and TM modes. For situations where the medium consists of parallel homogeneous slabs, one may use the standard transfer matrix technique to address the scattering problem for these modes. We extend the utility of this technique to inhomogeneous permittivity and permeability profiles by proposing a dynamical formulation of the scattering of TE and TM waves in which the transfer matrix for the medium is given in terms of the evolution operator for an effective non-unitary quantum system. This leads to a system of dynamical equations for the reflection and transmission amplitudes. Decoupling these equations we reduce the solution of the scattering problem for TE and TM modes to that of an initial-value problem for a Riccati equation. We discuss the application of this observation in identifying media that do not reflect TE or TM waves with given wavenumber and incidence angle.	翻訳日:2024-11-06 22:52:53 公開日:2024-09-22
# 新しい視点の定義:エンタープライズ情報ガバナンス Defining a new perspective: Enterprise Information Governance ( http://arxiv.org/abs/2409.14388v1 ) ライセンス: Link先を確認	Alastair McCullough,	(参考訳) 本稿では,組織における情報及びデータ資産に関する意思決定権の管理において,説明責任を保証するための制御機構を通じて機能する戦略的な枠組みとして,規制エンタープライズ情報ガバナンスの新たな定義を提唱する。この新たな実践的定義は、実践者と学者の両方の視点を捉えている。この定義は、新しい、より明確に規制されたアプローチを取り入れ、そのようなガバナンスのための新しい定義を合成するために、以前の定義に基づいて構築されている。この論文は学術的な考察とさらなる研究を支援する。情報とデータの定義、情報とデータに関する戦略、データ管理に関する戦略、エンタープライズアーキテクチャの戦略、ガバナンスの戦略的な取り組みとしてのガバナンス、そしてそのようなガバナンスの基礎となる戦略的および戦術的政策と標準の性質について考察する。 This paper adduces a novel definition of regulatory enterprise information governance as a strategic framework that acts through control mechanisms designed to assure accountability in managing decision rights over information and data assets in organizations. This new pragmatic definition takes the perspectives of both the practitioner and of the scholar. It builds upon earlier definitions to take a novel and more clearly regulatory approach and to synthesize a new definition for such governance; to build out a view of it as a scalable regulatory framework for large or complex organizations that sees governance from this new perspective as a business architecture or target operating model in this increasingly critical domain. The paper supports and enables scholarly consideration and further research. It looks at definitions of information and data; of strategy in relation to information and data; of data management; of enterprise architecture; of governance, and governance as a type of strategic endeavor, and of the nature of strategic and tactical policies and standards that form the basis for such governance.	翻訳日:2024-11-06 22:52:53 公開日:2024-09-22
# MaskedMimic: Masked Motion Inpaintingによる統一された物理ベースの文字制御 MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting ( http://arxiv.org/abs/2409.14393v1 ) ライセンス: Link先を確認	Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, Xue Bin Peng,	(参考訳) さまざまなシナリオにまたがって、対話的なキャラクターに人生を吹き込むことのできる、多用途な物理ベースのコントローラーを作れば、キャラクターアニメーションのエキサイティングなフロンティアになる。理想的なコントローラは、スパースターゲットキーフレーム、テキスト命令、シーン情報など、多様な制御モダリティをサポートする必要がある。以前の研究では、物理的にシミュレートされたシーン認識制御モデルが提案されていたが、これらのシステムは、それぞれのタスクの狭いセットと制御モダリティを専門とするコントローラの開発に主に焦点を絞っている。この研究はMaskedMimicという、物理に基づく文字制御を一般的な運動インペイント問題として定式化する新しいアプローチを提示している。私たちの重要な洞察は、マスクされたキーフレーム、オブジェクト、テキスト記述、あるいはそれらの組み合わせのような部分的な(マスキングされた)モーション記述から、モーションを合成するための単一の統一モデルをトレーニングすることです。これは、動作追跡データを活用するとともに、多様な動作記述を効果的に活用してコヒーレントなアニメーションを生成するスケーラブルなトレーニング手法を設計することで実現される。このプロセスを通じて,興味のあるすべての動作に対して面倒な報酬工学を必要とせず,直感的な制御インタフェースを提供する物理ベースのコントローラを学習する。コントローラは幅広い制御モードをサポートし、異なるタスク間のシームレスな遷移を可能にする。 MaskedMimicは、モーションインペイントによる文字制御を統一することにより、多目的な仮想文字を生成する。これらのキャラクターは複雑なシーンに動的に適応し、必要に応じて多様な動きを構成でき、よりインタラクティブで没入的な体験を可能にする。 Crafting a single, versatile physics-based controller that can breathe life into interactive characters across a wide spectrum of scenarios represents an exciting frontier in character animation. An ideal controller should support diverse control modalities, such as sparse target keyframes, text instructions, and scene information. While previous works have proposed physically simulated, scene-aware control models, these systems have predominantly focused on developing controllers that each specializes in a narrow set of tasks and control modalities. This work presents MaskedMimic, a novel approach that formulates physics-based character control as a general motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training method that can effectively utilize diverse motion descriptions to produce coherent animations. Through this process, our approach learns a physics-based controller that provides an intuitive control interface without requiring tedious reward engineering for all behaviors of interest. The resulting controller supports a wide range of control modalities and enables seamless transitions between disparate tasks. By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters. These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# スパースビュートモグラフィ再構成のための周波数規則化ニューラル表現法 Frequency-regularized Neural Representation Method for Sparse-view Tomographic Reconstruction ( http://arxiv.org/abs/2409.14394v1 ) ライセンス: Link先を確認	Jingmou Xian, Jian Zhu, Haolin Liao, Si Li,	(参考訳) スパース・ビュー・トモグラフィーは放射線線量削減と臨床応用性向上のための重要な方向である。多くの研究がスパース2次元投影からのトモグラフィ画像の再構成を提案しているが、既存のモデルはスパース入力画像内の低周波成分を見落としながら、過度に高周波情報に集中する傾向にある。高周波情報に対するこのバイアスは、しばしば過度に適合し、特に再建されたスライスにおけるエッジとバウンダリで強められる。本稿では,周波数正規化ニューラル減衰/活動場(Freq-NAF)を自己教師付きスパース・ビュー・トモグラフィーの再構成に適用する。 Freq-NAFは、ニューラルネットワーク入力の可視周波数帯域を直接制御し、周波数正規化を導入することで過度な適合を緩和する。このアプローチは、高周波と低周波の情報を効果的にバランスさせる。 CBCTおよびSPECTデータセットの数値実験を行い,その精度を実証した。 Sparse-view tomographic reconstruction is a pivotal direction for reducing radiation dose and augmenting clinical applicability. While many research works have proposed the reconstruction of tomographic images from sparse 2D projections, existing models tend to excessively focus on high-frequency information while overlooking low-frequency components within the sparse input images. This bias towards high-frequency information often leads to overfitting, particularly intense at edges and boundaries in the reconstructed slices. In this paper, we introduce the Frequency Regularized Neural Attenuation/Activity Field (Freq-NAF) for self-supervised sparse-view tomographic reconstruction. Freq-NAF mitigates overfitting by incorporating frequency regularization, directly controlling the visible frequency bands in the neural network input. This approach effectively balances high-frequency and low-frequency information. We conducted numerical experiments on CBCT and SPECT datasets, and our method demonstrates state-of-the-art accuracy.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# 大規模言語モデルを用いたターゲット非依存情報からのユーザスタンス予測 Predicting User Stances from Target-Agnostic Information using Large Language Models ( http://arxiv.org/abs/2409.14395v1 ) ライセンス: Link先を確認	Siyuan Brandon Loh, Liang Ze Wong, Prasanta Bhattacharya, Joseph Simons, Wei Gao, Hong Zhang,	(参考訳) 本研究では,ターゲットを意識しないソーシャルメディア投稿(ユーザレベルのスタンス予測)の収集から,ターゲットに対するユーザのスタンスを予測できるLarge Language Models(LLMs)能力について検討する。 LLMがこのタスクをこなせることを示す初期の証拠を示す一方で、モデル全体の性能にかなりのばらつきが浮かび上がっている。 (i)スタンスターゲットの種類 (二)予測戦略及び予測戦略 (三)対象不明の官職の個数ポストホック分析は、表面レベル(例えば、ターゲット関連キーワード)とユーザレベル機能(例えば、ユーザーの道徳的価値をエンコードする)の両方の存在を通して、LLMに関連情報を提供するターゲット非依存ポストの有用性をさらに示唆している。以上の結果から,LLMは歴史的・目標非依存のデータに基づいて,新たなトピックに対する公衆のスタンスを決定するための有効な方法である可能性が示唆された。同時に、姿勢予測タスクにおけるLCMの強みと、その効果がタスクコンテキストによってどのように変化するかをよりよく理解するために、さらなる研究も求めている。 We investigate Large Language Models' (LLMs) ability to predict a user's stance on a target given a collection of his/her target-agnostic social media posts (i.e., user-level stance prediction). While we show early evidence that LLMs are capable of this task, we highlight considerable variability in the performance of the model across (i) the type of stance target, (ii) the prediction strategy and (iii) the number of target-agnostic posts supplied. Post-hoc analyses further hint at the usefulness of target-agnostic posts in providing relevant information to LLMs through the presence of both surface-level (e.g., target-relevant keywords) and user-level features (e.g., encoding users' moral values). Overall, our findings suggest that LLMs might offer a viable method for determining public stances towards new topics based on historical and target-agnostic data. At the same time, we also call for further research to better understand LLMs' strong performance on the stance prediction task and how their effectiveness varies across task contexts.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# 平らなロラ:平らなロスランドスケープへの低ランク適応 Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape ( http://arxiv.org/abs/2409.14396v1 ) ライセンス: Link先を確認	Tao Li, Zhengbao He, Yujun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang,	(参考訳) 微調整された大規模事前訓練モデルは、計算とメモリコストの点で極めて高価である。 Low-Rank Adaptation (LoRA) はパラメータ効率の良いファインチューニング(PEFT)法であり、低ランク行列のみを最適化することで、モデルを微調整する効率的な方法を提供する。 LoRAの性能改善の最近の進歩にもかかわらず、LoRA最適化空間と元の完全なパラメータ空間との接続はしばしば見過ごされる。ロラ空間に平坦に見える解は、全パラメータ空間に鋭い方向が存在し、一般化性能を損なう可能性がある。本稿では、フルパラメータ空間の平坦な領域に位置する低ランク適応を求める効率的なアプローチであるFlat-LoRAを提案し、計算量やメモリ負荷を著しく低減できる、確立されたシャープネス認識最小化アプローチに頼る代わりに、ベイズ予測損失目標によるランダムな重量摂動を利用してトレーニング効率の維持と改良された摂動生成戦略の設計を行う。自然言語処理と様々なアーキテクチャを用いた画像分類タスクの実験により,提案手法の有効性が示された。 Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computational and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, provides an efficient way to fine-tune models by optimizing only a low-rank matrix. Despite recent progress made in improving LoRA's performance, the connection between the LoRA optimization space and the original full parameter space is often overlooked. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. In this paper, we propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.Instead of relying on the well-established sharpness-aware minimization approach, which can incur significant computational and memory burdens, we utilize random weight perturbation with a Bayesian expectation loss objective to maintain training efficiency and design a refined perturbation generation strategy for improved performance. Experiments on natural language processing and image classification tasks with various architectures demonstrate the effectiveness of our approach.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# ハードサンプルがデータ不均衡の正確性に及ぼす影響の検討 Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance ( http://arxiv.org/abs/2409.14401v1 ) ライセンス: Link先を確認	Pawel Pukowski, Haiping Lu,	(参考訳) AutoMLドメインでは、テスト精度がモデルの有効性を評価するための重要な指標として認識され、ニューラルアーキテクチャサーチからハイパーパラメータ最適化まで幅広いアプリケーションを支える。しかし、実験精度の信頼性は、特にラベルノイズがいかに最先端モデルの真のランキングを曖昧にするかを明らかにする研究によって疑問視されている。データセット内のハードサンプルの存在が、テスト精度だけで推測される一般化能力にさらに疑念を抱くという、別の視点に沿って、私たちはさらに先進的です。本研究は, トレーニングセットとテストセット間のハードサンプルの分布が, それらの集合の難易度に影響を及ぼし, モデルの一般化能力に影響を及ぼすことを明らかにした。そこで本研究では,バランスモデル評価の複雑さを浮き彫りにして,より容易かつハードな2つの一般化経路を明らかにした。最後に,この領域におけるよりニュアンスなアプローチの進展を促進するため,ハードサンプル識別法の比較のためのベンチマーク手法を提案する。我々の第一の目的は、決定的な解決策を提案することではなく、クラス内のデータ不均衡問題を導入することで、バランスの取れたデータセットを扱う場合でも、評価基準としてテスト精度に主に依存する制限を強調することです。そこで我々は,研究コミュニティにおける批判的な議論を刺激し,モデル評価基準の範囲を広く検討する研究のための新たな道を開くことを目的としている。匿名のコードは https://github.com/PawPuk/CurvBIM でGPL-3.0 ライセンスの下で利用可能である。 In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy, underpinning a wide array of applications from neural architecture search to hyperparameter optimization. However, the reliability of test accuracy as the primary performance metric has been called into question, notably through research highlighting how label noise can obscure the true ranking of state-of-the-art models. We venture beyond, along another perspective where the existence of hard samples within datasets casts further doubt on the generalization capabilities inferred from test accuracy alone. Our investigation reveals that the distribution of hard samples between training and test sets affects the difficulty levels of those sets, thereby influencing the perceived generalization capability of models. We unveil two distinct generalization pathways-toward easy and hard samples-highlighting the complexity of achieving balanced model evaluation. Finally, we propose a benchmarking procedure for comparing hard sample identification methods, facilitating the advancement of more nuanced approaches in this area. Our primary goal is not to propose a definitive solution but to highlight the limitations of relying primarily on test accuracy as an evaluation metric, even when working with balanced datasets, by introducing the in-class data imbalance problem. By doing so, we aim to stimulate a critical discussion within the research community and open new avenues for research that consider a broader spectrum of model evaluation criteria. The anonymous code is available at https://github.com/PawPuk/CurvBIM blueunder the GPL-3.0 license.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# GraspMamba: 階層的特徴学習を備えた言語駆動型Grasp検出フレームワーク GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning ( http://arxiv.org/abs/2409.14403v1 ) ライセンス: Link先を確認	Huy Hoang Nguyen, An Vuong, Anh Nguyen, Ian Reid, Minh Nhat Vu,	(参考訳) グラフ検出は、多くの産業アプリケーションの成功に欠かせない基本的なロボット作業である。しかしながら、このタスクの現在の言語駆動モデルは、乱雑なイメージ、長いテキスト記述、遅い推論速度に悩まされることが多い。この課題に対処するために,Mambaビジョンと階層的特徴融合を用いた言語駆動型グリップ検出手法であるGraspMambaを紹介した。本手法は,マンバをベースとしたバックボーンのリッチな視覚的特徴とテキスト情報を活用することにより,マルチモーダルな特徴の融合を効果的に促進する。 GraspMambaは、複数のスケールで視覚と言語の特徴を抽出し、堅牢なパフォーマンスと高速な推論時間を提供する、最初のMambaベースのグリップ検出モデルである。集中的な実験により、GraspMambaは最近の手法よりも明確なマージンで優れていることが示された。実際のロボット実験を通じて、我々のアプローチを検証し、その高速な推論速度を強調します。 Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# COSBO:保守的なオフラインシミュレーションに基づく政策最適化 COSBO: Conservative Offline Simulation-Based Policy Optimization ( http://arxiv.org/abs/2409.14412v1 ) ライセンス: Link先を確認	Eshagh Kargar, Ville Kyrki,	(参考訳) オフライン強化学習は、ライブデプロイメントのデータに関する強化学習モデルのトレーニングを可能にする。しかし、トレーニングデータに存在する行動の最良の組み合わせを選択することは限られている。対照的に、ライブ環境を再現しようとするシミュレーション環境は、ライブデータの代わりに利用できるが、この手法はシミュレーションと現実のギャップによって制限され、バイアスをもたらす。両世界を最大限に活用するために,不完全なシミュレーション環境と対象環境のデータを組み合わせてオフラインの強化学習ポリシーを訓練する手法を提案する。実験により,提案手法はCQL,MOPO,COMBO,特に多種多様かつ挑戦的な動的シナリオにおいて,最先端の手法よりも優れており,様々な実験条件において頑健な動作を示す。その結果,シミュレータ生成データを用いることで,実世界との直接のインタラクションが不可能な場合,シミュレートと現実のギャップにもかかわらず,オフラインポリシ学習を効果的に向上させることができることがわかった。 Offline reinforcement learning allows training reinforcement learning models on data from live deployments. However, it is limited to choosing the best combination of behaviors present in the training data. In contrast, simulation environments attempting to replicate the live environment can be used instead of the live data, yet this approach is limited by the simulation-to-reality gap, resulting in a bias. In an attempt to get the best of both worlds, we propose a method that combines an imperfect simulation environment with data from the target environment, to train an offline reinforcement learning policy. Our experiments demonstrate that the proposed method outperforms state-of-the-art approaches CQL, MOPO, and COMBO, especially in scenarios with diverse and challenging dynamics, and demonstrates robust behavior across a variety of experimental conditions. The results highlight that using simulator-generated data can effectively enhance offline policy learning despite the sim-to-real gap, when direct interaction with the real-world is not possible.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# EDK2ファームウェア欠陥の発見:コード監査ツールからの洞察 Uncovering EDK2 Firmware Flaws: Insights from Code Audit Tools ( http://arxiv.org/abs/2409.14416v1 ) ライセンス: Link先を確認	Mahsa Farahani, Ghazal Shenavar, Ali Hosseinghorban, Alireza Ejlali,	(参考訳) ファームウェアは現代のコンピュータの基盤となるソフトウェア層として機能し、最小限のオペレーティングシステムと同様に、プラットフォームハードウェア上で最初に実行されるコードとして開始される。オペレーティングシステムとプラットフォームファームウェアの間のソフトウェアインターフェースとして定義された統一拡張ファームウェアインタフェース(UEFI)は、システムの初期化と管理を標準化する。 EFI Development Kit II (EDK2) は、ファームウェアアーキテクチャを形成する上で重要な役割を担っている。広く採用されているにもかかわらず、アーキテクチャは、初期段階のシステムリソースの制限や、標準的なセキュリティ機能の欠如といった課題に直面している。さらに、ファームウェア分析用に特別に設計されたオープンソースツールの不足は、適応的で革新的なソリューションの必要性を強調している。本稿では,汎用コード監査ツールのファームウェアへの適用について検討し,EDK2に着目した。これらのツールはもともとファームウェア分析のために設計されたものではないが、ファームウェアのセキュリティを強化する重要な領域を特定するのに有効であることが証明されている。 EDK2にキー監査ツールを配置した結果,これらのツールを方法論に基づいて分類し,ユニークなファームウェア属性を明らかにする能力を示すとともに,ファームウェアセキュリティの理解と改善に大きく貢献した。 Firmware serves as a foundational software layer in modern computers, initiating as the first code executed on platform hardware, similar in function to a minimal operating system. Defined as a software interface between an operating system and platform firmware, the Unified Extensible Firmware Interface (UEFI) standardizes system initialization and management. A prominent open-source implementation of UEFI, the EFI Development Kit II (EDK2), plays a crucial role in shaping firmware architecture. Despite its widespread adoption, the architecture faces challenges such as limited system resources at early stages and a lack of standard security features. Furthermore, the scarcity of open-source tools specifically designed for firmware analysis emphasizes the need for adaptable, innovative solutions. In this paper, we explore the application of general code audit tools to firmware, with a particular focus on EDK2. Although these tools were not originally designed for firmware analysis, they have proven effective in identifying critical areas for enhancement in firmware security. Our findings, derived from deploying key audit tools on EDK2, categorize these tools based on their methodologies and illustrate their capability to uncover unique firmware attributes, significantly contributing to the understanding and improvement of firmware security.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# ドミナント:人間による人間のイメージアニメーションを擁護する Dormant: Defending against Pose-driven Human Image Animation ( http://arxiv.org/abs/2409.14424v1 ) ライセンス: Link先を確認	Jiachen Zhou, Mingsi Wang, Tianlin Li, Guozhu Meng, Kai Chen,	(参考訳) ポーズ駆動の人間の画像アニメーションは、非常に進歩し、1枚の写真から鮮明でリアルな人間のビデオを生成することができる。しかし、これは逆に画像誤用のリスクを悪化させ、攻撃者は1つの画像を使って政治、暴力、その他の違法コンテンツを含むビデオを作成することができる。この脅威に対処するため,ポーズ駆動型ヒューマンイメージアニメーション技術に対する防御に適した新しい保護手法であるDormantを提案する。 Dormantは、人間の1つのイメージに保護的摂動を適用し、オリジナルと視覚的類似性を保ちながら、品質の悪いビデオ生成をもたらす。保護摂動は、画像から外観特徴の誤抽出を誘発し、生成された映像フレーム間に不整合を生じさせるように最適化される。 8つのアニメーション手法と4つのデータセットにまたがる広範囲な評価は、6つのベースライン保護手法よりもDormantの方が優れていることを示す。さらに、Dormantは、完全なブラックボックスアクセスであっても、6つの現実世界の商用サービスで有効性を示す。 Pose-driven human image animation has achieved tremendous progress, enabling the generation of vivid and realistic human videos from just one single photo. However, it conversely exacerbates the risk of image misuse, as attackers may use one available image to create videos involving politics, violence and other illegal content. To counter this threat, we propose Dormant, a novel protection approach tailored to defend against pose-driven human image animation techniques. Dormant applies protective perturbation to one human image, preserving the visual similarity to the original but resulting in poor-quality video generation. The protective perturbation is optimized to induce misextraction of appearance features from the image and create incoherence among the generated video frames. Our extensive evaluation across 8 animation methods and 4 datasets demonstrates the superiority of Dormant over 6 baseline protection methods, leading to misaligned identities, visual distortions, noticeable artifacts, and inconsistent frames in the generated videos. Moreover, Dormant shows effectiveness on 6 real-world commercial services, even with fully black-box access.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# Kerr修飾キャビティマグノメカニクスにおける不安定性と極限サイクルの量子シグネチャ Quantum signatures of bistability and limit cycle in Kerr-modified cavity magnomechanics ( http://arxiv.org/abs/2409.14427v1 ) ライセンス: Link先を確認	Pooja Kumari Gupta, Subhadeep Chakraborty, Sampreet Kalita, Amarendra K. Sarma,	(参考訳) 両立状態に着目したKerr修飾キャビティマグメカニクスシステムについて検討した。 2つの安定な枝と1つの不安定な枝が中央に存在する。興味深いことに、本研究では、上枝が十分に強い駆動下で安定性を失い、サイクルの振動が制限されるというユニークな遷移を明らかにした。その結果、双安定および周期解の両方からなる豊富な位相図を報告し、それらの周りの量子相関について研究する。不安定な状態では、絡み合いは異なる定常な値に達するが、不安定な状態では、絡み合いは時間とともに振動する。この研究は、Kerr修飾キャビティ・マグノメカティカルシステムで生じる、異なる安定かつ不安定な点における量子絡み合いを理解する上で特に重要である。 We study a Kerr-modified cavity magnomechanical system with a focus on its bistable regime. We identify a distinct parametric condition under which bistability appears, featuring two stable branches and one unstable branch in the middle. Interestingly, our study reveals a unique transition where the upper branch loses its stability under a sufficiently strong drive, giving rise to limit cycle oscillation. Consequently, we report a rich phase diagram consisting of both bistable and periodic solutions and study quantum correlations around them. While in the bistable regime, we find the entanglement reaching different steady state value, in the unstable regime, entanglement oscillates in time. This study is especially important in understanding quantum entanglement at different stable and unstable points arising in a Kerr-modified cavity magnomechanical system.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# 性能-解釈可能性トレードオフの整合性:解釈可能な機械学習モデルの評価 Challenging the Performance-Interpretability Trade-off: An Evaluation of Interpretable Machine Learning Models ( http://arxiv.org/abs/2409.14429v1 ) ライセンス: Link先を確認	Sven Kruschel, Nico Hambauer, Sven Weinzierl, Sandra Zilker, Mathias Kraus, Patrick Zschech,	(参考訳) 機械学習は、データ駆動決定サポートを促進するために、すべての認識可能なドメインに浸透している。性能上の利点が想定されるため、高度なブラックボックスモデルに焦点が当てられることが多いが、解釈可能なモデルは、しばしば劣った予測品質に関連付けられている。しかし、近年では、完全解釈可能なまま複雑で非線形なパターンをキャプチャするための有望な特性を提供するGAM(Generalized Additive Model)が提案されている。これらのモデルの利点と限界を明らかにするため、20の表付きベンチマークデータセットの集合に基づく7つの一般的な機械学習モデルと比較して、7つの異なるGAMの予測性能について検討した。公平かつロバストなモデル比較を保証するため、クロスバリデーションと組み合わせた広範囲なハイパーパラメータ探索を行い、68,500回のモデル実行を実現した。さらに,本研究では,モデルの視覚的出力を質的に検討し,解釈可能性のレベルを評価する。これらの結果から,グラフデータに対する予測性能とモデル解釈性の間に厳密なトレードオフがないことを示すことによって,ブラックボックスモデルのみが高い精度を達成できるという誤解を解消する。さらに,情報システム分野における強力な解釈可能なモデルとしてのGAMの重要性を論じ,社会技術的観点からの今後の研究への示唆を導出する。 Machine learning is permeating every conceivable domain to promote data-driven decision support. The focus is often on advanced black-box models due to their assumed performance advantages, whereas interpretable models are often associated with inferior predictive qualities. More recently, however, a new generation of generalized additive models (GAMs) has been proposed that offer promising properties for capturing complex, non-linear patterns while remaining fully interpretable. To uncover the merits and limitations of these models, this study examines the predictive performance of seven different GAMs in comparison to seven commonly used machine learning models based on a collection of twenty tabular benchmark datasets. To ensure a fair and robust model comparison, an extensive hyperparameter search combined with cross-validation was performed, resulting in 68,500 model runs. In addition, this study qualitatively examines the visual output of the models to assess their level of interpretability. Based on these results, the paper dispels the misconception that only black-box models can achieve high accuracy by demonstrating that there is no strict trade-off between predictive performance and model interpretability for tabular data. Furthermore, the paper discusses the importance of GAMs as powerful interpretable models for the field of information systems and derives implications for future work from a socio-technical perspective.	翻訳日:2024-11-06 22:52:52 公開日:2024-09-22
# Pomo3D: 3D対応のポートレートアクセサイティングなど Pomo3D: 3D-Aware Portrait Accessorizing and More ( http://arxiv.org/abs/2409.14430v1 ) ライセンス: Link先を確認	Tzu-Chieh Liu, Chih-Ting Liu, Shao-Yi Chien,	(参考訳) ポートレートとアクセサリーを分解・再コンパイルすることで,フリーアクセゾライズを可能にする3Dポートレート操作フレームワークであるPomo3Dを提案する。これにより、アバターは複数のアクセサリーを同時に装着することで、アウト・オブ・ディストリビューション(OOD)の外観を達成できる。既存の方法は、そのような明示的できめ細かな編集に苦慮しており、特定の肖像画に追加のオブジェクトを生成したり、アクセサリーを生成する際にポートレート(例えばアイデンティティシフト)を変更することに失敗する。この制限は、人々が通常仮想空間で多様でファッション可能なアクセサリーで魅力的な外観を創り出そうとする際、注目すべき障害となる。私たちのアプローチは、この適応の少ない問題に対する効果的な解決策を提供します。 Scribble2Accessoriesモジュールも導入したので,Pomo3Dはユーザが描いたアクセサリのスクリブルマップから3Dアクセサリを作成できる。さらに、実世界のデータセットに存在するバイアス付き関連性を緩和するバイアス対応マッパーを設計する。上記のオブジェクトレベルの操作に加えて、Pomo3Dは、幾何学やテクスチャのグローバルなあるいはローカルな編集や、アバターのスタイラス化、ニューラルポートレートの3D編集をより包括的なレベルに高めるといった、ポートレートの編集オプションも備えている。 We propose Pomo3D, a 3D portrait manipulation framework that allows free accessorizing by decomposing and recomposing portraits and accessories. It enables the avatars to attain out-of-distribution (OOD) appearances of simultaneously wearing multiple accessories. Existing methods still struggle to offer such explicit and fine-grained editing; they either fail to generate additional objects on given portraits or cause alterations to portraits (e.g., identity shift) when generating accessories. This restriction presents a noteworthy obstacle as people typically seek to create charming appearances with diverse and fashionable accessories in the virtual universe. Our approach provides an effective solution to this less-addressed issue. We further introduce the Scribble2Accessories module, enabling Pomo3D to create 3D accessories from user-drawn accessory scribble maps. Moreover, we design a bias-conscious mapper to mitigate biased associations present in real-world datasets. In addition to object-level manipulation above, Pomo3D also offers extensive editing options on portraits, including global or local editing of geometry and texture and avatar stylization, elevating 3D editing of neural portraits to a more comprehensive level.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# EM-DARTS:眼球運動認識のための階層的微分可能なアーキテクチャ探索 EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition ( http://arxiv.org/abs/2409.14432v1 ) ライセンス: Link先を確認	Huafeng Qin, Hongyu Zhu, Xin Jin, Xin Yu, Mounim A. El-Yacoubi, Xinbo Gao,	(参考訳) アイムーブメントバイオメトリクスは、高いセキュアな識別によって注目を集めている。近年, 深層学習(DL)モデルが眼球運動認識に成功しているが, DLアーキテクチャは人間の事前知識によって決定されている。微分可能なニューラルアーキテクチャサーチ(DARTS)は、高い探索効率でアーキテクチャ設計のマニュアルプロセスを自動化する。しかしDARTSは、通常同じ複数の学習細胞を積み重ねて最終的なニューラルネットワークを構築し、ネットワークの多様性を制限する。ちなみに、DARTSは通常、浅いネットワークでアーキテクチャを検索し、より深いネットワークで評価する。本稿では,眼球運動認識のためのDLアーキテクチャを自動設計する階層的微分可能なアーキテクチャ探索アルゴリズムEM-DARTSを提案する。まず、スーパーネットを定義し、グローバルかつ局所的なニューラルアーキテクチャ探索法を提案し、最適なアーキテクチャを識別可能なニューラルアーキテクチャ探索と交互に探索する。ローカル検索戦略は,対象ネットワークのアーキテクチャを最適化するグローバル検索戦略において,異なるセルに対して最適なアーキテクチャを求めることを目的としている。さらに冗長性を低減するため,各層の情報量を計算するために転送エントロピーを提案し,検索ネットワークをさらに単純化した。提案するEM-DARTSは,3つの公開データベースを対象とした実験により,最先端の認識性能を実現する最適アーキテクチャを実現できることを示した。 Eye movement biometrics has received increasing attention thanks to its high secure identification. Although deep learning (DL) models have been recently successfully applied for eye movement recognition, the DL architecture still is determined by human prior knowledge. Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency. DARTS, however, usually stacks the same multiple learned cells to form a final neural network for evaluation, limiting therefore the diversity of the network. Incidentally, DARTS usually searches the architecture in a shallow network while evaluating it in a deeper one, which results in a large gap between the architecture depths in the search and evaluation scenarios. To address this issue, we propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition. First, we define a supernet and propose a global and local alternate Neural Architecture Search method to search the optimal architecture alternately with an differentiable neural architecture search. The local search strategy aims to find an optimal architecture for different cells while the global search strategy is responsible for optimizing the architecture of the target network. To further reduce redundancy, a transfer entropy is proposed to compute the information amount of each layer, so as to further simplify search network. Our experiments on three public databases demonstrate that the proposed EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# OStr-DARTS:操作強度に基づく微分可能なニューラルネットワーク探索 OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength ( http://arxiv.org/abs/2409.14433v1 ) ライセンス: Link先を確認	Le Yang, Ziwei Zheng, Yizeng Han, Shiji Song, Gao Huang, Fan Li,	(参考訳) 微分可能なアーキテクチャ探索(DARTS)は、効果的なニューラルネットワーク探索のための有望なテクニックとして登場し、主に高性能アーキテクチャを見つけるための2つのステップを含んでいる。第二に、最終的なアーキテクチャは、選択した操作によって構築され、スーパーネットに最も貢献する。 DARTSはNASの効率を向上するが、よく知られた劣化問題に悩まされ、アーキテクチャの劣化につながる。既存の研究は、主にデジェネレーションの問題は、そのスーパーネット最適化の失敗に起因しているが、選択法にはほとんど注意が払われていない。本稿では,広範に使用されている等級別選択法の適用を中止し,最終損失に対する操作の重要性を推定する操作強度に基づく新しい基準を提案する。本研究は,DARTSの不安定性に重要な原因となるマグニチュードベース選択法を,スーパーネット最適化の修正を伴わずに提案した基準を用いることで,デジェネレーション問題に効果的に対処できることを示唆する。 NAS-Bench-201とDARTSの探索実験により,本手法の有効性が示された。 Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search, and it mainly contains two steps to find the high-performance architecture: First, the DARTS supernet that consists of mixed operations will be optimized via gradient descent. Second, the final architecture will be built by the selected operations that contribute the most to the supernet. Although DARTS improves the efficiency of NAS, it suffers from the well-known degeneration issue which can lead to deteriorating architectures. Existing works mainly attribute the degeneration issue to the failure of its supernet optimization, while little attention has been paid to the selection method. In this paper, we cease to apply the widely-used magnitude-based selection method and propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss. We show that the degeneration issue can be effectively addressed by using the proposed criterion without any modification of supernet optimization, indicating that the magnitude-based selection method can be a critical reason for the instability of DARTS. The experiments on NAS-Bench-201 and DARTS search spaces show the effectiveness of our method.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# LLMを用いた自動車イノベーション造園 Automotive innovation landscaping using LLM ( http://arxiv.org/abs/2409.14436v1 ) ライセンス: Link先を確認	Raju Gorain, Omkar Salunke,	(参考訳) 特許分析による自動車イノベーションの造園プロセスは、研究開発チームにとって不可欠である。イノベーションのトレンド、技術進歩、そして最新の技術をライバルから理解するのに役立ちます。伝統的に、このプロセスには集中的な手作業が必要だった。しかし、Large Language Models (LLMs) の出現により、自動化が可能となり、より高速で効率的な特許分類と発明的概念抽出の最先端技術へと繋がる。この自動化は、広範囲の特許データベースから関連する情報を抽出する上で、さまざまなR&Dチームを支援することができる。本稿では,造園に必要な情報を抽出する手法について紹介する。この情報には、特許、利用された技術、車両エコシステム内のイノベーションの領域(安全、先進運転支援システムなど)などが含まれる。この結果は,オープンソース特許データを用いた燃料電池技術の展望を構築するための手法の実装を実証するものである。このアプローチは、燃料電池技術の現状を包括的に概観し、この分野における将来の研究開発に有用な洞察を提供する。 The process of landscaping automotive innovation through patent analysis is crucial for Research and Development teams. It aids in comprehending innovation trends, technological advancements, and the latest technologies from competitors. Traditionally, this process required intensive manual efforts. However, with the advent of Large Language Models (LLMs), it can now be automated, leading to faster and more efficient patent categorization & state-of-the-art of inventive concept extraction. This automation can assist various R\&D teams in extracting relevant information from extensive patent databases. This paper introduces a method based on prompt engineering to extract essential information for landscaping. The information includes the problem addressed by the patent, the technology utilized, and the area of innovation within the vehicle ecosystem (such as safety, Advanced Driver Assistance Systems and more).The result demonstrates the implementation of this method to create a landscape of fuel cell technology using open-source patent data. This approach provides a comprehensive overview of the current state of fuel cell technology, offering valuable insights for future research and development in this field.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# CNNと条件付きGANを用いた視覚的マルウェア検出フレームワーク A Visualized Malware Detection Framework with CNN and Conditional GAN ( http://arxiv.org/abs/2409.14439v1 ) ライセンス: Link先を確認	Fang Wang, Hussam Al Hamadi, Ernesto Damiani,	(参考訳) 機械学習(ML)を取り入れたマルウェアの可視化分析は、さまざまなプラットフォームのセキュリティ防御を改善するための有望なソリューションであることが証明されている。本研究では,ML活用者によるマルウェア検出システム開発における共通問題に対処する統合フレームワークを提案する。すなわち、各変数を二進数にエンコードし、それらを白黒のピクセルにマッピングすることで、良性/良性サンプルの同一性を保存するために、拡張付き画像提示システムを構築する。コンディショナル・ジェネレーティブ・アドバイサル・ネットワークに基づくモデルを用いて、合成画像を生成し、不均衡クラスの問題を緩和する。 Convolutional Neural Networksによって構築された検出モデルは、アーティファクトサンプルの有無にかかわらずデータセットをトレーニングしながらパフォーマンスを検証するためのものだ。この2つのトレーニングシナリオの精度は98.51%と97.26%である。 Malware visualization analysis incorporating with Machine Learning (ML) has been proven to be a promising solution for improving security defenses on different platforms. In this work, we propose an integrated framework for addressing common problems experienced by ML utilizers in developing malware detection systems. Namely, a pictorial presentation system with extensions is designed to preserve the identities of benign/malign samples by encoding each variable into binary digits and mapping them into black and white pixels. A conditional Generative Adversarial Network based model is adopted to produce synthetic images and mitigate issues of imbalance classes. Detection models architected by Convolutional Neural Networks are for validating performances while training on datasets with and without artifactual samples. Result demonstrates accuracy rates of 98.51% and 97.26% for these two training scenarios.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 累積進化の効率的な計算と全数え上げ統計:無限温度量子スピン鎖への応用 Efficient computation of cumulant evolution and full counting statistics: application to infinite temperature quantum spin chains ( http://arxiv.org/abs/2409.14442v1 ) ライセンス: Link先を確認	Angelo Valli, Cătălin Paşcu Moca, Miklós Antal Werner, Márton Kormos, Žiga Krajnik, Tomaž Prosen, Gergely Zaránd,	(参考訳) 本稿では,1次元量子システムにおいて,量子生成関数(QGF)を高温で効率的に計算する数値計算法を提案する。累積量の高精度な推定値を取得し,QGFから完全なカウント統計を再構築する。我々はスピン$S=1/2$の異方性ハイゼンベルク連鎖でそのポテンシャルを示し、そこでは最先端の古典的および量子シミュレーションに到達できない時間スケールに到達することができる。我々の結果は、等方的可積分量子スピン鎖に対するカルダル-パリ-張普遍性の予想に挑戦する。 We propose a numerical method to efficiently compute quantum generating functions (QGF) for a wide class of observables in one-dimensional quantum systems at high temperature. We obtain high-accuracy estimates for the cumulants and reconstruct full counting statistics from the QGF. We demonstrate its potential on spin $S=1/2$ anisotropic Heisenberg chain, where we can reach time scales hitherto inaccessible to state-of-the-art classical and quantum simulations. Our results challenge the conjecture of the Kardar--Parisi--Zhang universality for isotropic integrable quantum spin chains.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# Fake it until you make it: Curricular Dynamic Forgery Augmentations to General Deepfake Detection (特集:バイオサイバネティックスとバイオサイバネティックス) Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection ( http://arxiv.org/abs/2409.14444v1 ) ライセンス: Link先を確認	Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, Qiushi Li,	(参考訳) ディープフェイク検出に関するこれまでの研究は、トレーニングと同じデータセットから顔の偽造品をテストする際に有望な結果を示している。しかし、検出器を未知のデータセットから偽造に一般化し、未知の手法で生成しようとすると、問題は依然として困難である。本研究では,新しいディープフェイク検出手法である \textbf{C}urricular \textbf{D}ynamic \textbf{F}orgery \textbf{A}ugmentation (CDFA) を提案する。従来と異なり,トレーニング中の単調なカリキュラムに従って,偽造の増補を段階的に適用することを提案する。さらに,訓練段階ごとに異なる画像に対して,適切なフォージェリ拡張操作を選択するための動的フォージェリ探索戦略を提案し,より一般化に最適化されたフォージェリ拡張ポリシーを作成する。さらに, 深部フェイク生成の時間的不整合を簡易に再現するために, 自己シフトブレンディング画像という新しいフォージェリ拡張を提案する。包括的実験により,CDFAは様々な深度検出器のクロスデータセットとクロスマニピュレーションの両方の性能をプラグ・アンド・プレイ方式で大幅に改善し,既存の手法よりも優れた性能を達成できることが示されている。 Previous studies in deepfake detection have shown promising results when testing face forgeries from the same dataset as the training. However, the problem remains challenging when one tries to generalize the detector to forgeries from unseen datasets and created by unseen methods. In this work, we present a novel general deepfake detection method, called \textbf{C}urricular \textbf{D}ynamic \textbf{F}orgery \textbf{A}ugmentation (CDFA), which jointly trains a deepfake detector with a forgery augmentation policy network. Unlike the previous works, we propose to progressively apply forgery augmentations following a monotonic curriculum during the training. We further propose a dynamic forgery searching strategy to select one suitable forgery augmentation operation for each image varying between training stages, producing a forgery augmentation policy optimized for better generalization. In addition, we propose a novel forgery augmentation named self-shifted blending image to simply imitate the temporal inconsistency of deepfake generation. Comprehensive experiments show that CDFA can significantly improve both cross-datasets and cross-manipulations performances of various naive deepfake detectors in a plug-and-play way, and make them attain superior performances over the existing methods in several benchmark datasets.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 畳み込みニューラルネットワーク, データ拡張, ResNet50, Vision Transformer を用いた肺病変の検出 Detection of pulmonary pathologies using convolutional neural networks, Data Augmentation, ResNet50 and Vision Transformers ( http://arxiv.org/abs/2409.14446v1 ) ライセンス: Link先を確認	Pablo Ramirez Amador, Dinarle Milagro Ortega, Arnold Cesarano,	(参考訳) 肺疾患は、正確で迅速な診断技術を必要とする公衆衛生上の問題である。本稿では, 畳み込みニューラルネットワーク(CNN), データ拡張, ResNet50, ビジョントランスフォーマー(ViT)を用いて, 医用画像から肺病変を検出する手法を提案する。がん、肺炎、結核、線維症などの異なる肺疾患患者のX線画像とCTスキャンのデータセットを使用する。提案手法は, ROC曲線の精度, 感度, 特異性, 面積などの評価指標を用いて, 他の既存手法と比較した。その結果,提案手法は全ての測定値において他の手法よりも優れており,精度は98%,面積は99%であった。本手法は, 医用画像による肺疾患の診断に有効で有望なツールであると考えられた。 Pulmonary diseases are a public health problem that requires accurate and fast diagnostic techniques. In this paper, a method based on convolutional neural networks (CNN), Data Augmentation, ResNet50 and Vision Transformers (ViT) is proposed to detect lung pathologies from medical images. A dataset of X-ray images and CT scans of patients with different lung diseases, such as cancer, pneumonia, tuberculosis and fibrosis, is used. The results obtained by the proposed method are compared with those of other existing methods, using performance metrics such as accuracy, sensitivity, specificity and area under the ROC curve. The results show that the proposed method outperforms the other methods in all metrics, achieving an accuracy of 98% and an area under the ROC curve of 99%. It is concluded that the proposed method is an effective and promising tool for the diagnosis of pulmonary pathologies by medical imaging.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 電力系統発電機とインバータ資源のダイナミクス学習のための統一的アプローチ A Unified Approach for Learning the Dynamics of Power System Generators and Inverter-based Resources ( http://arxiv.org/abs/2409.14454v1 ) ライセンス: Link先を確認	Shaohui Liu, Weiqian Cai, Hao Zhu, Brian Johnson,	(参考訳) 再生可能エネルギーの統合と電化のためのインバータベースの資源(IBR)の普及は、電力系統の動的解析に大きく挑戦する。同期ジェネレータ(SG)とIRBの両方を考慮するため、この研究は個々の動的コンポーネントのモデルを学ぶためのアプローチを示す。リカレントニューラルネットワーク(RNN)モデルは、その終端バス電圧とセットポイント入力からコンポーネントのキーダイナミクス状態を予測する際に再帰構造と一致する。 IBRによる高速な過渡現象に対処するため,動的学習タスクの安定性と精度を高める高次積分法を模倣する安定積分(SI-)RNNを開発した。提案したSI-RNNモデルは,コンポーネントの動的挙動を予測できるだけでなく,設定点変化に対する動的感度を効率的に計算することができることを示す。これらの能力は、SGとIRBの両方を持つ小さな試験システム上での電磁過渡(EMT)シミュレーションに基づいて数値的に検証され、特に格子形成インバータのダイナミクスを予測する。 The growing prevalence of inverter-based resources (IBRs) for renewable energy integration and electrification greatly challenges power system dynamic analysis. To account for both synchronous generators (SGs) and IBRs, this work presents an approach for learning the model of an individual dynamic component. The recurrent neural network (RNN) model is used to match the recursive structure in predicting the key dynamical states of a component from its terminal bus voltage and set-point input. To deal with the fast transients especially due to IBRs, we develop a Stable Integral (SI-)RNN to mimic high-order integral methods that can enhance the stability and accuracy for the dynamic learning task. We demonstrate that the proposed SI-RNN model not only can successfully predict the component's dynamic behaviors, but also offers the possibility of efficiently computing the dynamic sensitivity relative to a set-point change. These capabilities have been numerically validated based on full-order Electromagnetic Transient (EMT) simulations on a small test system with both SGs and IBRs, particularly for predicting the dynamics of grid-forming inverters.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# クラスタ数の多いクラスタリングのための高性能外部妥当性指数 A High-Performance External Validity Index for Clustering with a Large Number of Clusters ( http://arxiv.org/abs/2409.14455v1 ) ライセンス: Link先を確認	Mohammad Yasin Karbasian, Ramin Javadi,	(参考訳) 本稿では,多数のクラスタを持つ大規模データセットにおいて,クラスタリング評価のための高性能な外部妥当性指標であるSMBPアルゴリズムを提案する。 SMBPは、クラスタリング法をまたいだクラスタの安定マッチングフレームワークを活用し、計算複雑性を$O(N^2)$と、従来の最大重みマッチング(MWM)と$O(N^3)$とで大幅に削減する。実世界のデータセットと合成データセットの総合的な評価を通じて、SMBPはMWMと同等の精度と優れた計算効率を示す。これは特に、多数のクラスタを持つバランスのとれた、バランスのとれた、大規模なデータセットに対して有効であり、モダンなクラスタリングタスクのためのスケーラブルで実用的なソリューションである。加えて、SMBPはPyTorchやTensorFlowといった機械学習フレームワークで簡単に実装でき、ビッグデータアプリケーションのための堅牢なツールを提供する。このアルゴリズムは広範な実験を通じて検証され、最大マッチング測度 (MMM) や Centroid Ratio (CR) といった既存の手法の強力な代替手段としての可能性を示している。 This paper introduces the Stable Matching Based Pairing (SMBP) algorithm, a high-performance external validity index for clustering evaluation in large-scale datasets with a large number of clusters. SMBP leverages the stable matching framework to pair clusters across different clustering methods, significantly reducing computational complexity to $O(N^2)$, compared to traditional Maximum Weighted Matching (MWM) with $O(N^3)$ complexity. Through comprehensive evaluations on real-world and synthetic datasets, SMBP demonstrates comparable accuracy to MWM and superior computational efficiency. It is particularly effective for balanced, unbalanced, and large-scale datasets with a large number of clusters, making it a scalable and practical solution for modern clustering tasks. Additionally, SMBP is easily implementable within machine learning frameworks like PyTorch and TensorFlow, offering a robust tool for big data applications. The algorithm is validated through extensive experiments, showcasing its potential as a powerful alternative to existing methods such as Maximum Match Measure (MMM) and Centroid Ratio (CR).	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# スコアリングルールネット:多変量回帰における平均目標予測を超えて Scoring rule nets: beyond mean target prediction in multivariate regression ( http://arxiv.org/abs/2409.14456v1 ) ライセンス: Link先を確認	Daan Roordink, Sibylle Hess,	(参考訳) 最大極大推定(MLE)で訓練された確率回帰モデルは、時には許容できない程度に分散を過大評価することがある。これは主に多変量領域において問題となる。単変量モデルは、多変量領域において人気のある連続ランク確率スコア(CRPS)を最適化することが多いが、MLEの代替案はまだ広く受け入れられていない。エネルギースコア(Energy Score)は、最も調査された代替案で、ターゲット変数間の相関に対するクローズドフォームの表現と感度が欠けていることで知られている。本稿では,CRPSを拡張した多変量厳密なスコアリングルールであるConditional CRPSを提案する。一般分布にクローズドフォーム表現が存在することを示すとともに,相関に敏感であることを示す。次に、合成データと実データの両方で様々な実験を行い、条件CRPSがMLEより優れており、分布ランダムフォレスト(DRF)のような最先端の非パラメトリックモデルに匹敵する結果が得られることを示した。 Probabilistic regression models trained with maximum likelihood estimation (MLE), can sometimes overestimate variance to an unacceptable degree. This is mostly problematic in the multivariate domain. While univariate models often optimize the popular Continuous Ranked Probability Score (CRPS), in the multivariate domain, no such alternative to MLE has yet been widely accepted. The Energy Score - the most investigated alternative - notoriously lacks closed-form expressions and sensitivity to the correlation between target variables. In this paper, we propose Conditional CRPS: a multivariate strictly proper scoring rule that extends CRPS. We show that closed-form expressions exist for popular distributions and illustrate their sensitivity to correlation. We then show in a variety of experiments on both synthetic and real data, that Conditional CRPS often outperforms MLE, and produces results comparable to state-of-the-art non-parametric models, such as Distributional Random Forest (DRF).	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 大規模モデルエージェント:現状,協力パラダイム,セキュリティとプライバシ,今後の動向 Large Model Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends ( http://arxiv.org/abs/2409.14457v1 ) ライセンス: Link先を確認	Yuntao Wang, Yanghe Pan, Quan Zhao, Yi Deng, Zhou Su, Linkang Du, Tom H. Luan,	(参考訳) GPT-4やDALL-E 2のような大きな基盤モデルによって駆動される大型モデル(LM)エージェントは、人工知能(AGI)の実現に向けた重要なステップである。 LMエージェントは、自律性、実施性、接続性の重要な特徴を示し、物理的、仮想的、混合現実的な環境を横断しながら、人間や他のエージェントや周囲とシームレスに相互作用することを可能にする。本稿では,アーキテクチャ,協調パラダイム,セキュリティ,プライバシ,今後の展望を中心に,LMエージェントの現状を包括的に調査する。具体的には, 汎用アーキテクチャ, キーコンポーネント, 実現技術, 最新のアプリケーションなど, LMエージェントの基本原理について検討する。そこで我々は,LMエージェントのコネクテッドインテリジェンスに対するデータ,計算,知識の観点から,実践的な協調パラダイムについて議論する。さらに,LMエージェントに関連するセキュリティ脆弱性やプライバシ侵害を,特にマルチエージェント設定で系統的に解析する。また,その基盤となるメカニズムについて検討し,既存および潜在的な対策について検討する。最後に、堅牢でセキュアなLMエージェントエコシステムを構築するための今後の研究方針について概説する。 Large Model (LM) agents, powered by large foundation models such as GPT-4 and DALL-E 2, represent a significant step towards achieving Artificial General Intelligence (AGI). LM agents exhibit key characteristics of autonomy, embodiment, and connectivity, allowing them to operate across physical, virtual, and mixed-reality environments while interacting seamlessly with humans, other agents, and their surroundings. This paper provides a comprehensive survey of the state-of-the-art in LM agents, focusing on the architecture, cooperation paradigms, security, privacy, and future prospects. Specifically, we first explore the foundational principles of LM agents, including general architecture, key components, enabling technologies, and modern applications. Then, we discuss practical collaboration paradigms from data, computation, and knowledge perspectives towards connected intelligence of LM agents. Furthermore, we systematically analyze the security vulnerabilities and privacy breaches associated with LM agents, particularly in multi-agent settings. We also explore their underlying mechanisms and review existing and potential countermeasures. Finally, we outline future research directions for building robust and secure LM agent ecosystems.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 大規模言語モデルにおける多言語探索:言語横断解析 Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis ( http://arxiv.org/abs/2409.14459v1 ) ライセンス: Link先を確認	Daoyang Li, Mingyu Jin, Qingcheng Zeng, Haiyan Zhao, Mengnan Du,	(参考訳) 大規模言語モデル(LLM)の探索技術は主に英語に焦点を合わせており、世界の言語の大部分を見下ろしている。本稿では,これらの探索手法を多言語文脈に拡張し,多言語間のLLMの挙動について検討する。複数のオープンソースのLCMモデルで実験を行い、探索精度、層間の傾向、および複数の言語に対する探索ベクトル間の類似性を解析した。その結果,(1)高リソース言語と低リソース言語間の一貫した性能差,(2)高リソース言語が英語に類似した深い層を著しく改善する階層的精度の傾向,(3)高リソース言語間の表現的類似性の向上,そして,低リソース言語と高リソース言語との類似性が低いこと,などが明らかになった。これらの結果から,LLMの多言語能力の相違が指摘され,低リソース言語のモデリングの改善の必要性が強調された。 Probing techniques for large language models (LLMs) have primarily focused on English, overlooking the vast majority of the world's languages. In this paper, we extend these probing methods to a multilingual context, investigating the behaviors of LLMs across diverse languages. We conduct experiments on several open-source LLM models, analyzing probing accuracy, trends across layers, and similarities between probing vectors for multiple languages. Our key findings reveal: (1) a consistent performance gap between high-resource and low-resource languages, with high-resource languages achieving significantly higher probing accuracy; (2) divergent layer-wise accuracy trends, where high-resource languages show substantial improvement in deeper layers similar to English; and (3) higher representational similarities among high-resource languages, with low-resource languages demonstrating lower similarities both among themselves and with high-resource languages. These results highlight significant disparities in LLMs' multilingual capabilities and emphasize the need for improved modeling of low-resource languages.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 分類・検出における低光強度効果:実証的研究 Low-Light Enhancement Effect on Classification and Detection: An Empirical Study ( http://arxiv.org/abs/2409.14461v1 ) ライセンス: Link先を確認	Xu Wu, Zhihui Lai, Zhou Jie, Can Gao, Xianxu Hou, Ya-nan Zhang, Linlin Shen,	(参考訳) 低照度画像は現実のシナリオで一般的に見られ、これらの画像の視認性を改善するために多くの低照度画像強調法(LLIE)が提案されている。 LLIEの主な目標は、人間により視覚的に喜ばせる、より鮮明な画像を作ることである。しかし、高品質な画像データセットに依存する画像分類や物体検出など、高レベルの視覚タスクにおけるLLIE法の影響は、あまり「探索的」ではない。本研究は,画像分類と物体検出実験を含む経験的調査を利用して,これらのハイレベル視覚課題に対するLLIE手法を総合的に評価する。 LLIE (textit{While Low-Light Image Enhancement) 法は人間の視覚的解釈を促進するが、コンピュータビジョンタスクに対する効果は矛盾し、時には有害となることがある。以上の結果から,高次視覚タスクを効果的に支援するためのLLIE法の必要性が示唆された。この洞察は、人間と機械の視覚の両方のニーズに合致するLLIE技術の開発に不可欠である。 Low-light images are commonly encountered in real-world scenarios, and numerous low-light image enhancement (LLIE) methods have been proposed to improve the visibility of these images. The primary goal of LLIE is to generate clearer images that are more visually pleasing to humans. However, the impact of LLIE methods in high-level vision tasks, such as image classification and object detection, which rely on high-quality image datasets, is not well {explored}. To explore the impact, we comprehensively evaluate LLIE methods on these high-level vision tasks by utilizing an empirical investigation comprising image classification and object detection experiments. The evaluation reveals a dichotomy: {\textit{While Low-Light Image Enhancement (LLIE) methods enhance human visual interpretation, their effect on computer vision tasks is inconsistent and can sometimes be harmful. }} Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis, indicating a need for LLIE methods tailored to support high-level vision tasks effectively. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# AggregHate: ソーシャルプラットフォーム上でのハトモンガー検出のための効果的なアグリゲーティブアプローチ AggregHate: An Efficient Aggregative Approach for the Detection of Hatemongers on Social Platforms ( http://arxiv.org/abs/2409.14464v1 ) ライセンス: Link先を確認	Tom Marzea, Abraham Israeli, Oren Tsur,	(参考訳) オンラインヘイトスピーチの自動検出は、オンライン言論の解毒における重要なステップである。さらに、正確な分類は、社会現象としての憎悪の拡散をよりよく理解する上で有効である。これまでの作業のほとんどは、憎しみのある発話の検出に重点を置いていますが、ユーザレベルへのフォーカスは重要でありながら、難しいものです。本稿では,ヘイト・モニュメントを検知するためのマルチモーダル・アグリゲーション・アプローチについて考察し,ヘイト・モニュメントやユーザ・アクティビティ,ユーザ・ネットワークを考慮に入れた。 X(Twitter)、Gab、Parlerの3つのユニークなデータセットから,ユーザのテキストをソーシャルコンテキストで処理することで,これまで使用されていたテキストやグラフベースの手法と比較して,ヘイトモンガーの検出が大幅に向上することを示す。提案手法は, コード化されたメッセージの分類, ドッグホイストリング, 人種的ガス照明の改善, および介入措置の通知に利用することができる。さらに、我々のアプローチは非常に大きなデータセットやネットワークに対しても非常に効率的です。 Automatic detection of online hate speech serves as a crucial step in the detoxification of the online discourse. Moreover, accurate classification can promote a better understanding of the proliferation of hate as a social phenomenon. While most prior work focus on the detection of hateful utterances, we argue that focusing on the user level is as important, albeit challenging. In this paper we consider a multimodal aggregative approach for the detection of hate-mongers, taking into account the potentially hateful texts, user activity, and the user network. We evaluate our methods on three unique datasets X (Twitter), Gab, and Parler showing that a processing a user's texts in her social context significantly improves the detection of hate mongers, compared to previously used text and graph-based methods. Our method can be then used to improve the classification of coded messages, dog-whistling, and racial gas-lighting, as well as inform intervention measures. Moreover, our approach is highly efficient even for very large datasets and networks.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 論理と生成AIについて On logic and generative AI ( http://arxiv.org/abs/2409.14465v1 ) ライセンス: Link先を確認	Yuri Gurevich, Andreas Blass,	(参考訳) 100年前、論理学は基礎研究とほぼ同義であった。進行中のAI革命は、神経科学、哲学、コンピュータ科学、論理学を含む多くの根本的な問題を提起している。次のダイアログの目標は、AI革命によって引き起こされた基礎的な問題に気づくための基礎を味わう若い論理学者を刺激することである。 A hundred years ago, logic was almost synonymous with foundational studies. The ongoing AI revolution raises many deep foundational problems involving neuroscience, philosophy, computer science, and logic. The goal of the following dialog is to provoke young logicians with a taste for foundations to notice the foundational problems raised by the AI revolution.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 大規模言語モデルのセマンティックパーシング再考:セマンティックヒントによるLLM性能向上 Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints ( http://arxiv.org/abs/2409.14469v1 ) ライセンス: Link先を確認	Kaikai An, Shuzheng Si, Helan Hu, Haozhe Zhao, Yuchi Wang, Qingyan Guo, Baobao Chang,	(参考訳) 意味的パーシング(Semantic Parsing)は、文の意味を捉え、論理的に構造化された形式に変換することを目的としている。従来の研究では、セマンティックパーシングは下流タスクにおけるより小さなモデル(例えばBERT)の性能を高めることが示されている。しかし、この改良がLLMに類似しているかどうかは不明である。本稿では,より小さなモデルとは異なり,LLMに意味解析結果を直接付加することで性能が低下することを示す。そこで本研究では,意味的ヒントをプロンプト内に埋め込む新しいプロンプト手法であるSENSEを提案する。実験の結果、SENSEは様々なタスクにわたってLLMの性能を一貫して改善し、LLM機能を改善するために意味情報を統合する可能性を強調している。 Semantic Parsing aims to capture the meaning of a sentence and convert it into a logical, structured form. Previous studies show that semantic parsing enhances the performance of smaller models (e.g., BERT) on downstream tasks. However, it remains unclear whether the improvements extend similarly to LLMs. In this paper, our empirical findings reveal that, unlike smaller models, directly adding semantic parsing results into LLMs reduces their performance. To overcome this, we propose SENSE, a novel prompting approach that embeds semantic hints within the prompt. Experiments show that SENSE consistently improves LLMs' performance across various tasks, highlighting the potential of integrating semantic information to improve LLM capabilities.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# ブロックチェーンによる情報セキュリティとプライバシ保護 : 計算文献レビューによる課題と今後の方向性 Blockchain Based Information Security and Privacy Protection: Challenges and Future Directions using Computational Literature Review ( http://arxiv.org/abs/2409.14472v1 ) ライセンス: Link先を確認	Gauri Shankar, Md Raihan Uddin, Saddam Mukta, Prabhat Kumar, Shareeful Islam, A. K. M. Najmul Islam,	(参考訳) ブロックチェーン技術は新たなデジタルイノベーションであり、情報システム(IS)内の個々のセキュリティとプライバシの強化で大きな人気を集めている。この関心の高まりは、ブロックチェーン技術に関する研究論文の急増を反映しており、デジタルランドスケープにおけるその重要性の高まりを浮き彫りにしている。しかし,論文の急激な普及は,膨大な情報量による手作業の分析と合成に重大な課題をもたらす。トピックの複雑さと広さは、人間のデータ処理能力の固有の制限と相まって、文献から意味のある洞察を包括的に分析し引き出すのを困難にしている。そこで我々は、LDA(Latent Dirichlet Allocation)技術を用いて、関連する文献の影響とトピックモデリングを分析するために、CLR(Computational Literature Review)を採用した。セキュリティとプライバシに関するトピックを10つ特定し、各トピックについて詳細な説明を提供しました。批判的分析から,いくつかの限界を観察し,今後の方向性を概説した。 Blockchain technology is an emerging digital innovation that has gained immense popularity in enhancing individual security and privacy within Information Systems (IS). This surge in interest is reflected in the exponential increase in research articles published on blockchain technology, highlighting its growing significance in the digital landscape. However, the rapid proliferation of published research presents significant challenges for manual analysis and synthesis due to the vast volume of information. The complexity and breadth of topics, combined with the inherent limitations of human data processing capabilities, make it difficult to comprehensively analyze and draw meaningful insights from the literature. To this end, we adopted the Computational Literature Review (CLR) to analyze pertinent literature impact and topic modelling using the Latent Dirichlet Allocation (LDA) technique. We identified 10 topics related to security and privacy and provided a detailed description of each topic. From the critical analysis, we have observed several limitations, and several future directions are provided as an outcome of this review.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 自然言語の命令による微細構造設計のための大規模言語モデルとデノベーション拡散フレームワーク A Large Language Model and Denoising Diffusion Framework for Targeted Design of Microstructures with Commands in Natural Language ( http://arxiv.org/abs/2409.14473v1 ) ライセンス: Link先を確認	Nikita Kartashov, Nikolaos N. Vlassis,	(参考訳) 組織は材料のマクロな特性を決定する上で重要な役割を担っており、合金設計、MEMSデバイス、組織工学などにも応用されている。計算フレームワークは、ミクロ構造と物質的挙動の複雑な関係を捉えるために開発された。しかし、これらの進歩にもかかわらず、ドメイン固有の知識と複雑なアルゴリズムに関連した急勾配学習曲線は、これらのツールの幅広い適用を制限する。この障壁を低くするために,自然言語処理(NLP),大規模言語モデル(LLM),拡散確率モデル(DDPM)を統合し,直感的な自然言語コマンドを用いたマイクロ構造設計を可能にするフレームワークを提案する。我々のフレームワークは、事前訓練されたLLMによって駆動されるコンテキストデータ拡張を用いて、多様なマイクロ構造記述子のデータセットを生成し、拡張する。再学習されたNERモデルは、ユーザが提供する自然言語入力から関連するマイクロ構造記述子を抽出し、DDPMによってターゲットとなる機械的特性とトポロジ的特徴を持つマイクロ構造を生成する。フレームワークのNLPとDDPMコンポーネントはモジュール化されており、個別のトレーニングとバリデーションを可能にし、異なるデータセットやユースケースにフレームワークを適用する際の柔軟性を保証する。シュロゲートモデルシステムを用いて, 対象特性との整合性に基づいて生成したサンプルのランク付けとフィルタリングを行う。非線形超弾性マイクロストラクチャのデータベース上で実証されたこのフレームワークは、直感的な自然言語コマンドから始まる、マイクロストラクチャのアクセス可能な逆設計のプロトタイプとして機能する。 Microstructure plays a critical role in determining the macroscopic properties of materials, with applications spanning alloy design, MEMS devices, and tissue engineering, among many others. Computational frameworks have been developed to capture the complex relationship between microstructure and material behavior. However, despite these advancements, the steep learning curve associated with domain-specific knowledge and complex algorithms restricts the broader application of these tools. To lower this barrier, we propose a framework that integrates Natural Language Processing (NLP), Large Language Models (LLMs), and Denoising Diffusion Probabilistic Models (DDPMs) to enable microstructure design using intuitive natural language commands. Our framework employs contextual data augmentation, driven by a pretrained LLM, to generate and expand a diverse dataset of microstructure descriptors. A retrained NER model extracts relevant microstructure descriptors from user-provided natural language inputs, which are then used by the DDPM to generate microstructures with targeted mechanical properties and topological features. The NLP and DDPM components of the framework are modular, allowing for separate training and validation, which ensures flexibility in adapting the framework to different datasets and use cases. A surrogate model system is employed to rank and filter generated samples based on their alignment with target properties. Demonstrated on a database of nonlinear hyperelastic microstructures, this framework serves as a prototype for accessible inverse design of microstructures, starting from intuitive natural language commands.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# SynBench: 厳格でない3Dポイントクラウド登録のためのシンセティックベンチマーク SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration ( http://arxiv.org/abs/2409.14474v1 ) ライセンス: Link先を確認	Sara Monji-Azad, Marvin Kinz, Claudia Scherl, David Männle, Jürgen Hesser, Nikolas Löw,	(参考訳) 非厳格なクラウド登録は、コンピュータビジョンにおいて重要なタスクである。厳密でないクラウド登録方法を評価するには、大きな変形レベル、ノイズ、外れ値、不完全性といった課題を伴うデータセットが必要である。変形可能なポイントクラウド登録のためのデータセットがいくつか存在するにもかかわらず、すべての課題を伴う包括的なベンチマークが存在しないため、さまざまな方法による公正な評価が難しい。本稿では,Flex と Unreal Engine のソフトボディシミュレーション用ツールセットである SimTool を用いて作成した,新たな非剛点クラウド登録データセットである SynBench を紹介する。シンベンチは、2つの点集合の間の対応する点の基底的真実を提供し、変形レベル、ノイズ、外れ値、不完全性を含む主要な登録課題を包含する。著者の知識を最大限に活用するために、既存のデータセットと比較してSynBenchには3つの特徴がある。(1)非剛点クラウド登録のための様々な課題を提供する最初のベンチマークであり、(2)SynBenchは難易度が異なる課題を包含し、(3)変形の前と後の両方で真実に対応するポイントを含んでいる。著者らは、SynBenchは将来の非厳格なクラウド登録手法により、彼らの成果を公平に比較できると考えている。 SynBenchは、https://doi.org/10.11588/data/R9IKCFで公開されている。 Non-rigid point cloud registration is a crucial task in computer vision. Evaluating a non-rigid point cloud registration method requires a dataset with challenges such as large deformation levels, noise, outliers, and incompleteness. Despite the existence of several datasets for deformable point cloud registration, the absence of a comprehensive benchmark with all challenges makes it difficult to achieve fair evaluations among different methods. This paper introduces SynBench, a new non-rigid point cloud registration dataset created using SimTool, a toolset for soft body simulation in Flex and Unreal Engine. SynBench provides the ground truth of corresponding points between two point sets and encompasses key registration challenges, including varying levels of deformation, noise, outliers, and incompleteness. To the best of the authors' knowledge, compared to existing datasets, SynBench possesses three particular characteristics: (1) it is the first benchmark that provides various challenges for non-rigid point cloud registration, (2) SynBench encompasses challenges of varying difficulty levels, and (3) it includes ground truth corresponding points both before and after deformation. The authors believe that SynBench enables future non-rigid point cloud registration methods to present a fair comparison of their achievements. SynBench is publicly available at: https://doi.org/10.11588/data/R9IKCF.	翻訳日:2024-11-06 22:41:53 公開日:2024-09-22
# 全体PET-CT画像における病変分割 : AutoPET 2024 チャレンジへの貢献 Lesion Segmentation in Whole-Body Multi-Tracer PET-CT Images; a Contribution to AutoPET 2024 Challenge ( http://arxiv.org/abs/2409.14475v1 ) ライセンス: Link先を確認	Mehdi Astaraki, Simone Bendazzoli,	(参考訳) 全体PET-CTボリューム内の病理領域の自動分割は、診断、予後、治療計画などの様々な臨床応用を効率化する可能性がある。本研究は,画像前処理,トレーサ分類,病変分割のステップを組み込んだワークフローを提案し,AutoPET MICCAI 2024チャレンジに貢献することで,この問題に対処することを目的とする。このパイプラインの実装により、モデルのセグメンテーション精度が大幅に向上した。この改善は、訓練対象者1611名の平均Diceスコア0.548、訓練セットのFDGとPSMAの0.631と0.559、予備試験段階データセットの0.792で実証されている。 The automatic segmentation of pathological regions within whole-body PET-CT volumes has the potential to streamline various clinical applications such as diagno-sis, prognosis, and treatment planning. This study aims to address this challenge by contributing to the AutoPET MICCAI 2024 challenge through a proposed workflow that incorporates image preprocessing, tracer classification, and lesion segmentation steps. The implementation of this pipeline led to a significant enhancement in the segmentation accuracy of the models. This improvement is evidenced by an average overall Dice score of 0.548 across 1611 training subjects, 0.631 and 0.559 for classi-fied FDG and PSMA subjects of the training set, and 0.792 on the preliminary testing phase dataset.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# 大規模言語モデルは心筋梗塞を論理的に予測できるか?英国バイオバンクコホートによる評価 Can Large Language Models Logically Predict Myocardial Infarction? Evaluation based on UK Biobank Cohort ( http://arxiv.org/abs/2409.14478v1 ) ライセンス: Link先を確認	Yuxing Zhi, Yuan Guo, Kai Yuan, Hesong Wang, Heng Xu, Haina Yao, Albert C Yang, Guangrui Huang, Yuping Duan,	(参考訳) 背景: 大規模言語モデル (LLMs) は臨床決定支援の分野で極めて進歩している。しかし、現実の医療データに基づく正確な臨床診断を行う上で、LCMの可能性と限界について、高品質な証拠が緊急に必要である。目的: ユニバーサル・オブ・ザ・アート LLM(ChatGPTおよびGPT-4)が、論理的推論により心筋梗塞(MI)の発生リスクを予測できるかどうかを定量的に評価し、さらに様々なモデルの比較を行い、LLMの性能を包括的に評価する。方法: この振り返りコホート調査では、2006年から2010年までの482,310人の参加者が英国バイオバンクのデータベースに登録され、後に690人の最終コホートに再サンプリングされた。各参加者に対して、MIの危険因子の表データをChatGPT認識のための標準化されたテキスト記述に変換する。リスクを表すスコアを0から10まで選択するようChatGPTに頼んだ結果,反応が得られた。 The Chain of Thought (CoT) questioning was used to evaluate whether LLMs makes predictionly。 ChatGPTの予測性能は、発行された医療指標、従来の機械学習モデル、その他の大規模言語モデルと比較された。結論:現在のLSMは臨床医学分野に適用される準備ができていない。将来の医学 LLM は、自然言語と定量化された医療データの両方を理解し、さらに論理的推論を行うために、医学領域の知識の専門家であることが示唆されている。 Background: Large language models (LLMs) have seen extraordinary advances with applications in clinical decision support. However, high-quality evidence is urgently needed on the potential and limitation of LLMs in providing accurate clinical decisions based on real-world medical data. Objective: To evaluate quantitatively whether universal state-of-the-art LLMs (ChatGPT and GPT-4) can predict the incidence risk of myocardial infarction (MI) with logical inference, and to further make comparison between various models to assess the performance of LLMs comprehensively. Methods: In this retrospective cohort study, 482,310 participants recruited from 2006 to 2010 were initially included in UK Biobank database and later on resampled into a final cohort of 690 participants. For each participant, tabular data of the risk factors of MI were transformed into standardized textual descriptions for ChatGPT recognition. Responses were generated by asking ChatGPT to select a score ranging from 0 to 10 representing the risk. Chain of Thought (CoT) questioning was used to evaluate whether LLMs make prediction logically. The predictive performance of ChatGPT was compared with published medical indices, traditional machine learning models and other large language models. Conclusions: Current LLMs are not ready to be applied in clinical medicine fields. Future medical LLMs are suggested to be expert in medical domain knowledge to understand both natural languages and quantified medical data, and further make logical inferences.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# 2つの課題の1つのモデル:反復的相互指導による低解像度シーン画像の協調認識と検索 One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance ( http://arxiv.org/abs/2409.14483v1 ) ライセンス: Link先を確認	Minyi Zhao, Yang Wang, Jihong Guan, Shuigeng Zhou,	(参考訳) 高分解能(HR)画像からのシーンテキスト認識(STR)は大きな成功を収めているが、低分解能(LR)画像でのテキスト読み出しは視覚情報不足のため依然として困難である。そのため、近年、LR画像の超解像度(SR)画像を生成するために多くのシーンテキスト画像超解像度(STISR)モデルが提案され、SR画像上でSTRが実行され、認識性能が向上した。しかし、これらの手法には2つの大きな弱点がある。一方、STISRアプローチは不完全または誤ったSR画像を生成し、STRモデルのその後の認識を誤解させる可能性がある。一方、STISRとSTRモデルは高い認識精度を追求するために共同最適化されているため、SR画像の忠実度は損なわれる可能性がある。その結果、STISRモデルの認識性能や忠実度は望ましいものではなかった。では、高い認識性能と良好な忠実さを両立できるだろうか? そこで本研究では,LRシーンのテキストイメージを同時に認識し,復元する,画像(Iterative MutuAl GuidancEの略)と呼ばれる新しい手法を提案する。具体的には、認識のための特殊なSTRモデルと、別々に最適化されたLR画像を復元するSTISRモデルから構成される。また,STISRモデルがSTRモデルに必要不可欠な低レベル画素の手がかりを提供し,より正確な認識を行うために,STRモデルがSTISRモデルへの手がかりとして高レベル意味情報を提供する反復的相互誘導機構を開発した。 2つのLRデータセットに対する大規模な実験は、認識性能と超解像忠実度の両方に関する既存の研究よりも、我々の手法が優れていることを示す。 Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging due to insufficient visual information. Therefore, recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. Nevertheless, these methods have two major weaknesses. On the one hand, STISR approaches may generate imperfect or even erroneous SR images, which mislead the subsequent recognition of STR models. On the other hand, as the STISR and STR models are jointly optimized, to pursue high recognition accuracy, the fidelity of SR images may be spoiled. As a result, neither the recognition performance nor the fidelity of STISR models are desirable. Then, can we achieve both high recognition performance and good fidelity? To this end, in this paper we propose a novel method called IMAGE (the abbreviation of Iterative MutuAl GuidancE) to effectively recognize and recover LR scene text images simultaneously. Concretely, IMAGE consists of a specialized STR model for recognition and a tailored STISR model to recover LR images, which are optimized separately. And we develop an iterative mutual guidance mechanism, with which the STR model provides high-level semantic information as clue to the STISR model for better super-resolution, meanwhile the STISR model offers essential low-level pixel clue to the STR model for more accurate recognition. Extensive experiments on two LR datasets demonstrate the superiority of our method over the existing works on both recognition performance and super-resolution fidelity.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# Prompt Augmentation と Caption の利用による視覚言語大モデルの実現 Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization ( http://arxiv.org/abs/2409.14484v1 ) ライセンス: Link先を確認	Minyi Zhao, Jie Wang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Shuigeng Zhou,	(参考訳) 近年の研究では、VLLM(Vision Language Large Models)が入力画像に関連のないコンテンツを出力できることが示されている。この問題は幻覚現象と呼ばれ、間違いなくVLLM性能を低下させる。そのため、モデル出力をより合理的かつ正確なものにするために、様々なアンチハロシン化技術が提案されている。彼らの成功にもかかわらず、広範なテストから、プロンプト(例えば、単語の付加、書き直し、スペルエラーなど)を増強することで、モデルの出力が変更され、出力が再び幻覚化することを発見した。そこで本研究では,VLLMの生成能力を高めるために,Prompt Augmentation and Caption utilization (PACU) と呼ばれる新しいインストラクションチューニングフレームワークを提案する。具体的には、PACUは既存のLCMを利用して、多様なプロンプトを自動で拡張し評価する。結果として生じる高品質なプロンプトは、異なるプロンプトを処理するVLLMの能力を高めるために利用される。一方、PACUは画像キャプションを利用して、画像の特徴と応答生成のプロンプトを併用する。視覚的特徴が不正確な場合、LCMは、応答生成のための画像キャプションから有用な情報をキャプチャすることができる。 VLLMモデルの性能を効果的に向上するために,我々のPACU法が既存の手法とうまく連携できることを示す。コードはhttps://github.com/zhaominyiz/PACUで入手できる。 Recent studies have shown that Vision Language Large Models (VLLMs) may output content not relevant to the input images. This problem, called the hallucination phenomenon, undoubtedly degrades VLLM performance. Therefore, various anti-hallucination techniques have been proposed to make model output more reasonable and accurate. Despite their successes, from extensive tests we found that augmenting the prompt (e.g. word appending, rewriting, and spell error etc.) may change model output and make the output hallucinate again. To cure this drawback, we propose a new instruct-tuning framework called Prompt Augmentation and Caption Utilization (PACU) to boost VLLM's generation ability under the augmented prompt scenario. Concretely, on the one hand, PACU exploits existing LLMs to augment and evaluate diverse prompts automatically. The resulting high-quality prompts are utilized to enhance VLLM's ability to process different prompts. On the other hand, PACU exploits image captions to jointly work with image features as well as the prompts for response generation. When the visual feature is inaccurate, LLM can capture useful information from the image captions for response generation. Extensive experiments on hallucination evaluation and prompt-augmented datasets demonstrate that our PACU method can work well with existing schemes to effectively boost VLLM model performance. Code is available in https://github.com/zhaominyiz/PACU.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# 教師なし単語発見:クラスタリングによる境界検出と動的プログラミング Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming ( http://arxiv.org/abs/2409.14486v1 ) ライセンス: Link先を確認	Simon Malan, Benjamin van Niekerk, Herman Kamper,	(参考訳) 我々は、ラベルなし音声を単語のようなセグメントに分割し、それらを辞書に集約するという長年の課題について考察する。いくつかの従来の手法では、スコアリングモデルと動的プログラミングを組み合わせて最適なセグメンテーションを見つける。そこで我々は, 隣接した自己教師付き特徴の相似性を用いて単語境界を予測し, 予測セグメントをクラスタ化して辞書を構築するという, より単純な戦略を提案する。公平な比較のために、より優れた機能と境界制約を持つ古いES-KMeans動的プログラミング手法を更新する。 5言語によるZeroSpeechベンチマークでは、新しいES-KMeans+法と同じような結果が得られるが、ほぼ5倍高速である。 We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon. For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints. On the five-language ZeroSpeech benchmarks, our simple approach gives similar state-of-the-art results compared to the new ES-KMeans+ method, while being almost five times faster.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# LLMによる自律運転エージェントの強化による知覚障害の軽減 Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks ( http://arxiv.org/abs/2409.14488v1 ) ライセンス: Link先を確認	Ruoyu Song, Muslum Ozgur Ozmen, Hyungsub Kim, Antonio Bianchi, Z. Berkay Celik,	(参考訳) 大規模言語モデル (LLM) と自律運転システム (AD) を統合することへの関心が高まっている。しかし、ADシステムはオブジェクトの検出と追跡(ODT)機能に対する攻撃に対して脆弱である。残念ながら、ODT攻撃に対する最近のLSMエージェント4件の評価は、(1)過去の意思決定経験を提供するメモリモジュールの誤誘導、(2)不整合の特定におけるプロンプトの制限、(3)真実の認識データへの依存によるトラフィックルールのクラッシュや違反に63.26%の成功を収めたことを示している。本稿では,従来のLCM駆動システムを拡張した運転推論エージェントであるHudsonを紹介し,良質な条件下での有効性を維持しつつ,認識攻撃時の安全な意思決定を可能にする。 HudsonはADソフトウェアを最初に実装し、運転シーンからリアルタイムの知覚結果とコンテキスト情報を収集する。このデータはその後、ドメイン固有言語(DSL)に形式化されます。 ODT攻撃中のLLMの検出と安全な制御決定をガイドするために、Hudson氏はDSLを自然言語に変換するとともに、カスタムアタック検出命令のリストを作成した。クエリ実行後、Hudsonはその因果推論プロセスを理解するためのLSMの制御決定を分析する。我々は,LLM (GPT-4) とオープンソース LLM (Llama と Gemma) を用いて,Hudson の有効性を評価する。 GPT-4,Llama,Gemmaの攻撃検出精度は平均83。 3%, 63。 6%, 73。 6%であった。その結果、攻撃の86.4%、73.9%、80%の安全管理決定がなされた。 LLMをADシステムに統合することへの関心が高まった結果、LDMの強みとODT攻撃の検出・緩和の可能性を強調した。 There is a growing interest in integrating Large Language Models (LLMs) with autonomous driving (AD) systems. However, AD systems are vulnerable to attacks against their object detection and tracking (ODT) functions. Unfortunately, our evaluation of four recent LLM agents against ODT attacks shows that the attacks are 63.26% successful in causing them to crash or violate traffic rules due to (1) misleading memory modules that provide past experiences for decision making, (2) limitations of prompts in identifying inconsistencies, and (3) reliance on ground truth perception data. In this paper, we introduce Hudson, a driving reasoning agent that extends prior LLM-based driving systems to enable safer decision making during perception attacks while maintaining effectiveness under benign conditions. Hudson achieves this by first instrumenting the AD software to collect real-time perception results and contextual information from the driving scene. This data is then formalized into a domain-specific language (DSL). To guide the LLM in detecting and making safe control decisions during ODT attacks, Hudson translates the DSL into natural language, along with a list of custom attack detection instructions. Following query execution, Hudson analyzes the LLM's control decision to understand its causal reasoning process. We evaluate the effectiveness of Hudson using a proprietary LLM (GPT-4) and two open-source LLMs (Llama and Gemma) in various adversarial driving scenarios. GPT-4, Llama, and Gemma achieve, on average, an attack detection accuracy of 83. 3%, 63. 6%, and 73. 6%. Consequently, they make safe control decisions in 86.4%, 73.9%, and 80% of the attacks. Our results, following the growing interest in integrating LLMs into AD systems, highlight the strengths of LLMs and their potential to detect and mitigate ODT attacks.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# 知能の尺度について On a measure of intelligence ( http://arxiv.org/abs/2409.14496v1 ) ライセンス: Link先を確認	Yuri Gurevich,	(参考訳) The Fall 2024 Logic in Computer Science column of the Bulletin of EATCS(英語版)は、Fran\c{c}ois Chollet(英語版)による「インテリジェンス、インテリジェンスの測定、関連する問題に関する小さな議論である。議論には記事の批判のモチーフが含まれている。 The Fall 2024 Logic in Computer Science column of the Bulletin of EATCS is a little discussion on intelligence, measuring intelligence, and related issues, provoked by a fascinating must-read article ``On the measure of intelligence'' by Fran\c{c}ois Chollet. The discussion includes a modicum of critique of the article.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# 古典的無線通信・センシングのためのRydberg原子量子受信器 Rydberg Atomic Quantum Receivers for Classical Wireless Communication and Sensing ( http://arxiv.org/abs/2409.14501v1 ) ライセンス: Link先を確認	Tierui Gong, Aveek Chandra, Chau Yuen, Yong Liang Guan, Rainer Dumke, Chong Meng Samson See, Mérouane Debbah, Lajos Hanzo,	(参考訳) レイドバーグ原子量子受信機(Rydberg atomic quantum receiver、RAQR)は、高周波(RF)信号を受信するように設計された量子精密センシングプラットフォームである。これは1つ以上の電子を励起して通常の原子から非常に高いエネルギーレベルへと誘導し、その結果原子はRF信号に敏感になる。 RAQRは、いわゆる電磁誘導透過 (EIT) と Aulter-Townes splitting (ATS) に依存する光原子相互作用に基づく光対光変換を実現し、所望のRF信号を光学的に読み出す。 Rydberg 状態の豊富な選択と様々な変調スキームに付随する大きな双極子モーメントは、超高感度(\sim$ nV/cm/$\sqrt{\text{Hz}}$)と超広帯域チューナビリティ(テラヘルツに直列に近い)を促進する。 RAQRはまた魅力的なスケーラビリティを示し、革新的でコンパクトな受信機の構築に貢献する。初期の実験的研究は、古典的な無線通信とセンシングにおけるその能力を実証した。様々な応用においてそれらのポテンシャルを完全に活用するために、Rydberg 原子の基本原理を概説し、続いて RAQR の原理、構造、理論を概説する。最後に、従来の無線システムとRAQRの統合を容易にするために、Rydberg原子核量子単一出力単一出力(RAQ-SISO)とマルチインプット多重出力(RAQ-MIMO)のスキームを考案し、強力な研究の方向性をまとめる。 The Rydberg atomic quantum receiver (RAQR) is an emerging quantum precision sensing platform designed for receiving radio frequency (RF) signals. It relies on creation of Rydberg atoms from normal atoms by exciting one or more electrons to a very high energy level, which in turn makes the atom sensitive to RF signals. The RAQR realizes RF-to-optical conversion based on light-atom interaction relying on the so called electromagnetically induced transparency (EIT) and Aulter-Townes splitting (ATS), so that the desired RF signal can be read out optically. The large dipole moments of Rydberg atoms associated with rich choices of Rydberg states and various modulation schemes facilitate an ultra-high sensitivity ($\sim$ nV/cm/$\sqrt{\text{Hz}}$) and an ultra-broadband tunability (near direct-current to Terahertz). RAQRs also exhibit compelling scalability and lend themselves to the construction of innovative, compact receivers. Initial experimental studies have demonstrated their capabilities in classical wireless communications and sensing. To fully harness their potential in a wide variety of applications, we commence by outlining the underlying fundamentals of Rydberg atoms, followed by the principles, structures, and theories of RAQRs. Finally, we conceive Rydberg atomic quantum single-input single-output (RAQ-SISO) and multiple-input multiple-output (RAQ-MIMO) schemes for facilitating the integration of RAQRs with classical wireless systems, and conclude with a set of potent research directions.	翻訳日:2024-11-06 22:30:40 公開日:2024-09-22
# SPAQ-DL-SLAM:資源制約付組込みプラットフォームのためのディープラーニングベースのSLAMの最適化に向けて SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms ( http://arxiv.org/abs/2409.14515v1 ) ライセンス: Link先を確認	Niraj Pudasaini, Muhammad Abdullah Hanif, Muhammad Shafique,	(参考訳) リソース制約の組込みプラットフォーム上での効率的な実装には,ディープラーニングに基づく同時局所マッピング(DL-SLAM)アルゴリズムの最適化が不可欠である。本稿では,SPAQ-DL-SLAM(Structured Pruning and Quantization, SPAQ)を最先端のDL-SLAMアルゴリズムであるDROID-SLAM(DROID-SLAM)のアーキテクチャに戦略的に適用する手法を提案する。具体的には,DROID-SLAMの深層学習モジュール上で,階層感度解析と8ビット後の静的量子化(PTQ)に基づく微調整による構造化プルーニングを行う。我々のSPAQ-DROIDSLAMモデルでは、20パーセントの構造化プルーニングと8ビットのPTQを持つSPAQ-DL-SLAMフレームワークを用いて、DROID-SLAMモデルの最適化版であるDROID-SLAMモデルを用いて、FLOPの18.9%の削減と、DROID-SLAMモデルと比較して全体のモデルサイズ79.8%の削減を実現している。 TUM-RGBDベンチマークによる評価は,SPAQ-DROID-SLAMモデルが絶対軌道誤差(ATE)測定値で平均10.5%のDROID-SLAMモデルを上回ることを示している。さらに、ETH3D SLAMトレーニングベンチマークの結果、AUCスコアが高いSPAQ-DROID-SLAMモデルの一般化能力が向上し、DROIDSLAMモデルと比較して2つの追加データシーケンスが成功した。これらの改善にもかかわらず、モデルは高角速度でキャプチャされるEuRoCデータセットと異なるVicon Roomシーケンスのパフォーマンスのばらつきを示す。 DL-SLAMアルゴリズムを設計し、運用環境やタスクを考慮し、リソース制約のある組込みプラットフォームに配置する際の最適なパフォーマンスとリソース効率を実現することを示唆している。 Optimizing Deep Learning-based Simultaneous Localization and Mapping (DL-SLAM) algorithms is essential for efficient implementation on resource-constrained embedded platforms, enabling real-time on-board computation in autonomous mobile robots. This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-ofthe-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency. Specifically, we perform structured pruning with fine-tuning based on layer-wise sensitivity analysis followed by 8-bit post-training static quantization (PTQ) on the deep learning modules within DROID-SLAM. Our SPAQ-DROIDSLAM model, optimized version of DROID-SLAM model using our SPAQ-DL-SLAM framework with 20% structured pruning and 8-bit PTQ, achieves an 18.9% reduction in FLOPs and a 79.8% reduction in overall model size compared to the DROID-SLAM model. Our evaluations on the TUM-RGBD benchmark shows that SPAQ-DROID-SLAM model surpasses the DROID-SLAM model by an average of 10.5% on absolute trajectory error (ATE) metric. Additionally, our results on the ETH3D SLAM training benchmark demonstrate enhanced generalization capabilities of the SPAQ-DROID-SLAM model, seen by a higher Area Under the Curve (AUC) score and success in 2 additional data sequences compared to the DROIDSLAM model. Despite these improvements, the model exhibits performance variance on the distinct Vicon Room sequences from the EuRoC dataset, which are captured at high angular velocities. This varying performance at some distinct scenarios suggests that designing DL-SLAM algorithms taking operating environments and tasks in consideration can achieve optimal performance and resource efficiency for deployment in resource-constrained embedded platforms.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# 単語を超えて:交通計画における大規模言語モデルの評価 Beyond Words: Evaluating Large Language Models in Transportation Planning ( http://arxiv.org/abs/2409.14516v1 ) ライセンス: Link先を確認	Shaowei Ying, Zhenlong Li, Manzhu Yu,	(参考訳) 2023年のジェネレーティブ・人工知能(GenAI)の復活と急速な進歩は、都市交通や物流を含む多くの産業分野における変革的な変化を触媒にした。本研究では,大規模言語モデル(LLM),特にGPT-4とPhi-3-miniの評価を行い,交通計画の充実を図る。本研究は, 一般地理空間技術, 一般交通領域技術, 実世界の交通問題解決などを含む交通インフォームド評価フレームワークを用いて, これらのモデルの性能と空間的理解を評価する。混成手法を用いて,LLMの一般地理情報システム(GIS)技術,一般交通領域の知識,および渋滞価格の現実的な交通計画シナリオにおける人間の意思決定を支援する能力の評価を行う。その結果, GPT-4 は Phi-3-mini と比較して, 各種GIS および輸送特化タスクにおいて高い精度と信頼性を示し, 輸送計画立案者にとって堅牢なツールとしての可能性を強調した。それでも、Phi-3-miniは特定の分析シナリオにおける能力を示し、資源制約環境におけるその有用性を示唆している。この結果は、都市交通計画におけるGenAI技術の変革の可能性を示している。将来的な研究は、より新しいLSMの適用と、より広範な現実的な輸送計画と運用上の課題に対する検索・拡張世代(RAG)技術の影響を探求し、輸送管理プラクティスにおける高度なAIモデルの統合をさらに深めることができる。 The resurgence and rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across numerous industry sectors, including urban transportation and logistics. This study investigates the evaluation of Large Language Models (LLMs), specifically GPT-4 and Phi-3-mini, to enhance transportation planning. The study assesses the performance and spatial comprehension of these models through a transportation-informed evaluation framework that includes general geospatial skills, general transportation domain skills, and real-world transportation problem-solving. Utilizing a mixed-methods approach, the research encompasses an evaluation of the LLMs' general Geographic Information System (GIS) skills, general transportation domain knowledge as well as abilities to support human decision-making in the real-world transportation planning scenarios of congestion pricing. Results indicate that GPT-4 demonstrates superior accuracy and reliability across various GIS and transportation-specific tasks compared to Phi-3-mini, highlighting its potential as a robust tool for transportation planners. Nonetheless, Phi-3-mini exhibits competence in specific analytical scenarios, suggesting its utility in resource-constrained environments. The findings underscore the transformative potential of GenAI technologies in urban transportation planning. Future work could explore the application of newer LLMs and the impact of Retrieval-Augmented Generation (RAG) techniques, on a broader set of real-world transportation planning and operations challenges, to deepen the integration of advanced AI models in transportation management practices.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# RPKI:完璧ではないが十分すぎる RPKI: Not Perfect But Good Enough ( http://arxiv.org/abs/2409.14518v1 ) ライセンス: Link先を確認	Haya Schulmann, Niklas Vogel, Michael Waidner,	(参考訳) Resource Public Key Infrastructure (RPKI)プロトコルは、インターネットルーティングに暗号化セキュリティを追加するために標準化された。今日RPKIで保護されているインターネットリソースの50%以上が、このプロトコルはインターネットトラフィックのかなりの部分にすでに影響を与えている。採用の増加に加えて、RPKIに対する政治的関心も高まっている。ホワイトハウスは2024年9月4日、インターネットルーティングセキュリティを強化するロードマップで、RPKIはドメイン間ルーティングを確保するための成熟した、容易に利用できる技術であることを示した。ロードマップは、RPKIを広く採用する主な障害として、理解の欠如、優先順位付けの欠如、管理上の障壁がある。本研究は、RPKIの成熟度を実運用レベルの技術として初めて包括的に研究したものである。現在のRPKI実装には、まだ製品レベルのレジリエンスが欠如しており、ソフトウェアの脆弱性、一貫性のない仕様、運用上の課題に悩まされており、重大なセキュリティ上の懸念が生じています。運用環境での厳格なRPKIバリデーションの経験が不足し、フェールオープンテストモードで動作する。我々は、RPKIレジリエンスの改善と、新興脅威に対するデプロイメントの保護に関するステークホルダーの指導を推奨する。 RPKIの現在の仕様と実装で発見された多くの問題は、必然的に問題に繋がる: RPKIは、ホワイトハウスのロードマップに概説されている期待に合致するほど安定していますか? もちろん完璧ではありませんが、それで十分でしょうか? 答えは、私たちが探求するとおり、人の視点によって異なります。 The Resource Public Key Infrastructure (RPKI) protocol was standardized to add cryptographic security to Internet routing. With over 50% of Internet resources protected with RPKI today, the protocol already impacts significant parts of Internet traffic. In addition to its growing adoption, there is also increasing political interest in RPKI. The White House indicated in its Roadmap to Enhance Internet Routing Security, on 4 September 2024, that RPKI is a mature and readily available technology for securing inter-domain routing. The Roadmap attributes the main obstacles towards wide adoption of RPKI to a lack of understanding, lack of prioritization, and administrative barriers. This work presents the first comprehensive study of the maturity of RPKI as a viable production-grade technology. We find that current RPKI implementations still lack production-grade resilience and are plagued by software vulnerabilities, inconsistent specifications, and operational challenges, raising significant security concerns. The deployments lack experience with full-fledged strict RPKI-validation in production environments and operate in fail-open test mode. We provide recommendations to improve RPKI resilience and guide stakeholders in securing their deployments against emerging threats. The numerous issues we have discovered with the current RPKI specifications and implementations inevitably lead to the question: Is RPKI sufficiently stable to align with the expectations outlined in the White House roadmap? Certainly, it is not perfect, but is it good enough? The answer, as we will explore, varies depending on one's viewpoint.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# RobotFingerPrint:マルチグルーパーグラフ合成のための統一グルーパー座標空間 RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis ( http://arxiv.org/abs/2409.14519v1 ) ライセンス: Link先を確認	Ninad Khargonkar, Luis Felipe Casas, Balakrishnan Prabhakaran, Yu Xiang,	(参考訳) 本稿では,複数のグリップの合成を把握するための統一グリップ座標空間として,新しい表現を導入する。空間は3次元の球面の2次元表面であり、緯度と緯度を座標とし、全てのロボットグリップパーに共有される。本稿では,グッパーのヤシ面を統一グッパー座標空間にマッピングする新しいアルゴリズムを提案し,入力対象のグッパー座標を予測する条件付き変分オートエンコーダを設計する。予測された統一グリップパ座標は、グリップとオブジェクトとの対応性を確立し、最適化問題において、グリップポーズとフィンガージョイントを解き、グリップ合成を行う。統一グリップパ座標空間を用いることで、複数のグリップパのグリップ合成における成功率と多様性が向上することを示した。 We introduce a novel representation named as the unified gripper coordinate space for grasp synthesis of multiple grippers. The space is a 2D surface of a sphere in 3D using longitude and latitude as its coordinates, and it is shared for all robotic grippers. We propose a new algorithm to map the palm surface of a gripper into the unified gripper coordinate space, and design a conditional variational autoencoder to predict the unified gripper coordinates given an input object. The predicted unified gripper coordinates establish correspondences between the gripper and the object, which can be used in an optimization problem to solve the grasp pose and the finger joints for grasp synthesis. We demonstrate that using the unified gripper coordinate space improves the success rate and diversity in the grasp synthesis of multiple grippers.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# 彼らは何をしているのか? 共同音声合成 What Are They Doing? Joint Audio-Speech Co-Reasoning ( http://arxiv.org/abs/2409.14526v1 ) ライセンス: Link先を確認	Yingzhi Wang, Pooneh Mousavi, Artem Ploujnikov, Mirco Ravanelli,	(参考訳) 音声処理や音声処理では、通常、同じ音声クリップに音声と人間の音声の両方が存在する場合でも、音声または音声のモダリティに焦点が当てられる。近年のAuditory Large Language Models (ALLMs) により、単一モデル内で音声と音声を同時に処理することが可能となり、共同音声合成タスクのさらなる検討がなされている。本稿では, ALLMの音声合成処理の精度について検討する。具体的には、音声処理と音声処理を一体化する新しいタスクであるJoint Audio-Speech Co-Reasoning (JASCO)を導入する。我々は,"What Are They Doing"と呼ばれるシーン推論データセットを公開し,一般的なALLMの協調推論能力を評価するために,共同音声合成ベンチマークを構築した。さらに、各モダリティへの依存を分析することにより、モデルの振舞いについてより深い洞察を提供する。 In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are present in the same audio clip. Recent Auditory Large Language Models (ALLMs) have made it possible to process audio and speech simultaneously within a single model, leading to further considerations of joint audio-speech tasks. In this paper, we investigate how well ALLMs can perform joint audio-speech processing. Specifically, we introduce Joint Audio-Speech Co-Reasoning (JASCO), a novel task that unifies audio and speech processing, strictly requiring co-reasoning across both modalities. We release a scene-reasoning dataset called "What Are They Doing" and establish a joint audio-speech benchmark to evaluate the joint reasoning capability of popular ALLMs. Additionally, we provide deeper insights into the models' behaviors by analyzing their dependence on each modality.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# ミドルマンアプローチによるセキュアかつ効率的なソースコードリポジトリホスティングのための統合ブロックチェーンとIPFSソリューション An Integrated Blockchain and IPFS Solution for Secure and Efficient Source Code Repository Hosting using Middleman Approach ( http://arxiv.org/abs/2409.14530v1 ) ライセンス: Link先を確認	Md. Rafid Haque, Sakibul Islam Munna, Sabbir Ahmed, Md. Tahmid Islam, Md Mehedi Hassan Onik, A. B. M. Ashikur Rahman,	(参考訳) バージョン管理システム(VCS)はソフトウェア開発に不可欠であるが、集中型VCSはデータ損失、セキュリティ侵害、所有権問題などのリスクを生じさせる。分散型ソースコードリポジトリホスティングに対するブロックチェーンベースのアプローチが検討されているが、既存のソリューションの多くは、セキュリティ、スケーラビリティ、効率性、リアルタイムコラボレーションに関わる課題に苦慮している。この研究は、EthereumブロックチェーンとIPFSを活用して、セキュアで効率的でレジリエントなコードリポジトリホスティングとガバナンスを実現する、新たな分散ソリューションを提案することで、これらの取り組みを強化することを目指している。当社のアプローチでは,ブロックチェーンの不変性と分散性と,オフチェーンストレージにおけるIPFSの効率性を組み合わせたハイブリッドアーキテクチャを導入しています。リアルタイムのコラボレーションを容易にするため,トランザクション処理を管理し,長期的セキュリティを損なうことなく運用効率を向上させる,一時集中型ミドルマンIPFSを統合した。このミドルマンIPFSは仲介役として機能し、中央集権システムの速度と分散アーキテクチャのレジリエンスのバランスをとる。本システムでは,アクセス権限を動的に検証することで,アクセス制御とキー管理をスマートコントラクトで管理し,IPFSに格納されたデータの取得と復号化を可能にする。この統合により、複数の共同作業者が共有リソースへの同時アクセスを必要とする環境で、セキュアでリアルタイムなコラボレーションが可能になる。本システムでは,対称暗号と非対称暗号を組み合わせたハイブリッド暗号方式を採用している。暗号化されたキーはブロックチェーン上に格納され、IPFSはコードベース自体の効率的なストレージを処理する。 Version control systems (VCS) are essential for software development, yet centralized VCS present risks such as data loss, security breaches, and ownership disputes. While blockchain-based approaches to decentralized source code repository hosting have been explored, many existing solutions struggle with challenges related to security, scalability, efficiency, and real-time collaboration. This study seeks to enhance these efforts by proposing a novel decentralized solution that leverages the Ethereum blockchain and IPFS for secure, efficient, and resilient code repository hosting and governance. Our approach introduces a hybrid architecture that combines the immutable and decentralized nature of blockchain with the efficiency of IPFS for off-chain storage. To facilitate real-time collaboration, we integrate a temporary centralized Middleman IPFS that manages transaction processing and enhances operational efficiency without compromising long-term security. This Middleman IPFS acts as an intermediary, balancing the speed of centralized systems with the resilience of decentralized architectures. Our system uses smart contracts to maintain access control and key management by dynamically verifying access rights, ensuring that only authorized users can retrieve and decrypt data stored on IPFS. This integration allows for secure, real-time collaboration in environments where multiple collaborators need concurrent access to shared resources. Our system employs a hybrid encryption scheme that combines symmetric and asymmetric cryptography. The encrypted keys are stored on the blockchain, while IPFS handles the efficient storage of the codebase itself, with a Middleman IPFS maintaining concurrent collaboration, providing a robust and scalable solution for managing large-scale, collaborative coding projects.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# 不均一モデルによるモデル非依存的データセット凝縮に向けて Towards Model-Agnostic Dataset Condensation by Heterogeneous Models ( http://arxiv.org/abs/2409.14538v1 ) ライセンス: Link先を確認	Jun-Yeong Moon, Jung Uk Kim, Gyeong-Moon Park,	(参考訳) 抽象。ディープラーニングの進歩は、モデルと利用可能なデータの両方の拡散と一致している。データセットサイズの増加とその後の計算要求の増加は、データセット凝縮(Dataset Condensation:DC)の開発につながった。従来の研究では、より効率的なモデルトレーニングのために、分散アライメントやトレーニング軌道追跡などの方法で合成画像を生成する方法が検討されてきたが、これらの凝縮されたイメージを実用的に利用する場合には、大きな課題が生じる。特に、これらの凝縮された画像は特定のモデルに特有であり、その汎用性と実用性を制約する傾向がある。この制限に対応するために,異種モデルデータセット凝縮法 (HMDC) を提案する。異種モデルを用いたモデルにおける勾配等級差と意味的距離の問題に対処するため,空間意味分解法を用いてグラディエント・バランス・モジュール (GBM) と相互蒸留 (MD) を提案する。提案手法は,各モデルのコントリビューションのバランスとセマンティックな意味の密接な維持により,モデル固有凝縮画像に関連する制約を克服し,より広範な有用性を高める。ソースコードはhttps://github.com/KHU-AGI/HMDCで入手できる。 Abstract. The advancement of deep learning has coincided with the proliferation of both models and available data. The surge in dataset sizes and the subsequent surge in computational requirements have led to the development of the Dataset Condensation (DC). While prior studies have delved into generating synthetic images through methods like distribution alignment and training trajectory tracking for more efficient model training, a significant challenge arises when employing these condensed images practically. Notably, these condensed images tend to be specific to particular models, constraining their versatility and practicality. In response to this limitation, we introduce a novel method, Heterogeneous Model Dataset Condensation (HMDC), designed to produce universally applicable condensed images through cross-model interactions. To address the issues of gradient magnitude difference and semantic distance in models when utilizing heterogeneous models, we propose the Gradient Balance Module (GBM) and Mutual Distillation (MD) with the SpatialSemantic Decomposition method. By balancing the contribution of each model and maintaining their semantic meaning closely, our approach overcomes the limitations associated with model-specific condensed images and enhances the broader utility. The source code is available in https://github.com/KHU-AGI/HMDC.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# 最小コストによる量子制御の幾何学的最適化 Geometric Optimization of Quantum Control with Minimum Cost ( http://arxiv.org/abs/2409.14540v1 ) ライセンス: Link先を確認	Chengming Tan, Yuhao Cai, Jinyi Zhang, Shengli Ma, Chenwei Lv, Ren Zhang,	(参考訳) 微分幾何学の観点から量子制御の最適化について検討する。ここでは、最適量子制御は、量子状態の輸送の最小コストを取る。コスト関数を定義することにより、関連するリーマン多様体の軌跡の長さによってコストを定量化する。 Su(2) と SU(1,1) の動的対称系を用いた最適化プロトコルを実証する。これらの系では、時間発展は三次元多様体で可視化される。初期状態と最終状態が与えられたとき、最小コストの量子制御は多様体の測地線に対応する。初期状態と最終状態を結ぶ軌道が特定されると、最小コストの量子制御は三次元多様体に埋め込まれた部分多様体の測地線に対応する。この状況における最適量子制御は、断熱駆動にショートカットを最適化する幾何学的な手段を提供する。 We study the optimization of quantum control from the perspective of differential geometry. Here, optimal quantum control takes the minimum cost of transporting a quantum state. By defining a cost function, we quantify the cost by the length of a trajectory in the relevant Riemannian manifold. We demonstrate the optimization protocol using SU(2) and SU(1,1) dynamically symmetric systems, which cover a large class of physical scenarios. For these systems, time evolution is visualized in the three-dimensional manifold. Given the initial and final states, the minimum-cost quantum control corresponds to the geodesic of the manifold. When the trajectory linking the initial and final states is specified, the minimum-cost quantum control corresponds to the geodesic in a sub-manifold embedded in the three-dimensional manifold. Optimal quantum control in this situation provides a geometrical means of optimizing shortcuts to adiabatic driving.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# マルチエージェント協調センシングのための分布ロバスト逆強化学習 Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing ( http://arxiv.org/abs/2409.14542v1 ) ライセンス: Link先を確認	Luke Snow, Vikram Krishnamurthy,	(参考訳) 我々は、マルチエージェントセンシングシステムの実用機能を再構築するために、分布性に頑健な逆強化学習(IRL)アルゴリズムを導出する。具体的には,雑音信号の観測を中心としたワッサーシュタインのあいまいさに対して,最悪のケース予測誤差を最小限に抑えるユーティリティ推定器を構築する。このロバストな推定と半無限の最適化再構成の等価性を証明し、計算解に対する一貫したアルゴリズムを提案する。本稿では,観測された追跡信号から認知レーダネットワークの実用機能を再構築するための数値的研究において,この堅牢なIRL方式の有効性について述べる。 We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# TrackNetV4: モーションアテンションマップによる高速スポーツオブジェクト追跡の強化 TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps ( http://arxiv.org/abs/2409.14543v1 ) ライセンス: Link先を確認	Arjun Raj, Lei Wang, Tom Gedeon,	(参考訳) スポーツビデオのボールのような、高速で小さな物体を正確に検出し、追跡することは、動きのぼやけや閉塞などの要因により困難である。 TrackNetV1、V2、V3といった最近のディープラーニングフレームワークには、高度なテニスボールとシャトルコックトラッキングがあるが、部分的な閉塞や可視性の低いシナリオでは、しばしば苦労する。これは、これらのモデルが運動情報を明示的に組み込むことなく視覚的特徴に強く依存するためであり、これは正確な追跡と軌道予測に不可欠である。本稿では,移動球の位置を効果的に強調し,追跡性能を向上させるとともに,移動球の位置を効果的に強調し,高次視覚特徴と学習可能な運動注意マップとを融合させることにより,トラックネットファミリーの強化を提案する。提案手法は,移動プロンプト層によって変調されたフレーム差分マップを利用して,時間とともに重要な動き領域をハイライトする。テニスボールとシャトルコックデータセットの実験結果から,TrackNetV2とV3のトラッキング性能が向上することが示された。我々は、既存のTrackNet上に構築された軽量のプラグイン・アンド・プレイソリューションをTrackNetV4と呼びます。 Accurately detecting and tracking high-speed, small objects, such as balls in sports videos, is challenging due to factors like motion blur and occlusion. Although recent deep learning frameworks like TrackNetV1, V2, and V3 have advanced tennis ball and shuttlecock tracking, they often struggle in scenarios with partial occlusion or low visibility. This is primarily because these models rely heavily on visual features without explicitly incorporating motion information, which is crucial for precise tracking and trajectory prediction. In this paper, we introduce an enhancement to the TrackNet family by fusing high-level visual features with learnable motion attention maps through a motion-aware fusion mechanism, effectively emphasizing the moving ball's location and improving tracking performance. Our approach leverages frame differencing maps, modulated by a motion prompt layer, to highlight key motion regions over time. Experimental results on the tennis ball and shuttlecock datasets show that our method enhances the tracking performance of both TrackNetV2 and V3. We refer to our lightweight, plug-and-play solution, built on top of the existing TrackNet, as TrackNetV4.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# 擬1次元量子ビットアレイを用いたシュウィンガーモデルダイナミクスのシミュレーション Simulating Schwinger model dynamics with quasi-one-dimensional qubit arrays ( http://arxiv.org/abs/2409.14544v1 ) ライセンス: Link先を確認	Alessio Lerose,	(参考訳) シュウィンガーモデルのリアルタイム力学は、クォーク閉じ込めを平衡から効果的に記述し、素粒子物理事象発生器のハドロン化過程をモデル化するために日常的に用いられる。このような非摂動過程のアブ・イニシアトシミュレーションは、既存の計算ツールの到達範囲をはるかに超えており、現在まで量子シミュレーターのオープンクエストとして傑出した存在である。本研究では、中性原子や超伝導量子ビットアレイなどの合成量子スピン格子上でシュウィンガーモデルダイナミクスを実行するための一般的な戦略を開発する。我々の構造は、モデルの制約されたフェルミオンおよびボゾン自由度を磁気界面の幾何学的形状にエンコードする。我々は、大域磁場パターンが格子シュヴィンガー・ハミルトニアンと同等の界面のコヒーレント量子力学を駆動できることを示した。実時間ウェーブパケット衝突と文字列フラグメンテーション過程を精度$\epsilon$の連続体-理論極限でシミュレートするために必要となる最適配列は、多項式長とポリ対数幅が$\epsilon^{-1}$の擬一次元リボンである、という厳密な証明を行う。最終的に、最先端の2種のRydberg原子配列を用いて、具体的な有利な実装について議論する。この研究は、短期量子シミュレーターが素粒子物理学に即時関係する問題に対処する道を開く。 Real-time dynamics of the Schwinger model provide an effective description of quark confinement out of equilibrium, routinely employed to model hadronization processes in particle-physics event generators. Ab-initio simulations of such non-perturbative processes are far beyond the reach of existing computational tools, and remain an outstanding open quest for quantum simulators to date. In this work we develop a general strategy to run Schwinger model dynamics on synthetic quantum spin lattices, such as neutral-atom or superconducting-qubit arrays. Our construction encodes the constrained fermionic and bosonic degrees of freedom of the model into the geometric shape of a magnetic interface. We show that global magnetic field patterns can drive coherent quantum dynamics of the interface equivalent to the lattice Schwinger Hamiltonian. We rigorously establish that the optimal array required for simulating real-time wave packet collisions and string fragmentation processes with accuracy $\epsilon$ in the continuum field-theory limit, is a quasi-one-dimensional ribbon with polynomial length and polylogarithmic width in $\epsilon^{-1}$. We finally discuss a concrete advantageous implementation using a state-of-the-art dual-species Rydberg atom array. This work opens up a path for near-term quantum simulators to address questions of immediate relevance to particle physics.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# ニューラルネットワークにおける適応的フィードフォワード勾配推定 Adaptive Feedforward Gradient Estimation in Neural ODEs ( http://arxiv.org/abs/2409.14549v1 ) ライセンス: Link先を確認	Jaouad Dabounou,	(参考訳) ニューラル正規微分方程式(Neural Ordinary Differential Equations)は、機械学習と数世紀にわたって様々な数学分野で発展したリッチな理論フレームワークの間のギャップを埋めることを約束し、ディープラーニングにおける重要なブレークスルーである。本研究では,適応フィードフォワード勾配推定を利用してニューラルネットワークの効率,一貫性,解釈性を向上させる手法を提案する。提案手法では,バックプロパゲーションとアジョイントを不要にし,計算オーバーヘッドとメモリ使用量を削減し,精度を向上する。提案手法は実用的応用によって検証され,ニューラルODEの最先端手法と比較して優れた性能を示した。 Neural Ordinary Differential Equations (Neural ODEs) represent a significant breakthrough in deep learning, promising to bridge the gap between machine learning and the rich theoretical frameworks developed in various mathematical fields over centuries. In this work, we propose a novel approach that leverages adaptive feedforward gradient estimation to improve the efficiency, consistency, and interpretability of Neural ODEs. Our method eliminates the need for backpropagation and the adjoint method, reducing computational overhead and memory usage while maintaining accuracy. The proposed approach has been validated through practical applications, and showed good performance relative to Neural ODEs state of the art methods.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# GlamTry: ハイエンドアクセサリーのための仮想トライオンの強化 GlamTry: Advancing Virtual Try-On for High-End Accessories ( http://arxiv.org/abs/2409.14553v1 ) ライセンス: Link先を確認	Ting-Yu Chang, Seretsi Khabane Lekena,	(参考訳) 本論文は,ジュエリーや時計などのアクセサリーのフォトリアリスティックな仮想試行モデルが欠如していることに対処することを目的としている。既存の仮想試用モデルは、主に衣料品に焦点を当てているが、アクセサリーの市場はギャップがある。本研究は,VITON-HDなどの衣服用2次元仮想試着モデルの応用について検討し,他のコンピュータビジョンモデル,特にMediaPipe Hand Landmarkerと統合する。既存の文献に基づいて、この研究はアクセサリー固有のデータとネットワークアーキテクチャの変更を使用してユニークなモデルをカスタマイズし、トレーニングし、仮想トライオン技術をアクセサリーに拡張する可能性を評価する。その結果、小さなデータセットであっても、衣服の原型モデルと比較して位置予測が改善された。これは、このモデルの可能性を1万枚を超える大きなデータセットで示し、仮想アクセサリートライオンアプリケーションにおける将来の研究の道を開くものだ。 The paper aims to address the lack of photorealistic virtual try-on models for accessories such as jewelry and watches, which are particularly relevant for online retail applications. While existing virtual try-on models focus primarily on clothing items, there is a gap in the market for accessories. This research explores the application of techniques from 2D virtual try-on models for clothing, such as VITON-HD, and integrates them with other computer vision models, notably MediaPipe Hand Landmarker. Drawing on existing literature, the study customizes and retrains a unique model using accessory-specific data and network architecture modifications to assess the feasibility of extending virtual try-on technology to accessories. Results demonstrate improved location prediction compared to the original model for clothes, even with a small dataset. This underscores the model's potential with larger datasets exceeding 10,000 images, paving the way for future research in virtual accessory try-on applications.	翻訳日:2024-11-06 22:19:40 公開日:2024-09-22
# ランダム密度行列に対する2乗ヘルリンガー距離の厳密平均と分散 Exact mean and variance of the squared Hellinger distance for random density matrices ( http://arxiv.org/abs/2409.14560v1 ) ライセンス: Link先を確認	Vinay Kumar, Kaushik Vasan, Santosh Kumar,	(参考訳) 量子状態間のヘルガー距離は、そのリーマン的性質と単調性で知られている量子情報理論において重要な測度である。また、これらの性質を共有する別の測度であるバーズ距離よりも計算が容易である。本研究では,一対の密度行列間のヘリンガー距離の平均と分散を導出する。その過程では、平均親和性と平均正方性についても正確な結果が得られる。ヘルリンガー距離の最初の2つの累積は、ガンマ分布に基づいて対応する確率密度関数の近似を提案できる。分析結果はモンテカルロシミュレーションにより相関し,良好な一致を示した。 The Hellinger distance between quantum states is a significant measure in quantum information theory, known for its Riemannian and monotonic properties. It is also easier to compute than the Bures distance, another measure that shares these properties. In this work, we derive the mean and variance of the Hellinger distance between pairs of density matrices, where one or both matrices are random. Along the way, we also obtain exact results for the mean affinity and mean square affinity. The first two cumulants of the Hellinger distance allow us to propose an approximation for the corresponding probability density function based on the gamma distribution. Our analytical results are corroborated through Monte Carlo simulations, showing excellent agreement.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# 脳波信号から筋肉アーチファクトを除去するためのEMDエンコーダ Encoder with the Empirical Mode Decomposition (EMD) to remove muscle artefacts from EEG signal ( http://arxiv.org/abs/2409.14571v1 ) ライセンス: Link先を確認	Ildar Rakhmatulin,	(参考訳) 本稿では,経験的モード分解(EMD)法と機械学習アーキテクチャを組み合わせることで,脳波信号からアーティファクトを効果的に除去する手法を提案する。提案手法は, 上部および下部の補間によりEMD法を改良し, 既存のアーティファクト除去技術の限界に対処するものである。従来のアーティファクト除去法では、EMD技術が一般的である。しかし、この課題は信号の欠落した成分を正確に補間し、固有の周波数成分を保存することである。この制限を克服するために、我々は、データを直接操作することなく補間処理を慎重に処理できる機械学習技術を導入した。我々のアプローチの主な利点は、人工物除去時の脳波信号の自然特性の保存である。機械学習を補間に利用することにより、EMD法で得られた平均成分が元の信号の重要な周波数成分を保持することを保証する。この保存は、脳波データの完全性と忠実性を維持するために不可欠であり、正確な分析と解釈を可能にする。その結果,脳波信号処理と解析のさらなる進歩を図り,本手法の有効性を検証した。 This paper introduces a novel method for effectively removing artifacts from EEG signals by combining the Empirical Mode Decomposition (EMD) method with a machine learning architecture. The proposed method addresses the limitations of existing artifact removal techniques by enhancing the EMD method through interpolation of the upper and lower. For conventional artifact removal methods, the EMD technique is commonly employed. However, the challenge lies in accurately interpolating the missing components of the signal while preserving its inherent frequency components. To overcome this limitation, we incorporated machine learning technique, which enables us to carefully handle the interpolation process without directly manipulating the data. The key advantage of our approach lies in the preservation of the natural characteristics of the EEG signal during artifact removal. By utilizing machine learning for interpolation, we ensure that the average component obtained through the EMD method retains the crucial frequency components of the original signal. This preservation is essential for maintaining the integrity and fidelity of the EEG data, allowing for accurate analysis and interpretation. The results obtained from our evaluation serve to validate the effectiveness of our approach and pave the way for further advancements in EEG signal processing and analysis.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# 材料科学Q&AにおけるLCMの性能とロバスト性の評価と特性予測 Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions ( http://arxiv.org/abs/2409.14572v1 ) ライセンス: Link先を確認	Hongchen Wang, Kangming Li, Scott Ramsay, Yao Fehlis, Edward Kim, Jason Hattrick-Simpers,	(参考訳) 大規模言語モデル(LLM)は科学的研究に革命をもたらす可能性があるが、ドメイン固有のアプリケーションにおける堅牢性と信頼性はいまだ不十分である。本研究では, 材料科学分野におけるLCMの総合的評価とロバスト性解析を行い, ドメイン固有質問応答と材料特性予測に着目した。この研究には3つの異なるデータセットが使われている。 1)学部レベルの教材科学講座からの複数項目の質問。 2 各種鋼の組成及び収量強度を含むデータセット及び 3)材料結晶構造とバンドギャップ値のテキスト記述を含むバンドギャップデータセット。 LLMのパフォーマンスは、ゼロショットチェーン、エキスパートプロンプト、少数ショットインコンテキスト学習など、さまざまなプロンプト戦略を用いて評価される。これらのモデルのロバスト性は、現実的な乱れから意図的に敵対的な操作に至るまで、様々な種類の「ノイズ」に対してテストされ、現実の条件下でのそれらの弾力性と信頼性を評価する。さらに, 列車/試験ミスマッチによる性能向上, 即時事例に近い場合のモード崩壊挙動など, 予測作業中のLCMの特異な現象を明らかにする。本研究の目的は, LLMを材料科学に広く活用するための情報的懐疑論を提供することと, その堅牢性と信頼性を高めるための進歩を促すことにある。 Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. This study conducts a comprehensive evaluation and robustness analysis of LLMs within the field of materials science, focusing on domain-specific question answering and materials property prediction. Three distinct datasets are used in this study: 1) a set of multiple-choice questions from undergraduate-level materials science courses, 2) a dataset including various steel compositions and yield strengths, and 3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The robustness of these models is tested against various forms of 'noise', ranging from realistic disturbances to intentionally adversarial manipulations, to evaluate their resilience and reliability under real-world conditions. Additionally, the study uncovers unique phenomena of LLMs during predictive tasks, such as mode collapse behavior when the proximity of prompt examples is altered and performance enhancement from train/test mismatch. The findings aim to provide informed skepticism for the broad use of LLMs in materials science and to inspire advancements that enhance their robustness and reliability for practical applications.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# リチウムイオン電池の健康状態評価のためのドメイン知識誘導機械学習フレームワーク Domain knowledge-guided machine learning framework for state of health estimation in Lithium-ion batteries ( http://arxiv.org/abs/2409.14575v1 ) ライセンス: Link先を確認	Andrea Lanubile, Pietro Bosoni, Gabriele Pozzato, Anirudh Allam, Matteo Acquarone, Simona Onori,	(参考訳) 電気自動車のバッテリ管理には,バッテリ状態の正確な推定が不可欠である。そこで本研究では,現実世界の電気自動車運転からオンラインで抽出できる5つの健康指標を提案し,健康状態を推定する機械学習手法を開発した。提案するインジケータは,電池のエネルギーと電力消費に関する物理的知見を提供し,一部欠落したデータであっても正確なキャパシティ推定を可能にする。さらに、充電プロファイルと実世界の運転条件の一部を計算し、リアルタイムのバッテリー劣化推定を容易にする。この指標は、電気自動車の環境下にある5つのセルの実験データを用いて算出され、健康状態を推定するために線形回帰モデルが用いられる。その結果、電力自己相関とエネルギーベースの特徴で訓練されたモデルでは、最大絶対パーセンテージ誤差が1.5%から2.5%の範囲でキャパシティ推定が達成された。 Accurate estimation of battery state of health is crucial for effective electric vehicle battery management. Here, we propose five health indicators that can be extracted online from real-world electric vehicle operation and develop a machine learning-based method to estimate the battery state of health. The proposed indicators provide physical insights into the energy and power fade of the battery and enable accurate capacity estimation even with partially missing data. Moreover, they can be computed for portions of the charging profile and real-world driving discharging conditions, facilitating real-time battery degradation estimation. The indicators are computed using experimental data from five cells aged under electric vehicle conditions, and a linear regression model is used to estimate the state of health. The results show that models trained with power autocorrelation and energy-based features achieve capacity estimation with maximum absolute percentage error within 1.5% to 2.5% .	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# ARオーバーレイ: 曲面の合成による画像空間の推定 AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way ( http://arxiv.org/abs/2409.14577v1 ) ライセンス: Link先を確認	Sining Huang, Yukun Song, Yixiao Kang, Chang Yu,	(参考訳) 空間コンピューティングの分野において、最も重要なタスクの1つは、3Dオブジェクトのポーズ推定である。任意の3Dオブジェクトの剛性変換は、照明不足や閉塞といった要因を取り入れた様々な環境のため、比較的検出が難しいが、事前に定義された形状のオブジェクトは、幾何学的制約を利用して、追跡が容易であることが多い。曲がりくねった画像は、フレキシブルな寸法だが狭い形状であり、しばしば3Dトラッキングの標的となる。伝統的に、プロプライエタリなアルゴリズムは、単一の画像ターゲットのポーズ推定を可能にするために、入力と元の平坦な画像と共に、特定の曲率測定を必要とすることが多い。本稿では,複数のロゴイメージを同時に検出できるパイプラインを提案し,入力として元の画像のみを必要とする。 In the field of spatial computing, one of the most essential tasks is the pose estimation of 3D objects. While rigid transformations of arbitrary 3D objects are relatively hard to detect due to varying environment introducing factors like insufficient lighting or even occlusion, objects with pre-defined shapes are often easy to track, leveraging geometric constraints. Curved images, with flexible dimensions but a confined shape, are essential shapes often targeted in 3D tracking. Traditionally, proprietary algorithms often require specific curvature measures as the input along with the original flattened images to enable pose estimation for a single image target. In this paper, we propose a pipeline that can detect several logo images simultaneously and only requires the original images as the input, unlocking more effects in downstream fields such as Augmented Reality (AR).	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# X型 -- Twitter Sphereのセマンティックスをマッピングする The X Types -- Mapping the Semantics of the Twitter Sphere ( http://arxiv.org/abs/2409.14584v1 ) ライセンス: Link先を確認	Ogen Schlachet Drukerman, Einat Minkov,	(参考訳) ソーシャルネットワークは、影響力のあるエンティティが人気のあるアカウントに対応する世界知識の貴重な情報源を形成する。意味オントロジーを保持する事実知識ベース(KB)とは異なり、構造化された意味情報はソーシャルメディアでは利用できない。本研究では、約200万のTwitterアカウントのソーシャルKBについて検討する。これらのエンティティに関するセマンティック情報を求めます。特に、特定のエンティティアカウントが政治家または音楽アーティストのものであるかどうかを判断するなど、136のセマンティックタイプからなるきめ細かいセットを関連付ける。 Twitter には明示的な型情報がないため,DBpedia および Wikidata の KB と整合して,アカウントのサブセットのセマンティックラベルを取得する。ラベル付きデータセットが与えられたら、トランスフォーマーベースのテキストエンコーダを微調整して、アカウントの内容に基づいてエンティティのセマンティック埋め込みを生成する。次に、このエビデンスとネットワークベースの埋め込みを併用して、エンティティのセマンティックタイプを予測する。実験ではラベル付きデータセット上で高い型予測性能を示す。その結果,ソーシャルKBのすべてのエンティティアカウントに型分類モデルを適用した。この結果から,Twitterのグローバルセマンティクスに関する知見が得られた。本研究で生成したソーシャルエンティティのセマンティックな型情報とセマンティックな埋め込みの恩恵を受けるべき下流アプリケーションについて論じる。特に,この情報を用いたエンティティ類似度評価の重要課題における性能向上を示す。 Social networks form a valuable source of world knowledge, where influential entities correspond to popular accounts. Unlike factual knowledge bases (KBs), which maintain a semantic ontology, structured semantic information is not available on social media. In this work, we consider a social KB of roughly 200K popular Twitter accounts, which denotes entities of interest. We elicit semantic information about those entities. In particular, we associate them with a fine-grained set of 136 semantic types, e.g., determine whether a given entity account belongs to a politician, or a musical artist. In the lack of explicit type information in Twitter, we obtain semantic labels for a subset of the accounts via alignment with the KBs of DBpedia and Wikidata. Given the labeled dataset, we finetune a transformer-based text encoder to generate semantic embeddings of the entities based on the contents of their accounts. We then exploit this evidence alongside network-based embeddings to predict the entities semantic types. In our experiments, we show high type prediction performance on the labeled dataset. Consequently, we apply our type classification model to all of the entity accounts in the social KB. Our analysis of the results offers insights about the global semantics of the Twitter sphere. We discuss downstream applications that should benefit from semantic type information and the semantic embeddings of social entities generated in this work. In particular, we demonstrate enhanced performance on the key task of entity similarity assessment using this information.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# Fokker-Planck方程式とディープスプリッティングに基づくベイズフィルタ問題の収束スキーム A convergent scheme for the Bayesian filtering problem based on the Fokker--Planck equation and deep splitting ( http://arxiv.org/abs/2409.14585v1 ) ライセンス: Link先を確認	Kasper Bågmark, Adam Andersson, Stig Larsson, Filip Rydin,	(参考訳) 非線形フィルタリング密度を近似する数値スキームを導入し、その収束速度を理論的には放物的H\"{o}rmander条件下で確立し、2つの例を経験的に検証する。予測ステップでは、離散時間における雑音と部分的な測定の間、このスキームはフォッカー・プランク方程式を深い分割スキームで近似し、ベイズの公式を通じて正確な更新を行う。その結果、従来の予測更新フィルタリングアルゴリズムが、トレーニング後の新しい観測シーケンスのためにオンラインで動作している。このアルゴリズムは、次元の呪いを軽減するために設計されたサンプリングベースのFeynman-Kacアプローチを採用している。我々の収束証明は、マリリャビン積分式に依存している。系として、フィルタ問題から切り離されたフォッカー・プランク方程式のみの近似の収束率を得る。 A numerical scheme for approximating the nonlinear filtering density is introduced and its convergence rate is established, theoretically under a parabolic H\"{o}rmander condition, and empirically for two examples. For the prediction step, between the noisy and partial measurements at discrete times, the scheme approximates the Fokker--Planck equation with a deep splitting scheme, and performs an exact update through Bayes' formula. This results in a classical prediction-update filtering algorithm that operates online for new observation sequences post-training. The algorithm employs a sampling-based Feynman--Kac approach, designed to mitigate the curse of dimensionality. Our convergence proof relies on the Malliavin integration-by-parts formula. As a corollary we obtain the convergence rate for the approximation of the Fokker--Planck equation alone, disconnected from the filtering problem.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# バックトラッキングはジェネレーションセーフティを改善する Backtracking Improves Generation Safety ( http://arxiv.org/abs/2409.14586v1 ) ライセンス: Link先を確認	Yiming Zhang, Jianfeng Chi, Hailey Nguyen, Kartikeya Upasani, Daniel M. Bikel, Jason Weston, Eric Michael Smith,	(参考訳) テキスト生成は定義上、基本的な制限がある: 明らかに問題のある場合でも、生成されたトークンを取り返すことはできない。言語モデルの安全性の文脈では、部分的な安全でない生成が生成されると、言語モデルは、同様に安全でない追加のテキストを生成し続ける傾向にある。実際これは、フロンティアモデルの安全性向上に多大な努力を払っているにもかかわらず、フロンティアモデルの安全性の整合性が野生で回避される方法である。安全アライメントを予防(有害な応答の確率の低下)としてアプローチするパラダイムから脱却し,特別な[RESET]トークンを導入することで,言語モデルが“アンド”し,自身の安全でない世代から回復することを可能にする手法であるバックトラック手法を提案する。本手法は, 有用性と無害性を最適化するために, SFT あるいは DPO トレーニングに組み込むことができる。 Llama-3-8Bはベースラインモデル (6.1\% $\to$ 1.5\%) よりも4倍安全である。また,適応攻撃を含む4つの敵攻撃に対する防御も行うが,その訓練は受けていない。 Text generation has a fundamental limitation almost by definition: there is no taking back tokens that have been generated, even when they are clearly problematic. In the context of language model safety, when a partial unsafe generation is produced, language models by their nature tend to happily keep on generating similarly unsafe additional text. This is in fact how safety alignment of frontier models gets circumvented in the wild, despite great efforts in improving their safety. Deviating from the paradigm of approaching safety alignment as prevention (decreasing the probability of harmful responses), we propose backtracking, a technique that allows language models to "undo" and recover from their own unsafe generation through the introduction of a special [RESET] token. Our method can be incorporated into either SFT or DPO training to optimize helpfulness and harmlessness. We show that models trained to backtrack are consistently safer than baseline models: backtracking Llama-3-8B is four times more safe than the baseline model (6.1\% $\to$ 1.5\%) in our evaluations without regression in helpfulness. Our method additionally provides protection against four adversarial attacks including an adaptive attack, despite not being trained to do so.	翻訳日:2024-11-06 22:08:18 公開日:2024-09-22
# URSimulator:拡散モデルによる仮想都市再生のための人間知覚駆動型プロンプトチューニング URSimulator: Human-Perception-Driven Prompt Tuning for Enhanced Virtual Urban Renewal via Diffusion Models ( http://arxiv.org/abs/2409.14589v1 ) ライセンス: Link先を確認	Chuanbo Hu, Shan Jia, Xin Li,	(参考訳) 都市身体障害(放棄された建物、ゴミ、乱雑な植生、落書きなど)に取り組むことは、コミュニティの安全、幸福、心理的状態に悪影響を及ぼすため不可欠である。都市再生 (Urban Renewal) は、住民の身体環境と生活の質を改善するために、都市内のこれら無視され、崩壊した地域を再活性化する過程である。効果的な都市再生努力は、これらの環境を変革し、その魅力と自由性を高めることができる。しかし、現在の研究では、しばしば主観的な判断に依存して、更新作業の影響を定量的に評価し視覚化するシミュレーションツールが欠如している。このようなツールは、潜在的な変化とその影響を明確に可視化することによって、効果的な戦略を計画し、実行するために不可欠です。本稿では、人間の知覚フィードバックを用いて、街路環境の強化をシミュレートすることで、このギャップに対処する新しい枠組みを提案する。我々は、テキスト駆動の安定拡散と人間の知覚フィードバックを統合し、ストリートビュー画像の局所的な領域を反復的に編集し、美、活気、安全の知覚とよりよく一致させる即時チューニングアプローチを開発した。この枠組みは, 安全性17.60%, 美容31.15%, 生活環境28.82%の増加とともに, 都市環境の認識を著しく向上させることを示した。対照的に、DiffEditのような先進的な手法は、それぞれ2.31%、11.87%、および15.84%の改善しか達成していない。本研究では, この枠組みを, 地区改良, ビル再開発, 緑地拡張, コミュニティガーデン作成など, 様々な仮想シナリオに適用した。その結果, 都市再生をシミュレーションする上での有効性が示され, 都市計画や政策立案に貴重な知見が得られた。 Tackling Urban Physical Disorder (e.g., abandoned buildings, litter, messy vegetation, graffiti) is essential, as it negatively impacts the safety, well-being, and psychological state of communities. Urban Renewal is the process of revitalizing these neglected and decayed areas within a city to improve the physical environment and quality of life for residents. Effective urban renewal efforts can transform these environments, enhancing their appeal and livability. However, current research lacks simulation tools that can quantitatively assess and visualize the impacts of renewal efforts, often relying on subjective judgments. Such tools are crucial for planning and implementing effective strategies by providing a clear visualization of potential changes and their impacts. This paper presents a novel framework addressing this gap by using human perception feedback to simulate street environment enhancement. We develop a prompt tuning approach that integrates text-driven Stable Diffusion with human perception feedback, iteratively editing local areas of street view images to better align with perceptions of beauty, liveliness, and safety. Our experiments show that this framework significantly improves perceptions of urban environments, with increases of 17.60% in safety, 31.15% in beauty, and 28.82% in liveliness. In contrast, advanced methods like DiffEdit achieve only 2.31%, 11.87%, and 15.84% improvements, respectively. We applied this framework across various virtual scenarios, including neighborhood improvement, building redevelopment, green space expansion, and community garden creation. The results demonstrate its effectiveness in simulating urban renewal, offering valuable insights for urban planning and policy-making.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# 条件付き独立性による多項式遅延における隠れ変数をもつ因果関係モデルの検証 Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies ( http://arxiv.org/abs/2409.14593v1 ) ライセンス: Link先を確認	Hyunchai Jeong, Adiba Ejaz, Jin Tian, Elias Bareinboim,	(参考訳) 観測データに対して仮説化された因果モデルをテストすることは、多くの因果推論タスクにとって重要な前提条件である。自然なアプローチは、モデルがデータに持つ条件付き独立関係(CI)をテストすることである。モデルは(変数の数に関して)指数関数的に多くのCIを仮定できるが、これらすべてをテストすることは非現実的であり、不要である。これらのCIを多項式空間にエンコードする因果グラフは、CIのかなり小さな部分集合でモデルテストを可能にする局所マルコフ特性をもたらす。ローカルプロパティに基づいたモデルテストでは、関連するCIをリストアップするアルゴリズムが必要である。しかし、隠れ変数や非パラメトリック分布を持つ現実的な設定のための既存のアルゴリズムは、ひとつのCI制約を発生させるのに指数関数的な時間を要する可能性がある。本稿では、隠れ変数を持つ因果グラフに対するc-component local Markov property (C-LMP)を提案する。 C-LMPは指数関数的なCIを起動できるため、多項式遅延アルゴリズムを開発し、これらのCIを多時間間隔でリストアップする。我々の知る限り、これは任意のデータ分布に対して隠れ変数を持つ因果グラフにおけるCIの多遅延テストを可能にする最初のアルゴリズムである。実世界のデータと合成データの実験は、我々のアルゴリズムの実用性を実証している。 Testing a hypothesized causal model against observational data is a key prerequisite for many causal inference tasks. A natural approach is to test whether the conditional independence relations (CIs) assumed in the model hold in the data. While a model can assume exponentially many CIs (with respect to the number of variables), testing all of them is both impractical and unnecessary. Causal graphs, which encode these CIs in polynomial space, give rise to local Markov properties that enable model testing with a significantly smaller subset of CIs. Model testing based on local properties requires an algorithm to list the relevant CIs. However, existing algorithms for realistic settings with hidden variables and non-parametric distributions can take exponential time to produce even a single CI constraint. In this paper, we introduce the c-component local Markov property (C-LMP) for causal graphs with hidden variables. Since C-LMP can still invoke an exponential number of CIs, we develop a polynomial delay algorithm to list these CIs in poly-time intervals. To our knowledge, this is the first algorithm that enables poly-delay testing of CIs in causal graphs with hidden variables against arbitrary data distributions. Experiments on real-world and synthetic data demonstrate the practicality of our algorithm.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# EchoAtt: より効率的な大規模言語モデルのためのテンプレート、コピー、調整 EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models ( http://arxiv.org/abs/2409.14595v1 ) ライセンス: Link先を確認	Hossein Rajabzadeh, Aref Jafari, Aman Sharma, Benyamin Jami, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh,	(参考訳) 大きな言語モデル(LLM)は、その深度とパラメータの数の増加とともに、様々な自然言語処理タスクにおいて優れたパフォーマンスを示している。しかし、このスケールの増大は、特に推論と微調整の間、計算要求の増大につながる。これらの課題に対処するために,レイヤ間の注目パターンの類似性を解析し活用することにより,トランスフォーマーベースのモデルの最適化を目的とした,新しいフレームワークであるEchoAttを紹介した。解析の結果, LLMの内部層, 特に大きな層は, 非常に類似した注意行列を示すことが明らかとなった。この類似性を活用することで、EchoAttは注意行列をあまり重要でない層で共有することができ、性能を損なうことなく計算要求を大幅に削減できる。本手法を知識蒸留システムに組み込むことにより,教師モデルがより小規模な学生モデルの訓練を指導する。学生モデルは、教師から重要なパラメータを継承しながら、高い類似性を持つ層に注意行列を選択的に共有する。 TinyLLaMA-1.1Bによる最良の結果は、EchoAttが推論速度を15倍改善し、トレーニング速度を25倍改善し、パラメータ数を約4倍削減し、ゼロショット性能を向上することを示した。これらの知見は,LLMの効率を高めるためにアテンションマトリックス共有の可能性を強調し,リアルタイムおよびリソース制限されたアプリケーションにおいてより実用的なものとなる。 Large Language Models (LLMs), with their increasing depth and number of parameters, have demonstrated outstanding performance across a variety of natural language processing tasks. However, this growth in scale leads to increased computational demands, particularly during inference and fine-tuning. To address these challenges, we introduce EchoAtt, a novel framework aimed at optimizing transformer-based models by analyzing and leveraging the similarity of attention patterns across layers. Our analysis reveals that many inner layers in LLMs, especially larger ones, exhibit highly similar attention matrices. By exploiting this similarity, EchoAtt enables the sharing of attention matrices in less critical layers, significantly reducing computational requirements without compromising performance. We incorporate this approach within a knowledge distillation setup, where a pre-trained teacher model guides the training of a smaller student model. The student model selectively shares attention matrices in layers with high similarity while inheriting key parameters from the teacher. Our best results with TinyLLaMA-1.1B demonstrate that EchoAtt improves inference speed by 15\%, training speed by 25\%, and reduces the number of parameters by approximately 4\%, all while improving zero-shot performance. These findings highlight the potential of attention matrix sharing to enhance the efficiency of LLMs, making them more practical for real-time and resource-limited applications.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# DarkGram:Telegramチャンネルで共有されるサイバー犯罪コンテンツの探索と緩和 DarkGram: Exploring and Mitigating Cybercriminal content shared in Telegram channels ( http://arxiv.org/abs/2409.14596v1 ) ライセンス: Link先を確認	Sayak Saha Roy, Elham Pourabbas Vafa, Kobra Khanmohammadi, Shirin Nilizadeh,	(参考訳) 2024年2月から5月にかけてTelegramで339のサイバー犯罪活動チャネル(CAC)の大規模分析を行った。合計で2380万人を超えるユーザーを擁するこれらのチャンネルは、侵入された資格情報、海賊版ソフトウェアとメディア、マルウェア、ソーシャルエンジニアリング詐欺、エクスプロイトキットなどのブラックハットハッキングリソースのためのツールを含む、幅広い不正コンテンツを共有した。 BERTベースのフレームワークであるDarkGramを開発し、CACからの悪意のある投稿を96%の精度で識別し、これらのチャンネルから53,605件の投稿を定量的に分析し、共有コンテンツの鍵となる特徴を明らかにした。これらのコンテンツの多くは無料で配布されているが、チャンネル管理者はユーザーをエンゲージメントし、プレミアムなサイバー犯罪コンテンツの販売を促進するためにプロモーションやオファーを頻繁に採用している。これらのチャンネルは、自身の購読者にも重大なリスクをもたらす。特に、共有リンクの28.1%はフィッシング攻撃を含んでおり、実行ファイルの38%はマルウェアにバンドルされている。さらに,CACにおける回答の質的分析は,非合法なコンテンツに対する要求や,不正な知識共有,コラボレーティブなハッキング活動を通じて,コミュニティの危険な感覚をいかに育むかを示し,絵文字応答を含む投稿に対する反応は,その内容に対する認識をさらに強調する。また、CACは、サブスクライバの損失を最小限に抑えた新しいチャネルに素早く移行することで、監視を回避することができ、このエコシステムのレジリエンスを浮き彫りにしている。これに対抗するために、DarkGramを使用して新しいチャンネルを検出し、Telegramや影響を受けた組織に悪意のあるコンテンツを報告し、3ヶ月で196チャンネルが削除された。これらのチャネルをダウンさせるためのさらなる共同作業を支援するため、データセットとDarkGramフレームワークをオープンソースにしています。 We present the first large scale analysis of 339 cybercriminal activity channels (CACs) on Telegram from February to May 2024. Collectively followed by over 23.8 million users, these channels shared a wide array of illicit content, including compromised credentials, pirated software and media, tools for blackhat hacking resources such as malware, social engineering scams, and exploit kits. We developed DarkGram, a BERT based framework that identifies malicious posts from the CACs with an accuracy of 96%, using which we conducted a quantitative analysis of 53,605 posts from these channels, revealing key characteristics of shared content. While much of this content is distributed for free, channel administrators frequently employ promotions and giveaways to engage users and boost the sales of premium cybercriminal content. These channels also pose significant risks to their own subscribers. Notably, 28.1% of shared links contained phishing attacks, and 38% of executable files were bundled with malware. Moreover, our qualitative analysis of replies in CACs shows how subscribers cultivate a dangerous sense of community through requests for illegal content, illicit knowledge sharing, and collaborative hacking efforts, while their reactions to posts, including emoji responses, further underscore their appreciation for such content. We also find that the CACs can evade scrutiny by quickly migrating to new channels with minimal subscriber loss, highlighting the resilience of this ecosystem. To counteract this, we further utilized DarkGram to detect new channels, reporting malicious content to Telegram and the affected organizations which resulted in the takedown of 196 such channels over three months. To aid further collaborative efforts in taking down these channels, we open source our dataset and the DarkGram framework.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# 動的デカップリングを用いたクロストークによる量子攻撃の回避 Defending crosstalk-mediated quantum attacks using dynamical decoupling ( http://arxiv.org/abs/2409.14598v1 ) ライセンス: Link先を確認	Devika Mehra, Amir Kalev,	(参考訳) ここ数年、量子コンピューティングの分野はアルゴリズム開発において大きな進歩を遂げ、新たな高度に達している。上昇する研究分野と並行して、企業や研究所は、さまざまな実験の正確で迅速な結果を提供するために、フォールトトレラントな量子コンピュータの構築に積極的に取り組んでいる。量子コンピュータの需要の増大は、幅広いユーザーベースのためにマルチテナントを可能にするためにハードウェアの共有を必要とする。このアプローチは、限られた量子リソースの利用を最適化する一方で、潜在的なセキュリティ脆弱性も導入する。本稿では,このような脅威から正規回路を保護する対策として動的デカップリング(DD)について検討する。我々は,Groverの検索アルゴリズムに対するクロストークによる攻撃に焦点を当てた。他の対策と比較してDDは攻撃の軽減に成功し、場合によってはノンアタック以上の回路性能を向上させることができる。そこで本研究では,マルチテナンシ量子ハードウェア上でのアルゴリズム実行にDDを組み込むことの重要性を強調した。 In the past few years, the field of quantum computing is reaching new heights with significant advancements in algorithm development. In parallel to rising research areas, companies and research labs are actively working to build fault-tolerant quantum computers which can help provide accurate and speedy results for the various experiments. The increasing demand for quantum computers necessitates the sharing of hardware to enable multi-tenancy for a broad user base. While this approach optimizes the utilization of limited quantum resources, it also introduces potential security vulnerabilities. In this paper we examine dynamical decoupling (DD) as a countermeasure to protect the legitimate circuit from such threats. We focus on crosstalk-mediated attacks on Grover's search algorithm. We find that, when compared to other countermeasures, DD successfully mitigates the attack and in some cases is able to improve the performance of the circuit beyond the level of no-attack. Thus our results emphasis the importance of incorporating DD into algorithm executions on multi-tenancy quantum hardware.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# 脳外科:概念消去による大規模言語モデルにおけるGDPRコンプライアンスの確保 Brain Surgery: Ensuring GDPR Compliance in Large Language Models via Concept Erasure ( http://arxiv.org/abs/2409.14603v1 ) ライセンス: Link先を確認	Michele Laurelli,	(参考訳) 大規模なAIシステムが普及するにつれて、GDPR(General Data Protection Regulation)のようなデータプライバシ規則の遵守が重要になっている。本稿では、リアルタイムプライバシ管理とターゲット未学習を可能にすることにより、すべてのローカルAIモデルGDPR対応化のための変革的手法であるBrain Surgeryを紹介する。 Embedding-Corrupted Prompts (ECO Prompts)、ブロックチェーンベースのプライバシ管理、プライバシを意識した継続的学習といった高度な技術に基づいて、Brain Surgeryは、さまざまなAIアーキテクチャにデプロイ可能なモジュラーソリューションを提供する。このツールは、プライバシ規則の遵守を保証するだけでなく、ユーザが自身のプライバシ制限を定義できるようにし、AI倫理とガバナンスの新しいパラダイムを作成する。 As large-scale AI systems proliferate, ensuring compliance with data privacy laws such as the General Data Protection Regulation (GDPR) has become critical. This paper introduces Brain Surgery, a transformative methodology for making every local AI model GDPR-ready by enabling real-time privacy management and targeted unlearning. Building on advanced techniques such as Embedding-Corrupted Prompts (ECO Prompts), blockchain-based privacy management, and privacy-aware continual learning, Brain Surgery provides a modular solution that can be deployed across various AI architectures. This tool not only ensures compliance with privacy regulations but also empowers users to define their own privacy limits, creating a new paradigm in AI ethics and governance.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# パッチランク付け: ローカルパッチのランク付けを学習した効率の良いCLIP Patch Ranking: Efficient CLIP by Learning to Rank Local Patches ( http://arxiv.org/abs/2409.14607v1 ) ライセンス: Link先を確認	Cheng-En Wu, Jinhong Lin, Yu Hen Hu, Pedro Morgado,	(参考訳) CLIPのような対照的な画像テキスト事前学習モデルは、下流タスクに顕著な適応性を示している。しかし、ViT(Vision Transformer)のバックボーンの計算能力が高いため、問題に直面している。 ViT効率を向上する現在の戦略はパッチトークンのプルーニングに重点を置いているが、CLIPのマルチモーダルな性質に対処し、最大パフォーマンスのためにトークンの最適なサブセットを特定することには不足している。そこで我々は「黄金ランキング」を確立するための欲求探索手法を提案し、このランキングを推定するために特別に訓練された軽量な予測器を導入する。トークンプルーニングによるパフォーマンス劣化を補うため、学習可能な視覚トークンを組み込み、モデルの性能を回復し、潜在的に向上させる。本研究は,CLIPモデルのViTバックボーン内でのプルーニングトークンの包括的,体系的な調査である。フレームワークを通じて、CLIPのViTのパッチトークンの40%を削減しましたが、7つのデータセットの平均精度損失は0.3でした。本研究は,より計算効率のよいマルチモーダルモデルを構築する上で,その性能を犠牲にすることなく基礎を築き,先進的な視覚言語モデルの適用における重要な課題に対処するものである。 Contrastive image-text pre-trained models such as CLIP have shown remarkable adaptability to downstream tasks. However, they face challenges due to the high computational requirements of the Vision Transformer (ViT) backbone. Current strategies to boost ViT efficiency focus on pruning patch tokens but fall short in addressing the multimodal nature of CLIP and identifying the optimal subset of tokens for maximum performance. To address this, we propose greedy search methods to establish a "Golden Ranking" and introduce a lightweight predictor specifically trained to approximate this Ranking. To compensate for any performance degradation resulting from token pruning, we incorporate learnable visual tokens that aid in restoring and potentially enhancing the model's performance. Our work presents a comprehensive and systematic investigation of pruning tokens within the ViT backbone of CLIP models. Through our framework, we successfully reduced 40% of patch tokens in CLIP's ViT while only suffering a minimal average accuracy loss of 0.3 across seven datasets. Our study lays the groundwork for building more computationally efficient multimodal models without sacrificing their performance, addressing a key challenge in the application of advanced vision-language models.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# Nirjas: ソースコードからメタデータを抽出するオープンソースフレームワーク Nirjas: An open source framework for extracting metadata from the source code ( http://arxiv.org/abs/2409.14609v1 ) ライセンス: Link先を確認	Ayush Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra,	(参考訳) メタデータとコメントはどんなソフトウェア開発プロセスにおいても重要な要素です。本稿では,ソースコードのメタデータやコメントが,ソフトウェアを理解する上で重要な役割を担っているかを説明する。我々はPythonベースのオープンソースフレームワークであるNirjasを紹介し、構造化された方法でメタデータを抽出するのに役立つ。様々な構文、型、広く受け入れられている規約は、異なるプログラミング言語のソースファイルにコメントを追加するために存在する。エッジケースは抽出時にノイズを発生させ、メタデータを正確に取得するためにRegexを使用します。非Regex法は結果を与えるが、しばしば精度とノイズの分離を見逃す。 Nirjasはまた、異なるタイプのコメント、ソースコードを分離し、行番号、ファイル名、使用される言語、合計SLOCなど、それらのコメントの詳細を提供する。 NirjasはスタンドアロンのPythonフレームワーク/ライブラリで、ソースまたはpip(Pythonパッケージインストーラ)経由で簡単にインストールできる。 Nirjasは最初、Google Summer of Codeプロジェクトの一部として開発され、現在はFOSSology組織の下で開発、維持されている。 Metadata and comments are critical elements of any software development process. In this paper, we explain how metadata and comments in source code can play an essential role in comprehending software. We introduce a Python-based open-source framework, Nirjas, which helps in extracting this metadata in a structured manner. Various syntaxes, types, and widely accepted conventions exist for adding comments in source files of different programming languages. Edge cases can create noise in extraction, for which we use Regex to accurately retrieve metadata. Non-Regex methods can give results but often miss accuracy and noise separation. Nirjas also separates different types of comments, source code, and provides details about those comments, such as line number, file name, language used, total SLOC, etc. Nirjas is a standalone Python framework/library and can be easily installed via source or pip (the Python package installer). Nirjas was initially created as part of a Google Summer of Code project and is currently developed and maintained under the FOSSology organization.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# エンジンバグの補修に関する実証的研究 An Empirical Study of Refactoring Engine Bugs ( http://arxiv.org/abs/2409.14610v1 ) ライセンス: Link先を確認	Haibo Wang, Zhuolin Xu, Huaien Zhang, Nikolaos Tsantalis, Shin Hwei Tan,	(参考訳) リファクタリングはソフトウェア開発において重要なプロセスであり、外部の振舞いを保ちながらコードの内部構造を改善することを目的としています。リファクタリングエンジンは、現代の統合開発環境(IDE)の不可欠なコンポーネントであり、このプロセスを自動化または半自動化することで、コードの可読性を高め、複雑さを減らし、ソフトウェア製品の保守性を向上させることができる。従来のソフトウェアシステムと同じように、リファクタリングエンジンは誤ったリファクタリングプログラムを生成し、予期せぬ振る舞いやクラッシュを引き起こします。本稿では,3つの一般的なリファクタリングエンジン(Eclipse,IntelliJ IDEA,Netbeans)で発生するバグを分析することで,エンジンのバグをリファクタリングする最初の体系的な研究について述べる。これらのバグは, リファクタリングタイプ, 症状, 根本原因, トリガー条件によって分析した。 12の結果を得て、バグ検出とデバッグのリファクタリングに関する一連の貴重なガイドラインを公開しました。さらに,これらのリファクタリングエンジンの最新バージョンでは130の新たなバグが報告された。私たちが提出した21のバグのうち、10のバグが開発者によって確認され、7つのバグはすでに修正されています。 Refactoring is a critical process in software development, aiming at improving the internal structure of code while preserving its external behavior. Refactoring engines are integral components of modern Integrated Development Environments (IDEs) and can automate or semi-automate this process to enhance code readability, reduce complexity, and improve the maintainability of software products. Like traditional software systems, refactoring engines can generate incorrect refactored programs, resulting in unexpected behaviors or even crashes. In this paper, we present the first systematic study of refactoring engine bugs by analyzing bugs arising in three popular refactoring engines (i.e., Eclipse, IntelliJ IDEA, and Netbeans). We analyzed these bugs according to their refactoring types, symptoms, root causes, and triggering conditions. We obtained 12 findings and provided a series of valuable guidelines for future work on refactoring bug detection and debugging. Furthermore, our transferability study revealed 130 new bugs in the latest version of those refactoring engines. Among the 21 bugs we submitted, 10 bugs are confirmed by their developers, and seven of them have already been fixed.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# イベントベースビジョンのためのエッジインフォームドコントラスト最大化の秘密 Secrets of Edge-Informed Contrast Maximization for Event-Based Vision ( http://arxiv.org/abs/2409.14611v1 ) ライセンス: Link先を確認	Pritam P. Karmokar, Quan H. Nguyen, William J. Beksi,	(参考訳) イベントカメラは、高速非同期イベントの形で画像平面内の強度勾配(エッジ)の動きをキャプチャする。 2Dヒストグラムに蓄積すると、これらの事象は動き中のエッジのオーバーレイを描写し、結果として生成するエッジの空間構造を隠蔽する。コントラスト最大化(CM)は、この効果を逆転させ、イベントの運動軌跡を推定することにより、運動強度勾配に類似した鋭い空間構造を生成する最適化フレームワークである。それでも、CMはいまだ未調査の分野であり、改善の道もある。本稿では,CMをユニモーダル(イベントのみ)からバイモーダル(イベントとエッジ)に拡張する新しいハイブリッドアプローチを提案する。我々は、基準時間に応じて、最適に歪んだイベントは、その時点で動くエッジと整合した鋭い勾配を生み出すというアンダーピンニングの概念を活用する。具体的には、CMを支援するための相関に基づく目的を定式化し、マルチスケールおよびマルチ参照技術の導入に関する重要な洞察を提供する。さらに、エッジインフォームドCM法では、優れたシャープネススコアが得られ、MVSEC, DSEC, ECDデータセット上で、新しい最先端のイベント光フローベンチマークが確立される。 Event cameras capture the motion of intensity gradients (edges) in the image plane in the form of rapid asynchronous events. When accumulated in 2D histograms, these events depict overlays of the edges in motion, consequently obscuring the spatial structure of the generating edges. Contrast maximization (CM) is an optimization framework that can reverse this effect and produce sharp spatial structures that resemble the moving intensity gradients by estimating the motion trajectories of the events. Nonetheless, CM is still an underexplored area of research with avenues for improvement. In this paper, we propose a novel hybrid approach that extends CM from uni-modal (events only) to bi-modal (events and edges). We leverage the underpinning concept that, given a reference time, optimally warped events produce sharp gradients consistent with the moving edge at that time. Specifically, we formalize a correlation-based objective to aid CM and provide key insights into the incorporation of multiscale and multireference techniques. Moreover, our edge-informed CM method yields superior sharpness scores and establishes new state-of-the-art event optical flow benchmarks on the MVSEC, DSEC, and ECD datasets.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# Erbium-170ドープイットリウムオルトシリケートにおけるトリプライ共鳴マイクロ波と光変換 Triply Resonant Microwave to Optical Conversion in Erbium-170 Doped Yttrium Orthosilicate ( http://arxiv.org/abs/2409.14612v1 ) ライセンス: Link先を確認	Gavin G. G. King, Luke S. Trainor, Jevon J. Longdell,	(参考訳) ミリケルビン温度におけるFabry-P'erot共振器において, 等方的に精製したエルビウムドープイットリウムオルソシリケート中のマイクロ波-光変換を報告する。これは、高温で行った研究や、エルビウムのドーパントに対する天然同位体比から続く。これらの以前の調査では、高い効率性は中程度の強いマイクロ波パワーにのみ見られた。超微細構造を持ち、望ましくない背景光吸収を与える不必要なエルビウム-167を除去し、低い温度でこの問題を取り除いた。出力を計測できる最小の入力パワーに達すると、マイクロ波電力が減少するので、効率は依然として上昇している。 2,\times10^{-6}$の効率が観察され,光共振器の周波数安定性の向上やエルビウムスピンの熱処理の改善など,潜在的な改善が議論された。 We report microwave to optical upconversion in isotopically purified erbium-doped yttrium orthosilicate in a Fabry-P\'erot resonator at millikelvin temperatures. This follows on from investigations made at higher temperatures and with natural isotopic ratios for the erbium dopants. In these previous investigations the highest efficiency was seen only for moderately strong microwave powers. The removal of the unwanted erbium-167 which has hyperfine structure and provides unwanted background optical absorption, and the lower temperatures has removed this problem. We now see efficiencies still increasing as the microwave power is decreased when we reach the smallest input powers for which we could measure an output. Efficiencies of $2\times10^{-6}$ were observed and we discuss potential improvements, including better optical cavity frequency stability and better thermalisation of the erbium spins.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# 高次元ランダム可逆回路の高速混合 Faster Mixing of Higher-Dimensional Random Reversible Circuits ( http://arxiv.org/abs/2409.14614v1 ) ライセンス: Link先を確認	William Gay, William He, Nicholas Kocurek,	(参考訳) 我々は、$\{\pm1\}^n$ の置換として、ランダム可逆回路の近似 $k$-wise 独立性の研究を継続する。我々の主な成果は、深さに依存するサブ線形-in-$n$のランダム可逆回路の自然なクラスを初めて構築することである。我々の構築は、実用的な暗号の考慮によって動機付けられており、DESやAESといった実用的なブロック暗号の設計に着想を得ている。 1次元格子上のゲートアーキテクチャで構築されたHeとO'Donnell [HO24] の以前の構成は、深さへの固有の線形-in-$n$依存に悩まされていた。我々の回路モデルの主な特徴は、高次元格子上に構築されたゲートアーキテクチャである。 We continue the study of the approximate $k$-wise independence of random reversible circuits as permutations of $\{\pm1\}^n$. Our main result is the first construction of a natural class of random reversible circuits with a sublinear-in-$n$ dependence on depth. Our construction is motivated by considerations in practical cryptography and is somewhat inspired by the design of practical block ciphers, such as DES and AES. Previous constructions of He and O'Donnell [HO24], which were built with gate architectures on one-dimensional lattices, suffered from an inherent linear-in-$n$ dependence on depth. The main novelty of our circuit model is a gate architecture built on higher-dimensional lattices.	翻訳日:2024-11-06 21:57:16 公開日:2024-09-22
# タンパク質-マンバ:タンパク質機能予測のための生物学的マンバモデル Protein-Mamba: Biological Mamba Models for Protein Function Prediction ( http://arxiv.org/abs/2409.14617v1 ) ライセンス: Link先を確認	Bohao Xu, Yingzhou Lu, Yoshitaka Inoue, Namkyeong Lee, Tianfan Fu, Jintai Chen,	(参考訳) タンパク質機能予測は、薬物発見において重要な課題であり、効果的で安全な治療薬の開発に大きな影響を及ぼす。従来の機械学習モデルは、タンパク質機能の予測に固有の複雑さと多様性に苦しむことが多く、より洗練されたアプローチを必要とする。本研究では,タンパク質機能予測を改善するために,自己教師付き学習と微調整の両方を活用する新しい2段階モデルであるProtein-Mambaを紹介する。事前トレーニングの段階では、モデルが大規模でラベル付けされていないデータセットから一般的な化学構造や関係をキャプチャし、微調整の段階では特定のラベル付きデータセットを使用してこれらの洞察を洗練し、予測性能が向上する。我々の広範囲な実験により、タンパク質-マンバは、さまざまなタンパク質機能データセットにまたがる最先端のいくつかの手法と比較して、競争力を発揮することが示された。このモデルがラベル付きデータとラベル付きデータの両方を効果的に活用する能力は、タンパク質機能予測の進歩における自己教師付き学習の可能性を強調し、薬物発見の今後の研究に有望な方向を提供する。 Protein function prediction is a pivotal task in drug discovery, significantly impacting the development of effective and safe therapeutics. Traditional machine learning models often struggle with the complexity and variability inherent in predicting protein functions, necessitating more sophisticated approaches. In this work, we introduce Protein-Mamba, a novel two-stage model that leverages both self-supervised learning and fine-tuning to improve protein function prediction. The pre-training stage allows the model to capture general chemical structures and relationships from large, unlabeled datasets, while the fine-tuning stage refines these insights using specific labeled datasets, resulting in superior prediction performance. Our extensive experiments demonstrate that Protein-Mamba achieves competitive performance, compared with a couple of state-of-the-art methods across a range of protein function datasets. This model's ability to effectively utilize both unlabeled and labeled data highlights the potential of self-supervised learning in advancing protein function prediction and offers a promising direction for future research in drug discovery.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# LazyからRichへ:Deep Linear Networksにおける実践的な学習ダイナミクス From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks ( http://arxiv.org/abs/2409.14623v1 ) ライセンス: Link先を確認	Clémentine C. J. Dominé, Nicolas Anguita, Alexandra M. Proca, Lukas Braun, Daniel Kunin, Pedro A. M. Mediano, Andrew M. Saxe,	(参考訳) 生物学的および人工ニューラルネットワークは、複雑なタスクを実行できる内部表現を開発する。人工ネットワークでは、これらのモデルの有効性は、データセット、アーキテクチャ、初期化戦略、最適化アルゴリズム間の相互作用に影響されるタスク固有表現を構築する能力に依存している。以前の研究では、異なる初期化によって、表現が静的な遅延状態にあるネットワークや、表現が動的に進化するリッチ/フィーチャーな学習体制のいずれかにネットワークを配置できることが強調されていた。本稿では,階層間の重みの相対スケールによって定義されるラムダ均衡初期化の正確な解を導出し,初期化が深層線形ニューラルネットワークの学習力学にどのように影響するかを検討する。これらの解は、豊かな状態から遅延状態までのスペクトルを横断する表現とニューラル・タンジェント・カーネルの進化を捉えている。本研究は, 重み初期化が学習体制に与える影響に関する理論的理解を深め, 継続学習, 逆学習, 転帰学習に影響を及ぼす。 Biological and artificial neural networks develop internal representations that enable them to perform complex tasks. In artificial networks, the effectiveness of these models relies on their ability to build task specific representation, a process influenced by interactions among datasets, architectures, initialization strategies, and optimization algorithms. Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically. Here, we examine how initialization influences learning dynamics in deep linear neural networks, deriving exact solutions for lambda-balanced initializations-defined by the relative scale of weights across layers. These solutions capture the evolution of representations and the Neural Tangent Kernel across the spectrum from the rich to the lazy regimes. Our findings deepen the theoretical understanding of the impact of weight initialization on learning regimes, with implications for continual learning, reversal learning, and transfer learning, relevant to both neuroscience and practical applications.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# 2Qubit Clifford群の部分群の分類 Classification of the Subgroups of the Two-Qubit Clifford Group ( http://arxiv.org/abs/2409.14624v1 ) ライセンス: Link先を確認	Eric Kubischta, Ian Teixeira,	(参考訳) 2量子パウリ群を含む2量子クリフォード群の56個の部分群の完全分類を行う。我々は,これらのグループに対して,量子情報コミュニティに慣れ親しんだゲートを用いたジェネレータを提供し,GAPが提供するグループライブラリに対してこれらのグループを参照する。また、2量子クリフォード階層の上位の群のいくつかの族を列挙する。 We perform a complete classification of all 56 subgroups of the two-qubit Clifford group containing the two-qubit Pauli group. We provide generators for these groups using gates familiar to the quantum information community and we reference these groups against the group libraries provided in GAP. We also list several families of groups in higher levels of the two-qubit Clifford hierarchy.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# SOS: オブジェクトプライオリティを用いたオープンワールドインスタンスセグメンテーションのためのセグメンテーションオブジェクトシステム SOS: Segment Object System for Open-World Instance Segmentation With Object Priors ( http://arxiv.org/abs/2409.14627v1 ) ライセンス: Link先を確認	Christian Wilms, Tim Rolff, Maris Hillemann, Robert Johanson, Simone Frintrop,	(参考訳) 本研究では,任意の未知のオブジェクトを画像に分割するタスクであるOpen-World Instance Segmentation (OWIS) を提案する。我々のセグメンテーションオブジェクトシステム(SOS)は、背景検出をしばしば生成する最先端システムの一般化能力と低い精度に明示的に対応している。この目的のために,基礎モデルSAMに基づいて高品質な擬似アノテーションを生成する。我々は、SAMのプロンプトを生成するために、様々なオブジェクト先行を徹底的に研究し、基礎モデルをオブジェクトに明示的にフォーカスする。最強対象は自己監督型視覚変換器の自己注意マップで, SAMの促進に利用した。最後に、SAMからの後処理セグメントは、標準的なインスタンスセグメンテーションシステムをトレーニングするために擬似アノテーションとして使用される。提案手法はCOCO, LVIS, ADE20kデータセットに対して強力な一般化能力を示し, 最先端技術と比較して81.6%の精度向上を実現している。ソースコードは、https://github.com/chwilms/SOSで入手できる。 We propose an approach for Open-World Instance Segmentation (OWIS), a task that aims to segment arbitrary unknown objects in images by generalizing from a limited set of annotated object classes during training. Our Segment Object System (SOS) explicitly addresses the generalization ability and the low precision of state-of-the-art systems, which often generate background detections. To this end, we generate high-quality pseudo annotations based on the foundation model SAM. We thoroughly study various object priors to generate prompts for SAM, explicitly focusing the foundation model on objects. The strongest object priors were obtained by self-attention maps from self-supervised Vision Transformers, which we utilize for prompting SAM. Finally, the post-processed segments from SAM are used as pseudo annotations to train a standard instance segmentation system. Our approach shows strong generalization capabilities on COCO, LVIS, and ADE20k datasets and improves on the precision by up to 81.6% compared to the state-of-the-art. Source code is available at: https://github.com/chwilms/SOS	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# ニューラルモデルガイドフィールドワークは可能か? : 形態的インフレクションのケーススタディ Can a Neural Model Guide Fieldwork? A Case Study on Morphological Inflection ( http://arxiv.org/abs/2409.14628v1 ) ライセンス: Link先を確認	Aso Mahmudi, Borja Herce, Demian Inostroza Amestica, Andreas Scherbakov, Eduard Hovy, Ekaterina Vylomova,	(参考訳) 言語学のフィールドワークは、言語の文書化と保存において重要な要素である。しかし、それは長く、徹底的で、時間を要するプロセスです。本稿では,言語学者をフィールドワーク中に指導し,言語学者と話者の相互作用のダイナミクスを説明する新しいモデルを提案する。本稿では,モルフォロジーデータを得るための様々なサンプリング戦略の効率を評価し,モルフォロジー構造を一般化する上での最先端のニューラルネットワークの有効性を評価する新しいフレームワークを提案する。実験では,(1)パラダイムテーブルのセル間の一様サンプリングによる注釈データの多様性の向上,(2)アノテーション中に信頼性のある予測を提供することで,肯定的な相互作用を高めるためのガイドとしてモデル信頼度を用いた。 Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# PPRM変換によるNEQR量子回路のゲート最適化 Gate Optimization of NEQR Quantum Circuits via PPRM Transformation ( http://arxiv.org/abs/2409.14629v1 ) ライセンス: Link先を確認	Shahab Iranmanesh, Hossein Aghababa, Kazim Fouladi,	(参考訳) 量子画像表現(QIR)は、画像中の多数のピクセルが量子ゲートや量子ビットの必要性を高めるため、量子画像処理(QIP)において鍵となる課題である。しかし、現在の量子システムは、実行時の複雑さと利用可能な量子ビットの制限に直面している。本研究の目的は, 新規拡張量子表現(NEQR)方式の量子回路を, 排他的論理式(ESOP)を正極性リード・ミュラー(PPRM)同値に変換することで圧縮することである。 m制御量子ビット(m \rightarrow \infty$)を持つNEQR回路では、多重制御NOTゲートの分解に応じて、実行時複雑性の2つの例、指数関数と線形回路が考慮されている。非線形回帰を用いて、提案した変換は指数複雑性を$O(2^m)$から$O(1.5^m)$に減らし、圧縮比が100%に近づくと推定される。線形複雑性では、圧縮比が52%に近づき、変換時間は半減すると推定される。 6つの256$\times$256画像に対するテストでは、指数的ケースでは平均105.5倍、線形ケースでは2.4倍、平均圧縮比は99.05%、58.91%である。 Quantum image representation (QIR) is a key challenge in quantum image processing (QIP) due to the large number of pixels in images, which increases the need for quantum gates and qubits. However, current quantum systems face limitations in run-time complexity and available qubits. This work aims to compress the quantum circuits of the Novel Enhanced Quantum Representation (NEQR) scheme by transforming their Exclusive-Or Sum-of-Products (ESOP) expressions into Positive Polarity Reed-Muller (PPRM) equivalents without adding ancillary qubits. Two cases of run-time complexity, exponential and linear, are considered for NEQR circuits with m controlling qubits ($m \rightarrow \infty$), depending on the decomposition of multi-controlled NOT gates. Using nonlinear regression, the proposed transformation is estimated to reduce the exponential complexity from $O(2^m)$ to $O(1.5^m)$, with a compression ratio approaching 100%. For linear complexity, the transformation is estimated to halve the run-time, with a compression ratio approaching 52%. Tests on six 256$\times$256 images show average reductions of 105.5 times for exponential cases and 2.4 times for linear cases, with average compression ratios of 99.05% and 58.91%, respectively.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# EQ-CBM:エネルギーモデルと量子ベクトルを用いた確率論的概念ボトルネック EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors ( http://arxiv.org/abs/2409.14630v1 ) ライセンス: Link先を確認	Sangwon Kim, Dasom Ahn, Byoung Chul Ko, In-su Jang, Kwang-Ju Kim,	(参考訳) 信頼性の高いAIシステムの需要により、ディープニューラルネットワークの解釈の必要性が高まっている。概念ボトルネックモデル(CBM)は、人間の理解可能な概念を活用して解釈可能性を高める効果的なアプローチとして注目されている。しかし、既存のCBMは、決定論的概念の符号化と一貫性のない概念への依存によって問題に直面し、不正確な結果となった。エネルギーベースモデル(EBM)と量子化概念アクティベーションベクトル(qCAV)を用いた確率論的概念符号化によりCBMを強化する新しいフレームワークであるEQ-CBMを提案する。 EQ-CBMは、不確実性を効果的に捕捉し、予測信頼性と精度を向上させる。提案手法は,qCAVを用いて,概念エンコーディング中の均質ベクトルを選択し,より決定的なタスク性能を実現し,高いレベルの人的介入を容易にする。ベンチマークデータセットを用いた実証実験の結果,提案手法は概念とタスクの正確性の両方において最先端であることがわかった。 The demand for reliable AI systems has intensified the need for interpretable deep neural networks. Concept bottleneck models (CBMs) have gained attention as an effective approach by leveraging human-understandable concepts to enhance interpretability. However, existing CBMs face challenges due to deterministic concept encoding and reliance on inconsistent concepts, leading to inaccuracies. We propose EQ-CBM, a novel framework that enhances CBMs through probabilistic concept encoding using energy-based models (EBMs) with quantized concept activation vectors (qCAVs). EQ-CBM effectively captures uncertainties, thereby improving prediction reliability and accuracy. By employing qCAVs, our method selects homogeneous vectors during concept encoding, enabling more decisive task performance and facilitating higher levels of human intervention. Empirical results using benchmark datasets demonstrate that our approach outperforms the state-of-the-art in both concept and task accuracy.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# 誤差緩和型多層量子ルーティング Error-Mitigated Multi-Layer Quantum Routing ( http://arxiv.org/abs/2409.14632v1 ) ライセンス: Link先を確認	Wenbo Shi, Neel Kanth Kundu, Robert Malaney,	(参考訳) 現在の量子デバイスには多くの制限があるため、量子エラー緩和法は近い将来、実用的な量子アプリケーションを実現するための潜在的な解決策となる。 Zero-Noise Extrapolation (ZNE) と Clifford Data Regression (CDR) は、2つの有望な量子エラー軽減手法である。これら2つの手法の特性に基づいて,外挿型CDR (eCDR) と呼ばれる新しい手法を提案する。本手法をベンチマークするために、eCDRを量子アプリケーション、特に多層量子ルーティングに組み込む。量子ルータは、ある入力経路から複数の出力経路の量子重ね合わせへ量子信号を誘導し、将来の量子ネットワークの重要な要素とみなされる。多層量子ルータは、経路のさらなる重ね合わせを可能にすることにより、量子ネットワークのスケーラビリティを拡張する。超伝導量子デバイス上に実装された多層量子ルータの性能をZNE, CDR, eCDR法で評価した。実験の結果,新しいeCDR法は2層量子ルータのZNEとCDRを著しく上回ることがわかった。私たちの研究は、既存のメソッドの異なるコンポーネントから構築された新しい緩和メソッドがどのようにして、コアアプリケーションを考慮して設計され、大幅なパフォーマンス向上をもたらすかを強調しています。 Due to the numerous limitations of current quantum devices, quantum error mitigation methods become potential solutions for realizing practical quantum applications in the near term. Zero-Noise Extrapolation (ZNE) and Clifford Data Regression (CDR) are two promising quantum error mitigation methods. Based on the characteristics of these two methods, we propose a new method named extrapolated CDR (eCDR). To benchmark our method, we embed eCDR into a quantum application, specifically multi-layer quantum routing. Quantum routers direct a quantum signal from one input path to a quantum superposition of multiple output paths and are considered important elements of future quantum networks. Multi-layer quantum routers extend the scalability of quantum networks by allowing for further superposition of paths. We benchmark the performance of multi-layer quantum routers implemented on current superconducting quantum devices instantiated with the ZNE, CDR, and eCDR methods. Our experimental results show that the new eCDR method significantly outperforms ZNE and CDR for the 2-layer quantum router. Our work highlights how new mitigation methods built from different components of pre-existing methods, and designed with a core application in mind, can lead to significant performance enhancements.	翻訳日:2024-11-06 21:45:58 公開日:2024-09-22
# インターネット・オブ・メディカルモノを用いた終末期疾患の予測と検出 Prediction and Detection of Terminal Diseases Using Internet of Medical Things: A Review ( http://arxiv.org/abs/2410.00034v1 ) ライセンス: Link先を確認	Akeem Temitope Otapo, Alice Othmani, Ghazaleh Khodabandelou, Zuheng Ming,	(参考訳) 機械学習(ML)とディープラーニング(DL)技術による医療における人工知能(AI)と医療物のインターネット(IoMT)の統合は、慢性疾患の予測と診断を進歩させた。 XGBoost、ランダムフォレスト、CNN、LSTM RNNといったAI駆動型モデルは、Kaggle、UCI、民間機関、リアルタイムIoMTソースなどのプラットフォームからのデータセットを使用して、心臓病、慢性腎臓病(CKD)、アルツハイマー病、肺がんの予測において98%以上の精度を達成した。しかし、データ品質、患者人口、異なる病院や研究資料からのフォーマットの変動により、課題は継続する。 IoMTデータは巨大で異種であり、患者のプライバシを保護するための相互運用性とセキュリティを確保するための複雑さが増している。 AIモデルはオーバーフィットに苦しむことが多く、制御された環境ではうまく機能するが、実際の臨床環境では効果的に機能しない。さらに、特に認知症、脳卒中、がんなどの稀な疾患に対する多疾患シナリオは未解決のままである。今後の研究は、データ品質と相互運用性を改善するために、データの標準化と高度な前処理技術に焦点を当てるべきである。トランスファーラーニングとアンサンブル法は,臨床現場におけるモデルの一般化性向上に不可欠である。また, 疾患の相互作用の探索や, 慢性疾患交差点の予測モデルの開発も必要である。 IoMTシステムにフェデレーション学習、ブロックチェーン、差分プライバシーを統合するための標準化されたフレームワークとオープンソースツールを作成することで、堅牢なデータプライバシとセキュリティも確保できる。 The integration of Artificial Intelligence (AI) and the Internet of Medical Things (IoMT) in healthcare, through Machine Learning (ML) and Deep Learning (DL) techniques, has advanced the prediction and diagnosis of chronic diseases. AI-driven models such as XGBoost, Random Forest, CNNs, and LSTM RNNs have achieved over 98\% accuracy in predicting heart disease, chronic kidney disease (CKD), Alzheimer's disease, and lung cancer, using datasets from platforms like Kaggle, UCI, private institutions, and real-time IoMT sources. However, challenges persist due to variations in data quality, patient demographics, and formats from different hospitals and research sources. The incorporation of IoMT data, which is vast and heterogeneous, adds complexities in ensuring interoperability and security to protect patient privacy. AI models often struggle with overfitting, performing well in controlled environments but less effectively in real-world clinical settings. Moreover, multi-morbidity scenarios especially for rare diseases like dementia, stroke, and cancers remain insufficiently addressed. Future research should focus on data standardization and advanced preprocessing techniques to improve data quality and interoperability. Transfer learning and ensemble methods are crucial for improving model generalizability across clinical settings. Additionally, the exploration of disease interactions and the development of predictive models for chronic illness intersections is needed. Creating standardized frameworks and open-source tools for integrating federated learning, blockchain, and differential privacy into IoMT systems will also ensure robust data privacy and security.	翻訳日:2024-11-05 15:29:12 公開日:2024-09-22
# SAC-KG:ドメイン知識グラフのための熟練コンストラクタとしての大規模言語モデルの爆発 SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Knowledge Graphs ( http://arxiv.org/abs/2410.02811v1 ) ライセンス: Link先を確認	Hanzhu Chen, Xu Shen, Qitan Lv, Jie Wang, Xiaoqi Ni, Jieping Ye,	(参考訳) 知識グラフ(KG)は、正確で信頼性の高い知識の獲得が不可欠である専門分野において、知識集約的なタスクにおいて重要な役割を担っている。しかし、既存のKG構築手法は人間の介入に大きく依存しており、現実のシナリオにおける実践的適用性を著しく妨げている。この課題に対処するため、我々はSAC-KGと呼ばれる一般的なKG構築フレームワークを提案し、大規模言語モデル(LLM)をドメイン知識グラフのためのスキル自動コンストラクタとして活用する。 SAC-KGは、専門的で正確なマルチレベルKGを生成するために、LLMをドメインエキスパートとして効果的に扱う。具体的には、SAC-KGは、Generator、Verifier、Prunerの3つのコンポーネントで構成される。与えられたエンティティに対して、Generatorはその関係とテールを生のドメインコーパスから生成し、特殊な単一レベルKGを構築する。実験では、SAC-KGが100万以上のノードのスケールでドメインKGを自動構築し、89.32%の精度を達成し、既存のKG構築タスクと比べて20%以上精度が向上した。 Knowledge graphs (KGs) play a pivotal role in knowledge-intensive tasks across specialized domains, where the acquisition of precise and dependable knowledge is crucial. However, existing KG construction methods heavily rely on human intervention to attain qualified KGs, which severely hinders the practical applicability in real-world scenarios. To address this challenge, we propose a general KG construction framework, named SAC-KG, to exploit large language models (LLMs) as Skilled Automatic Constructors for domain Knowledge Graph. SAC-KG effectively involves LLMs as domain experts to generate specialized and precise multi-level KGs. Specifically, SAC-KG consists of three components: Generator, Verifier, and Pruner. For a given entity, Generator produces its relations and tails from raw domain corpora, to construct a specialized single-level KG. Verifier and Pruner then work together to ensure precision by correcting generation errors and determining whether newly produced tails require further iteration for the next-level KG.Experiments demonstrate that SAC-KG automatically constructs a domain KG at the scale of over one million nodes and achieves a precision of 89.32%, leading to a superior performance with over 20% increase in precision rate compared to existing state-of-the-art methods for the KG construction task.	翻訳日:2024-11-03 05:34:38 公開日:2024-09-22
# 浮動浮動小数点:柔軟な数列を持つ高精度な数表現 Floating-floating point: a highly accurate number representation with flexible Counting ranges ( http://arxiv.org/abs/2410.03692v1 ) ライセンス: Link先を確認	Itamar Cohen, Gil Einziger,	(参考訳) 効率の良い数表現は、連合学習、自然言語処理、ネットワーク計測ソリューションに不可欠である。タイミング、面積、電力の制約により、そのようなアプリケーションは狭いビット幅(例えば8ビット)の番号システムを使用する。広く使われている浮動小数点系は、カウント範囲と精度のトレードオフを示す。本稿では,浮動小数点数である浮動小数点(F2P)について紹介する。このような柔軟性は、選択したサブレンジに対する精度の向上と、大きなカウント範囲につながる。現状からF2Pに移行することで,ネットワーク計測精度とフェデレーション学習が向上することを示す。 Efficient number representation is essential for federated learning, natural language processing, and network measurement solutions. Due to timing, area, and power constraints, such applications use narrow bit-width (e.g., 8-bit) number systems. The widely used floating-point systems exhibit a trade-off between the counting range and accuracy. This paper introduces Floating-Floating-Point (F2P) - a floating point number that varies the partition between mantissa and exponent. Such flexibility leads to a large counting range combined with improved accuracy over a selected sub-range. Our evaluation demonstrates that moving to F2P from the state-of-the-art improves network measurement accuracy and federated learning.	翻訳日:2024-11-02 20:38:13 公開日:2024-09-22
# 一般ニューロンの線形独立と機能 Linear Independence of Generalized Neurons and Related Functions ( http://arxiv.org/abs/2410.03693v1 ) ライセンス: Link先を確認	Leyang Zhang,	(参考訳) ニューロンの線形独立性は、ニューラルネットワークの理論解析において重要な役割を果たす。具体的には、与えられたニューロン $H_1, ..., H_n: \bR^N \times \bR^d \to \bR$ は、次の問題に関心がある: if =\{H_1(\theta_1, \cdot), ..., H_n(\theta_n, \cdot)\}$ は、パラメータ $\theta_1, ..., \theta_n$ として線型独立である。これまでの研究は、一般的なスムーズな活性化機能のために、バイアスのない2層ニューロンの完全な特徴づけを与えている。本稿では、任意の層と幅を持つニューロンの問題を考察し、汎用的な解析活性化関数の単純かつ完全な特徴付けを与える。 The linear independence of neurons plays a significant role in theoretical analysis of neural networks. Specifically, given neurons $H_1, ..., H_n: \bR^N \times \bR^d \to \bR$, we are interested in the following question: when are $\{H_1(\theta_1, \cdot), ..., H_n(\theta_n, \cdot)\}$ are linearly independent as the parameters $\theta_1, ..., \theta_n$ of these functions vary over $\bR^N$. Previous works give a complete characterization of two-layer neurons without bias, for generic smooth activation functions. In this paper, we study the problem for neurons with arbitrary layers and widths, giving a simple but complete characterization for generic analytic activation functions.	翻訳日:2024-11-02 20:38:13 公開日:2024-09-22
# 複数の誤測定汚染物質の健康影響評価のためのダブル/デバイアス機械学習による因果推論 Causal Inference with Double/Debiased Machine Learning for Evaluating the Health Effects of Multiple Mismeasured Pollutants ( http://arxiv.org/abs/2410.07135v1 ) ライセンス: Link先を確認	Gang Xu, Xin Zhou, Molin Wang, Boya Zhang, Wenhao Jiang, Francine Laden, Helen H. Suh, Adam A. Szpiro, Donna Spiegelman, Zuoheng Wang,	(参考訳) 大気汚染とその成分を疫学研究で定量化する方法の1つは、個人の最も近いモニターを使用することである。この戦略は実際の個人曝露における潜在的な不正確性をもたらし、大気汚染とその成分の健康影響を推定するバイアスを導入し、特に相関性のある多汚染成分の因果効果を相関誤差で評価する。本稿では,他のPM2.5成分の存在下での1成分の因果効果の推定と推定を行い,測定誤差と相関性について考察する。線形回帰キャリブレーションモデルを用いて, 一般化された推定方程式を外的検証実験に適用し, 測定誤差を補正し, 主研究における利害効果を推定するために, 二重偏差機械学習(DML)アプローチを拡張した。回帰キャリブレーションを用いたDML推定器は一貫性があり,その漸近的分散を導出することを示した。シミュレーションにより、提案した推定器はバイアスを低減し、ほとんどのシミュレーション設定において名目カバレッジ確率を得た。本手法を用いて,PM2.5成分の認知機能に対する因果効果を評価し,測定誤差補正後の認知機能に対する負の因果効果を示す2つのPM2.5成分,Br,Mnを同定した。 One way to quantify exposure to air pollution and its constituents in epidemiologic studies is to use an individual's nearest monitor. This strategy results in potential inaccuracy in the actual personal exposure, introducing bias in estimating the health effects of air pollution and its constituents, especially when evaluating the causal effects of correlated multi-pollutant constituents measured with correlated error. This paper addresses estimation and inference for the causal effect of one constituent in the presence of other PM2.5 constituents, accounting for measurement error and correlations. We used a linear regression calibration model, fitted with generalized estimating equations in an external validation study, and extended a double/debiased machine learning (DML) approach to correct for measurement error and estimate the effect of interest in the main study. We demonstrated that the DML estimator with regression calibration is consistent and derived its asymptotic variance. Simulations showed that the proposed estimator reduced bias and attained nominal coverage probability across most simulation settings. We applied this method to assess the causal effects of PM2.5 constituents on cognitive function in the Nurses' Health Study and identified two PM2.5 constituents, Br and Mn, that showed a negative causal effect on cognitive function after measurement error correction.	翻訳日:2024-10-31 22:06:43 公開日:2024-09-22
# 融合脳結合グラフを用いた自閉症スペクトラム障害の診断と病因解析 Diagnosis and Pathogenic Analysis of Autism Spectrum Disorder Using Fused Brain Connection Graph ( http://arxiv.org/abs/2410.07138v1 ) ライセンス: Link先を確認	Lu Wei, Yi Huang, Guosheng Yin, Fode Zhang, Manxue Zhang, Bin Liu,	(参考訳) マルチモーダル磁気共鳴画像(MRI)データを用いた自閉症スペクトラム障害(ASD)の診断モデルを提案する。本手法は拡散テンソルイメージング(DTI)と機能MRI(fMRI)の脳接続データを統合し,グラフニューラルネットワーク(GNN)を融合グラフ分類に用いる。診断精度を向上させるために,クラス間マージンの最大化とクラス内マージンの最小化を行う損失関数を導入する。また, ネットワークノードの集中度, 計算度, サブグラフ, 固有ベクトル集中度を2次元融合脳グラフ上で解析し, ASDに関連付けられた病理領域を同定した。 2つの非パラメトリックテストは、ASD患者と健常者の間のこれらの中心性の統計的意義を評価する。以上の結果から, テスト間の整合性は明らかだが, 同定された領域は中心領域によって大きく異なっており, 生理的解釈の相違が示唆された。これらの知見は, ASDの神経生物学的基盤の理解を高め, 臨床診断のための新たな方向性を提供する。 We propose a model for diagnosing Autism spectrum disorder (ASD) using multimodal magnetic resonance imaging (MRI) data. Our approach integrates brain connectivity data from diffusion tensor imaging (DTI) and functional MRI (fMRI), employing graph neural networks (GNNs) for fused graph classification. To improve diagnostic accuracy, we introduce a loss function that maximizes inter-class and minimizes intra-class margins. We also analyze network node centrality, calculating degree, subgraph, and eigenvector centralities on a bimodal fused brain graph to identify pathological regions linked to ASD. Two non-parametric tests assess the statistical significance of these centralities between ASD patients and healthy controls. Our results reveal consistency between the tests, yet the identified regions differ significantly across centralities, suggesting distinct physiological interpretations. These findings enhance our understanding of ASD's neurobiological basis and offer new directions for clinical diagnosis.	翻訳日:2024-10-31 22:06:43 公開日:2024-09-22
# SARF:指数増分林による株式市場予測の強化 SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest ( http://arxiv.org/abs/2410.07143v1 ) ライセンス: Link先を確認	Saber Talazadeh, Dragan Perakovic,	(参考訳) 金融分野で難しい問題である株価トレンド予測には、急激なデータと関連する指標が含まれる。経験分析だけに頼れば、持続不可能で効果の低い結果が得られることが多い。機械学習研究者は、ランダムフォレストアルゴリズムの適用により、この文脈での予測が向上し、株価トレンドの予測において重要な補助的役割を果たすことを示した。本研究では、FinGPT生成AIモデルと従来のランダムフォレストモデルを用いた感情分析を統合することで、株式市場の予測に新たなアプローチを導入する。提案手法は、FinGPTが提供する財務感情の微妙な理解を活用して、株価予測の精度を最適化することを目的としている。本稿では,感情特徴をランダムフォレストフレームワークに組み込んだ「感性増強ランダムフォレスト(SARF)」という新たな方法論を提案する。実験の結果,SARFは従来のランダムフォレストモデルやLSTMモデルよりも平均精度が9.23%向上し,予測誤差が低かった。 Stock trend forecasting, a challenging problem in the financial domain, involves ex-tensive data and related indicators. Relying solely on empirical analysis often yields unsustainable and ineffective results. Machine learning researchers have demonstrated that the application of random forest algorithm can enhance predictions in this context, playing a crucial auxiliary role in forecasting stock trends. This study introduces a new approach to stock market prediction by integrating sentiment analysis using FinGPT generative AI model with the traditional Random Forest model. The proposed technique aims to optimize the accuracy of stock price forecasts by leveraging the nuanced understanding of financial sentiments provided by FinGPT. We present a new methodology called "Sentiment-Augmented Random Forest" (SARF), which in-corporates sentiment features into the Random Forest framework. Our experiments demonstrate that SARF outperforms conventional Random Forest and LSTM models with an average accuracy improvement of 9.23% and lower prediction errors in pre-dicting stock market movements.	翻訳日:2024-10-31 22:06:43 公開日:2024-09-22
# アウト・オブ・ディストリビューション検出のためのMargin-bounded Confidence Scores Margin-bounded Confidence Scores for Out-of-Distribution Detection ( http://arxiv.org/abs/2410.07185v1 ) ライセンス: Link先を確認	Lakpa D. Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal,	(参考訳) 自律運転や医用画像診断などの多くの重要な機械学習アプリケーションにおいて、アウト・オブ・ディストリビューション(OOD)サンプルの検出は、イン・ディストリビューション(ID)入力を正確に分類するのと同じくらい重要である。近年,OE(Outlier Exposure)に基づく手法は,補助外乱データを用いたモデル微調整によるOOD入力の検出に有望な結果を示している。しかし、以前のOEベースのアプローチのほとんどは、余分な外れ値サンプルの合成や、OODサンプル空間の多様化に重点を置いていた。本研究では,ID値とOOD値の差を大きくすることで,非自明なOOD検出問題に対処するMargin bounded Confidence Scores (MaCS) という手法を提案する。具体的には、IDと比較してOOD入力の高信頼度スコアをペナルティ化する補足制約付き正規化分類器の学習目標を増大させ、ID分類精度を維持しながらOOD検出性能を大幅に向上させる。画像分類タスクのための様々なベンチマークデータセットに対する広範囲な実験は、様々なベンチマーク指標上で、最先端(S.O.T.A)メソッドを著しく上回って提案手法の有効性を示す。コードはhttps://github.com/lakpa-tamang9/margin_oodで公開されている。 In many critical Machine Learning applications, such as autonomous driving and medical image diagnosis, the detection of out-of-distribution (OOD) samples is as crucial as accurately classifying in-distribution (ID) inputs. Recently Outlier Exposure (OE) based methods have shown promising results in detecting OOD inputs via model fine-tuning with auxiliary outlier data. However, most of the previous OE-based approaches emphasize more on synthesizing extra outlier samples or introducing regularization to diversify OOD sample space, which is rather unquantifiable in practice. In this work, we propose a novel and straightforward method called Margin bounded Confidence Scores (MaCS) to address the nontrivial OOD detection problem by enlarging the disparity between ID and OOD scores, which in turn makes the decision boundary more compact facilitating effective segregation with a simple threshold. Specifically, we augment the learning objective of an OE regularized classifier with a supplementary constraint, which penalizes high confidence scores for OOD inputs compared to that of ID and significantly enhances the OOD detection performance while maintaining the ID classification accuracy. Extensive experiments on various benchmark datasets for image classification tasks demonstrate the effectiveness of the proposed method by significantly outperforming state-of-the-art (S.O.T.A) methods on various benchmarking metrics. The code is publicly available at https://github.com/lakpa-tamang9/margin_ood	翻訳日:2024-10-31 21:46:48 公開日:2024-09-22
# 動物の磁気受容は量子的か?エネルギー分解限界の観点から Is animal magnetoreception quantum? A perspective from the energy resolution limit ( http://arxiv.org/abs/2410.07186v1 ) ライセンス: Link先を確認	I. K. Kominis, E. Gkoudinakis,	(参考訳) 超伝導量子干渉装置、光ポンピング、窒素空洞磁力計など、多くの磁気センサがエネルギー分解能の限界を満たすことが示されている。この制限は、センサの磁気感度が時間とともにエネルギーの積に変換されると、プランク定数$\hbar$によって下界されることを示している。この境界は、磁気センシングにおいて何が達成できるかに関する基本的な制限を意味する。ここでは、生物磁気センサ、特に動物が磁場を感知すると考えられる3つの磁気受容機構(ラジカル対、マグネタイト、マグネタイト)を探索する。これらのメカニズムがどの程度エネルギー分解能限界に近づいたかという問題に対処する。定量的レベルでは、エネルギー分解限界の有効性は、モデルに依存しない方法で磁気センシングの動作を知らせ、理論的モデルと、特に複雑な生物学的システムで必要とされる推定または測定されたパラメータ値の微妙な整合性チェックを提供することができることである。定性的レベルでは、エネルギー分解能が$\hbar$に近づくほど、より‘enquote{quantum} はセンサーとなる。これは磁気受容の量子生物学を理解するための代替ルートを提供する。また、改良の余地を定量化し、ネイチャーが達成したことを照明し、ネイチャーの磁気センシング性能を超えるバイオミメティクスセンサーの工学を刺激する。 A large number of magnetic sensors, like superconducting quantum interference devices, optical pumping and nitrogen vacancy magnetometers, were shown to satisfy the energy resolution limit. This limit states that the magnetic sensitivity of the sensor, when translated into a product of energy with time, is bounded below by Planck's constant, $\hbar$. This bound implies a fundamental limitation as to what can be achieved in magnetic sensing. Here we explore biological magnetometers, in particular three magnetoreception mechanisms thought to underly animals' geomagnetic field sensing: the radical-pair, the magnetite and the MagR mechanism. We address the question of how close these mechanisms approach the energy resolution limit. At the quantitative level, the utility of the energy resolution limit is that it informs the workings of magnetic sensing in model-independent ways, and thus can provide subtle consistency checks for theoretical models and estimated or measured parameter values, particularly needed in complex biological systems. At the qualitative level, the closer the energy resolution is to $\hbar$, the more \enquote{quantum} is the sensor. This offers an alternative route towards understanding the quantum biology of magnetoreception. It also quantifies the room for improvement, illuminating what Nature has achieved, and stimulating the engineering of biomimetic sensors exceeding Nature's magnetic sensing performance.	翻訳日:2024-10-31 21:46:48 公開日:2024-09-22
# 自発音声からのアルツハイマー病検出のクラス内変動に向けて Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech ( http://arxiv.org/abs/2409.16322v1 ) ライセンス: Link先を確認	Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng,	(参考訳) アルツハイマー病(AD)の検出は、ADと非ADの個人を識別するために機械学習分類モデルを使用する有望な研究領域として浮上している。従来の分類課題とは異なり、AD検出において、クラス内変異は重要な課題として認識され、ADを持つ個人は認知障害のスペクトルを示す。多くのAD検出タスクにはきめ細かいラベルがないため、単純化されたバイナリ分類は、クラス内差とインスタンスレベルの不均衡という2つの重要な側面を見落としてしまう可能性がある。前者は、認知機能の特定の変化を無視して、1つの診断ラベルに様々な障害度でADサンプルをマッピングするモデルを補完する。後者はモデルが過剰に表現された重大度レベルに偏っている。この研究はこれらの課題に対処するための初期の取り組みを示す。本稿では,ソフトターゲット蒸留 (SoTD) とインスタンスレベルの再分散 (InRe) の2つの新しい手法を提案する。 ADReSSとADReSSoデータセットの実験により,提案手法は検出精度を大幅に向上することが示された。さらに分析したところ、SoTDは複数のコンポーネントモデルの強みを効果的に活用し、InReはモデルの過度な適合を実質的に軽減していることがわかった。これらの結果は、より堅牢で信頼性の高いAD検出モデルを開発するための洞察を与える。 Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models to distinguish between individuals with AD and those without. Unlike conventional classification tasks, we identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments. Given that many AD detection tasks lack fine-grained labels, simplistic binary classification may overlook two crucial aspects: within-class differences and instance-level imbalance. The former compels the model to map AD samples with varying degrees of impairment to a single diagnostic label, disregarding certain changes in cognitive function. While the latter biases the model towards overrepresented severity levels. This work presents early efforts to address these challenges. We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively. Experiments on the ADReSS and ADReSSo datasets demonstrate that the proposed methods significantly improve detection accuracy. Further analysis reveals that SoTD effectively harnesses the strengths of multiple component models, while InRe substantially alleviates model over-fitting. These findings provide insights for developing more robust and reliable AD detection models.	翻訳日:2024-09-27 09:03:58 公開日:2024-09-22
# 都市環境マッピングにおける街路景観の被覆とバイアス Coverage and Bias of Street View Imagery in Mapping the Urban Environment ( http://arxiv.org/abs/2409.15386v1 ) ライセンス: Link先を確認	Zicheng Fan, Chen-Chieh Feng, Filip Biljecki,	(参考訳) ストリートビュー画像(SVI)は、都市研究において貴重なデータ形式として出現し、都市環境を地図化し、知覚する新しい方法を可能にしている。しかし、SVIの代表性、品質、信頼性に関する根本的な懸念は未解決のままであり、例えば、都市がそのようなデータによってどの程度の頻度で捉えられるか、データのギャップはバイアスをもたらす。空間データ品質と都市分析の交差点に位置するこの研究は、SVIの都市環境に関する特徴レベルカバレッジを推定する新しいワークフローを提案することによって、これらの懸念に対処する。ワークフローは、SVIとターゲット特徴の位置関係、および環境障害の影響を統合する。データ品質の領域をSVIに拡張し、網羅範囲を評価する指標システムを導入し、完全性と周波数次元に着目した。ロンドンをケーススタディとして、SVIが都市の特徴をカバーし表現する能力の潜在的なバイアスを特定するために、3つの実験が実施され、ファサードの構築に焦点が当てられている。この研究は、SVI評価における従来の空間データ品質指標の限界と、異なるデータ取得プラクティスの下でのSVIカバレッジのばらつきを強調している。 SVIのユニークなメタデータと水平視点を考慮に入れたテーラー的なアプローチも強調されている。この結果は、SVIが貴重な洞察を提供する一方で、パナセアではないことを示唆している。都市研究への応用には、信頼性の高い結果を保証するために、データカバレッジと特徴レベルの代表性を慎重に検討する必要がある。 Street View Imagery (SVI) has emerged as a valuable data form in urban studies, enabling new ways to map and sense urban environments. However, fundamental concerns regarding the representativeness, quality, and reliability of SVI remain underexplored, e.g.\ to what extent can cities be captured by such data and do data gaps result in bias. This research, positioned at the intersection of spatial data quality and urban analytics, addresses these concerns by proposing a novel workflow to estimate SVI's feature-level coverage on urban environment. The workflow integrates the positional relationships between SVI and target features, as well as the impact of environmental obstructions. Expanding the domain of data quality to SVI, we introduce an indicator system that evaluates the extent of coverage, focusing on the completeness and frequency dimensions. Using London as a case study, three experiments are conducted to identify potential biases in SVI's ability to cover and represent urban features, with a focus on building facades. The research highlights the limitations of traditional spatial data quality metrics in assessing SVI, and variability of SVI coverage under different data acquisition practices. Tailored approaches that consider the unique metadata and horizontal perspective of SVI are also underscored. The findings suggest that while SVI offers valuable insights, it is no panacea -- its application in urban research requires careful consideration of data coverage and feature-level representativeness to ensure reliable results.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-22
# Custodial and non-Custodial Wallets Custodial and Non-Custodial Wallets ( http://arxiv.org/abs/2409.15389v1 ) ライセンス: Link先を確認	Tony Seymour, Geoff Goodell,	(参考訳) 非カセット型ウォレット(Non-custodial wallet)は、暗号通貨ウォレットの一種で、所有者が秘密鍵を完全にコントロールし、保有するデジタル資産の管理と確保に責任を負う。取引所などの第三者が管理するカストディアルウォレットとは異なり、非カストディアルウォレットは、資金がエンドユーザーによってのみ管理されることを保証する。本研究は,カストディアルウォレットと非カストディアルウォレットの違いを特徴付けるとともに,その重要な特徴と関連するリスクについて検討する。 Non-custodial wallets are a type of cryptocurrency wallet wherein the owner has full control over the private keys and is solely responsible for managing and securing the digital assets that it contains. Unlike custodial wallets, which are managed by third parties, such as exchanges, non-custodial wallets ensure that funds are controlled exclusively by the end user. We characterise the difference between custodial and non-custodial wallets and examine their key features and related risks.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-22
# 不明瞭性と長時間粒子検出による量子決定性と完全性回復 Quantum determinism and completeness restored by indistinguishability and long-time particle detection ( http://arxiv.org/abs/2409.15390v1 ) ライセンス: Link先を確認	Patrick Navez, Henni Ouerdane,	(参考訳) 量子物理学における測定データは、同一粒子の識別不可能な性質を考慮して、統計的、マクロなプロセスの結果のみ厳密に解釈できると論じる。量子決定論(Quantum determinism)は、研究対象と相互作用する測定装置を1つの系として記述するために、完全に分岐した量子モデルを用いることで、原則として可能である。対照的に、ボルンの法則に依存するあらゆるアプローチは、測定中に相互作用する検出器と量子系の力学を区別する。本研究は, 単点信号に適用した計測仮定の有効性を批判的に解析する。実際、'individual'粒子の概念は、非識別性と、有効検出のための無限の相互作用時間を許容する散乱アプローチの両方が、本来あるべきものとみなすと不適切なものになるため、2つの連続した測定事象の分離を防止できる。この文脈では、測定データは多くの事象に関する統計の結果としてのみ理解されるべきである。情報源と検出器の内在的なノイズを考慮すると、シュリンガー・キャットとベルの実験では、ボルン・ルールが1つの粒子のレベルで放棄されると、リアリズム、局所性、因果関係が復元されることが示されている。量子物理学は、認識不能と長期検出のプロセスによって、根本的な確率的ではないと結論付けている。 We argue that measurement data in quantum physics can be rigorously interpreted only as a result of a statistical, macroscopic process, taking into account the indistinguishable character of identical particles. Quantum determinism is in principle possible on the condition that a fully-fledged quantum model is used to describe the measurement device in interaction with the studied object as one system. In contrast, any approach that relies on Born's rule discriminates the dynamics of a quantum system from that of the detector with which it interacts during measurement. In this work, we critically analyze the validity of this measurement postulate applied to single-event signals. In fact, the concept of ``individual'' particle becomes inadequate once both indistinguishability and a scattering approach allowing an unlimited interaction time for an effective detection, are considered as they should be, hence preventing the separability of two successive measurement events. In this context, measurement data should therefore be understood only as a result of statistics over many events. Accounting for the intrinsic noise of the sources and the detectors, we also show with the illustrative cases of the Schr\"odinger cat and the Bell experiment that once the Born rule is abandoned on the level of a single particle, realism, locality and causality are restored. We conclude that indiscernibility and long-time detection process make quantum physics not fundamentally probabilistic.	翻訳日:2024-09-26 13:20:54 公開日:2024-09-22
# 供給リスクに配慮した合金の発見と設計 Supply Risk-Aware Alloy Discovery and Design ( http://arxiv.org/abs/2409.15391v1 ) ライセンス: Link先を確認	Mrinalini Mulukutla, Robert Robinson, Danial Khatamsaz, Brent Vela, Nhu Vu, Raymundo Arróyave,	(参考訳) 材料設計はイノベーションの重要な要因であるが、材料やサプライチェーンに固有の技術的、経済的、環境的なリスクを見落としれば、持続不可能でリスクを伴う解決策につながる可能性がある。そこで本研究では,サプライチェーン・アウェア・デザイン・ストラテジーを素材開発プロセスに統合するリスク対応設計手法を提案する。このアプローチは、既存の言語モデルとテキスト分析を活用して、原料供給リスク指標を予測するための特殊なモデルを開発する。多目的・多制約設計空間を効率的にナビゲートするために,我々はBatch Bayesian Optimization (BBO) を用い,パレート最適高エントロピー合金 (HEA) の同定を可能にした。 MoNbTiVWシステムを用いたケーススタディでは, 設計プロセスにサプライリスクを組み込むことによる大きな影響を, 4つのシナリオで示した。性能と供給リスクの両方を最適化することにより、開発した合金が高性能であるだけでなく、持続可能かつ経済的に実現可能であることを保証する。この統合されたアプローチは、サステナビリティ、サプライチェーンのダイナミクス、包括的なライフサイクル分析をシームレスに検討する材料発見と設計が将来に向けて重要なステップである。 Materials design is a critical driver of innovation, yet overlooking the technological, economic, and environmental risks inherent in materials and their supply chains can lead to unsustainable and risk-prone solutions. To address this, we present a novel risk-aware design approach that integrates Supply-Chain Aware Design Strategies into the materials development process. This approach leverages existing language models and text analysis to develop a specialized model for predicting materials feedstock supply risk indices. To efficiently navigate the multi-objective, multi-constraint design space, we employ Batch Bayesian Optimization (BBO), enabling the identification of Pareto-optimal high entropy alloys (HEAs) that balance performance objectives with minimized supply risk. A case study using the MoNbTiVW system demonstrates the efficacy of our approach in four scenarios, highlighting the significant impact of incorporating supply risk into the design process. By optimizing for both performance and supply risk, we ensure that the developed alloys are not only high-performing but also sustainable and economically viable. This integrated approach represents a critical step towards a future where materials discovery and design seamlessly consider sustainability, supply chain dynamics, and comprehensive life cycle analysis.	翻訳日:2024-09-26 13:20:54 公開日:2024-09-22
# LLMを用いたマルチパス・テキスト・ビデオアライメントによる授業映像中の行動の局所化学習 Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment ( http://arxiv.org/abs/2409.16145v1 ) ライセンス: Link先を確認	Yuxiao Chen, Kai Li, Wentao Bao, Deep Patel, Yu Kong, Martin Renqiang Min, Dimitris N. Metaxas,	(参考訳) 注釈付き大規模トレーニングビデオの可用性が限られているため,プロシージャステップの時間的境界のローカライズは困難である。最近の研究は、コントラスト学習を通じてビデオセグメントとASRで書き起こされたナレーションテキスト間の相互アライメントを学習することに焦点を当てている。しかし、これらの手法はアライメントノイズ、すなわちビデオの教示課題とナレーションの信頼性の低いタイムスタンプに無関係なナレーションを考慮できない。これらの課題に対処するため、本研究では、新しいトレーニングフレームワークを提案する。手順理解とテキスト要約におけるLLM(Large Language Models)の強みを生かして,まずLLMを適用し,課題関連情報をフィルタリングし,課題関連手順(LLM-steps)をナレーションから要約する。 LLMステップとトレーニング用ビデオとの信頼性の高い擬似マッチングを生成するために,MPTVA(Multi-Pathway Text-Video Alignment)戦略を提案する。 1)ナレーションタイムスタンプを用いたステップナレーション・ビデオアライメント,(2)長期のセマンティックな類似性に基づく直接ステップ・ツー・ビデオアライメント,(3)一般的なビデオドメインから学んだ短期的な微粒なセマンティックな類似性に焦点を当てた直接ステップ・ツー・ビデオアライメントなどである。異なる経路からの結果は融合し、信頼できる擬似ステップビデオマッチングを生成する。提案手法を評価するため,様々なタスクや問題設定について広範な実験を行った。提案手法は, 3 つの下流タスクにおいて, 手順ステップグラウンド, ステップローカライゼーション, ナレーショングラウンドリングの5.9 %, 3.1 %, 2.8 % の最先端手法を超える。 Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent works focus on learning the cross-modal alignment between video segments and ASR-transcripted narration texts through contrastive learning. However, these methods fail to account for the alignment noise, i.e., irrelevant narrations to the instructional task in videos and unreliable timestamps in narrations. To address these challenges, this work proposes a novel training framework. Motivated by the strong capabilities of Large Language Models (LLMs) in procedure understanding and text summarization, we first apply an LLM to filter out task-irrelevant information and summarize task-related procedure steps (LLM-steps) from narrations. To further generate reliable pseudo-matching between the LLM-steps and the video for training, we propose the Multi-Pathway Text-Video Alignment (MPTVA) strategy. The key idea is to measure alignment between LLM-steps and videos via multiple pathways, including: (1) step-narration-video alignment using narration timestamps, (2) direct step-to-video alignment based on their long-term semantic similarity, and (3) direct step-to-video alignment focusing on short-term fine-grained semantic similarity learned from general video domains. The results from different pathways are fused to generate reliable pseudo step-video matching. We conducted extensive experiments across various tasks and problem settings to evaluate our proposed method. Our approach surpasses state-of-the-art methods in three downstream tasks: procedure step grounding, step localization, and narration grounding by 5.9\%, 3.1\%, and 2.8\%.	翻訳日:2024-09-26 05:27:07 公開日:2024-09-22
# リモートセンシングデータのオブジェクト検出とセグメンテーションにおける人間アノテーションの性能 Performance of Human Annotators in Object Detection and Segmentation of Remotely Sensed Data ( http://arxiv.org/abs/2409.10272v2 ) ライセンス: Link先を確認	Roni Blushtein-Livnon, Tal Svoray, Michael Dorman,	(参考訳) 本研究では,アノテータの性能に及ぼすアノテーション戦略,不均衡データのレベル,事前経験の影響を評価する実験室実験を紹介する。この実験は、長方形の物体のケーススタディとして選択された小さな太陽電池パネルを検出し、セグメンテーションするために、ArcGIS Proツールを使用して空中画像のラベル付けに重点を置いている。この実験は0.15\textbf{$m$}のピクセルサイズを持つ画像を用いて行われ、専門家と非専門家の両方が異なるセットアップ戦略とターゲット/バックグラウンド比データセットにまたがって参加する。以上の結果から,ヒトのアノテータは,一般的にセグメンテーションタスクよりもオブジェクト検出において効果的に機能することが示唆された。偽陰性(False Negatives)や未検出オブジェクト(False Positives)よりも多くのタイプIIエラー(False Negatives、未検出オブジェクト)をコミットする顕著な傾向は、すべての実験的な設定と条件で観察され、検出とセグメンテーションプロセスにおける一貫したバイアスが示唆された。目標/背景比が高いタスク(単位面積当たりのオブジェクト数)では、パフォーマンスが向上した。以前の経験はパフォーマンスに大きな影響を与えず、場合によってはセグメンテーションの過大評価に繋がることもある。これらの結果は、人間のアノテーターが比較的慎重であり、信頼しているときにのみ対象を識別し、過大評価よりも過小評価を優先する傾向があることを示す。アノテーションのパフォーマンスはオブジェクトの不足の影響も受けており、極めて不均衡なデータセットとターゲット・ツー・バックグラウンドの比率の低い領域が減少している。これらの知見は, セグメンテーションと検出モデルを改善するための高品質なトレーニングデータに対する要求が増大する時代において, 効率的なアノテータが不可欠である一方で, リモートセンシング研究のためのアノテーション戦略を強化する可能性がある。 This study introduces a laboratory experiment designed to assess the influence of annotation strategies, levels of imbalanced data, and prior experience, on the performance of human annotators. The experiment focuses on labeling aerial imagery, using ArcGIS Pro tools, to detect and segment small-scale photovoltaic solar panels, selected as a case study for rectangular objects. The experiment is conducted using images with a pixel size of 0.15\textbf{$m$}, involving both expert and non-expert participants, across different setup strategies and target-background ratio datasets. Our findings indicate that human annotators generally perform more effectively in object detection than in segmentation tasks. A marked tendency to commit more Type II errors (False Negatives, i.e., undetected objects) than Type I errors (False Positives, i.e. falsely detecting objects that do not exist) was observed across all experimental setups and conditions, suggesting a consistent bias in detection and segmentation processes. Performance was better in tasks with higher target-background ratios (i.e., more objects per unit area). Prior experience did not significantly impact performance and may, in some cases, even lead to overestimation in segmentation. These results provide evidence that human annotators are relatively cautious and tend to identify objects only when they are confident about them, prioritizing underestimation over overestimation. Annotators' performance is also influenced by object scarcity, showing a decline in areas with extremely imbalanced datasets and a low ratio of target-to-background. These findings may enhance annotation strategies for remote sensing research while efficient human annotators are crucial in an era characterized by growing demands for high-quality training data to improve segmentation and detection models.	翻訳日:2024-09-24 13:39:07 公開日:2024-09-22
# BRDF-NeRF:光衛星画像とBRDFモデリングを用いたニューラルラジアンス場 BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling ( http://arxiv.org/abs/2409.12014v2 ) ライセンス: Link先を確認	Lulin Zhang, Ewelina Rupnik, Tri Dung Nguyen, Stéphane Jacquemoud, Yann Klinger,	(参考訳) 衛星画像から複雑な地球表面の異方性反射を理解することは、多くの応用に不可欠である。ニューラルレイディアンス場(NeRF)は、複数の画像からシーンの双方向反射率分布関数(BRDF)を推定できる機械学習技術として人気がある。しかし、従来の研究はNeRFを近距離画像に適用することに集中しており、多くの地球表面で不足している基礎的なMicrofacet BRDFモデルを推定している。さらに、高品質のNeRFは、一般的に複数の画像を同時に撮影する必要がある。これらの制約に対処するために,遠隔センシングによく用いられる半経験的BRDFモデルであるRahman-Pinty-Verstraete(RPV)モデルを明示的に推定するために開発されたBRDF-NeRFを提案する。我々は,(1) ジブチ,(2) ラン州,および(2) 異なる視角と太陽位置の複数のエポック上で捕獲された1つのエポックに捕獲されたジブチの2つのデータセットを用いて,アプローチを評価する。その結果, BRDF-NeRFは, トレーニングデータから遠く離れた方向からの新たなビューを効果的に合成し, 高品質なデジタル表面モデル(DSM)を作成できることが実証された。 Understanding the anisotropic reflectance of complex Earth surfaces from satellite imagery is crucial for numerous applications. Neural radiance fields (NeRF) have become popular as a machine learning technique capable of deducing the bidirectional reflectance distribution function (BRDF) of a scene from multiple images. However, prior research has largely concentrated on applying NeRF to close-range imagery, estimating basic Microfacet BRDF models, which fall short for many Earth surfaces. Moreover, high-quality NeRFs generally require several images captured simultaneously, a rare occurrence in satellite imaging. To address these limitations, we propose BRDF-NeRF, developed to explicitly estimate the Rahman-Pinty-Verstraete (RPV) model, a semi-empirical BRDF model commonly employed in remote sensing. We assess our approach using two datasets: (1) Djibouti, captured in a single epoch at varying viewing angles with a fixed Sun position, and (2) Lanzhou, captured over multiple epochs with different viewing angles and Sun positions. Our results, based on only three to four satellite images for training, demonstrate that BRDF-NeRF can effectively synthesize novel views from directions far removed from the training data and produce high-quality digital surface models (DSMs).	翻訳日:2024-09-24 13:39:07 公開日:2024-09-22
# Free-VSC: 教師なしビデオセマンティック圧縮のためのVisual Foundation Modelsからのフリーセマンティック Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression ( http://arxiv.org/abs/2409.11718v2 ) ライセンス: Link先を確認	Yuan Tian, Guo Lu, Guangtao Zhai,	(参考訳) 教師なしビデオセマンティック圧縮(Unsupervised video semantic compression, UVSC)は、近年注目されている。しかし,従来の手法のセマンティック・リッチネスは,単一のセマンティック・ラーニング目標,限られたトレーニングデータなどによって制限され続けている。そこで本研究では,VFMから既製のリッチなセマンティクスを吸収することにより,UVSCタスクの強化を提案する。具体的には、圧縮されたビデオと様々なVFM間のセマンティックスを柔軟に整合させるために、VFM固有のプロンプトを補完するVFM共有セマンティックアライメント層を導入する。これにより、異なるVFMが相互に強化されたセマンティック空間を共同で構築し、圧縮モデルの学習を導くことができる。さらに,動的トラジェクトリに基づくフレーム間圧縮方式を導入し,まず歴史的内容に基づいて意味的トラジェクトリを推定し,次にそのトラジェクトリに沿って進行して,将来的なセマンティクスを符号化コンテキストとして予測する。これによりシステム全体のビットコストが削減され、圧縮効率が向上する。提案手法は,3つのメインストリームタスクと6つのデータセットにおいて,従来のコーディング手法より優れている。 Unsupervised video semantic compression (UVSC), i.e., compressing videos to better support various analysis tasks, has recently garnered attention. However, the semantic richness of previous methods remains limited, due to the single semantic learning objective, limited training data, etc. To address this, we propose to boost the UVSC task by absorbing the off-the-shelf rich semantics from VFMs. Specifically, we introduce a VFMs-shared semantic alignment layer, complemented by VFM-specific prompts, to flexibly align semantics between the compressed video and various VFMs. This allows different VFMs to collaboratively build a mutually-enhanced semantic space, guiding the learning of the compression model. Moreover, we introduce a dynamic trajectory-based inter-frame compression scheme, which first estimates the semantic trajectory based on the historical content, and then traverses along the trajectory to predict the future semantics as the coding context. This reduces the overall bitcost of the system, further improving the compression efficiency. Our approach outperforms previous coding methods on three mainstream tasks and six datasets.	翻訳日:2024-09-24 11:55:37 公開日:2024-09-22

Title

Authors

Abstract

論文公表日・翻訳日

# ゼロ知識ゲーム

Zero Knowledge Games ( http://arxiv.org/abs/2009.13521v7 )

ライセンス: Link先を確認

Ian Malloy,

(参考訳) 本稿では,不完全なリコールと不完全な情報によって,全ての戦略が不完全であるようなゲームをモデル化する。また,リニアトランスフォーメーションとして修正されたスライディングブロックコードを導入し,プレイヤーの公開発表時の情報伝達に関する共通知識を生成する。最終的に、2つのプレイヤーまたは2つの連立関係の間に、両方のプレイヤーに知らせられるゼロ知識ゲームは、混合戦略ナッシュ均衡に確立された信頼の効力を持つ。ゼロ知識ゲームは信頼と健全性の1つである。非インフォームドの選手の場合、そのようなプレイヤーは非インフォームドであることを明らかにする。検証の意思」は、クレームが繰り返し虚偽のクレームの責任を負ったり、非インフォームされたりすることがないように浸食されることがある。

In this paper we model a game such that all strategies are non-revealing, with imperfect recall and incomplete information. We also introduce a modified sliding-block code as a linear transformation which generates common knowledge of how informed a player is under public announcements. Ultimately, we see that between two players or two coalitions; zero-knowledge games where both players are informed have the utility of trust established in the mixed strategy Nash equilibrium. A zero-knowledge game is one of trust and soundness, placing utility in being informed. For any player who may be uninformed, such players reveal they are uninformed. The "will to verify" may be eroded such that the claimant is never held responsible for their repeated false claims or being uninformed.

翻訳日:2024-11-09 15:57:56 公開日:2024-09-22

# ゴールデンデリケートアップルにおける酵素ブルーニング欠陥検出のための新しい簡易可視化アルゴリズム

A New Simple Vision Algorithm for Detecting the Enzymic Browning Defects in Golden Delicious Apples ( http://arxiv.org/abs/2110.03574v2 )

ライセンス: Link先を確認

Hamid Majidi Balanji,

(参考訳) 本研究は, 酵素的玄米処理によるゴールデンデリシスリンゴの表面欠陥を抽出し, 同定するために, 簡単な視覚アルゴリズムを設計, 実装した。実験では34種類のゴールデン・デリシアスリンゴが選択され、そのうち17個は酵素的染料欠陥があり、残りの17個は音が聞こえた。提案した視覚アルゴリズムの画像処理部は, リンゴの欠陥表面積を97.15%の精度で抽出した。分割画像の面積と平均は、2x1特徴ベクトルとして選択され、設計された人工ニューラルネットワークに入力される。以上の特徴から, 平均0.0065以下の画像は, 欠陥リンゴに属さないことが明らかとなった。本研究で適用されたニューラルネットワークの分類精度は99.19%であった。

In this work, a simple vision algorithm is designed and implemented to extract and identify the surface defects on the Golden Delicious apples caused by the enzymic browning process. 34 Golden Delicious apples were selected for the experiments, of which 17 had enzymic browning defects and the other 17 were sound. The image processing part of the proposed vision algorithm extracted the defective surface area of the apples with high accuracy of 97.15%. The area and mean of the segmented images were selected as the 2x1 feature vectors to feed into a designed artificial neural network. The analysis based on the above features indicated that the images with a mean less than 0.0065 did not belong to the defective apples; rather, they were extracted as part of the calyx and stem of the healthy apples. The classification accuracy of the neural network applied in this study was 99.19%

翻訳日:2024-11-09 15:57:56 公開日:2024-09-22

# より高速なグラディエントバリアントを用いたプライバシー保護ロジスティック回帰トレーニング

Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant ( http://arxiv.org/abs/2201.10838v9 )

ライセンス: Link先を確認

John Chiang,

(参考訳) 暗号化されたデータに対するロジスティック回帰のトレーニングは、セキュリティ上の問題に何年も取り組んできた。本稿では、プライバシー保護ロジスティック回帰トレーニングのための効率的な勾配変種である$quadratic$$gradient$を紹介する。我々は,Nesterov の Accelerated Gradient (NAG),Adaptive Gradient Algorithm (Adagrad) およびAdamアルゴリズムを2次勾配を組み込んで拡張し,これらの改良アルゴリズムを様々なデータセット上で評価する。実験により, 従来の1次勾配法と比較して, 改良アルゴリズムは収束速度を著しく向上することを示した。さらに,同相ロジスティック回帰学習の実装に改良NAG法を適用し,わずか4回の反復で同等の結果を得ることができた。二次勾配法は2階のニュートン・ラフソン法と1階の勾配勾配勾配/上昇アルゴリズムを統合することができ、幅広い数値最適化問題に適用できる可能性は高い。

Training logistic regression over encrypted data has been a compelling approach in addressing security concerns for several years. In this paper, we introduce an efficient gradient variant, called $quadratic$ $gradient$, for privacy-preserving logistic regression training. We enhance Nesterov's Accelerated Gradient (NAG), Adaptive Gradient Algorithm (Adagrad) and Adam algorithms by incorporating their quadratic gradients and evaluate these improved algorithms on various datasets. Experimental results demonstrate that the enhanced algorithms achieve significantly improved convergence speed compared to traditional first-order gradient methods. Moreover, we applied the enhanced NAG method to implement homomorphic logistic regression training, achieving comparable results within just 4 iterations. There is a good chance that the quadratic gradient approach could integrate first-order gradient descent/ascent algorithms with the second-order Newton-Raphson methods, and that it could be applied to a wide range of numerical optimization problems.

翻訳日:2024-11-09 15:46:48 公開日:2024-09-22

# 低ビットレート映像理解のための符号化フレームワークとベンチマーク

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding ( http://arxiv.org/abs/2202.02813v3 )

ライセンス: Link先を確認

Yuan Tian, Guo Lu, Yichao Yan, Guangtao Zhai, Li Chen, Zhiyong Gao,

(参考訳) ビデオ圧縮は、ほとんどのビデオ分析システムにとって不可欠である。転送帯域を節約しているにもかかわらず、特に低ビットレート設定では、下流のビデオ理解タスクも悪化する。この問題を体系的に検討するために,我々はまず,従来の手法,すなわちタスク分離,ラベルなし,データエマージされたセマンティクスという3つの原則が,マシンフレンドリーなコーディングフレームワークにとって重要であるが,今のところ完全に満足していないことを明らかにした。本稿では,従来のコーデックとニューラルネットワーク(NN)の両方を活用することによって,これらすべての原則を同時に満たす従来型ニューラル混合コーディングフレームワークを提案する。一方、従来のコーデックはビデオのピクセル信号を効率的に符号化できるが、意味情報を歪ませることもある。一方、高非線形NNは、ビデオセマンティクスをコンパクトな表現に凝縮するのに熟練している。このフレームワークは、自己管理された方法でラベルのないデータから自発的に学習されるコーディング手順に、動画の移動効率のよい意味表現が保存されることを保証することで最適化される。 2つのストリーム(コーデックとNN)から共同でデコードされたビデオは、リッチなセマンティクスを持ち、視覚的に写真リアリスティックであり、いくつかの主流のダウンストリームビデオ分析タスクのパフォーマンスを、後処理なしで実証的に向上させる。さらに,アテンション機構とアダプティブ・モデリング・スキームを導入することで,本手法の映像セマンティック・モデリング能力をさらに強化する。最後に、8つのデータセット上の3つの下流タスクを備えた低ビットレートビデオ理解ベンチマークを構築し、我々のアプローチの顕著な優位性を実証した。すべてのコード、データ、モデルは、 \url{https://github.com/tianyuan168326/VCS-Pytorch}で利用可能である。

Video compression is indispensable to most video analysis systems. Despite saving transportation bandwidth, it also deteriorates downstream video understanding tasks, especially at low-bitrate settings. To systematically investigate this problem, we first thoroughly review the previous methods, revealing that three principles, i.e., task-decoupled, label-free, and data-emerged semantic prior, are critical to a machine-friendly coding framework but are not fully satisfied so far. In this paper, we propose a traditional-neural mixed coding framework that simultaneously fulfills all these principles, by taking advantage of both traditional codecs and neural networks (NNs). On one hand, the traditional codecs can efficiently encode the pixel signal of videos but may distort the semantic information. On the other hand, highly non-linear NNs are proficient in condensing video semantics into a compact representation. The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved w.r.t. the coding procedure, which is spontaneously learned from unlabeled data in a self-supervised manner. The videos collaboratively decoded from two streams (codec and NN) are of rich semantics, as well as visually photo-realistic, empirically boosting several mainstream downstream video analysis task performances without any post-adaptation procedure. Furthermore, by introducing the attention mechanism and adaptive modeling scheme, the video semantic modeling ability of our approach is further enhanced. Finally, we build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach. All codes, data, and models will be available at \url{https://github.com/tianyuan168326/VCS-Pytorch}.

翻訳日:2024-11-09 15:46:48 公開日:2024-09-22

# 逐次実験に対する実測的推論

Counterfactual inference for sequential experiments ( http://arxiv.org/abs/2202.06891v4 )

ライセンス: Link先を確認

Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah,

(参考訳) 複数の単位が時間とともに適応する処理ポリシーを用いて、複数の時間点に対する処理を割り当てるシーケンシャルな設計実験のアフタースタディ統計的推論を考察する。我々の目標は、最小限の可能な規模(各単位と各単位の異なる処理の下での平均結果)で、適応的な処理ポリシーに関する最小限の仮定で、カウンターファクト平均に対する推論保証を提供することです。反事実的手段に関する構造的な仮定がなければ、この課題は観測されたデータポイントよりも多くの未知のために実現不可能である。そこで本研究では,非線形混合効果モデルの非パラメトリック一般化と,先行研究で考慮された双線形潜在因子モデルの非パラメトリック一般化として機能する潜在因子モデルを提案する。推定には、近辺の変種である非パラメトリック法を用い、各単位と各時間に対する対実平均に対して非漸近的高確率誤差を定めている。正規性条件の下では、この境界は、単位数と時間点が適切な速度で一緒に$\infty$に増加するにつれて、反ファクトリアル平均に対する漸近的に妥当な信頼区間をもたらす。我々は,いくつかのシミュレーションと,モバイル医療臨床試験HeartStepsのデータを含むケーススタディを通して,我々の理論を解説する。

We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.

翻訳日:2024-11-09 15:46:48 公開日:2024-09-22

# 生物学的時系列データから確率力学方程式を発見する

Discovering stochastic dynamical equations from biological time series data ( http://arxiv.org/abs/2205.02645v6 )

ライセンス: Link先を確認

Arshed Nabeel, Ashwin Karichannavar, Shuaib Palathingal, Jitesh Jhawar, David B. Brückner, Danny Raj M., Vishwesha Guttal,

(参考訳) 理論的研究により、確率性は反直観的な方法で生態系の力学に影響を与えることが示されている。しかし、個体群や生態系の動態を規定する方程式を知らずに、実際のデータセットにおける確率性の役割を確かめることは困難である。したがって、データセットから支配確率方程式を推定する逆問題は重要である。本稿では,状態変数の時系列データを入力とし,確率微分方程式を出力する方程式探索手法を提案する。確率計算からの従来のアプローチと方程式発見手法を組み合わせることでこれを実現できる。いくつかの応用を通して,本手法の一般化を実証する。まず、基本的に異なる支配方程式を持つ様々な確率モデルを意図的に選択するが、ほぼ同一の定常分布を生成する。時系列データのみの解析から,正しい基礎となる方程式を復元し,その安定性を正確に推定できることが示される。我々は,魚の学習と単一細胞移動という,時空間スケールとダイナミクスの異なる2つの実世界のデータセット上で,我々の手法を実証する。本手法の様々な限界と潜在的な落とし穴と診断方法による克服方法について述べる。最後に、PyDaDDy(Python Library for Data Driven Dynamics)というパッケージを通じて、オープンソースコードを提供しています。

Theoretical studies have shown that stochasticity can affect the dynamics of ecosystems in counter-intuitive ways. However, without knowing the equations governing the dynamics of populations or ecosystems, it is difficult to ascertain the role of stochasticity in real datasets. Therefore, the inverse problem of inferring the governing stochastic equations from datasets is important. Here, we present an equation discovery methodology that takes time series data of state variables as input and outputs a stochastic differential equation. We achieve this by combining traditional approaches from stochastic calculus with the equation-discovery techniques. We demonstrate the generality of the method via several applications. First, we deliberately choose various stochastic models with fundamentally different governing equations; yet they produce nearly identical steady-state distributions. We show that we can recover the correct underlying equations, and thus infer the structure of their stability, accurately from the analysis of time series data alone. We demonstrate our method on two real-world datasets -- fish schooling and single-cell migration -- which have vastly different spatiotemporal scales and dynamics. We illustrate various limitations and potential pitfalls of the method and how to overcome them via diagnostic measures. Finally, we provide our open-source codes via a package named PyDaDDy (Python library for Data Driven Dynamics).

翻訳日:2024-11-09 15:46:48 公開日:2024-09-22

# Mine yOur owN anatomy: Revising Medical Image Segmentation with Extremely Limited Labels (特集バイオサイバネティックスとバイオサイバネティックス)

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels ( http://arxiv.org/abs/2209.13476v6 )

ライセンス: Link先を確認

Chenyu You, Weicheng Dai, Fenglin Liu, Yifei Min, Nicha C. Dvornek, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S. Duncan,

(参考訳) 近年のコントラスト学習の研究は, 医療画像セグメンテーションの文脈において, ラベルの少ないことのみを生かして, 優れた成果を上げている。既存の方法は、主にインスタンスの識別と不変マッピングに焦点を当てている。 1) 尾性: 医療画像データは通常、暗黙の長い尾のクラス分布に従う。したがって、訓練ですべてのピクセルを盲目的に活用することは、データの不均衡を招き、パフォーマンスを悪化させる。(2)一貫性:セグメント化モデルが、異なる解剖学的特徴のクラス内変化のために有意義で一貫性のある解剖学的特徴を学習したかどうか、(3)多様性:データセット全体のスライス内相関は、著しく低い注意を払っている。これは、データセット自体を戦略的に利用し、異なる解剖学的視点から類似しているが異なるサンプルを発見するための、原則化されたアプローチを求める動機である。本稿では,Mine yOur owN Anatomy (MONA) と呼ばれる,半教師付き2次元医用画像セグメンテーションフレームワークを紹介する。まず、先行研究では、すべてのピクセルがモデルトレーニングに等しく重要であると論じており、これらだけでは、主に監視信号が欠如していることから、意味のある解剖学的特徴を定義することは不可能である、と実証的に観察している。より強力なデータ拡張と最も近い隣人を使って、不変性を学ぶための2つの簡単なソリューションを示します。第2に,医療画像の解剖学的特徴の集合体への分解を教師なしで行うことをモデルに促す目的の集合を構築した。最後に、我々は実験的かつ理論的に、異なるラベル付き設定で3つのベンチマークデータセットに対してMONAの有効性を実証し、異なるラベル付き半教師付き設定で新しい最先端を実現する。

Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.

翻訳日:2024-11-09 15:46:48 公開日:2024-09-22

# FIRE:エッジコンピューティングマイグレーションのための障害適応型強化学習フレームワーク

FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations ( http://arxiv.org/abs/2209.14399v3 )

ライセンス: Link先を確認

Marie Siew, Shikhar Sharma, Zekai Li, Kun Guo, Chao Xu, Tania Lorido-Botran, Tony Q. S. Quek, Carlee Joe-Wong,

(参考訳) エッジコンピューティングでは、ユーザのモビリティのために、ユーザのサービスプロファイルが移行される。強化学習(RL)フレームワークは、しばしばシミュレーションデータに基づいて訓練される。しかし、既存のRLフレームワークは時折サーバの障害を見落としており、これは、自律運転やリアルタイム障害検出のような遅延に敏感なアプリケーションに影響を与えている。それでも、過去のトレーニングデータで適切に表現されていないこれらの失敗(まれな出来事)は、データ駆動RLアルゴリズムに挑戦する。実世界のトレーニング用アプリケーションにおいて、故障頻度を調整することは現実的ではないため、エッジコンピューティングのディジタルツイン環境でRLポリシーをトレーニングすることで、まれな事象に適応するフレームワークであるFIREを導入する。 ImREは重要なサンプリングに基づくQ-ラーニングアルゴリズムであり、希少事象をその値関数への影響に比例してサンプリングする。 FIREは、個々のサービスプロファイルと共有サービスのプロファイルにまたがる遅延、マイグレーション、障害、バックアップの配置コストを考慮に入れている。我々はImREの有界性と最適性への収束性を証明する。次に、拡張性を高めるために、新しいQ-ラーニング(ImDQL)とアクタ評論家(ImACRE)バージョンを導入します。リスクトレランスの異なるユーザに対応するために、当社のフレームワークを拡張しています。トレース駆動実験により,障害発生時のバニラRLやグリーディベースラインと比較して,FIREがコストを削減できることが判明した。

In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.

翻訳日:2024-11-09 15:35:37 公開日:2024-09-22

# 航空機エンジンブレードの知的欠陥検出のための超画素知覚グラフニューラルネットワーク

Superpixel perception graph neural network for intelligent defect detection of aero-engine blade ( http://arxiv.org/abs/2210.07539v2 )

ライセンス: Link先を確認

Hongbing Shang, Qixiu Yang, Chuang Sun, Xuefeng Chen, Ruqiang Yan,

(参考訳) エアロエンジンは航空機や他の宇宙船のコアコンポーネントである。高速回転翼は空気を吸って完全に燃焼することで力を提供し、様々な欠陥が必然的に発生し、航空エンジンの運転安全性を脅かす。そのため、このような複雑なシステムには定期的な検査が不可欠である。しかしながら、ボアスコープ検査を行う既存の技術は、労働集約的で、時間がかかり、経験に依存している。特徴抽出のための多段階グラフ畳み込みネットワーク(MSGCN)と領域提案のための超画素知覚領域提案ネットワーク(SPRPN)を用いて,この技術を知能で実現するために,新しい超画素知覚グラフニューラルネットワーク(SPGNN)を提案する。まず、複雑な不規則なテクスチャをキャプチャするために、画像は一連のパッチに変換され、グラフ表現を得る。次に、複数のGCNブロックからなるMSGCNがグラフ構造の特徴を抽出し、グラフレベルでグラフ情報処理を行う。最後に、SPRPNは、グラフ表現特徴とスーパーピクセル知覚特徴を融合させて知覚境界ボックスを生成する。そのため,提案したSPGNNは,SPGNNパイプライン全体のグラフレベルにおいて,常に特徴抽出と情報伝達を実装し,受容野の減少と情報損失を軽減する。 SPGNNの有効性を検証するため,3000枚の画像を用いたシミュレートされたブレードデータセットを構築した。アルミニウムのパブリックデータセットは、異なる方法のパフォーマンスを検証するためにも使われる。実験結果から,提案したSPGNNは最先端手法と比較して優れた性能を示した。

Aero-engine is the core component of aircraft and other spacecraft. The high-speed rotating blades provide power by sucking in air and fully combusting, and various defects will inevitably occur, threatening the operation safety of aero-engine. Therefore, regular inspections are essential for such a complex system. However, existing traditional technology which is borescope inspection is labor-intensive, time-consuming, and experience-dependent. To endow this technology with intelligence, a novel superpixel perception graph neural network (SPGNN) is proposed by utilizing a multi-stage graph convolutional network (MSGCN) for feature extraction and superpixel perception region proposal network (SPRPN) for region proposal. First, to capture complex and irregular textures, the images are transformed into a series of patches, to obtain their graph representations. Then, MSGCN composed of several GCN blocks extracts graph structure features and performs graph information processing at graph level. Last but not least, the SPRPN is proposed to generate perceptual bounding boxes by fusing graph representation features and superpixel perception features. Therefore, the proposed SPGNN always implements feature extraction and information transmission at the graph level in the whole SPGNN pipeline, to alleviate the reduction of receptive field and information loss. To verify the effectiveness of SPGNN, we construct a simulated blade dataset with 3000 images. A public aluminum dataset is also used to validate the performances of different methods. The experimental results demonstrate that the proposed SPGNN has superior performance compared with the state-of-the-art methods.

翻訳日:2024-11-09 15:35:37 公開日:2024-09-22

# ソフトラベルプロトタイプを用いた事例から新しい課題を学習する

Learning New Tasks from a Few Examples with Soft-Label Prototypes ( http://arxiv.org/abs/2210.17437v4 )

ライセンス: Link先を確認

Avyav Kumar Singh, Ekaterina Shutova, Helen Yannakoudakis,

(参考訳) 既存のNLPにおける少数ショット学習へのアプローチは、大言語モデル(LLM)および/またはこれらを微調整して、アウト・オブ・ディストリビューションデータの一般化に頼っている。そこで本研究では,入力領域における異なるクラスの分布を総合的に把握するソフトラベルのプロトタイプ(SLP)に基づく,新しい数発学習手法を提案する。本稿では,NLP タスクをクラスごとのごく少数の例 (4, 8, 16) から学習することに集中し,本手法がパラメータ効率が高く,テスト済みタスクの大部分に対して優れた性能を達成できることを実験的に実証する。また,本手法は,より汎用的な学習環境,主にメタラーニングに組み込むことで,強力なベースラインに対して優れた性能が得られることを示す。

Existing approaches to few-shot learning in NLP rely on large language models (LLMs) and/or fine-tuning of these to generalise on out-of-distribution data. In this work, we propose a novel few-shot learning approach based on soft-label prototypes (SLPs) designed to collectively capture the distribution of different classes across the input domain space. We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class and experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting while being highly parameter efficient. We also show that our few-shot adaptation method can be integrated into more generalised learning settings, primarily meta-learning, to yield superior performance against strong baselines.

翻訳日:2024-11-09 15:35:37 公開日:2024-09-22

# スパースディープニューラルネットワークアーキテクチャのための適応的・安定的階層的学習手法

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture ( http://arxiv.org/abs/2211.06860v2 )

ライセンス: Link先を確認

C G Krishnanunni, Tan Bui-Thanh,

(参考訳) この研究は、与えられたトレーニングデータセットに対してうまく一般化するディープニューラルネットワーク(DNN)アーキテクチャを段階的に開発するための2段階適応フレームワークを提案する。第1段階では、新しいレイヤを毎回追加し、前のレイヤでパラメータを凍結することで独立してトレーニングする、レイヤワイズトレーニングアプローチが採用されている。我々は、多様体正則化、スパーシティ正則化、物理インフォームド項を用いることで、DNNに望ましい構造を課す。本稿では, 学習アルゴリズムの望ましい特性として, エプシロン・デルタ安定促進の概念を導入し, 多様体正規化を用いることで, エプシロン・デルタ安定促進アルゴリズムが得られることを示す。さらに,新たに加えた層をトレーニングするために必要な条件を導出し,トレーニング飽和問題について検討する。アルゴリズムの第2段(後処理)では、浅いネットワークのシーケンスを用いて、第1段で生成された残差から情報を抽出し、予測精度を向上させる。試行錯誤問題と分類問題に関する数値的研究により,提案手法が同一サイズの完全連結DNNより優れていることを示す。さらに、物理インフォームドニューラルネットワーク(PINN)に偏微分方程式を解くための適応型アーキテクチャ戦略を組み込むことにより、適応型PINNは標準のPINNよりも優れているだけでなく、証明可能な安定性を持つ解釈可能な隠蔽層を生成することを数値的に示す。また, 楕円偏微分方程式に支配される逆問題の解法として, アーキテクチャ設計戦略を適用した。

This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.

翻訳日:2024-11-09 15:35:37 公開日:2024-09-22

# Z-SSMNet : Bi-parametric MRIによる前立腺癌検出と診断のためのゾーナル・アウェア自己監督メッシュネットワーク

Z-SSMNet: Zonal-aware Self-supervised Mesh Network for Prostate Cancer Detection and Diagnosis with Bi-parametric MRI ( http://arxiv.org/abs/2212.05808v2 )

ライセンス: Link先を確認

Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, Jinman Kim,

(参考訳) 臨床的に有意な前立腺癌(csPCa)の検出と診断において,bi-parametric magnetic resonance imaging (bpMRI)が重要なモダリティとなっている。 bpMRIを用いてcsPCaを識別するAIベースのシステムを開発することで、効率性とコスト効率を向上させることにより、PCa管理を変革することができる。しかし、畳み込みニューラルネットワーク(CNN)を用いた現在の最先端手法は、異方性画像から平面内および三次元空間情報を学習する際に限られている。それらのパフォーマンスは、大きく、多様で、よく注釈付けされたbpMRIデータセットの可用性にも依存する。本研究では,多次元(2D/2.5D/3D)畳み込みを適応的に統合し,高密度なスライス情報と異方性bpMRIのスライス間情報をバランスよく学習するZ-SSMNetを提案する。 bpMRIの外観,テクスチャ,構造を学習するために,大規模未ラベルデータを用いてネットワークを事前学習するための自己教師付き学習(SSL)手法を提案する。トレーニング前の段階で、スライス内情報とスライス間情報の両方をキャプチャすることを目的としている。さらに,我々は,csPCaの検出・診断能力をさらに向上するため,粒子解剖学的領域に集中するようにネットワークを拘束した。 10000以上のマルチセンターデータとマルチスキャナデータからなるPI-CAIデータセットについて広範な実験を行った。 Z-SSMNetは病変レベルの診断(APスコア0.633)と患者レベルの診断(AUROCスコア0.881)の両方に優れ,PI-CAIチャレンジのオープン開発フェーズにおけるトップ位置を確保し,APスコア0.690とAUROCスコア0.909を達成し,クローズドテストフェーズにおける第2位の地位を確保した。

Bi-parametric magnetic resonance imaging (bpMRI) has become a pivotal modality in the detection and diagnosis of clinically significant prostate cancer (csPCa). Developing AI-based systems to identify csPCa using bpMRI can transform PCa management by improving efficiency and cost-effectiveness. However, current state-of-the-art methods using convolutional neural networks (CNNs) are limited in learning in-plane and three-dimensional spatial information from anisotropic images. Their performances also depend on the availability of large, diverse, and well-annotated bpMRI datasets. We propose a Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptively integrates multi-dimensional (2D/2.5D/3D) convolutions to learn dense intra-slice information and sparse inter-slice information of the anisotropic bpMRI in a balanced manner. A self-supervised learning (SSL) technique is proposed to pre-train our network using large-scale unlabeled data to learn the appearance, texture, and structure semantics of bpMRI. It aims to capture both intra-slice and inter-slice information during the pre-training stage. Furthermore, we constrained our network to focus on the zonal anatomical regions to further improve the detection and diagnosis capability of csPCa. We conducted extensive experiments on the PI-CAI dataset comprising 10000+ multi-center and multi-scanner data. Our Z-SSMNet excelled in both lesion-level detection (AP score of 0.633) and patient-level diagnosis (AUROC score of 0.881), securing the top position in the Open Development Phase of the PI-CAI challenge and maintained strong performance, achieving an AP score of 0.690 and an AUROC score of 0.909, and securing the second-place ranking in the Closed Testing Phase.

翻訳日:2024-11-09 15:35:37 公開日:2024-09-22

# ハニーポットデータにおける教師なし攻撃パターン検出のためのネストディリクレモデル

Nested Dirichlet models for unsupervised attack pattern detection in honeypot data ( http://arxiv.org/abs/2301.02505v3 )

ライセンス: Link先を確認

Francesco Sanna Passino, Anastasia Mantziou, Daniyar Ghani, Philip Thiede, Ross Bevington, Nicholas A. Heard,

(参考訳) サイバーシステムは侵入の試みからほぼ一貫した脅威にさらされている。攻撃の種類は異なるが、それぞれの試みは典型的には特定の意図を持ち、加害者は典型的には同様の目的を持った個人のグループである。共通の意図を共有しているように見えるクラスタリング攻撃は、脅威追跡の専門家にとって非常に価値がある。本稿では、悪意のある攻撃者を誘惑するように設計された特別なネットワークホストであるハニーポットから収集した端末セッションコマンドをクラスタリングするためのディリクレ分布トピックモデルについて検討する。セッションをクラスタリングする主な実践的意味は2つある。様々な統計モデルが検討され、コマンドライン構文の構造に適応している。特に、セカンダリトピックとセカンダリトピックの概念、そしてセッションレベルおよびコマンドレベルトピックの概念が、解釈可能性を改善するためにモデルに導入される。提案手法はさらにベイズ的非パラメトリックな方法で拡張され、語彙サイズと潜在意図数の非有界性を許容する。これらの手法は、従来のトピックモデリングアプローチでは検出されていない、既存の暗号通貨のコインマイニングインフラを乗っ取ろうとする、珍しいMIRAI変異を発見している。

Cyber-systems are under near-constant threat from intrusion attempts. Attacks types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common intent is very valuable to threat-hunting experts. This article explores Dirichlet distribution topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks, and identifying outliers. A range of statistical models are considered, adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant which attempts to take over existing cryptocurrency coin-mining infrastructure, not detected by traditional topic-modelling approaches.

翻訳日:2024-11-09 15:24:36 公開日:2024-09-22

# Cesno: 新しいプログラミング言語の初期設計

Cesno: The Initial Design of a New Programming Language ( http://arxiv.org/abs/2303.15750v4 )

ライセンス: Link先を確認

Ozelot Vanilla, Jingxiang Yu, Hemn Barzan Abdalla,

(参考訳) プログラミング言語は非常に多彩で、開発者は個々の要件に合ったアプリケーションやプログラムを作成できます。この記事では、高度でユーザフレンドリで使いやすいプログラミング環境を提供するためにゼロから設計された、Cesnoという新しい言語を紹介します。 Cesnoの構文は他の人気のある言語と似ているため、学習と作業が簡単になる。構文シュガー、組み込みライブラリ、関数型プログラミングのサポート、オブジェクト指向プログラミング、動的型付け、型システム、さまざまな関数パラメータと制約など、他の言語の機能が含まれている。この記事では、Cesnoの文法の設計について検討し、Cesnoがどのようにコードを処理し、コンパイルするかを概観し、Cesnoのコードがどのようなもので、どのように開発に役立てるかを検証します。

Programming languages are incredibly versatile, enabling developers to create applications and programs that suit their individual requirements. This article introduces a new language called Cesno, designed from the ground up to offer an advanced, user-friendly, and easy-to-use programming environment. Cesno's syntax is similar to other popular languages, making it simple to learn and work with. It incorporates features from other languages, such as syntactic sugar, a built-in library, support for functional programming, object-oriented program-ming, dynamic typing, a type system, and a variety of function parameters and restrictions. This article will explore the design of Cesno's grammar, provide a brief overview of how Cesno processes and compiles code, and provide exam-ples of what Cesno's code looks like and how it can aid in development.

翻訳日:2024-11-09 15:24:36 公開日:2024-09-22

# 確率的エージェントドロップアウト下におけるマルチエージェントMDPのモデル自由学習と最適ポリシー設計

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout ( http://arxiv.org/abs/2304.12458v2 )

ライセンス: Link先を確認

Carmel Fiscko, Soummya Kar, Bruno Sinopoli,

(参考訳) 本研究では,エージェントドロップアウトを行うマルチエージェントマルコフ決定プロセス(MDP)と,事前ドロップアウトシステムの制御とサンプリングに基づくポストドロップアウトシステムのポリシーの計算について検討する。中央プランナーの目的は、エージェントのドロップアウト確率の事前知識が与えられた場合、期待されるシステムの価値を最大化する最適なポリシーを見つけることである。特定の遷移独立性と報酬分離性構造を持つMDPに対して、システムからエージェントを取り除くことは、新しい状態と行動空間を持つ残りのエージェントと、除去されたエージェントを疎外する遷移ダイナミクスと、除去されたエージェントとは独立な報酬からなる新しいMDPを形成すると仮定する。この「ロバストMDP」は、Nがエージェント数を表すようなシステムの2ドルN$実現度を全て評価する必要性を排除している。さらに、モデルフリーの文脈では、ロバストなMDP値を事前ドロップアウトシステムによって生成されたサンプルで推定できることが示され、つまり、ドロップアウトが起こる前にロバストなポリシーを見つけることができる。この事実は、ドロップアウトシナリオに対するポリシー評価を行うための政策重要サンプリング(IS)ルーチンの提案に利用され、既存のシステムを適切な事前ドロップアウトポリシーで制御する。ポリシーISルーチンは、堅牢なMDPと特定のドロップアウトシステムの実現の両方に対して値推定を生成し、指数的信頼境界で正当化される。最後に、このアプローチの有用性をシミュレーションで検証し、エージェントのドロップアウトの構造的特性が、ドロップアウトが起こる前にコントローラが優れたドロップアウトポリシーを見つけるのにどう役立つかを示す。

This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The central planner's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where N denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.

翻訳日:2024-11-09 15:13:22 公開日:2024-09-22

# CADGE: グラフ構造化知識集約による文脈認識対話生成

CADGE: Context-Aware Dialogue Generation Enhanced with Graph-Structured Knowledge Aggregation ( http://arxiv.org/abs/2305.06294v4 )

ライセンス: Link先を確認

Hongbo Zhang, Chen Tang, Tyler Loakman, Bohao Yang, Stefan Goetze, Chenghua Lin,

(参考訳) 常識知識は多くの自然言語処理タスクに不可欠である。既存の研究は通常、グラフ知識を従来のグラフニューラルネットワーク(GNN)に組み込む。しかし、この区画化は、これらの2種類の入力知識間の文脈的相互作用を完全に活用するわけではない。本稿では,文脈対応グラフアテンションモデル (Context-aware graph-attention model) を提案する。具体的には、フラットなグラフ知識とテキストデータとを融合させることにより、不均一な特徴を調和させる表現学習に革新的なアプローチを採用する。コンテクスト情報によって補完される連結部分グラフにおけるグラフ知識集約の階層的適用により、コモンセンス駆動対話の生成を促進する。実験により,本フレームワークは従来のGNNベース言語モデルよりも性能が優れていることが示された。自動評価と人的評価の両面から,提案モデルのフローベースラインに対する性能向上が確認できた。

Commonsense knowledge is crucial to many natural language processing tasks. Existing works usually incorporate graph knowledge with conventional graph neural networks (GNNs), resulting in a sequential pipeline that compartmentalizes the encoding processes for textual and graph-based knowledge. This compartmentalization does, however, not fully exploit the contextual interplay between these two types of input knowledge. In this paper, a novel context-aware graph-attention model (Context-aware GAT) is proposed, designed to effectively assimilate global features from relevant knowledge graphs through a context-enhanced knowledge aggregation mechanism. Specifically, the proposed framework employs an innovative approach to representation learning that harmonizes heterogeneous features by amalgamating flattened graph knowledge with text data. The hierarchical application of graph knowledge aggregation within connected subgraphs, complemented by contextual information, to bolster the generation of commonsense-driven dialogues is analyzed. Empirical results demonstrate that our framework outperforms conventional GNN-based language models in terms of performance. Both, automated and human evaluations affirm the significant performance enhancements achieved by our proposed model over the concept flow baseline.

翻訳日:2024-11-09 15:13:22 公開日:2024-09-22

# 脳腫瘍セグメンテーション(BraTS)課題 : 塗布による健康な脳組織の局所的合成

The Brain Tumor Segmentation (BraTS) Challenge: Local Synthesis of Healthy Brain Tissue via Inpainting ( http://arxiv.org/abs/2305.08992v3 )

ライセンス: Link先を確認

Florian Kofler, Felix Meissen, Felix Steinbauer, Robert Graf, Stefan K Ehrlich, Annika Reinke, Eva Oswald, Diana Waldmannstetter, Florian Hoelzl, Izabela Horvath, Oezguen Turgut, Suprosanna Shit, Christina Bukas, Kaiyuan Yang, Johannes C. Paetzold, Ezequiel de da Rosa, Isra Mekki, Shankeeth Vinayahalingam, Hasan Kassem, Juexin Zhang, Ke Chen, Ying Weng, Alicia Durrer, Philippe C. Cattin, Julia Wolleb, M. S. Sadique, M. M. Rahman, W. Farzana, A. Temtam, K. M. Iftekharuddin, Maruf Adewole, Syed Muhammad Anwar, Ujjwal Baid, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Hongwei Bran Li, Ahmed W Moawad, Gian-Marco Conte, Keyvan Farahani, James Eddy, Micah Sheller, Sarthak Pati, Alexandros Karagyris, Alejandro Aristizabal, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Elaine Johanson, Zeke Meier, Ariana Familiar, Christos Davatzikos, John Freymann, Justin Kirby, Michel Bilello, Hassan M Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Rivka R Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc-André Weber, Abhishek Mahajan, Suyash Mohan, John Mongan, Christopher Hess, Soonmee Cha, Javier Villanueva-Meyer, Errol Colak, Priscila Crivellaro, Andras Jakab, Abiodun Fatade, Olubukola Omidiji, Rachel Akinola Lagos, O O Olatunji, Goldey Khanna, John Kirkpatrick, Michelle Alonso-Basanta, Arif Rashid, Miriam Bornhorst, Ali Nabavizadeh, Natasha Lepore, Joshua Palmer, Antonio Porras, Jake Albrecht, Udunna Anazodo, Mariam Aboian, Evan Calabrese, Jeffrey David Rudie, Marius George Linguraru, Juan Eugenio Iglesias, Koen Van Leemput, Spyridon Bakas, Benedikt Wiestler, Ivan Ezhov, Marie Piraud, Bjoern H Menze,

(参考訳) 脳MR画像の自動解析のための無数のアルゴリズムが、臨床医の意思決定を支援するために利用可能である。脳腫瘍患者の場合、画像取得の時系列は通常、すでに病理的なスキャンから始まる。多くのアルゴリズムは健康な脳を解析し、病変を特徴とする画像の保証を提供しない。例えば、脳解剖学のパーセレーション、組織セグメンテーション、脳抽出のアルゴリズムがある。このジレンマを解決するために,BraTS塗装の課題を紹介する。そこで参加者は、損傷した脳から健康な脳スキャンを合成するための塗装技術を探る。下記の原稿にはタスクの定式化、データセット、提出手順が含まれている。その後、課題の調査結果をまとめるために更新される。この挑戦はASNR-BraTS MICCAIチャレンジの一部として組織されている。

A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with an already pathological scan. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantee for images featuring lesions. Examples include, but are not limited to, algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS inpainting challenge. Here, the participants explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later, it will be updated to summarize the findings of the challenge. The challenge is organized as part of the ASNR-BraTS MICCAI challenge.

翻訳日:2024-11-09 15:13:22 公開日:2024-09-22

# フェデレーション学習のための視覚変換器の連続的適応

Continual Adaptation of Vision Transformers for Federated Learning ( http://arxiv.org/abs/2306.09970v2 )

ライセンス: Link先を確認

Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira,

(参考訳) 本稿では,サーバがクライアントの集合と通信し,データを共有したり保存したりすることなく,新たな概念を段階的に学習する,CFL(Continuousal Federated Learning)の重要な課題に焦点を当てる。この問題の複雑さは、継続学習とフェデレート学習の両方の観点からの課題によって複雑化されます。具体的には、CFLセットアップでトレーニングされたモデルは、クライアント間のデータの異質性によって悪化する破滅的な忘れ込みに悩まされる。この問題に対する既存の試みは、クライアントや通信チャネルに大きなオーバーヘッドを課す傾向にあり、あるいは保存されたデータにアクセスする必要があるため、プライバシによる実際の使用には適さない。本稿では,記憶データへのアクセスを必要とせず,オーバーヘッドコストを最小限に抑えながら,忘れと不均一性に取り組む。本研究では,視覚変換器の文脈でこの問題を考察し,動的分布に適応するパラメータ効率のアプローチを,最小限に抑えながら検討する。我々は、プロンプトベースのアプローチ(プロンプトとクラシファイアヘッドのみを通信しなければならない)を活用し、サーバにおけるクライアントモデルを統合するための、新しくて軽量な生成と蒸留方式を提案する。我々は、画像分類の問題を定式化し、比較のための強力なベースラインを確立し、CIFAR-100上で実験を行い、ImageNet-RやDomainNetのような大規模データセットに挑戦する。提案手法は,通信コストとクライアントレベルの計算コストを大幅に削減しつつ,既存手法と独自のベースラインを最大7%向上させる。コードはhttps://github.com/shaunak27/hepco-fed.comで公開されている。

In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Vision Transformers and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs. Code available at https://github.com/shaunak27/hepco-fed.

翻訳日:2024-11-09 14:51:04 公開日:2024-09-22

# 恒常的ホモロジーランク関数を用いた推論の安定性

Stability for Inference with Persistent Homology Rank Functions ( http://arxiv.org/abs/2307.02904v2 )

ライセンス: Link先を確認

Qiquan Wang, Inés García-Redondo, Pierre Faugère, Gregory Henselman-Petrusek, Anthea Monod,

(参考訳) 永続ホモロジーバーコードとダイアグラムは、点雲、ネットワーク、関数など、幅広い複雑なデータ構造の「形」を捉えたトポロジ的データ解析の基盤である。しかし、その複雑な幾何学的構造のため、統計的な設定での使用は困難である。本稿では,統計と機械学習のツールとして,バーコードと永続化図に数学的に等価な永続的ホモロジーランク関数を再検討する。ランク関数は、関数であり、関数の形でデータに適合する統計の領域である、機能データ分析(FDA)の統計理論の直接的な適用を可能にする。しかし、実際にバーコードに対して提示される重要な課題は、安定性の欠如である。データの忠実な表現としての使用を検証する上で重要な特性であり、したがって実行可能な要約統計量である。本稿では,FDA 統合のための適切な基準の下で,永続的ホモロジーランク関数に対する2つの安定性結果を導出することにより,このギャップを埋める。次に、機能的推論統計学および機械学習におけるランク関数の性能を、単パラメータおよび多パラメータの永続的ホモロジーの両方において、実データアプリケーション上で研究する。階数関数によって捕捉される永続的ホモロジーの使用は、既存の非永続的アプローチよりも明らかな改善をもたらす。

Persistent homology barcodes and diagrams are a cornerstone of topological data analysis that capture the "shape" of a wide range of complex data structures, such as point clouds, networks, and functions. However, their use in statistical settings is challenging due to their complex geometric structure. In this paper, we revisit the persistent homology rank function, which is mathematically equivalent to a barcode and persistence diagram, as a tool for statistics and machine learning. Rank functions, being functions, enable the direct application of the statistical theory of functional data analysis (FDA)-a domain of statistics adapted for data in the form of functions. A key challenge they present over barcodes in practice, however, is their lack of stability-a property that is crucial to validate their use as a faithful representation of the data and therefore a viable summary statistic. In this paper, we fill this gap by deriving two stability results for persistent homology rank functions under a suitable metric for FDA integration. We then study the performance of rank functions in functional inferential statistics and machine learning on real data applications, in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing non-persistence-based approaches.

翻訳日:2024-11-09 14:51:04 公開日:2024-09-22

# コードLLMのための高リソースから低リソースプログラミング言語への知識伝達

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs ( http://arxiv.org/abs/2308.09895v6 )

ライセンス: Link先を確認

Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha,

(参考訳) ここ数年、Large Language Models of Code (Code LLMs) はプログラミングの実践に大きな影響を与え始めています。プログラミング言語やソフトウェア工学の研究のためのビルディングブロックとして、コードLLMが登場している。しかし、Code LLMはトレーニングデータ(例えば、Java、Python、JavaScript)でよく表現されているが、トレーニングデータに制限のある低リソースの言語では苦労しているプログラミング言語に対して印象的な結果をもたらす。低リソース言語にはOCaml、Racket、その他いくつかのものがある。本稿では,半合成データを用いた低リソース言語上でのコードLLMの性能向上に有効な手法を提案する。我々のアプローチであるMultiPL-Tは、ハイソース言語からのトレーニングデータを、以下の方法で低リソース言語のトレーニングデータに変換する。 1) Code LLMを使用して、高ソース言語からのコメント付きコードのテストの合成を行い、欠陥のあるテストとテストカバレッジの低いコードをフィルタリングします。 2) コードLLMを使用してPythonコードをターゲットとする低リソース言語に翻訳し,テストを使用して翻訳を検証する。このアプローチを適用して,Julia,Lua,OCaml,R,Racketの各トレーニング項目を数万個生成する。さらに、オープンなトレーニングデータ(The Stack)を備えたオープンモデル(StarCoderBase)を使用することで、ベンチマークの削除や、ライセンスに違反することなくモデルをトレーニングし、それ以外の方法では不可能な実験を実行することが可能になります。 MultiPL-T 生成データを用いて,Julia,Lua,OCaml,R,Racket 用の StarCoderBase と Code Llama の微調整版を提示する。確立されたベンチマーク(MultiPL-E)では、これらのモデルは他のオープンコードLLMよりも優れている。 MultiPL-Tアプローチは、新しい言語に簡単に適用でき、トレーニングのような代替手段よりもはるかに効率的で効果的である。

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.

翻訳日:2024-11-09 14:40:04 公開日:2024-09-22

# 量子擬似ランダムスクランブラ

Quantum Pseudorandom Scramblers ( http://arxiv.org/abs/2309.08941v2 )

ライセンス: Link先を確認

Chuhan Lu, Minglong Qin, Fang Song, Penghui Yao, Mingnan Zhao,

(参考訳) 量子擬似ランダム状態発生器(PRSG)は近年、エキサイティングな発展を促している。固定初期(例えば全ゼロ)状態のPSRGは、Haarランダム状態と計算的に区別できない出力状態を生成する。しかし、出力状態の擬似ランダム性は他の初期状態では保証されない。実際、既知のPSSG構造はいくつかの初期状態で確実に失敗する。本研究では、任意の初期状態上で擬似乱数状態を生成する量子擬似乱数状態スクランブラ(PRSS)を提案し、構築する。情報理論的な設定では、任意の初期状態を全変動距離におけるハールランダムに近い量子状態の分布にマッピングするスクランブラを得る。その結果,スクランブラーは分散特性を示した。一般には、状態空間の$\epsilon$-netにまたがることができる。このことは、平均出力状態がハールランダム状態に近似するならば、状態空間の小さな領域のみに集中できるため、標準PSRGが誘導できるものを大幅に強化する。我々のPRSS構造は有名なKacの歩行を平行に拡張し、標準のKacの歩行よりも指数関数的に高速に混合することを示す。これは我々の証明の核となる。 PRSSの応用についても述べる。 PRSSの構成は、量子後片道関数を仮定するが、PRSSはより弱いプリミティブであり、標準PSSGと同様の相対化世界の片道関数から分離することができる。

Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial states. In this work, we propose and construct quantum Pseudorandom State Scramblers (PRSSs), which can produce a pseudorandom state on an arbitrary initial state. In the information-theoretical setting, we obtain a scrambler which maps an arbitrary initial state to a distribution of quantum states that is close to Haar random in total variation distance. As a result, our scrambler exhibits a dispersing property. Loosely, it can span an $\epsilon$-net of the state space. This significantly strengthens what standard PRSGs can induce, as they may only concentrate on a small region of the state space provided that average output state approximates a Haar random state. Our PRSS construction develops a parallel extension of the famous Kac's walk, and we show that it mixes exponentially faster than the standard Kac's walk. This constitutes the core of our proof. We also describe a few applications of PRSSs. While our PRSS construction assumes a post-quantum one-way function, PRSSs are potentially a weaker primitive and can be separated from one-way functions in a relativized world similar to standard PRSGs.

翻訳日:2024-11-09 14:28:50 公開日:2024-09-22

# Intelligent Scoliosis スクリーニングと診断 : アンケート調査

Intelligent Scoliosis Screening and Diagnosis: A Survey ( http://arxiv.org/abs/2310.08756v2 )

ライセンス: Link先を確認

Zhenlin Zhang, Lixin Pu, Ang Li, Jun Zhang, Xianjie Li, Jipeng Fan,

(参考訳) scooliosisは3次元の脊椎変形であり、胸椎変形や骨盤傾斜などの異常な形態を呈する可能性がある。重度の患者は神経損傷や尿路異常に悩まされることがある。現在、中国では小学校・中学校のスコリシス患者が500万人を超えており、毎年3%から5%の頻度で伸びている。したがって、スコリオーシスの研究は重要な臨床的価値を持っている。本稿では,コンピュータ支援型スコリアススクリーニングと診断を体系的に導入し,現状の課題におけるアルゴリズムモデルの利点と限界を分析する。また、この分野での現在の開発ボトルネックについても論じ、今後の開発動向を楽しみにしている。

Scoliosis is a three-dimensional spinal deformity, which may lead to abnormal morphologies, such as thoracic deformity, and pelvic tilt. Severe patients may suffer from nerve damage and urinary abnormalities. At present, the number of scoliosis patients in primary and secondary schools has exceeded five million in China, the incidence rate is about 3% to 5% which is growing every year. The research on scoliosis, therefore, has important clinical value. This paper systematically introduces computer-assisted scoliosis screening and diagnosis as well as analyzes the advantages and limitations of different algorithm models in the current issue field. Moreover, the paper also discusses the current development bottlenecks in this field and looks forward to future development trends.

翻訳日:2024-11-09 10:01:09 公開日:2024-09-22

# ドメイン横断点雲の分割にSAMを適応させる学習

Learning to Adapt SAM for Segmenting Cross-domain Point Clouds ( http://arxiv.org/abs/2310.08820v4 )

ライセンス: Link先を確認

Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Yujing Sun, Tai Wang, Xinge Zhu, Yuexin Ma,

(参考訳) 3Dセグメンテーションタスクにおける非教師なしドメイン適応(UDA)は、主にポイントクラウドデータの希薄で非秩序な性質から生じる、恐ろしい挑戦である。特にLiDARの点雲では、様々な撮影シーン、変動する気象条件、使用中の様々なLiDARデバイス間でドメインの差が明らかになる。従来のUDA手法では、ソースとターゲットのドメイン間の特徴を整列させることで、このギャップを緩和しようと試みてきたが、ドメインのかなりの変動により、3Dセグメンテーションに適用した場合、このアプローチは不十分である。イメージセグメンテーションの領域において,視覚基盤モデル SAM が示す顕著な一般化能力に着想を得て,SAM 内に埋め込まれた豊富な一般知識を活用し,多様な3次元領域にまたがる特徴表現を統一し,さらに3次元領域適応問題を解く。具体的には,3次元特徴空間とSAMの特徴空間との整合性を大幅に向上させ,シーンレベルとインスタンスレベルの両方で動作する,革新的なハイブリッド機能拡張手法を提案する。提案手法は,広く認識されている多くのデータセットで評価され,最先端の性能を実現する。

Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily stemming from the sparse and unordered nature of point cloud data. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. While previous UDA methodologies have often sought to mitigate this gap by aligning features between source and target domains, this approach falls short when applied to 3D segmentation due to the substantial domain variations. Inspired by the remarkable generalization capabilities exhibited by the vision foundation model, SAM, in the realm of image segmentation, our approach leverages the wealth of general knowledge embedded within SAM to unify feature representations across diverse 3D domains and further solves the 3D domain adaptation problem. Specifically, we harness the corresponding images associated with point clouds to facilitate knowledge transfer and propose an innovative hybrid feature augmentation methodology, which significantly enhances the alignment between the 3D feature space and SAM's feature space, operating at both the scene and instance levels. Our method is evaluated on many widely-recognized datasets and achieves state-of-the-art performance.

翻訳日:2024-11-09 10:01:09 公開日:2024-09-22

# ソノルミネッセンス:時間依存アナログ系における光子生成

Sonoluminescence: Photon production in time dependent analog system ( http://arxiv.org/abs/2311.03305v2 )

ライセンス: Link先を確認

Rajesh Karmakar, Debaprasad Maity,

(参考訳) ソノルミネッセンス(英: Sonoluminescence)は、適切な環境で振動するガスの泡が周期的に可視域の光を放出する、よく知られた実験室現象である。本稿では,アナログ重力の枠組みにおけるシステムについて考察する。アナログ幾何学の観点から発振気泡をモデル化し,電磁場の最小結合処方則を提案する。この幾何学は、量子真空からのパラメトリック共鳴を通じて、光子の繰り返し束が広い周波数範囲で生成される類似の発振時間依存背景として振る舞う。数値的な制限のため、$\sim 10^5 ~\mbox{m}^{-1}$まで到達することができた。しかし、観測周波数範囲を$\sim 10^7 ~\mbox{m}^{-1}$とする多項式形式に数値的に適合する。現在の分析は、アナログ背景におけるパラメトリック共鳴が、量子場理論の枠組みにおいてそのような現象を説明する上で、基本的な役割を担っていることを示唆している。

Sonoluminescence is a well known laboratory phenomenon where an oscillating gas bubble in the appropriate environment periodically emits a flash of light in the visible frequency range. In this submission, we study the system in the framework of analog gravity. We model the oscillating bubble in terms of analog geometry and propose a non-minimal coupling prescription of the electromagnetic field with the geometry. The geometry behaves as an analogous oscillating time dependent background in which repeated flux of photons are produced in a wide frequency range through parametric resonance from quantum vacuum. Due to our numerical limitation, we could reach the frequency up to $\sim 10^5 ~\mbox{m}^{-1}$. However, we numerically fit the spectrum in a polynomial form including the observed frequency range around $\sim 10^7 ~\mbox{m}^{-1}$. Our current analysis seems to suggest that parametric resonance in analog background may play a fundamental role in explaining such phenomena in the quantum field theory framework.

翻訳日:2024-11-09 09:50:02 公開日:2024-09-22

# 非監督的遠隔生理計測のための自己相似事前蒸留法

Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement ( http://arxiv.org/abs/2311.05100v2 )

ライセンス: Link先を確認

Xinyu Zhang, Weiyu Sun, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen,

(参考訳) 遠隔光胸腺造影法(remote Photoplethysmography, RPPG)は、心臓活動による血液量の変化による顔画素の微妙な変化を捉えることを目的とした非侵襲的手法である。既存のrPPGタスクの教師なし手法のほとんどは、生理的信号の前の自己相似性を無視しながら、サンプル間の対照的な学習に焦点を当てている。本稿では,心活動の本質的な自己相似性に着目した,教師なしrPPG推定のための自己相似事前蒸留(SSPD)フレームワークを提案する。具体的には、まず、様々な種類のノイズの影響を軽減するために、物理的に適切な組込み拡張手法を導入する。そして、より信頼性の高い自己相似的生理的特徴を抽出するために、自己相似性認識ネットワークを調整する。最後に,顔画像から自己相似的な生理的パターンを引き離すネットワークを支援するために,階層的な自己蒸留パラダイムを開発する。包括的実験により、教師なしのSSPDフレームワークは、最先端の教師付き手法と比較して、同等またはそれ以上のパフォーマンスを達成することが示された。一方、SSPDは、エンドツーエンドモデルの中で最も低い推論時間と計算コストを維持している。

Remote photoplethysmography (rPPG) is a noninvasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the contrastive learning between samples while neglecting the inherent self-similar prior in physiological signals. In this paper, we propose a Self-Similarity Prior Distillation (SSPD) framework for unsupervised rPPG estimation, which capitalizes on the intrinsic self-similarity of cardiac activities. Specifically, we first introduce a physical-prior embedded augmentation technique to mitigate the effect of various types of noise. Then, we tailor a self-similarity-aware network to extract more reliable self-similar physiological features. Finally, we develop a hierarchical self-distillation paradigm to assist the network in disentangling self-similar physiological patterns from facial videos. Comprehensive experiments demonstrate that the unsupervised SSPD framework achieves comparable or even superior performance compared to the state-of-the-art supervised methods. Meanwhile, SSPD maintains the lowest inference time and computation cost among end-to-end models.

翻訳日:2024-11-09 09:50:02 公開日:2024-09-22

# 量子過程の実現のための絡み合いコスト

Entanglement cost of realizing quantum processes ( http://arxiv.org/abs/2311.10649v2 )

ライセンス: Link先を確認

Xin Wang, Mingrui Jing, Chengkai Zhu,

(参考訳) 量子絡み合い(quantum entanglement)は、量子コンピューティングやセキュアな通信といった強力な技術を支える粒子間の特異な接続である。しかし、量子状態を作成し、量子プロセスを実装するのに必要な最小の絡み合いを定量化することは重要な課題である。我々は、特定の物理原理を尊重する任意の量子過程を実現するのに必要な絡み合いの量を確実に推定する効率的な計算可能なツールを開発する。我々のツールは、従来の方法の制限を超越して、漸近的状態における幅広い量子状態の準備に必要な絡み合いに適用する。また、量子演算のクラスを考えるために一度消費された絡み合いが、漸近的にさえ完全に回復できないことも確認する。この不可逆な振る舞いは、部分的転置の正当性を完全に保存する量子演算でさえも、フルランクの絡み合った状態と実質的に関連する振幅減衰チャネルに対して明らかである。本稿では,SWAPチャネルの両分極化を実現するための絡み合い条件の推定や,熱相互作用下でのハミルトンシミュレーションの解法などの例を通して,我々のアプローチのパワーを実証する。我々の研究は、一般的な状態と量子力学の絡み合い要求をベンチマークするための実用的なツールキットを提供し、量子技術の性能を評価し、最適化する方法を開拓する。

Quantum entanglement, a peculiar connection between particles, underpins powerful technologies such as quantum computing and secure communication. However, quantifying the minimum entanglement required to prepare quantum states and implement quantum processes remains a significant challenge. We develop an efficiently computable tool that reliably estimates the amount of entanglement needed for realizing arbitrary quantum processes respecting certain physical principles. Our tool applies to the entanglement required to prepare a broad range of quantum states in the asymptotic regime, surpassing previous methods' limitations. We also confirm that entanglement, once consumed to realize the considered class of quantum operations, cannot be fully recovered, even asymptotically. This irreversible behavior is evident for full-rank entangled states and practically relevant amplitude damping channels, even under quantum operations that completely preserve the positivity of partial transpose. We showcase our approach's power through examples such as estimating entanglement requirements for realizing bipartite dephasing SWAP channels and solving Hamiltonian simulations under thermal interaction, highlighting its advantages over existing techniques. Our work provides a practical toolkit for benchmarking entanglement requirements for generic states and quantum dynamics, paving the way for assessing and optimizing the performances of quantum technologies.

翻訳日:2024-11-09 09:38:58 公開日:2024-09-22

# Event Camera Data Dense 事前トレーニング

Event Camera Data Dense Pre-training ( http://arxiv.org/abs/2311.11533v2 )

ライセンス: Link先を確認

Yan Yang, Liyuan Pan, Liu Liu,

(参考訳) 本稿では,イベントカメラデータを用いた高密度予測タスクに適したニューラルネットワークの事前学習を目的とした,自己教師付き学習フレームワークを提案する。当社のアプローチでは,トレーニングにイベントデータのみを活用する。イベントカメラデータへの高密度RGB事前トレーニングによる成果の転送は、サブパーパフォーマンスをもたらす。これは、多くのピクセルが情報を含まないイベント画像(イベントデータから変換される)に固有の空間空間の空間空間性に起因する。この余分な問題を緩和するために、イベントイメージをイベントパッチ機能にエンコードし、パッチ間のコンテキスト的類似性関係を自動的にマイニングし、パッチ機能を固有のコンテキストにグループ化し、コンテキストとコンテキストの類似性を強制し、識別可能なイベント機能を学ぶ。フレームワークをトレーニングするために、さまざまなシーンと動きパターンを特徴とする合成イベントカメラデータセットをキュレートする。下流の高密度予測タスクにおける伝達学習性能は,最先端手法よりも提案手法が優れていることを示す。

This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. Our approach utilizes solely event data for training. Transferring achievements from dense RGB pre-training directly to event camera data yields subpar performance. This is attributed to the spatial sparsity inherent in an event image (converted from event data), where many pixels do not contain information. To mitigate this sparsity issue, we encode an event image into event patch features, automatically mine contextual similarity relationships among patches, group the patch features into distinctive contexts, and enforce context-to-context similarities to learn discriminative event features. For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns. Transfer learning performance on downstream dense prediction tasks illustrates the superiority of our method over state-of-the-art approaches.

翻訳日:2024-11-09 09:38:57 公開日:2024-09-22

# 流通福祉による政策学習

Policy Learning with Distributional Welfare ( http://arxiv.org/abs/2311.15878v3 )

ライセンス: Link先を確認

Yifan Cui, Sukjin Han,

(参考訳) 本稿では,分配福祉を対象とする最適治療配分政策について検討する。治療選択に関する文献の多くは、条件付き平均治療効果(ATE)に基づく実用的福祉を考察している。平均的な福祉は直感的であるが、特に個人が異質な(例えば、外れ値を持つ)場合、望ましくない割り当てをもたらす可能性がある。本研究の動機は,個別処理効果の条件量子化(QoTE)に基づいて治療を割り当てる最適政策を提案することである。量的確率の選択によっては、この基準は慎重または無神経な政策立案者に対応することができる。 QoTEを特定することの課題は、実験データにおいても回復が困難である対実的な結果の共分散に関する知識の要求にある。したがって、不確実性をモデル化する上で堅牢なミニマックスポリシーを導入する。仮定を特定できる範囲は、より情報的なポリシーを生み出すのに利用できる。確率的・決定論的両政策については,提案された政策の実施を後悔することによる漸近的境界を確立する。シミュレーションと2つの経験的応用において、QoTEに基づく最適決定と他の基準に基づく決定を比較した。この枠組みは、福祉が潜在的な成果の共役分布の関数として定義されるあらゆる状況に一般化することができる。

In this paper, we explore optimal treatment allocation policies that target distributional welfare. Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE). While average welfare is intuitive, it may yield undesirable allocations especially when individuals are heterogeneous (e.g., with outliers) - the very reason individualized treatments were introduced in the first place. This observation motivates us to propose an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE). Depending on the choice of the quantile probability, this criterion can accommodate a policymaker who is either prudent or negligent. The challenge of identifying the QoTE lies in its requirement for knowledge of the joint distribution of the counterfactual outcomes, which is generally hard to recover even with experimental data. Therefore, we introduce minimax policies that are robust to model uncertainty. A range of identifying assumptions can be used to yield more informative policies. For both stochastic and deterministic policies, we establish the asymptotic bound on the regret of implementing the proposed policies. In simulations and two empirical applications, we compare optimal decisions based on the QoTE with decisions based on other criteria. The framework can be generalized to any setting where welfare is defined as a functional of the joint distribution of the potential outcomes.

翻訳日:2024-11-09 09:38:57 公開日:2024-09-22

# ネットワークによるUnixwap日替わり取引指標のデータセット

A Dataset of Uniswap daily transaction indices by network ( http://arxiv.org/abs/2312.02660v2 )

ライセンス: Link先を確認

Nir Chemaya, Lin William Cong, Emma Jorgensen, Dingyue Liu, Luyao Zhang,

(参考訳) 分散ファイナンス(DeFi)は、仲介者なしで直接取引を可能にし、豊富なオープンファイナンスデータを作成することによって、従来のファイナンスを再構築している。レイヤ2(L2)ソリューションは、レイヤ1(L1)システムを超えた、DeFiエコシステムのスケーラビリティと効率を高めるために登場しています。しかし、L2ソリューションの影響は、主に経済分析のための包括的トランザクションデータ指標が欠如していることから、いまだ研究が進んでいない。この研究は、L1ネットワークとL2ネットワークの両方にわたる主要な分散取引であるUnixwapから5000万件以上のトランザクションを分析し、ギャップを埋める。私たちはEthereum、Optimism、Arbitrum、Polygonのブロックチェーンデータから毎日のインデックスを作成し、DeFiの採用、スケーラビリティ、分散化、富の分散に関する洞察を提供しました。さらに、分散化指標を計算するためのオープンソースのPythonフレームワークを開発し、このデータセットが高度な機械学習研究に非常に役立つようにした。私たちの仕事はデータサイエンティストに貴重なリソースを提供し、インテリジェントなWeb3エコシステムの成長に貢献しています。

Decentralized Finance (DeFi) is reshaping traditional finance by enabling direct transactions without intermediaries, creating a rich source of open financial data. Layer 2 (L2) solutions are emerging to enhance the scalability and efficiency of the DeFi ecosystem, surpassing Layer 1 (L1) systems. However, the impact of L2 solutions is still underexplored, mainly due to the lack of comprehensive transaction data indices for economic analysis. This study bridges that gap by analyzing over 50 million transactions from Uniswap, a major decentralized exchange, across both L1 and L2 networks. We created a set of daily indices from blockchain data on Ethereum, Optimism, Arbitrum, and Polygon, offering insights into DeFi adoption, scalability, decentralization, and wealth distribution. Additionally, we developed an open-source Python framework for calculating decentralization indices, making this dataset highly useful for advanced machine learning research. Our work provides valuable resources for data scientists and contributes to the growth of the intelligent Web3 ecosystem.

翻訳日:2024-11-09 09:27:53 公開日:2024-09-22

# SkyScenes: 航空シーン理解のための合成データセット

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding ( http://arxiv.org/abs/2312.06719v2 )

ライセンス: Link先を確認

Sahil Khose, Anisha Pal, Aayushi Agarwal, Deepanshi, Judy Hoffman, Prithvijit Chattopadhyay,

(参考訳) 実世界の航空シーンの理解は、様々な条件の下でキュレーションされた濃密な注釈付き画像を含むデータセットの欠如によって制限される。制御された実世界の環境でこのような画像を得るための固有の課題のため、無人航空機(UAV)の視点から捉えた高密度の注釈付き空中画像の合成データセットSkyScenesを提示する。我々は、CARLAのSkyScenes画像を慎重にキュレートし、レイアウト(都市や農村の地図)、気象条件、日時、ピッチ角、高度を、対応する意味、例、深さアノテーションで包括的に把握する。 1)SkyScenesを用いた実験により,(1)SkyScenesで訓練されたモデルが現実のシナリオに順応し,(2)SkyScenesデータによる実画像のトレーニングが実世界のパフォーマンスを向上させること,(3)SkyScenesの制御されたバリエーションが,視点条件(高さとピッチ),天気と日時の変化にモデルがどう反応するか,(4)付加的なセンサモーダル性(深度)を組み込むことにより,空間の理解が向上すること,などが示されている。私たちのデータセットと関連する生成コードは、https://hoffman-group.github.io/SkyScenes/で公開されています。

Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions (height and pitch), weather and time of day, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding. Our dataset and associated generation code are publicly available at: https://hoffman-group.github.io/SkyScenes/

翻訳日:2024-11-09 09:16:50 公開日:2024-09-22

# オンライン連続手話認識と翻訳に向けて

Towards Online Continuous Sign Language Recognition and Translation ( http://arxiv.org/abs/2401.05336v2 )

ライセンス: Link先を確認

Ronglai Zuo, Fangyun Wei, Brian Mak,

(参考訳) 聴覚と聴覚のコミュニケーションギャップを埋めるためには,連続手話認識(CSLR)の研究が不可欠である。過去の多くの研究では、コネクショニスト時間分類(CTC)の損失を用いてモデルを訓練してきた。推論の間、これらのCTCベースのモデルは一般的に、高いレイテンシとかなりのメモリ使用量に悩まされるオフライン認識と呼ばれるプロセスである予測を行うために、入力としてサインビデオ全体を必要とする。本研究では,オンラインCSLRに向けた第一歩を踏み出す。私たちのアプローチは3つのフェーズで構成されています。 1)手話辞書の作成 2 辞書上で孤立手話認識モデルを訓練すること、及び 3)入力サインシーケンスにスライディングウインドウアプローチを適用し,各サインクリップを最適化したオンライン認識モデルに供給する。さらに、我々のオンライン認識モデルは、グロス・トゥ・テキスト・ネットワークを統合することで、オンライン翻訳をサポートするように拡張することができ、オフラインモデルの性能を向上させることができる。これらの拡張により、オンラインアプローチは、様々なタスク設定にまたがる3つの人気のあるベンチマークに対して、最先端のパフォーマンスを新たに達成する。コードとモデルはhttps://github.com/FangyunWei/SLRT.comで公開されている。

Research on continuous sign language recognition (CSLR) is essential to bridge the communication gap between deaf and hearing individuals. Numerous previous studies have trained their models using the connectionist temporal classification (CTC) loss. During inference, these CTC-based models generally require the entire sign video as input to make predictions, a process known as offline recognition, which suffers from high latency and substantial memory usage. In this work, we take the first step towards online CSLR. Our approach consists of three phases: 1) developing a sign dictionary; 2) training an isolated sign language recognition model on the dictionary; and 3) employing a sliding window approach on the input sign sequence, feeding each sign clip to the optimized model for online recognition. Additionally, our online recognition model can be extended to support online translation by integrating a gloss-to-text network and can enhance the performance of any offline model. With these extensions, our online approach achieves new state-of-the-art performance on three popular benchmarks across various task settings. Code and models are available at https://github.com/FangyunWei/SLRT.

翻訳日:2024-11-09 05:28:28 公開日:2024-09-22

# 森林伐採の理論的・実証的研究

Theoretical and Empirical Advances in Forest Pruning ( http://arxiv.org/abs/2401.05535v3 )

ライセンス: Link先を確認

Albert Dorador,

(参考訳) 開始から数十年後、レグレッション・フォレストは最先端の精度を提供し続けており、この点において、レグレッション・ツリーやニューラルネットワークのような代替機械学習モデルよりも優れています。しかし、アンサンブル手法であるレグレッション・フォレストは、レグレッション・ツリーを著しく過小評価する傾向にある。本研究は,回帰林の精度と回帰樹の解釈可能性という,両世界を最大限に活用するアプローチである森林伐採を再考するものである。この追求はランダム森林理論の核心にあるが、経験的研究において大きな成功を収めている。本稿では,これらの経験的知見を裏付け,検証する理論的な結果,すなわち,非常に弱い仮定のもとに,未開林に対するラッソ刈り林の漸近的優位性を証明し,また,本手法により刈り取られた回帰林に対する高確率有限サンプル一般化境界を検証し,シミュレーションにより検証する。次に,19の異なるデータセット (合成, 3実) 上で, 未伐採林と比較し, 伐採林の精度を検証した。テストされたほとんどのシナリオでは、少なくとも1つの森林伐採方法があり、それは元の森林(予想通り)と同等かそれ以上の精度が得られる。その結果,森林面積の減少が劇的であり,結果として得られた亜熱帯林を1本木に有意にマージし,原生林よりも質的に優れた解釈可能性を得ることができた。

Decades after their inception, regression forests continue to provide state-of-the-art accuracy, outperforming in this respect alternative machine learning models such as regression trees or even neural networks. However, being an ensemble method, the one aspect where regression forests tend to severely underperform regression trees is interpretability. In the present work, we revisit forest pruning, an approach that aims to have the best of both worlds: the accuracy of regression forests and the interpretability of regression trees. This pursuit, whose foundation lies at the core of random forest theory, has seen vast success in empirical studies. In this paper, we contribute theoretical results that support and qualify those empirical findings; namely, we prove the asymptotic advantage of a Lasso-pruned forest over its unpruned counterpart under extremely weak assumptions, as well as high-probability finite-sample generalization bounds for regression forests pruned according to the main methods, which we then validate by way of simulation. Then, we test the accuracy of pruned regression forests against their unpruned counterparts on 19 different datasets (16 synthetic, 3 real). We find that in the vast majority of scenarios tested, there is at least one forest-pruning method that yields equal or better accuracy than the original full forest (in expectation), while just using a small fraction of the trees. We show that, in some cases, the reduction in the size of the forest is so dramatic that the resulting sub-forest can be meaningfully merged into a single tree, obtaining a level of interpretability that is qualitatively superior to that of the original regression forest, which remains a black box.

翻訳日:2024-11-09 05:28:28 公開日:2024-09-22

# 乳房におけるニップ・ハロシン化の知識検証

Knowledge Verification to Nip Hallucination in the Bud ( http://arxiv.org/abs/2401.10768v5 )

ライセンス: Link先を確認

Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi,

(参考訳) 大規模言語モデル(LLM)は、人間のアライメントに続く様々なタスクにおいて例外的な性能を示したが、それでも、幻覚として知られる事実知識と矛盾する応答を生成する可能性がある。本稿では、アライメントデータに存在する外部知識と基礎LLM内に埋め込まれた固有の知識との矛盾を検証し、最小化することにより、幻覚を緩和する可能性を示す。具体的には,知識一貫性アライメント(KCA, Knowledge Consistent Alignment)と呼ばれる,外部知識に基づく評価を自動的に定式化し,基礎LPMの知識境界を評価する手法を提案する。アライメントデータにおける知識の不整合に対処するため、KCAはこれらのデータインスタンスを扱うためのいくつかの具体的な戦略を実装している。 6つのベンチマークで幻覚を減らし, バックボーンとスケールの異なる基礎的LCMを利用することで, KCAの優れた効果を実証した。これは、知識の不整合を減らして幻覚を緩和する効果を確認する。私たちのコード、モデルウェイト、データは、 \url{https://github.com/fanqiwan/KCA}で公開アクセスできます。

While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at \url{https://github.com/fanqiwan/KCA}.

翻訳日:2024-11-09 05:17:11 公開日:2024-09-22

# Sum-Product Networks を用いた類似物生成

Generating Likely Counterfactuals Using Sum-Product Networks ( http://arxiv.org/abs/2401.14086v3 )

ライセンス: Link先を確認

Jiri Nemecek, Tomas Pevny, Jakub Marecek,

(参考訳) AIシステムによる決定の説明責任は、最近の規制とユーザ要求の両方によって引き起こされる。これらの決定はしばしば、事実の後に \emph{post hoc} のみを説明することができる。反事実的説明において、最も優れた反事実的説明を構成するものは何であるかを問うことができる。明らかに、"サンプルからの距離"は重要な基準であるが、複数の基準を考慮する必要がある。カウンターファクトの妥当性を考える最近の手法は、この本来の目的を犠牲にしているようだ。本稿では,密接かつ疎密な高次説明を提供するシステムを提案する。そこで本研究では,多くのデシデラタを満足する最も可能性の高い説明の探索を混合整数最適化 (MIO) を用いてモデル化できることを述べる。本プロセスでは,SPN(Sum-Product Network)のMIO定式化を提案し,SPNを用いて,独立利害関係にある可能性のある反事実の可能性を推定する。

Explainability of decisions made by AI systems is driven by both recent regulation and user demand. These decisions are often explainable only \emph{post hoc}, after the fact. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although "distance from the sample" is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using mixed-integer optimization (MIO). In the process, we propose an MIO formulation of a Sum-Product Network (SPN) and use the SPN to estimate the likelihood of a counterfactual, which can be of independent interest.

翻訳日:2024-11-09 05:17:11 公開日:2024-09-22

# 光学鋼ロープの非破壊損傷検出法

A new method for optical steel rope non-destructive damage detection ( http://arxiv.org/abs/2402.03843v4 )

ライセンス: Link先を確認

Yunqing Bao, Bin Hu,

(参考訳) 本稿では,高高度(空中ロープウェイ)における鋼ロープの非破壊損傷検出のための新しいアルゴリズムを提案する。まず、RGBD-UNetという名前のセグメンテーションモデルは、複雑な背景から鋼のロープを正確に抽出するように設計されている。このモデルは、提案したCMAモジュールを通して色と深度情報を処理・結合する機能を備えている。第2に、VovNetV3.5と呼ばれる検出モデルは、通常の鋼ロープと異常鋼ロープを区別するために開発された。 VovNetアーキテクチャとDBBモジュールを統合してパフォーマンスを向上させる。また,セグメンテーションモデルの一般化能力を高めるために,新たなバックグラウンド拡張手法を提案する。セグメンテーションと検出モデルの両方のトレーニングとテストのために、異なるシナリオでスチールロープの画像を含むデータセットを作成する。実験はベースラインモデルよりも大幅に改善された。提案したデータセットでは,検出モデルにより達成された最高精度が0.975に達し,セグメンテーションモデルにより達成された最大F値が0.948に達した。

This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.

翻訳日:2024-11-09 04:54:55 公開日:2024-09-22

# 正則フラー理論 I:有限次元および無限次元における指数積分

Holomorphic Floer theory I: exponential integrals in finite and infinite dimensions ( http://arxiv.org/abs/2402.07343v2 )

ライセンス: Link先を確認

Maxim Kontsevich, Yan Soibelman,

(参考訳) プロジェクト『ホロモルフィック・フレアー理論』にまつわる一連の論文の第一部において、指数積分と関連する壁交差構造について論じる。この主題について、変形量子化の考えに基づくものと、フレアー理論の考えに基づくものという2つの視点を強調する。それらの等価性は、一般化されたリーマン・ヒルベルト対応の系である。指数積分の場合には、デ・ラムとベッチコホモロジーの局所版と大域版の比較同型性に相当する。我々は、モーゼ・ノヴィコフ理論を正則なケースに特に一般化する対応する理論を発展させる。また、壁交差構造が解析可能であることを証明する。我々は、量子波動関数の一般理論を開発し、チャーン・サイモンズ理論の場合、一般化されたナーム和の概念に基づくチャーン・サイモンズ壁交差構造の代替記述を与えることを示す。対応する摂動級数の解析性と復活に関するいくつかの予想を提案する。

In the first of the series of papers devoted to our project ``Holomorphic Floer Theory" we discuss exponential integrals and related wall-crossing structures. We emphasize two points of view on the subject: the one based on the ideas of deformation quantization and the one based on the ideas of Floer theory. Their equivalence is a corollary of our generalized Riemann-Hilbert correspondence. In the case of exponential integrals this amounts to several comparison isomorphisms between local and global versions of de Rham and Betti cohomology. We develop the corresponding theories in particular generalizing Morse-Novikov theory to the holomorphic case. We prove that arising wall-crossing structures are analytic. As a corollary, perturbative expansions of exponential integrals are resurgent. Based on a careful study of finite-dimensional exponential integrals we propose a conjectural approach to infinite-dimensional exponential integrals. We illustrate this approach in the case of Feynman path integral with holomorphic Lagrangian boundary conditions as well as in the case of the complexified Chern-Simons theory. We discuss the arising perverse sheaf of infinite rank as well as analyticity of the corresponding ``Chern-Simons wall-crossing structure". We develop a general theory of quantum wave functions and show that in the case of Chern-Simons theory it gives an alternative description of the Chern-Simons wall-crossing structure based on the notion of generalized Nahm sum. We propose several conjectures about analyticity and resurgence of the corresponding perturbative series.

翻訳日:2024-11-09 04:43:41 公開日:2024-09-22

# 思慮に注意を払う:オンライン談話における行動検出のための実践的ニュアンスをマイニングする

Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse ( http://arxiv.org/abs/2402.09934v2 )

ライセンス: Link先を確認

Khiem Phi, Noushin Salek Faramarzi, Chenlu Wang, Ritwik Banerjee,

(参考訳) 物語をディスラプトし、不信を喚起する強力なツールである「Whataboutism」は、量的NLP研究において未発見のままである。また、過去の研究は、誤情報やプロパガンダの戦略としての使用と、実用的・意味的なフレーミングの道具としての使用とを区別していない。我々は、TwitterとYouTubeからの新しいデータセットを導入し、オーバーラップと、どこが問題なのか、プロパガンダ、そしてTu quoqueの誤用の区別を明らかにした。さらに、言語意味論における最近の研究に基づき、「何について」の語彙構造と「何について」を区別する。我々の実験は、その正確な検出において、非常に独特な課題をもたらし、負のサンプルマイニングに注意重みを用いた新しい方法が導入された。本誌のTwitterとYouTubeのコレクションでは、これまでの最先端の手法に比べて、4%と10%の大幅な改善が報告されている。

Whataboutism, a potent tool for disrupting narratives and sowing distrust, remains under-explored in quantitative NLP research. Moreover, past work has not distinguished its use as a strategy for misinformation and propaganda from its use as a tool for pragmatic and semantic framing. We introduce new datasets from Twitter and YouTube, revealing overlaps as well as distinctions between whataboutism, propaganda, and the tu quoque fallacy. Furthermore, drawing on recent work in linguistic semantics, we differentiate the `what about' lexical construct from whataboutism. Our experiments bring to light unique challenges in its accurate detection, prompting the introduction of a novel method using attention weights for negative sample mining. We report significant improvements of 4% and 10% over previous state-of-the-art methods in our Twitter and YouTube collections, respectively.

翻訳日:2024-11-09 04:43:41 公開日:2024-09-22

# Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, overall ?

Magic Mirror on the Wall, How to Benchmark Quantum Error Correction Codes, Overall ? ( http://arxiv.org/abs/2402.11105v4 )

ライセンス: Link先を確認

Avimita Chatterjee, Swaroop Ghosh,

(参考訳) 量子誤り訂正符号(Quantum Error Correction Codes、QECC)は、ノイズやエラーの悪影響から量子状態を保護することにより、量子コンピューティングの進歩において重要なものである。既存のものの新しい開発や修正を含む様々なQECCの開発により、特定の条件に合わせて適切なQECCを選択することが重要である。 QECCの分野では大幅な改善があったが、それらを一貫した基準で評価するための統一的な方法論はいまだ解明されていない。このギャップに対処するため,本論文では,QECCの最初のベンチマークフレームワークを提案する。 8つの重要なQECCを評価し,その分析のために8つのパラメータからなる包括的スイートを提案する。提案手法は普遍的なベンチマーク手法を確立し,量子誤り訂正の複雑さを強調し,QECCの選択は各シナリオのユニークな要件と制限に依存することを示す。さらに、与えられたシナリオの特定の要求に適応するQECCを選択するための体系的な戦略を開発し、量子誤り訂正に対する調整されたアプローチを容易にする。さらに,ユーザが提供したシナリオの特徴を評価する新しいQECCレコメンデーションツールを導入し,各コードに対して達成可能な最大距離とともに,最も適度なQECCのスペクトルを推奨する。このツールは適応可能なように設計されており、新しいQECCを組み込んで、最小限の労力でパラメータを修正できる。

Quantum Error Correction Codes (QECCs) are pivotal in advancing quantum computing by protecting quantum states against the adverse effects of noise and errors. With a variety of QECCs developed, including new developments and modifications of existing ones, selecting an appropriate QECC tailored to specific conditions is crucial. Despite significant improvements in the field of QECCs, a unified methodology for evaluating them on a consistent basis has remained elusive. Addressing this gap, this paper presents the first benchmarking framework for QECCs, introducing a set of universal parameters. By evaluating eight prominent QECCs, we propose a comprehensive suite of eight parameters for their analysis. Our methodology establishes a universal benchmarking approach and highlights the complexity of quantum error correction, indicating that the choice of a QECC depends on the unique requirements and limitations of each scenario. Furthermore, we develop a systematic strategy for selecting QECCs that adapts to the specific requirements of a given scenario, facilitating a tailored approach to quantum error correction. Additionally, we introduce a novel QECC recommendation tool that assesses the characteristics of a given scenario provided by the user, subsequently recommending a spectrum of QECCs from most to least suitable, along with the maximum achievable distance for each code. This tool is designed to be adaptable, allowing for the inclusion of new QECCs and the modification of their parameters with minimal effort, ensuring its relevance in the evolving landscape of quantum computing.

翻訳日:2024-11-09 04:43:41 公開日:2024-09-22

# 超伝導ジョセフソン接合による第5番ゲージボソンの検出

Detecting a Fifth-Force Gauge Boson via Superconducting Josephson Junctions ( http://arxiv.org/abs/2402.14514v2 )

ライセンス: Link先を確認

Yu Cheng, Jie Sheng, Tsutomu T. Yanagida,

(参考訳) B-L=電荷を持つ粒子間の新しい第5の力は、標準モデルの興味深い$U(1)_{B-L}$拡張によって動機付けられる。ゲージボソンメディエーターのF\'eetonもダークマター候補として機能している。本稿では,超伝導ジョセフソン接合を用いた第5の力による量子位相差を検出するための新しい実験設計を提案する。この実験は、ゲージボソンがF'eetonダークマターにとって興味深い質量領域である0.01\,$eVから10\,$eVの範囲内にあるときに、ゲージカップリングに最も敏感である。これは、新しい物理をミリ以下の小さなスケールで測定するための新しい道を開く。

A new fifth force between particles carrying $B-L$ charges is well-motivated by the intriguing $U(1)_{B-L}$ extension of the standard model. The gauge boson mediator, F\'eeton, also serves as a dark matter candidate. In this letter, we propose a novel experimental design to detect the quantum phase difference caused by this fifth force using a superconducting Josephson junction. We find that the experiment has the best sensitivity to the gauge coupling when the gauge boson is within the mass range of $0.01\,$eV to $10\,$eV, which is an interesting mass region for the F\'eeton dark matter. This opens up a new avenue for the measurement of new physics at small scale below millimeter.

翻訳日:2024-11-09 04:32:42 公開日:2024-09-22

# マルチモーダルLDMとチェーン・オブ・ソート推論が相反する画像に出会うとき

Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image ( http://arxiv.org/abs/2402.14899v3 )

ライセンス: Link先を確認

Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu,

(参考訳) テキストや画像の理解能力に優れたマルチモーダルLLM(MLLM)が注目されている。 MLLMを用いたより優れた推論を実現するために、CoT推論が広く研究され、中間的推論ステップを与えることでMLLMの説明可能性をさらに向上させる。 MLLMによるマルチモーダル推論の強い力にもかかわらず、最近の研究はMLLMがいまだに敵対的なイメージに悩まされていることを示している。 CoTはまた、MLLMの対角的堅牢性を強化しますか? CoTの中間的推論ステップは、敵対的攻撃にどのような意味があるのか? これらの質問に答えるために、我々はまず、CoTベースの推論に対する既存の攻撃を2つの主要なコンポーネント、すなわち理性と答えを攻撃することによって一般化する。 CoTは,マルチステップ推論プロセスを活用することで,既存の攻撃手法に対するMLLMの対角的堅牢性を向上させるが,実質的には向上しない。そこで本研究では,CoT推論過程をバイパスしながらモデルを攻撃する新たな攻撃手法を提案する。 3つのMLLMと2つの視覚的推論データセットによる実験により,提案手法の有効性が検証された。本研究は, 停止共振攻撃は, 誤認予測やベースライン攻撃の精度を著しく向上させる可能性があることを示す。

Multimodal LLMs (MLLMs) with a great ability of text and image understanding have received great attention. To achieve better reasoning with MLLMs, Chain-of-Thought (CoT) reasoning has been widely explored, which further promotes MLLMs' explainability by giving intermediate reasoning steps. Despite the strong power demonstrated by MLLMs in multimodal reasoning, recent studies show that MLLMs still suffer from adversarial images. This raises the following open questions: Does CoT also enhance the adversarial robustness of MLLMs? What do the intermediate reasoning steps of CoT entail under adversarial attacks? To answer these questions, we first generalize existing attacks to CoT-based inferences by attacking the two main components, i.e., rationale and answer. We find that CoT indeed improves MLLMs' adversarial robustness against the existing attack methods by leveraging the multi-step reasoning process, but not substantially. Based on our findings, we further propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process. Experiments on three MLLMs and two visual reasoning datasets verify the effectiveness of our proposed method. We show that stop-reasoning attack can result in misled predictions and outperform baseline attacks by a significant margin.

翻訳日:2024-11-09 04:32:42 公開日:2024-09-22

# SAFDNet: 完全スパース3Dオブジェクト検出のためのシンプルで効果的なネットワーク

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection ( http://arxiv.org/abs/2403.05817v3 )

ライセンス: Link先を確認

Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu,

(参考訳) LiDARベースの3Dオブジェクト検出は、自動運転において重要な役割を果たす。既存の高性能な3Dオブジェクト検出器は通常、バックボーンネットワークと予測ヘッドに密度の高い特徴マップを構築する。しかし、高密度特徴写像によって引き起こされる計算コストは、知覚範囲が大きくなるにつれて2次的に増大し、これらのモデルが長距離検出にスケールアップすることが困難になる。いくつかの最近の研究は、この問題を解決するために完全なスパース検出器を構築しようとしたが、結果として得られたモデルは複雑な多段パイプラインに依存するか、劣った性能を示すかのいずれかであった。本研究では,SAFDNetを提案する。SAFDNetは,完全スパースな3Dオブジェクト検出に適した,単純かつ高効率なアーキテクチャである。 SAFDNetでは、中心的特徴不足問題に対処するために適応的特徴拡散戦略が設計されている。 Waymo Open、nuScenes、Argoverse2データセットについて広範な実験を行った。 SAFDNetは、最初の2つのデータセットでは以前のSOTAよりもわずかに優れていたが、最後のデータセットでは、長距離検出を必要とするシナリオにおいて、SAFDNetの有効性を検証した。特にArgoverse2では、SAFDNetは以前の最高のハイブリッド検出器であるHEDNetを2.1倍高速で2.6%上回り、以前の最高のスパース検出器であるFSDv2よりも2.1%上回った。コードはhttps://github.com/zhanggang001/HEDNetで入手できる。

LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet.

翻訳日:2024-11-09 04:10:35 公開日:2024-09-22

# 多様なユーザシミュレーションによる非協調対話の戦略計画の改善

Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation ( http://arxiv.org/abs/2403.06769v3 )

ライセンス: Link先を確認

Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua,

(参考訳) 我々は,多様なユーザとの戦略的対話を期待する非協力的対話エージェントについて,システム目標に好意的に依存する相互合意を確保するために検討する。これは、既存の対話エージェントに2つの大きな課題をもたらす。 1) ユーザ固有の特徴を戦略的計画に組み込むことができないこと、及び 2)多様な利用者に一般化できる戦略プランナーの育成が困難である。これらの課題に対処するため,我々は,ユーザ対応戦略計画モジュールと人口ベーストレーニングパラダイムを取り入れた,適切な戦略計画の能力を高めるためのTripを提案する。協調的でない対話タスクのベンチマーク実験を通じて,多様なユーザを対象としたTripの有効性を実証した。

We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. Through experiments on benchmark non-collaborative dialogue tasks, we demonstrate the effectiveness of Trip in catering to diverse users.

翻訳日:2024-11-09 04:10:35 公開日:2024-09-22

# PoIFusion:関心点での核融合による多モード3次元物体検出

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest ( http://arxiv.org/abs/2403.09212v2 )

ライセンス: Link先を確認

Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid,

(参考訳) 本稿では,RGB画像とLiDAR点雲の情報を興味ある点(PoI)に融合させる,概念的にシンプルで効果的なマルチモーダル3Dオブジェクト検出フレームワークPoIFusionを提案する。マルチセンサデータを統一的なビューに変換する,あるいは統合を容易にするグローバルアテンション機構を活用する,これまでで最も正確な方法とは違い,本手法は各モードのビューを維持し,計算に優しい投影と補間によりマルチモーダル特徴を得る。特に、私たちのPoIFusionは、クエリベースのオブジェクト検出のパラダイムに従い、オブジェクトクエリを動的3Dボックスとして定式化し、各クエリボックスに基づいてPoIのセットを生成します。 PoIは3Dオブジェクトを表すキーポイントとして機能し、マルチモーダル融合において基本ユニットの役割を担う。具体的には、PoIを各モードのビューに投影し、対応する特徴をサンプリングし、動的融合ブロックを介して各PoIのマルチモーダル特徴を統合する。さらに、同じクエリボックスから派生したPoIの機能を集約してクエリ機能を更新する。本手法は、ビュー変換による情報損失を防止し、計算集約的なグローバルな注目を排除し、マルチモーダル3Dオブジェクト検出器をより適用できるようにする。我々はnuScenesとArgoverse2データセットについて広範囲に実験を行い、我々のアプローチを評価した。注目すべきことに、提案手法は、ベルとホイッスルを使わずに両方のデータセットで最先端の結果を得る。 \emph{i.e.}, 74.9\% NDS and 73.4\% mAP on nuScenes, 31.6\% CDS and 40.6\% mAP on Argoverse2。コードは \url{https://djiajunustc.github.io/projects/poifusion} で公開される。

In this work, we present PoIFusion, a conceptually simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the points of interest (PoIs). Different from the most accurate methods to date that transform multi-sensor data into a unified view or leverage the global attention mechanism to facilitate fusion, our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and interpolation. In particular, our PoIFusion follows the paradigm of query-based object detection, formulating object queries as dynamic 3D boxes and generating a set of PoIs based on each query box. The PoIs serve as the keypoints to represent a 3D object and play the role of the basic units in multi-modal fusion. Specifically, we project PoIs into the view of each modality to sample the corresponding feature and integrate the multi-modal features at each PoI through a dynamic fusion block. Furthermore, the features of PoIs derived from the same query box are aggregated together to update the query feature. Our approach prevents information loss caused by view transformation and eliminates the computation-intensive global attention, making the multi-modal 3D object detector more applicable. We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach. Remarkably, the proposed approach achieves state-of-the-art results on both datasets without any bells and whistles, \emph{i.e.}, 74.9\% NDS and 73.4\% mAP on nuScenes, and 31.6\% CDS and 40.6\% mAP on Argoverse2. The code will be made available at \url{https://djiajunustc.github.io/projects/poifusion}.

翻訳日:2024-11-09 04:10:35 公開日:2024-09-22

# 憎しみの解読:憎しみのあるミームとそのターゲットを識別する

Deciphering Hate: Identifying Hateful Memes and Their Targets ( http://arxiv.org/abs/2403.10829v2 )

ライセンス: Link先を確認

Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum,

(参考訳) インターネットミームは、個人がソーシャルメディア上で感情、思考、視点を表現するための強力な手段となっている。ユーモアやエンターテイメントの源と見なされることが多いが、ミームは個人やコミュニティをターゲットにしたヘイトフルなコンテンツを広めることもできる。既存の研究は、ベンガル語(バングラ語としても知られる)のような低リソース言語にまつわる独特な課題を見越して、高リソース言語のミームの負の側面に焦点を当てている。さらに、ベンガルのミームに関する以前の研究は、憎しみのあるミームを検出することに焦点を合わせてきたが、その対象物を検出するための研究は行われていない。このギャップを埋め、この領域での研究を促進するために、ベンガルのBHM(Bengali Hateful Memes)のための新しいマルチモーダルデータセットを導入する。データセットは、ベンガル語で書かれた7,148のミームと、2つのタスクに合わせたコードミキシングされたキャプションで構成されている。一憎しみのあるミームを検知し、 (二)対象とする社会団体(個人、組織、コミュニティ、社会)を検出すること。これらの課題を解決するために,メメから重要なモダリティ特徴を体系的に抽出し,その文脈をよりよく理解するためのモダリティ特化特徴と共同で評価するマルチモーダルディープニューラルネットワークであるDORA(Dual cO attention fRAmework)を提案する。我々の実験は、DORAが他の低リソースのヘイトフルミームデータセットで一般化可能であることを示し、最先端の競合するいくつかのベースラインを上回っている。

Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges associated with low-resource languages like Bengali (also known as Bangla). Furthermore, while previous work on Bengali memes has focused on detecting hateful memes, there has been no work on detecting their targeted entities. To bridge this gap and facilitate research in this arena, we introduce a novel multimodal dataset for Bengali, BHM (Bengali Hateful Memes). The dataset consists of 7,148 memes with Bengali as well as code-mixed captions, tailored for two tasks: (i) detecting hateful memes, and (ii) detecting the social entities they target (i.e., Individual, Organization, Community, and Society). To solve these tasks, we propose DORA (Dual cO attention fRAmework), a multimodal deep neural network that systematically extracts the significant modality features from the memes and jointly evaluates them with the modality-specific features to understand the context better. Our experiments show that DORA is generalizable on other low-resource hateful meme datasets and outperforms several state-of-the-art rivaling baselines.

翻訳日:2024-11-09 03:59:25 公開日:2024-09-22

# m&m's: マルチステップマルチモーダルタスクのためのツール利用評価ベンチマーク

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks ( http://arxiv.org/abs/2403.11085v4 )

ライセンス: Link先を確認

Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna,

(参考訳) 実世界のマルチモーダル問題は、単一の機械学習モデルではほとんど解決されず、しばしば複数のモデルを縫合する多段階の計算計画を必要とする。ツール拡張 LLM は、そのような計算計画の自動生成に非常に有望である。しかし、マルチステップマルチモーダルタスクのプランナーとしてLLMを評価するための標準ベンチマークが欠如していることは、プランナー設計決定の体系的な研究を妨げている。 LLMは、ひとつのショットで完全なプランを生成するべきか、ステップバイステップで生成すべきか? ツールを直接PythonコードやJSONのような構造化データフォーマットで呼び出すべきか? フィードバックは計画を改善するか? マルチモーダルモデル、(無料)パブリックAPI、画像処理モジュールを含む33のツールを含む4K以上のマルチモーダルタスクを含むベンチマーク。これら各タスククエリに対して、この現実的なツールセットを使用して自動生成されたプランを提供する。我々はさらに,人間による検証と正確な実行が可能な,1,565のタスクプランの高品質なサブセットを提供する。 m&mでは,2つの計画戦略(複数ステップ対ステップバイステッププランニング),2つの計画形式(JSON対コード),3種類のフィードバック(パーシング/検証/実行)を備えた10のLLMを評価した。最後に、我々の広範な実験の要点を要約する。私たちのデータセットとコードは、HuggingFace (https://huggingface.co/datasets/zixianma/mnms)とGithub (https://github.com/RAIVNLab/mnms)で利用可能です。

Real-world multi-modal problems are rarely solved by a single machine learning model, and often require multi-step computational plans that involve stitching several models. Tool-augmented LLMs hold tremendous promise for automating the generation of such computational plans. However, the lack of standardized benchmarks for evaluating LLMs as planners for multi-step multi-modal tasks has prevented a systematic study of planner design decisions. Should LLMs generate a full plan in a single shot or step-by-step? Should they invoke tools directly with Python code or through structured data formats like JSON? Does feedback improve planning? To answer these questions and more, we introduce m&m's: a benchmark containing 4K+ multi-step multi-modal tasks involving 33 tools that include multi-modal models, (free) public APIs, and image processing modules. For each of these task queries, we provide automatically generated plans using this realistic toolset. We further provide a high-quality subset of 1,565 task plans that are human-verified and correctly executable. With m&m's, we evaluate 10 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution). Finally, we summarize takeaways from our extensive experiments. Our dataset and code are available on HuggingFace (https://huggingface.co/datasets/zixianma/mnms) and Github (https://github.com/RAIVNLab/mnms).

翻訳日:2024-11-09 03:59:25 公開日:2024-09-22

# B-LoRAを用いたインプシットスタイル・コンテンツ分離

Implicit Style-Content Separation using B-LoRA ( http://arxiv.org/abs/2403.14572v2 )

ライセンス: Link先を確認

Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or,

(参考訳) イメージスタイリングは、画像の視覚的な外観とテクスチャ(スタイル)を操作しつつ、その基盤となるオブジェクト、構造、概念(コンテンツ)を保存することを含む。スタイルと内容の分離は、画像のスタイルをその内容から独立して操作するために不可欠であり、調和し、視覚的に喜ぶ結果を保証する。この分離を実現するには、画像の視覚的特徴と意味的特徴の両方を深く理解する必要がある。本稿では,LoRA(Low-Rank Adaptation)を利用して,画像のスタイルとコンテンツコンポーネントを暗黙的に分離し,画像スタイリング作業を容易にする手法であるB-LoRAを紹介する。 SDXLのアーキテクチャをLoRAと組み合わせて解析することにより、B-LoRAと呼ばれる2つのブロックのLoRA重みを共同で学習することで、各B-LoRAを個別に訓練することでは達成できないスタイル-コンテンツ分離を実現する。トレーニングを2ブロックに集約し、スタイルとコンテンツを分離することで、スタイル操作を大幅に改善し、モデル微調整に関連する過度な問題を克服できます。トレーニングが完了すると、2つのB-LoRAは独立したコンポーネントとして使用でき、画像スタイルの転送、テキストベースの画像スタイリング、一貫したスタイル生成、スタイル内容の混合など、様々な画像スタイリングタスクが可能である。

Image stylization involves manipulating the visual appearance and texture (style) of an image while preserving its underlying objects, structures, and concepts (content). The separation of style and content is essential for manipulating the image's style independently from its content, ensuring a harmonious and visually pleasing result. Achieving this separation requires a deep understanding of both the visual and semantic characteristics of images, often necessitating the training of specialized models or employing heavy optimization. In this paper, we introduce B-LoRA, a method that leverages LoRA (Low-Rank Adaptation) to implicitly separate the style and content components of a single image, facilitating various image stylization tasks. By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks (referred to as B-LoRAs) achieves style-content separation that cannot be achieved by training each B-LoRA independently. Consolidating the training into only two blocks and separating style and content allows for significantly improving style manipulation and overcoming overfitting issues often associated with model fine-tuning. Once trained, the two B-LoRAs can be used as independent components to allow various image stylization tasks, including image style transfer, text-based image stylization, consistent style generation, and style-content mixing.

翻訳日:2024-11-09 03:48:22 公開日:2024-09-22

# GeNet: グラフニューラルネットワークによるタスク指向セマンティック通信パラダイム

GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm ( http://arxiv.org/abs/2403.18296v3 )

ライセンス: Link先を確認

Chunhang Zheng, Kechao Cai,

(参考訳) 意味コミュニケーションタスクに対する従来のアプローチは、チャネルノイズを軽減するためにSNR(Signal-to-Noise ratio)の知識に依存していた。さらに、これらの手法は特定のSNR条件下でのトレーニングを必要とし、かなりの時間と計算資源を必要とする。本稿では,ノイズ対策を目的とした意味コミュニケーションのためのグラフニューラルネットワーク(GNN)に基づくパラダイムであるGeNetを提案し,タスク指向通信(TOC)を容易にする。入力データイメージをグラフ構造に変換する新しい手法を提案する。そして、GNNベースのエンコーダを利用して、ソースデータから意味情報を抽出する。この抽出された意味情報はチャネルを介して送信される。受信側の最後には、GNNベースのデコーダを使用して、TOCのソースデータから関連する意味情報を再構成する。実験により,SNR依存性を疎結合化しながら,アンチノイズTOCにおけるGeNetの有効性を示す。さらに,ノード数を変えてGeNetの性能を評価し,その汎用性を意味コミュニケーションの新しいパラダイムとして明らかにした。さらに,GeNetの幾何変換に対する頑健さを,データ拡張に頼ることなく,異なる回転角度でテストすることで示す。

Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.

翻訳日:2024-11-09 03:37:10 公開日:2024-09-22

# SugarcaneNet2024:Sgarcane病分類のためのLASSO正規化事前訓練モデルの最適化された平均アンサンブルアプローチ

SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease Classification ( http://arxiv.org/abs/2403.18870v2 )

ライセンス: Link先を確認

Md. Simul Hasan Talukder, Sharmin Akter, Abdullah Hafez Nur, Mohammad Aljaidi, Rejwan Bin Sulaiman,

(参考訳) 世界の砂糖産業にとって重要な作物であるシュガーカインは、その収量と品質の両方にかなりの悪影響を及ぼすいくつかの病気の傾向にある。予防イニシアチブを効果的に管理し、実施するには、疾患を迅速かつ正確に検出する必要がある。本研究では,サトウキビ病を自動的にかつ迅速に検出するための従来の手法よりも優れたサトウキビNet2024というユニークなモデルを提案する。 InceptionV3、InceptionResNetV2、DenseNet201、DenseNet169、Xception、ResNet152V2の7つのカスタマイズおよびLASSO正規化事前学習モデルの最適化された平均アンサンブルを集約した。当初、0.0001 LASSO正則化、30%のドロップアウト層、3つのバッチ正規化を加えた。この添加によりサトウキビ葉病分類の精度が大幅に向上した。その後、平均アンサンブルと個々のモデルの比較研究を行い、アンサンブルの手法がより良くなったことを示唆した。すべての改良された事前訓練されたモデルの平均アンサンブルは、それぞれ100%、99%、99%、99.45%のスコア、精度、リコール、精度で優れた結果をもたらした。グリッドサーチを組み込んだ最適化された平均アンサンブル手法の実装により、さらに性能が向上した。この最適化されたサトウキビNet2024モデルは、精度、精度、リコール、F1スコアの99.67%、100%、100%、100%を達成し、サトウキビ病の診断に最善を尽くした。

Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf disease classification was greatly increased by this addition. Following this, several comparative studies between the average ensemble and individual models were carried out, indicating that the ensemble technique performed better. The average ensemble of all modified pre-trained models produced outstanding outcomes: 100%, 99%, 99%, and 99.45% for f1 score, precision, recall, and accuracy, respectively. Performance was further enhanced by the implementation of an optimized weighted average ensemble technique incorporated with grid search. This optimized sugarcaneNet2024 model performed the best for detecting sugarcane diseases, having achieved accuracy, precision, recall, and F1 score of 99.67%, 100%, 100%, and 100% , respectively.

翻訳日:2024-11-09 03:37:10 公開日:2024-09-22

# ハンドオブジェクト接触セマンティックマッピングによるクラッタ環境における多指ロボットハンドグラッピング

Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping ( http://arxiv.org/abs/2404.08844v2 )

ライセンス: Link先を確認

Lei Zhang, Kaixin Bai, Guowen Huang, Zhenshan Bing, Zhaopeng Chen, Alois Knoll, Jianwei Zhang,

(参考訳) 深層学習モデルには,多指ハンドグリップのための巧妙な操作技術が著しく進歩している。しかし, 乱雑な環境下での接触情報誘導の把握は, いまだに過小評価されている。このギャップに対処するため,接触セマンティックマップを用いて,乱雑な環境下でのマルチフィンガーハンドグリップサンプルを生成する手法を開発した。オブジェクトポイントクラウドから総合的な接触セマンティックマップを作成するための接触セマンティック条件変分オートエンコーダネットワーク(CoSe-CVAE)を導入する。接触セマンティックマップから手つかみポーズを推定するために把握検出法を利用する。最後に, 把握品質と衝突確率を定量的に評価する統合的把握評価モデルを構築し, 散在シナリオにおける最適把握の信頼性を著しく向上する。実世界の単一物体環境における把握成功率の平均は81.0%、散在するシーンでの把握成功率は75.3%である。また,マルチモーダルなマルチフィンガーグリップデータセット生成手法を提案する。マルチフィンガーハンドグルーピングデータセットは、シーンの多様性、モダリティの多様性において、過去のデータセットよりも優れています。データセット、コード、補足資料はhttps://sites.google.com/view/ffh-cluttered-graspingで見ることができる。

The deep learning models has significantly advanced dexterous manipulation techniques for multi-fingered hand grasping. However, the contact information-guided grasping in cluttered environments remains largely underexplored. To address this gap, we have developed a method for generating multi-fingered hand grasp samples in cluttered settings through contact semantic map. We introduce a contact semantic conditional variational autoencoder network (CoSe-CVAE) for creating comprehensive contact semantic map from object point cloud. We utilize grasp detection method to estimate hand grasp poses from the contact semantic map. Finally, an unified grasp evaluation model is designed to assess grasp quality and collision probability, substantially improving the reliability of identifying optimal grasps in cluttered scenarios. Our grasp generation method has demonstrated remarkable success, outperforming state-of-the-art methods by at least 4.65% with 81.0% average grasping success rate in real-world single-object environment and 75.3% grasping success rate in cluttered scenes. We also proposed the multi-modal multi-fingered grasping dataset generation method. Our multi-fingered hand grasping dataset outperforms previous datasets in scene diversity, modality diversity. The dataset, code and supplementary materials can be found at https://sites.google.com/view/ffh-cluttered-grasping.

翻訳日:2024-11-09 03:14:33 公開日:2024-09-22

# 拡散モデルを用いた頑健な深度推定のためのコントラスト学習

Digging into contrastive learning for robust depth estimation with diffusion models ( http://arxiv.org/abs/2404.09831v4 )

ライセンス: Link先を確認

Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao,

(参考訳) 近年, 拡散型深度推定法は, エレガントなデノナイジングパターンと有望な性能により, 広く注目を集めている。しかし、雨や雪などの現実のシナリオでよく見られる悪条件下では、信頼できないのが普通である。本稿では,複雑な環境における性能劣化を軽減するために,拡散モデルに適した独自のコントラスト学習モードを備えた,D4RDと呼ばれる新しい頑健な深度推定手法を提案する。具体的には、知識蒸留の強みを対照的な学習に統合し、「真性」の対照的なスキームを構築する。このスキームは前方拡散過程のサンプルノイズを自然参照として利用し、様々な場面で予測されたノイズをより安定かつ正確な最適化に向けて導く。さらに、より汎用的な特徴や画像レベルを包含する雑音レベルトリニティを拡張し、マルチレベルコントラストを確立し、ネットワーク全体にわたって頑健な知覚の重荷を分散する。複雑なシナリオに対処する前に、3つの単純かつ効果的な改善によりベースライン拡散モデルの安定性を高め、収束を容易にし、奥行きの外れを除去する。大規模な実験により、D4RDは、合成汚職データセットや現実世界の気象条件に関する既存の最先端のソリューションを超越していることが示された。ソースコードとデータは \url{https://github.com/wangjiyuan9/D4RD} で公開されている。

Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.

翻訳日:2024-11-09 03:14:33 公開日:2024-09-22

# ViViDex:人間のビデオから視覚に基づく有害な操作を学習する

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos ( http://arxiv.org/abs/2404.15709v2 )

ライセンス: Link先を確認

Zerui Chen, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid,

(参考訳) 本研究は,多指ロボットによる多様なポーズでさまざまな物体を操作するための統一的な視覚ベースのポリシーを学習することを目的としている。以前の研究は、政策学習にヒューマンビデオを使うことの利点を示してきたが、推定軌跡のノイズによって性能向上は制限されてきた。さらに、接地木オブジェクトのような特権オブジェクト情報への依存は、現実的なシナリオにおける適用性をさらに制限する。これらの制約に対処するため、人間のビデオから視覚に基づくポリシー学習を改善するための新しいフレームワークViViDexを提案する。最初は、強化学習と軌道誘導報酬を使って、各ビデオのステートベースのポリシーを訓練し、ビデオから視覚的に自然と身体的にもっともらしい軌跡の両方を得る。次に、州ベースのポリシーから成功したエピソードをロールアウトし、特権情報を使用しずに統一された視覚ポリシーをトレーニングします。本稿では,視覚的視点のクラウド表現をさらに強化するコーディネート変換を提案し,視覚的ポリシートレーニングのための行動クローニングと拡散ポリシーを比較した。シミュレーションと実際のロボットの両方の実験では、ViViDexは3つの巧妙な操作タスクにおける最先端のアプローチよりも優れていることが示されている。

In this work, we aim to learn a unified vision-based policy for multi-fingered robot hands to manipulate a variety of objects in diverse poses. Though prior work has shown benefits of using human videos for policy learning, performance gains have been limited by the noise in estimated trajectories. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. We propose coordinate transformation to further enhance the visual point cloud representation, and compare behavior cloning and diffusion policy for the visual policy training. Experiments both in simulation and on the real robot demonstrate that ViViDex outperforms state-of-the-art approaches on three dexterous manipulation tasks.

翻訳日:2024-11-09 03:03:34 公開日:2024-09-22

# 半データと400倍の計算量を持つ高性能網膜基礎モデルの訓練

Training a high-performance retinal foundation model with half-the-data and 400 times less compute ( http://arxiv.org/abs/2405.00117v2 )

ライセンス: Link先を確認

Justin Engelmann, Miguel O. Bernabeu,

(参考訳) 医学における人工知能は、伝統的に大規模なトレーニングデータセットの欠如によって制限されている。ファンデーションモデルは、小さなデータセットで下流タスクに適応できる事前訓練されたモデルであり、この問題を軽減する可能性がある。ムーアフィールドズアイ病院(MEH)の研究者たちは、90万枚の画像でトレーニングされた網膜基盤モデルであるRETFound-MEHを提案した。最近、データ効率のよいDERETFoundが提案され、わずか15万の公開画像でトレーニングされている。しかし、これら2つのモデルは、当初トレーニングするために非常に重要なリソースを必要とし、下流での使用にリソースが集中していた。本稿では,75,000枚しか公開されていない画像と400倍の計算量でトレーニングされた網膜基盤モデルであるRETFound-Greenのトレーニングに使用する,新しいToken Restructionの目標を提案する。本稿では,RETFound-MEH と DERETFound のトレーニング費用をそれぞれ10,000 ドル,および 14,000 ドルと見積もる。 RETFound-Greenは100ドル以下でトレーニングできる。ダウンロード速度は14倍、ベクトル埋め込みは2.7倍、ストレージ容量は2.6倍である。それにもかかわらず、RETFound-Greenは体系的に悪いパフォーマンスをしない。実際、ブラジル、インド、中国の3つの下流データセットのさまざまなタスクにおいて、119の比較のうち68のタスクで最善を尽くし、DERETFoundでは21、RETFound-MEHでは13である。以上の結果から,RETFound-Greenは非常に効率的で高性能な網膜基盤モデルであることが示唆された。われわれは、Token Restructionの目的を、さらに高いパフォーマンスのためにスケールアップし、網膜画像以外の他の領域にも適用できることを期待している。

Artificial Intelligence in medicine is traditionally limited by the lack of massive training datasets. Foundation models, pre-trained models that can be adapted to downstream tasks with small datasets, could alleviate this problem. Researchers at Moorfields Eye Hospital (MEH) proposed RETFound-MEH, a retinal foundation model trained on 900,000 images, including private hospital data. Recently, data-efficient DERETFound was proposed providing comparable performance while being trained on only 150,000 publicly available images. However, both these models required very substantial resources to train initially and are resource-intensive in downstream use. We propose a novel Token Reconstruction objective that we use to train RETFound-Green, a retinal foundation model trained using only 75,000 publicly available images and 400 times less compute. We estimate the cost of training RETFound-MEH and DERETFound at \$10,000 and \$14,000, respectively. RETFound-Green could be trained for less than \$100, with equally reduced environmental impact. RETFound-Green is also far more efficient in downstream use: it can be downloaded 14 times faster, computes vector embeddings 2.7 times faster which then require 2.6 times less storage space. Despite this, RETFound-Green does not perform systematically worse. In fact, on various task on three downstream datasets from Brazil, India and China, it performs best on 68 tasks out of 119 comparisons, versus 21 for DERETFound and 13 for RETFound-MEH. Our results suggest that RETFound-Green is a very efficient, high-performance retinal foundation model. We anticipate that our Token Reconstruction objective could be scaled up for even higher performance and be applied to other domains beyond retinal imaging.

翻訳日:2024-11-09 02:52:30 公開日:2024-09-22

# UniGen: ゼロショットデータセット生成による感覚分類のためのユニバーサルドメインの一般化

UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation ( http://arxiv.org/abs/2405.01022v3 )

ライセンス: Link先を確認

Juhwan Choi, Yeonghwa Kim, Seunguk Yu, JungMin Yun, YoungBin Kim,

(参考訳) 事前学習された言語モデルは、プロンプトベースの数発の学習で非常に柔軟性と汎用性を示してきたが、広いパラメータサイズと推論の適用性に悩まされている。近年の研究では、PLMをデータセットジェネレータとして使用し、効率的な推論を実現するために、タスク固有の小さなモデルを訓練することが示唆されている。しかし、ドメイン固有のデータセットを生成する傾向があるため、さまざまなドメインへの適用性は制限されている。本研究では,対象領域によらずデータセットを生成する普遍的領域一般化に対する新しいアプローチを提案する。これにより、ラベル空間を共有する任意のドメインに小さなタスクモデルを一般化することができ、データセット生成パラダイムの現実的な適用性を高めることができる。提案手法は, PLM よりも桁違いの小さいパラメータ集合を用いて, 各領域にまたがる一般化性を実現する。

Although pre-trained language models have exhibited great flexibility and versatility with prompt-based few-shot learning, they suffer from the extensive parameter size and limited applicability for inference. Recent studies have suggested that PLMs be used as dataset generators and a tiny task-specific model be trained to achieve efficient inference. However, their applicability to various domains is limited because they tend to generate domain-specific datasets. In this work, we propose a novel approach to universal domain generalization that generates a dataset regardless of the target domain. This allows for generalization of the tiny task model to any domain that shares the label space, thus enhancing the real-world applicability of the dataset generation paradigm. Our experiments indicate that the proposed method accomplishes generalizability across various domains while using a parameter set that is orders of magnitude smaller than PLMs.

翻訳日:2024-11-09 02:52:29 公開日:2024-09-22

# EconLogicQA: 経済シーケンス推論における大規模言語モデル評価のための質問応答ベンチマーク

EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning ( http://arxiv.org/abs/2405.07938v2 )

ライセンス: Link先を確認

Yinzhu Quan, Zefang Liu,

(参考訳) 本稿では,経済,ビジネス,サプライチェーン管理の複雑な領域において,大規模言語モデル(LLM)の逐次推論能力を評価するための厳密なベンチマークであるEconLogicQAを紹介する。 EconLogicQAは、後続のイベントを個別に予測する従来のベンチマークとは違い、複数の相互接続されたイベントを識別してシーケンスする必要があるため、経済論理の複雑さを捉える必要がある。 EconLogicQAは、時間的および論理的事象の関係に関する洞察に富んだ理解を必要とする、経済的な記事から派生した多段階シナリオで構成されている。 EconLogicQAは、包括的な評価を通じて、経済的な文脈に固有のシーケンシャルな複雑さをナビゲートするLLMの習熟度を効果的に評価することを示した。本稿では,EconLogicQAデータセットの詳細な説明と,各種先進LLMのベンチマーク評価結果について述べる。ベンチマークデータセットはhttps://huggingface.co/datasets/yinzhu-quan/econ_logic_qaで公開されています。

In this paper, we introduce EconLogicQA, a rigorous benchmark designed to assess the sequential reasoning capabilities of large language models (LLMs) within the intricate realms of economics, business, and supply chain management. Diverging from traditional benchmarks that predict subsequent events individually, EconLogicQA poses a more challenging task: it requires models to discern and sequence multiple interconnected events, capturing the complexity of economic logics. EconLogicQA comprises an array of multi-event scenarios derived from economic articles, which necessitate an insightful understanding of both temporal and logical event relationships. Through comprehensive evaluations, we exhibit that EconLogicQA effectively gauges a LLM's proficiency in navigating the sequential complexities inherent in economic contexts. We provide a detailed description of EconLogicQA dataset and shows the outcomes from evaluating the benchmark across various leading-edge LLMs, thereby offering a thorough perspective on their sequential reasoning potential in economic contexts. Our benchmark dataset is available at https://huggingface.co/datasets/yinzhu-quan/econ_logic_qa.

翻訳日:2024-11-09 02:30:11 公開日:2024-09-22

# 映像オブジェクトセグメンテーション参照のための時間アウェア適応を用いたハラスティング視覚言語事前学習モデル

Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation ( http://arxiv.org/abs/2405.10610v2 )

ライセンス: Link先を確認

Zikun Zhou, Wentao Xiong, Li Zhou, Xin Li, Zhenyu He, Yaowei Wang,

(参考訳) Referring Video Object Segmentation (RVOS) の要点は、抽象言語概念とピクセルレベルでの動的視覚的内容とを関連付けるために、密集したテキストとビデオの関係をモデル化することにある。現在のRVOSメソッドは一般的に、バックボーンとして独立して事前訓練された視覚と言語モデルを使用する。画像とテキストは結合しない特徴空間にマッピングされるため、視覚・言語関係モデリング(VL)をスクラッチから学習する難しい課題に直面します。 VLP(Vision-Language Pretrained)モデルの成功に気付き、協調したVL特徴空間に基づいてRVOSの関連モデリングを学ぶことを提案する。それでも、VLPモデルをRVOSに転送するのは、事前訓練タスク(静的画像/領域レベルの予測)とRVOSタスク(動的ピクセルレベルの予測)の間にかなりのギャップがあるため、非常に難しい作業である。この移行問題に対処するため,時相適応によりRVOSのVLPモデルを利用するVLP-RVOSというフレームワークを導入する。まず、画素レベルの予測のために事前訓練された表現を適応させるだけでなく、視覚エンコーダを時間文脈のモデル化に活用する時間適応型プロンプトチューニング手法を提案する。さらに、頑健な時空間推論のための立方体フレームアテンション機構をカスタマイズする。さらに,包括的VL理解のための特徴抽出における多段階VL関係モデリングを提案する。大規模な実験により,本手法は最先端のアルゴリズムに対して良好に動作し,強力な一般化能力を示すことが示された。

The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pretrained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language (VL) relation modeling from scratch. Witnessing the success of Vision-Language Pretrained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pretraining task (static image/region-level prediction) and the RVOS task (dynamic pixel-level prediction). To address this transfer challenge, we introduce a framework named VLP-RVOS which harnesses VLP models for RVOS through temporal-aware adaptation. We first propose a temporal-aware prompt-tuning method, which not only adapts pretrained representations for pixel-level prediction but also empowers the vision encoder to model temporal contexts. We further customize a cube-frame attention mechanism for robust spatial-temporal reasoning. Besides, we propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Extensive experiments demonstrate that our method performs favorably against state-of-the-art algorithms and exhibits strong generalization abilities.

翻訳日:2024-11-09 02:30:11 公開日:2024-09-22

# データアロケーションによる選択的アノテーション:これらのデータはモデルではなくアノテーションのために専門家にトリアージされるべきである

Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model ( http://arxiv.org/abs/2405.12081v2 )

ライセンス: Link先を確認

Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan,

(参考訳) 限られた予算下で高品質なアノテーションを得るために、半自動アノテーション法が一般的に用いられ、データの一部を専門家によって注釈付けされ、残りのデータに対するアノテーションを完成させるためにモデルが訓練される。しかしながら、これらの手法は主に、モデル予測能力(トリアージ・トゥ・ヒューマン・データ)を改善するために専門家アノテーションのための情報的データを選択することに焦点を当て、残りのデータはモデルアノテーション(トリアージ・トゥ・モデル・データ)に無差別に割り当てられている。これはアノテーションの予算配分の非効率につながる可能性がある。モデルが正確にアノテートできる簡単なデータは専門家に不要に割り当てられる可能性があるし、ハードデータはモデルによって誤って分類される可能性があるからだ。その結果、全体的なアノテーションの品質が損なわれる可能性がある。この問題に対処するため、我々はSANTと呼ばれる選択的なアノテーションフレームワークを提案する。提案した誤り認識トリアージと二重み付け機構により、トリアージ・ツー・ヒューマンデータとトリアージ・ツー・モデルデータの両方を効果的に活用する。そのため、情報的あるいはハードなデータは専門家にアノテーションとして割り当てられ、簡単なデータはモデルによって処理される。実験の結果、SANTは他のベースラインを一貫して上回り、専門家とモデルワーカーの両方にデータの適切な割り当てを通じて高品質なアノテーションをもたらすことが示された。我々は、予算制約の中でデータアノテーションに関する先駆的な研究を行い、将来のトリアージベースのアノテーション研究のランドマークを確立します。

To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.

翻訳日:2024-11-09 02:30:11 公開日:2024-09-22

# テキストフリーマルチドメイングラフ事前学習:グラフ基礎モデルに向けて

Text-Free Multi-domain Graph Pre-training: Toward Graph Foundation Models ( http://arxiv.org/abs/2405.13934v4 )

ライセンス: Link先を確認

Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang,

(参考訳) さまざまな領域にまたがる幅広いグラフデータに基づいてグラフ基盤モデルをトレーニングすることは可能ですか? この目標への大きなハードルは、異なる領域のグラフがしばしば非常に異なる特性を示すという事実にある。事前トレーニングのためのマルチドメイングラフの統合には、最初はいくつかの取り組みがあったが、主にグラフを整列させるためにテキスト記述に依存しており、そのアプリケーションはテキスト対応グラフに制限されている。さらに、異なるソースドメインが互いに衝突したり干渉したりし、ターゲットドメインとの関係は著しく変化する。これらの問題に対処するため,MDGPTというテキストフリーなマルチドメイングラフ事前学習・適応フレームワークを提案する。まず、シナジスティックな事前学習のために、ソースドメインにまたがる機能を調整するために、一連のドメイントークンを提案する。第2に、統一的なプロンプトと混合プロンプトからなる二重プロンプトを提案し、統合されたマルチドメイン知識とドメイン固有の知識の調整された混合により、ターゲットドメインをさらに適応させる。最後に、6つの公開データセットによる広範な実験を行い、MDGPTを評価し分析する。

Given the ubiquity of graph data, it is intriguing to ask: Is it possible to train a graph foundation model on a broad range of graph data across diverse domains? A major hurdle toward this goal lies in the fact that graphs from different domains often exhibit profoundly divergent characteristics. Although there have been some initial efforts in integrating multi-domain graphs for pre-training, they primarily rely on textual descriptions to align the graphs, limiting their application to text-attributed graphs. Moreover, different source domains may conflict or interfere with each other, and their relevance to the target domain can vary significantly. To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning. First, we propose a set of domain tokens to to align features across source domains for synergistic pre-training. Second, we propose a dual prompts, consisting of a unifying prompt and a mixing prompt, to further adapt the target domain with unified multi-domain knowledge and a tailored mixture of domain-specific knowledge. Finally, we conduct extensive experiments involving six public datasets to evaluate and analyze MDGPT, which outperforms prior art by up to 37.9%.

翻訳日:2024-11-09 02:18:45 公開日:2024-09-22

# SF-DQN:Deep Reinforcement Learningのための継承機能を用いた確率的知識伝達

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning ( http://arxiv.org/abs/2405.15920v2 )

ライセンス: Link先を確認

Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang,

(参考訳) 本稿では、複数のRL問題が異なる報酬関数を持つが、基礎となる遷移力学を共有する転写強化学習(RL)問題を考察する。この設定では、各RL問題(タスク)のQ-関数を後継特徴(SF)と報酬マッピング(前者は遷移ダイナミクスを、後者はタスク固有報酬関数を特徴付ける)に分解することができる。このQ関数分解は、一般化政策改善(GPI)と呼ばれる政策改善演算子と組み合わせて、最適なQ関数を見つける際のサンプルの複雑さを低減し、SF \& GPIフレームワークは、Q学習のような従来のRL手法と比較して有望な経験的性能を示す。しかし、その理論的基盤は、特に深層ニューラルネットワーク(SF-DQN)を用いて後継機能を学ぶ際には、ほとんど確立されていない。本稿では,移動RL問題におけるSFs-DQNを用いた証明可能な知識伝達について検討する。 GPIを用いたSF-DQNの証明可能な一般化保証を用いた最初の収束解析を確立する。この理論は、GPI を持つ SF-DQN が、より高速な収束率とより優れた一般化の両面から、ディープQ-ネットワークのような従来の RL アプローチより優れていることを明らかにしている。実および合成RLタスクの数値実験により, SF-DQN \& GPIの優れた性能が得られた。

This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.

翻訳日:2024-11-09 02:07:29 公開日:2024-09-22

# XRec: 説明可能な推奨のための大規模言語モデル

XRec: Large Language Models for Explainable Recommendation ( http://arxiv.org/abs/2406.02377v2 )

ライセンス: Link先を確認

Qiyao Ma, Xubin Ren, Chao Huang,

(参考訳) リコメンダシステムは、ユーザが好みに合わせてパーソナライズされたレコメンデーションを提供することによって、情報の過負荷をナビゲートするのに役立つ。協調フィルタリング(CF)は広く採用されているアプローチであるが、グラフニューラルネットワーク(GNN)や自己教師付き学習(SSL)といった高度な技術は、より良いユーザ表現のためにCFモデルを拡張しているが、推奨項目の説明を提供する能力に欠けることが多い。説明可能なレコメンデーションは、レコメンデーション決定プロセスに対する透明性と洞察を提供することで、ユーザの理解を深めることによって、このギャップに対処することを目的としている。この作業は、LLM(Large Language Models)の言語機能を活用して、説明可能なレコメンデータシステムのバウンダリを押し上げる。我々は、LLMがレコメンデーションシステムにおけるユーザの振る舞いを包括的に説明できるXRecというモデルに依存しないフレームワークを紹介した。協調的な信号の統合と軽量な協調的適応器の設計により、このフレームワークはLLMにユーザとイテムのインタラクションにおける複雑なパターンを理解し、ユーザの好みをより深く理解する権限を与える。我々はXRecの有効性を実証し、説明可能なレコメンデーションシステムにおけるベースラインアプローチよりも優れた、包括的で意味のある説明を生成する能力を示した。私たちはモデル実装をhttps://github.com/HKUDS/XRec.comでオープンソース化しました。

Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanations for the recommended items. Explainable recommendations aim to address this gap by offering transparency and insights into the recommendation decision-making process, enhancing users' understanding. This work leverages the language capabilities of Large Language Models (LLMs) to push the boundaries of explainable recommender systems. We introduce a model-agnostic framework called XRec, which enables LLMs to provide comprehensive explanations for user behaviors in recommender systems. By integrating collaborative signals and designing a lightweight collaborative adaptor, the framework empowers LLMs to understand complex patterns in user-item interactions and gain a deeper understanding of user preferences. Our extensive experiments demonstrate the effectiveness of XRec, showcasing its ability to generate comprehensive and meaningful explanations that outperform baseline approaches in explainable recommender systems. We open-source our model implementation at https://github.com/HKUDS/XRec.

翻訳日:2024-11-09 01:56:09 公開日:2024-09-22

# 任意下流予測タスクのためのフェアネス最適化合成EHR生成

Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks ( http://arxiv.org/abs/2406.02510v2 )

ライセンス: Link先を確認

Mirza Farhan Bin Tarek, Raphael Poulain, Rahmatollah Beheshti,

(参考訳) 医療アプリケーションのためのAIツールの責任ある設計を保証するためのさまざまな側面の中で、公平性に関する懸念に対処することが、重要な焦点となっている。具体的には、電子健康記録(EHR)データの普及と、幅広い臨床的意思決定支援タスクを通知する大きな可能性を考慮し、このカテゴリの健康AIツールの公平性を向上させることが重要である。このような広範な問題(EHRベースのAIモデルにおける公平性を緩和する)は、様々な手法を用いて取り組まれてきたが、タスクやモデルに依存しない手法は顕著に稀である。本研究では,実データと合成されたERHデータを生成するパイプラインを新たに提示し,実データと組み合わせることで,下流タスクにおける公平性(エンドユーザが定義する)の懸念を軽減することを目的とした。下流タスクと2つの異なるEHRデータセットにまたがるパイプラインの有効性を実証する。提案したパイプラインは、ダウンストリームモデルの設計を変更するような、健康なAIアプリケーションにおける公平性に対処する既存のツールボックスに、広く適用可能な補完ツールを追加することができる。プロジェクトのコードベースはhttps://github.com/healthylaife/FairSynthで公開されています。

Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our proposed pipeline across various downstream tasks and two different EHR datasets. Our proposed pipeline can add a widely applicable and complementary tool to the existing toolbox of methods to address fairness in health AI applications, such as those modifying the design of a downstream model. The codebase for our project is available at https://github.com/healthylaife/FairSynth

翻訳日:2024-11-09 01:56:09 公開日:2024-09-22

# DICE:数学推論のためのLDMの微調整相における分布内汚染の検出

DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning ( http://arxiv.org/abs/2406.04197v2 )

ライセンス: Link先を確認

Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li,

(参考訳) 大規模言語モデル(LLM)の進歩は、公開ベンチマークによる評価に依存するが、データ汚染は過大評価パフォーマンスをもたらす可能性がある。従来の研究は、トレーニング中にモデルが全く同じデータを見たかどうかを判断することで汚染を検出することに重点を置いていた。さらに、以前の研究では、ベンチマークデータに類似したデータに対するトレーニングでさえ、パフォーマンス、すなわち \emph{In-distribution contamination} を膨らませていることがすでに示されている。本研究では, 分散汚染がOODベンチマークの性能低下につながることを論じる。そこで本研究では,LSMの内部状態を利用して汚染を検出・検出する新しい手法であるDICEを提案する。 DICEはまず汚染に対して最も敏感な層を特定し、その層の内部状態に基づいて分類器を訓練する。実験により、DICEは様々なLSMと数学推論データセットをまたいだ分布内汚染を検出するのに高い精度を示している。また、類似した分布を持つ複数のベンチマーク間で汚染を検出することができる訓練されたDICE検出器の一般化能力を示す。さらに、DICEの予測は、私たちまたは他の組織によって微調整されたLLMの性能と相関し、0.61から0.75の判定係数(R^2$)を達成する。コードとデータはhttps://github.com/THU-KEG/DICE.comで公開されている。

The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. Besides, prior work has already shown that even training on data similar to benchmark data inflates performance, namely \emph{In-distribution contamination}. In this work, we argue that in-distribution contamination can lead to the performance drop on OOD benchmarks. To effectively detect in-distribution contamination, we propose DICE, a novel method that leverages the internal states of LLMs to locate-then-detect the contamination. DICE first identifies the most sensitive layer to contamination, then trains a classifier based on the internal states of that layer. Experiments reveal DICE's high accuracy in detecting in-distribution contamination across various LLMs and math reasoning datasets. We also show the generalization capability of the trained DICE detector, which is able to detect contamination across multiple benchmarks with similar distributions. Additionally, we find that DICE's predictions correlate with the performance of LLMs fine-tuned by either us or other organizations, achieving a coefficient of determination ($R^2$) between 0.61 and 0.75. The code and data are available at https://github.com/THU-KEG/DICE.

翻訳日:2024-11-09 01:44:51 公開日:2024-09-22

# 時系列モデルに対する会員推測攻撃

Membership Inference Attacks Against Time-Series Models ( http://arxiv.org/abs/2407.02870v2 )

ライセンス: Link先を確認

Noam Koren, Abigail Goldsteen, Guy Amit, Ariel Farkash,

(参考訳) 個人情報、特に医療分野での時系列データを分析すると、深刻なプライバシー上の懸念が浮かび上がっている。患者からの敏感な健康データは、診断と継続的なケアのための機械学習モデルのトレーニングにしばしば使用される。このようなモデルのプライバシリスクを評価することは、プロダクションでモデルを使用するか、サードパーティと共有するかに関して、知識に富んだ決定を行う上で極めて重要です。メンバーシップ推論攻撃(MIA)はこの種の評価の鍵となる手法であるが、時系列予測モデルは、この文脈では十分に研究されていない。時系列モデルにおける既存のMIA技術について検討し、データの季節性やトレンドに焦点をあてた新機能を紹介する。季節性は多変量フーリエ変換を用いて推定され、低次多項式を用いて傾向を近似する。健康領域のデータセットを用いて,これらの手法を各種時系列モデルに適用した。以上の結果から,これらの新機能はMIAの識別における有効性を高め,医療データアプリケーションにおけるプライバシリスクの理解を向上させることが示唆された。

Analyzing time-series data that contains personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train machine learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production or share it with third parties. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications.

翻訳日:2024-11-09 00:59:29 公開日:2024-09-22

# ゲノム言語モデル:機会と課題

Genomic Language Models: Opportunities and Challenges ( http://arxiv.org/abs/2407.11435v2 )

ライセンス: Link先を確認

Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song,

(参考訳) 大規模言語モデル(LLM)は、幅広い科学分野、特に生物医学分野において、変革的な影響を及ぼしている。自然言語処理の目的が単語の列を理解することにあるように、生物学の主要な目的は生物学的列を理解することである。ゲノム言語モデル(gLM)は、DNA配列に基づいて訓練されたLLMであり、ゲノムの理解を深め、様々なスケールのDNA要素がどのように相互作用して複雑な機能を引き起こすかを示す可能性がある。この可能性を示すために,機能制約予測,シーケンス設計,伝達学習など,gLMの重要応用を強調した。しかし、最近の顕著な進歩にもかかわらず、効率的かつ効率的なgLMの開発は、特に大型で複雑なゲノムを持つ種に対して多くの課題を呈している。本稿では,gLMの開発と評価について論じる。

Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.

翻訳日:2024-11-08 21:10:26 公開日:2024-09-22

# 拡散モデルにおけるブレンド概念

How to Blend Concepts in Diffusion Models ( http://arxiv.org/abs/2407.14280v2 )

ライセンス: Link先を確認

Lorenzo Olearo, Giorgio Longari, Simone Melzi, Alessandro Raganato, Rafael Peñaloza,

(参考訳) 過去10年間、多次元(ラテント)空間を使って概念を表現しようとする動きがあったが、それでもこれらの概念や理由をどう操作するかは明らかになっていない。最近の手法では複数の潜在表現とその関連性を利用しており、この研究はさらに絡み合っている。我々のゴールは、潜在空間における操作が根底にある概念にどのように影響するかを理解することです。そこで本研究では,拡散モデルを用いた概念ブレンディングの課題について検討する。拡散モデルは、テキストプロンプトの潜時表現と画像再構成と生成を可能にする潜時空間との間の接続に基づいている。このタスクにより、異なるテキストベースの組み合わせ戦略を試すことができ、視覚分析により容易に評価できる。我々の結論は、宇宙操作によるブレンドの概念は可能であるが、最良の戦略はブレンドの文脈に依存する。

For the last decade, there has been a push to use multi-dimensional (latent) spaces to represent concepts; and yet how to manipulate these concepts or reason with them remains largely unclear. Some recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. To that end, we explore the task of concept blending through diffusion models. Diffusion models are based on a connection between a latent representation of textual prompts and a latent space that enables image reconstruction and generation. This task allows us to try different text-based combination strategies, and evaluate easily through a visual analysis. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.

翻訳日:2024-11-08 19:38:31 公開日:2024-09-22

# グローバルな構造-動きからの再考

Global Structure-from-Motion Revisited ( http://arxiv.org/abs/2407.20219v2 )

ライセンス: Link先を確認

Linfei Pan, Dániel Baráth, Marc Pollefeys, Johannes L. Schönberger,

(参考訳) 画像から3D構造とカメラの動きを復元することは、コンピュータビジョン研究の長年の焦点であり、Structure-from-Motion (SfM)として知られている。この問題に対する解決策は、漸進的およびグローバルなアプローチに分類される。これまでのところ、最もポピュラーなシステムは精度と堅牢性のために漸進的なパラダイムを踏襲しているが、グローバルなアプローチは劇的にスケーラブルで効率的である。本研究は,グローバルSfMの問題を再考し,グローバルSfMにおける最先端技術を上回る新しい汎用システムとしてGLOMAPを提案する。精度とロバスト性の観点からは、最も広く使われている増分SfMであるCOLMAPよりも桁違いに高速な結果が得られる。当社のシステムは,https://github.com/colmap/glomap} でオープンソース実装として公開しています。

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.

翻訳日:2024-11-08 14:05:01 公開日:2024-09-22

# 機能的MRI理解のための階層型量子制御ゲート

Hierarchical Quantum Control Gates for Functional MRI Understanding ( http://arxiv.org/abs/2408.03596v3 )

ライセンス: Link先を確認

Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill, Samee U. Khan, Khoa Luu,

(参考訳) 量子コンピューティングは、古典的コンピュータ、特に暗号、最適化、ニューロコンピューティングといった一般的な分野において、難解な複雑な問題を解決する強力なツールとして登場した。本稿では,fMRI(Functional Magnetic Resonance Imaging)データを効率的に理解するために,HQCG(Hierarchical Quantum Control Gates)法という新しい量子ベース手法を提案する。このアプローチには、それぞれfMRI信号の局所的特徴とグローバルな特徴を抽出するために設計されたローカル量子制御ゲート(LQCG)とグローバル量子制御ゲート(GQCG)の2つの新しいモジュールが含まれている。提案手法は,量子マシン上でエンドツーエンドで動作し,量子力学を利用して,古典コンピュータの課題である30,000サンプルなどの超高次元fMRI信号のパターンを学習する。実験結果から,本手法は古典的手法よりも有意に優れていることが示された。さらに、提案した量子モデルは古典的手法よりも安定性が高く、過度に適合する傾向が低いことが判明した。

Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods.

翻訳日:2024-11-08 12:33:46 公開日:2024-09-22

# 境界駆動量子系の定常状態:いくつかの正確な結果

Stationary states of boundary driven quantum systems: some exact results ( http://arxiv.org/abs/2408.06887v2 )

ライセンス: Link先を確認

Eric A. Carlen, David a. Huse, Joel L. Lebowitz,

(参考訳) 密度行列がリンドブラディアンの=-i[H,\rho]+{\mathcal D}\rho$を介して進化する有限次元開量子系について検討する。ここで、$H$は孤立系のハミルトニアンであり、${\mathcal D}$は散逸子である。そこで、${\mathcal D}={\mathcal D}_A\otimes{\mathcal I}_B$、${\mathcal D}_A$がpart $A$、${\mathcal I}_B$がpart $B$である。例えば、${\mathcal D}_A$ をエルゴードとすると、${\mathcal D}_A\hat{\rho}_A=0$ は 1 つの一意密度行列 $\hat{\rho}_A$ に対してのみである。任意の定常密度行列 $\bar{\rho}$ がフルシステム上で$H$ と可換であることは、ある$\rho_B$ に対して $\bar{\rho}=\hat{\rho}_A\otimes\rho_B$ の積形式でなければならないことを示す。これにより、Gibs測度が $\rho_\beta\sim e^{-\beta H}$ を $\beta\neq 0$ の定常状態として持つ${\mathcal D}_A$ を見つけることができる。 A$ と $B$ の相互作用を持つシステムに対して、定常状態 $\bar{\rho}$ の特異性の基準を与える。非エルゴードケースの関連結果についても論じる。

We study finite-dimensional open quantum systems whose density matrix evolves via a Lindbladian, $\dot{\rho}=-i[H,\rho]+{\mathcal D}\rho$. Here $H$ is the Hamiltonian of the isolated system and ${\mathcal D}$ is the dissipator. We consider the case where the system consists of two parts, the "boundary'' $A$ and the ``bulk'' $B$, and ${\mathcal D}$ acts only on $A$, so ${\mathcal D}={\mathcal D}_A\otimes{\mathcal I}_B$, where ${\mathcal D}_A$ acts only on part $A$, while ${\mathcal I}_B$ is the identity superoperator on part $B$. Let ${\mathcal D}_A$ be ergodic, so ${\mathcal D}_A\hat{\rho}_A=0$ only for one unique density matrix $\hat{\rho}_A$. We show that any stationary density matrix $\bar{\rho}$ on the full system which commutes with $H$ must be of the product form $\bar{\rho}=\hat{\rho}_A\otimes\rho_B$ for some $\rho_B$. This rules out finding any ${\mathcal D}_A$ that has the Gibbs measure $\rho_\beta\sim e^{-\beta H}$ as a stationary state with $\beta\neq 0$, unless there is no interaction between parts $A$ and $B$. We give criteria for the uniqueness of the stationary state $\bar{\rho}$ for systems with interactions between $A$ and $B$. Related results for non-ergodic cases are also discussed.

翻訳日:2024-11-08 07:53:35 公開日:2024-09-22

# 大規模言語モデルによるソースコードの品質評価 : 比較研究

Evaluating Source Code Quality with Large Languagem Models: a comparative study ( http://arxiv.org/abs/2408.07082v2 )

ライセンス: Link先を確認

Igor Regis da Silva Simões, Elaine Venson,

(参考訳) コード品質は、複雑さ、可読性、テスト容易性、相互運用性、再利用可能性、良いプラクティスや悪いプラクティスの使用など、さまざまなメトリクスで構成されている属性です。静的コード解析ツールは、コード品質を評価するための属性のセットを測定することを目的としている。しかしながら、いくつかの品質特性は、コードレビューアクティビティにおいて人間によってのみ測定され、可読性はその例です。自然言語のテキスト処理能力を考えると、LLM(Large Language Model)がコードの品質を評価することができると仮定する。本稿では,LLMを静的解析ツールとして使用して得られた結果を記述し,解析し,コード全体の品質を評価することを目的とする。 GPT 3.5 Turbo と GPT 4o の2つのバージョンを比較し,総計1,641 のクラスを解析した。 GPT 3.5 Turbo LLMにはコード品質を評価する能力があり,Sonarのメトリクスと相関関係があることを実証した。しかし、LSMがSonarQubeと異なる具体的な側面がある。 GPT 4o版では、低品質と評価されたコードに高い分類を割り当てることで、以前のモデルとSonarから切り離された結果が示されなかった。本研究では,LLMによるコード品質評価の可能性を示す。しかし, LLMのコスト, 出力のばらつき, 従来の静的解析ツールでは測定されない品質特性の探索など, さらなる研究が必要である。

Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of attributes to assess code quality. However, some quality attributes can only be measured by humans in code review activities, readability being an example. Given their natural language text processing capability, we hypothesize that a Large Language Model (LLM) could evaluate the quality of code, including attributes currently not automatable. This paper aims to describe and analyze the results obtained using LLMs as a static analysis tool, evaluating the overall quality of code. We compared the LLM with the results obtained with the SonarQube software and its Maintainability metric for two Open Source Software (OSS) Java projects, one with Maintainability Rating A and the other B. A total of 1,641 classes were analyzed, comparing the results in two versions of models: GPT 3.5 Turbo and GPT 4o. We demonstrated that the GPT 3.5 Turbo LLM has the ability to evaluate code quality, showing a correlation with Sonar's metrics. However, there are specific aspects that differ in what the LLM measures compared to SonarQube. The GPT 4o version did not present the same results, diverging from the previous model and Sonar by assigning a high classification to codes that were assessed as lower quality. This study demonstrates the potential of LLMs in evaluating code quality. However, further research is necessary to investigate limitations such as LLM's cost, variability of outputs and explore quality characteristics not measured by traditional static analysis tools.

翻訳日:2024-11-08 07:53:35 公開日:2024-09-22

# ソリューション設計による自動プログラム修復の強化

Enhancing Automated Program Repair with Solution Design ( http://arxiv.org/abs/2408.12056v2 )

ライセンス: Link先を確認

Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang, Fang Liu,

(参考訳) 自動プログラム修正(APR)は、バグ解決、新機能開発、機能強化の3つのカテゴリを含む、特定のプロジェクト内の問題を自律的に修正する試みである。様々な方法論を提唱する広範な研究にもかかわらず、実際の問題に対処する効果は相変わらず不十分である。一般的に、エンジニアは、ソリューション計画のソリューションと基本的な理由のセットについて、設計の合理性(DR)を持っています。オープンソースプロジェクトでは、これらのDRはJiraのようなプロジェクト管理ツールを通じて、イシューログにキャプチャされることが多い。問題ログに散在するDRを活用して、APRを効率的に拡張するにはどうすればよいのか? DRCodePilot は GPT-4-Turbo の APR 機能を強化し,DR をプロンプト命令に組み込む手法である。さらに, GPT-4のプロジェクトコンテキストを十分に把握する上での制約や, 正確な識別子を生成する上での欠点を考慮し, フィードバックに基づく自己回帰フレームワークを考案し, 提案したパッチや提案した識別子を参照して, GPT-4のアウトプットを再検討し, 改善するよう促した。 GitHubとJiraにホストされている2つのオープンソースリポジトリからソースされた938のイシューパッチペアからなるベンチマークを確立しました。 DRCodePilotはGPT-4を直接利用するよりも4.7倍高いフルマッチ比を達成しています。さらに、CodeBLEUスコアも有望な拡張を示している。さらに,本研究では, DRのスタンドアロン適用により, ベンチマークスイート内でのCodeLlama, GPT-3.5, GPT-4間のフルマッチ比が向上する可能性が示唆された。我々は、DRCodePilotイニシアチブが、APRの分野を前進させる新しい人道となると信じている。

Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4's constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR.

翻訳日:2024-11-08 05:49:00 公開日:2024-09-22

# パラメタライズドシンボリック抽象グラフによるワンショット映像の模倣

One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs ( http://arxiv.org/abs/2408.12674v2 )

ライセンス: Link先を確認

Jianren Wang, Kangni Liu, Dingkun Guo, Xian Zhou, Christopher G Atkeson,

(参考訳) 動的で変形可能なオブジェクトを単一のデモビデオから操作することを学ぶことは、スケーラビリティという面で大きな約束である。これまでのアプローチでは、オブジェクト関係のリプレイやアクターの軌跡に主に焦点が当てられていた。前者は様々なタスクを一般化するのに苦労するが、後者はデータ非効率に悩まされる。さらに、どちらの手法も、力などの見えない物理的特性を捉える際の課題に直面している。本稿では,パラメータ化シンボル抽象グラフ(PSAG)を用いて,オブジェクトとエッジがオブジェクト間の関係を表すビデオデモを解釈する。さらに,非幾何学的,視覚的に知覚できない属性を推定するために,シミュレーションによる幾何学的制約を基礎とする。強化PSAGは実際のロボット実験に応用される。我々のアプローチは、Avocado、Cutting Vegetable、Pouring Liquid、Rolling Dough、Slicing Pizzaといった様々なタスクで検証されている。視覚的・物理的特性の異なる新しい物体への一般化を成功に導く。

Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.

翻訳日:2024-11-08 05:37:29 公開日:2024-09-22

# ADRS-CNet:DNAストレージクラスタリングアルゴリズムのための適応次元削減選択と分類ネットワーク

ADRS-CNet: An adaptive dimensionality reduction selection and classification network for DNA storage clustering algorithms ( http://arxiv.org/abs/2408.12751v2 )

ライセンス: Link先を確認

Bowen Liu, Jiankun Li,

(参考訳) DNAストレージ技術は、高いストレージ密度、長期保存、低いメンテナンスコスト、コンパクトサイズのために、大量のデータストレージに対処する新たな可能性を提供します。記憶されている情報の信頼性を向上させるために、ベースエラーと不足するストレージシーケンスは、直面するべき課題である。現在、元のシーケンス情報を可能な限り回復するために、シーケンスシーケンスのクラスタリングと比較が採用されている。それでも、異なる長さのDNA配列を特徴として抽出すると、寸法の呪いが生じる。これを解決するために、PCA、UMAP、t-SNEといった技術は、高次元の特徴を低次元空間に投影するために一般的に用いられる。そこで本研究では,これらの手法が,異なるデータセットを扱う場合の次元削減に様々な効果を示すことを考慮し,入力DNA配列の特徴を分類するための多層パーセプトロンモデルを訓練し,その後のクラスタリング結果を高めるために最適な次元縮小法を適応的に選択することを提案する。オープンソースデータセットのテストや,さまざまなベースライン手法との比較を通じて,本モデルが優れた分類性能を示し,クラスタリング結果を大幅に改善することを示す実験結果が得られた。これにより,クラスタリングモデルに対する次元の呪いの影響を効果的に軽減できることを示す。

DNA storage technology offers new possibilities for addressing massive data storage due to its high storage density, long-term preservation, low maintenance cost, and compact size. To improve the reliability of stored information, base errors and missing storage sequences are challenges that must be faced. Currently, clustering and comparison of sequenced sequences are employed to recover the original sequence information as much as possible. Nonetheless, extracting DNA sequences of different lengths as features leads to the curse of dimensionality, which needs to be overcome. To address this, techniques like PCA, UMAP, and t-SNE are commonly employed to project high-dimensional features into low-dimensional space. Considering that these methods exhibit varying effectiveness in dimensionality reduction when dealing with different datasets, this paper proposes training a multilayer perceptron model to classify input DNA sequence features and adaptively select the most suitable dimensionality reduction method to enhance subsequent clustering results. Through testing on open-source datasets and comparing our approach with various baseline methods, experimental results demonstrate that our model exhibits superior classification performance and significantly improves clustering outcomes. This displays that our approach effectively mitigates the impact of the curse of dimensionality on clustering models.

翻訳日:2024-11-08 05:37:29 公開日:2024-09-22

# Zeoformer: OSDA-Zeolite親和性予測のための粗粒周期グラフ変換器

Zeoformer: Coarse-Grained Periodic Graph Transformer for OSDA-Zeolite Affinity Prediction ( http://arxiv.org/abs/2408.12984v3 )

ライセンス: Link先を確認

Xiangxiang Shen, Zheng Wan, Lingfeng Wen, Licheng Sun, Ou Yang Ming Jie, Xuan Tang, Xian Zeng, Mingsong Chen, Xiao He, Xian Wei,

(参考訳) 国際ゼオライト協会構造委員会(IZA-SC)はこれまでに255の異なるゼオライト構造をカタログ化しており、数百万もの理論上可能な構造がまだ発見されていない。特定のゼオライトの合成は、主にOSDAとゼオライトの親和性によって決定されるため、有機構造誘導剤(OSDA)の使用を必要とする。したがって、最も親和性が高いOSDA-ゼオライトペアが標的ゼオライトの合成の鍵となる。しかし、OSDA-ゼオライト対はしばしば複雑な幾何学構造、すなわち多数の原子によって形成される複雑な結晶構造を示す。既存の機械学習手法では結晶の周期性を表現できるが、局所的な可変性を持つ結晶構造を正確に表現することはできない。この問題に対処するため,Zeoformerという,粗粒度結晶周期性と粒度局所変動性を効果的に表現する手法を提案する。ゼオフォーマーは各原子を中心に単位細胞を再構成し、この中心原子と他の原子との対距離を再構成された単位細胞内に符号化する。再構成ユニットセル内の対距離の導入は、ユニットセルの全体構造と異なるユニットセルの違いをより効果的に表現し、OSDA-ゼオライト対と一般的な結晶構造の性質をより正確に効率的に予測することができる。総合評価により,OSDA-ゼオライトペアデータセットと2種類の結晶材料データセットで最高の性能を示す。

To date, the International Zeolite Association Structure Commission (IZA-SC) has cataloged merely 255 distinct zeolite structures, with millions of theoretically possible structures yet to be discovered. The synthesis of a specific zeolite typically necessitates the use of an organic structure-directing agent (OSDA), since the selectivity for a particular zeolite is largely determined by the affinity between the OSDA and the zeolite. Therefore, finding the best affinity OSDA-zeolite pair is the key to the synthesis of targeted zeolite. However, OSDA-zeolite pairs frequently exhibit complex geometric structures, i.e., a complex crystal structure formed by a large number of atoms. Although some existing machine learning methods can represent the periodicity of crystals, they cannot accurately represent crystal structures with local variability. To address this issue, we propose a novel approach called Zeoformer, which can effectively represent coarse-grained crystal periodicity and fine-grained local variability. Zeoformer reconstructs the unit cell centered around each atom and encodes the pairwise distances between this central atom and other atoms within the reconstructed unit cell. The introduction of pairwise distances within the reconstructed unit cell more effectively represents the overall structure of the unit cell and the differences between different unit cells, enabling the model to more accurately and efficiently predict the properties of OSDA-zeolite pairs and general crystal structures. Through comprehensive evaluation, our Zeoformer model demonstrates the best performance on OSDA-zeolite pair datasets and two types of crystal material datasets.

翻訳日:2024-11-08 05:26:28 公開日:2024-09-22

# CorefUDにおけるマルチリンガル干渉分解能向上のための多重戦略の探索

Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD ( http://arxiv.org/abs/2408.16893v2 )

ライセンス: Link先を確認

Ondřej Pražák, Miloslav Konopík,

(参考訳) テキスト中の同じエンティティを参照する式を識別するタスクである参照解決は、さまざまな自然言語処理(NLP)アプリケーションにおいて重要なコンポーネントである。本稿では,12言語にまたがる17のデータセットにまたがるCorefUD 1.1データセットを用いて,エンドツーエンドのニューラルコア参照解決システムを提案する。我々のモデルは、エンドツーエンドのニューラルコア参照解決システムに基づいている。まず、単言語と言語間変異を含む強力なベースラインモデルを構築し、その後、多様な言語文脈における性能向上のためのいくつかの拡張を提案する。これらの拡張には、言語間のトレーニング、構文情報の取り込み、最適化された単語予測のためのSpan2Headモデル、高度なシングルトンモデリングが含まれる。また,重なり合うセグメントによる単語スパン表現と長文書モデリングについても実験を行った。提案された拡張、特にヘッドオンリーのアプローチ、シングルトンモデリング、長いドキュメント予測は、ほとんどのデータセットのパフォーマンスを大幅に改善した。また、ゼロショット言語間実験を行い、コア参照分解における言語間移動の可能性と限界を強調した。本研究は,マルチリンガル・コアス・リゾリューションのための堅牢でスケーラブルなコアス・システムの開発に寄与する。最後に、CorefUD 1.1テストセット上でのモデルの評価を行い、CRAC 2023共有タスクの最良のモデルよりも大きなマージンで比較した。私たちのモデルはGitHubで利用可能です。

Coreference resolution, the task of identifying expressions in text that refer to the same entity, is a critical component in various natural language processing (NLP) applications. This paper presents our end-to-end neural coreference resolution system, utilizing the CorefUD 1.1 dataset, which spans 17 datasets across 12 languages. Our model is based on the end-to-end neural coreference resolution system. We first establish strong baseline models, including monolingual and cross-lingual variations, and then propose several extensions to enhance performance across diverse linguistic contexts. These extensions include cross-lingual training, incorporation of syntactic information, a Span2Head model for optimized headword prediction, and advanced singleton modeling. We also experiment with headword span representation and long-document modeling through overlapping segments. The proposed extensions, particularly the heads-only approach, singleton modeling, and long document prediction, significantly improve performance across most datasets. We also perform zero-shot cross-lingual experiments, highlighting the potential and limitations of cross-lingual transfer in coreference resolution. Our findings contribute to the development of robust and scalable coreference systems for multilingual coreference resolution. Finally, we evaluate our model on the CorefUD 1.1 test set and surpass the best model from the CRAC 2023 shared task of comparable size by a large margin. Our model is available on GitHub: https://github.com/ondfa/coref-multiling

翻訳日:2024-11-08 04:08:49 公開日:2024-09-22

# ICの認証・追跡のための階層型ブルームフィルタベースフレームワーク

A Persistent Hierarchical Bloom Filter-based Framework for Authentication and Tracking of ICs ( http://arxiv.org/abs/2408.16950v2 )

ライセンス: Link先を確認

Fairuz Shadmani Shishir, Md Mashfiq Rizvee, Tanvir Hossain, Tamzidul Hoque, Domenic Forte, Sumaiya Shomaji,

(参考訳) 信頼できないサプライチェーンにおける偽ファイト集積回路(IC)の検出には、堅牢な追跡と認証が必要である。 Physical Unclonable Function (PUF) はユニークなIC識別子を提供するが、ノイズはそれらの実用性を損なう。本研究では,PHBF(Persistent Hierarchical Bloom Filter)フレームワークを導入し,ノイズのあるPUF生成シグネチャであっても,サプライチェーン全体で100%の精度で高速かつ正確なIC認証を実現する。

Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply chain even with noisy PUF-generated signatures.

翻訳日:2024-11-08 04:08:49 公開日:2024-09-22

# マルコフ連鎖変動推定法 : 確率近似法

Markov Chain Variance Estimation: A Stochastic Approximation Approach ( http://arxiv.org/abs/2409.05733v2 )

ライセンス: Link先を確認

Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri,

(参考訳) マルコフ連鎖上で定義される関数の漸近的分散を推定する問題は、定常平均の統計的推測の重要なステップである。我々は各ステップで$O(1)$計算を必要とする新しい再帰的推定器を設計し、履歴サンプルやラン長に関する事前知識を保存する必要がなく、証明可能な有限標本保証付き平均二乗誤差(MSE)に対する最適$O(\frac{1}{n})$収束率を有する。ここで、$n$は生成されたサンプルの総数を指す。我々の推定子は、ポアソン方程式の解の項による漸近分散の等価な定式化の線形確率近似に基づいている。我々は,ベクトル値関数の共分散行列の推定,マルコフ鎖の定常分散の推定,および基礎となるマルコフ鎖の状態空間が大きくなるような条件下での漸近分散の推定など,いくつかの方向の近似器を一般化する。また, 平均報酬強化学習(RL)における推定器の応用について述べる。この文脈でポリシー評価に適した時間差型アルゴリズムを設計する。表型および線形関数近似の設定について検討する。我々の研究は、分散制約付きRLのためのアクター・クリティカルなスタイルのアルゴリズムを開発するための道を開いた。

We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computation at each step, does not require storing any historical samples or any prior knowledge of run-length, and has optimal $O(\frac{1}{n})$ rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees. Here, $n$ refers to the total number of samples generated. Our estimator is based on linear stochastic approximation of an equivalent formulation of the asymptotic variance in terms of the solution of the Poisson equation. We generalize our estimator in several directions, including estimating the covariance matrix for vector-valued functions, estimating the stationary variance of a Markov chain, and approximately estimating the asymptotic variance in settings where the state space of the underlying Markov chain is large. We also show applications of our estimator in average reward reinforcement learning (RL), where we work with asymptotic variance as a risk measure to model safety-critical applications. We design a temporal-difference type algorithm tailored for policy evaluation in this context. We consider both the tabular and linear function approximation settings. Our work paves the way for developing actor-critic style algorithms for variance-constrained RL.

翻訳日:2024-11-07 22:27:40 公開日:2024-09-22

# 離散音声ユニットの完全性の推定

Estimating the Completeness of Discrete Speech Units ( http://arxiv.org/abs/2409.06109v2 )

ライセンス: Link先を確認

Sung-Lin Yeh, Hao Tang,

(参考訳) 離散単位による音声表現は音声コーデックや音声生成に広く用いられている。しかし、k-meansで音声情報や話者情報を混同したり、k-means以降の情報損失を仮定したりするなど、自己管理された離散単位に関する不確実な主張がいくつかある。本研究では,情報理論の観点を用いて,情報量(情報完全性)と情報量(情報アクセシビリティ)(情報アクセシビリティ)を,残差ベクトル量子化前後に求める。残差ベクトル量子化後の離散化HuBERT表現に対して,情報完全性と推定完全性に対する低い境界を示す。我々は,HuBERT離散単位には話者情報が十分に存在しており,残音には音声情報が十分存在しており,ベクトル量子化が絡み合っていないことを示す。この結果から, 離散単位の選択に関する総合的な評価が得られ, 残余の情報は廃棄されるよりも多く掘り下げるべきであることが示唆された。

Representing speech with discrete units has been widely used in speech codec and speech generation. However, there are several unverified claims about self-supervised discrete units, such as disentangling phonetic and speaker information with k-means, or assuming information loss after k-means. In this work, we take an information-theoretic perspective to answer how much information is present (information completeness) and how much information is accessible (information accessibility), before and after residual vector quantization. We show a lower bound for information completeness and estimate completeness on discretized HuBERT representations after residual vector quantization. We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement. Our results offer a comprehensive assessment on the choice of discrete units, and suggest that a lot more information in the residual should be mined rather than discarded.

翻訳日:2024-11-07 22:16:23 公開日:2024-09-22

# 変態等価性を用いたチャネル拡張による半監督型3次元物体検出

Semi-Supervised 3D Object Detection with Channel Augmentation using Transformation Equivariance ( http://arxiv.org/abs/2409.06583v2 )

ライセンス: Link先を確認

Minju Kang, Taehun Kong, Tae-Kyun Kim,

(参考訳) 正確な3Dオブジェクト検出は、自動運転車やロボットにとって、安全かつ効果的に環境をナビゲートし、対話する上で不可欠である。一方、3D検出器の性能は高価であるデータサイズとアノテーションに依存している。その結果,ラベル付きデータによるトレーニングの需要が高まっている。本稿では,3次元半教師対象検出のためのチャネル拡張を用いた新しい教師学生フレームワークについて検討する。教師の学生SSLは、教師と生徒にそれぞれ弱い増補と強い増補を採用するのが一般的である。本研究では、変換等分散検出器(TED)を用いて、両方のネットワークに多重チャネル拡張を適用する。 TEDにより、点雲上の拡張の異なる組み合わせを探索し、マルチチャネル変換等式を効率的に集約することができる。原則として、教師ネットワークに固定チャネル拡張を適用することにより、学生は信頼できる擬似ラベルで安定的に訓練することができる。強力なチャネル拡張を採用することで、データの多様性を強化し、変換に対する堅牢性を高め、学生ネットワークの一般化性能を向上させることができる。我々はSOTA階層的監視をベースラインとして使用し、その二重閾値をTEDに適応させ、これはチャネルIoU整合性と呼ばれる。提案手法をKITTIデータセットを用いて評価し,SOTA3D半教師付き物体検出モデルを上回る性能向上を実現した。

Accurate 3D object detection is crucial for autonomous vehicles and robots to navigate and interact with the environment safely and effectively. Meanwhile, the performance of 3D detector relies on the data size and annotation which is expensive. Consequently, the demand of training with limited labeled data is growing. We explore a novel teacher-student framework employing channel augmentation for 3D semi-supervised object detection. The teacher-student SSL typically adopts a weak augmentation and strong augmentation to teacher and student, respectively. In this work, we apply multiple channel augmentations to both networks using the transformation equivariance detector (TED). The TED allows us to explore different combinations of augmentation on point clouds and efficiently aggregates multi-channel transformation equivariance features. In principle, by adopting fixed channel augmentations for the teacher network, the student can train stably on reliable pseudo-labels. Adopting strong channel augmentations can enrich the diversity of data, fostering robustness to transformations and enhancing generalization performance of the student network. We use SOTA hierarchical supervision as a baseline and adapt its dual-threshold to TED, which is called channel IoU consistency. We evaluate our method with KITTI dataset, and achieved a significant performance leap, surpassing SOTA 3D semi-supervised object detection models.

翻訳日:2024-11-07 22:05:05 公開日:2024-09-22

# コンピュータビジョンにおける倫理的課題: 公開データセットにおけるプライバシの確保とバイアスの緩和

Ethical Challenges in Computer Vision: Ensuring Privacy and Mitigating Bias in Publicly Available Datasets ( http://arxiv.org/abs/2409.10533v3 )

ライセンス: Link先を確認

Ghalib Ahmed Tahir,

(参考訳) 本稿では,コンピュータビジョン技術の創造と展開に関する倫理的問題,特に公開データセットの利用に関して,光を当てることを目的としている。機械学習と人工知能の急速な成長により、コンピュータビジョンは医療、セキュリティシステム、貿易など多くの産業において重要なツールとなっている。しかし、その影響についての情報的な議論により、同意なく収集されることが多い視覚的データの広範な使用は、プライバシーと偏見に関する重大な懸念を提起する。また、コンピュータビジョンモデルのトレーニングに通常使用されるCOCO、LFW、ImageNet、CelebA、PASCAL VOCなどの一般的なデータセットを分析して、これらの問題についても検討する。我々は、個人の権利の保護、バイアスの最小化、開放性と責任に関するこれらの課題に対処する包括的な倫理的枠組みを提供する。我々は、社会的な価値と倫理的基準を考慮に入れたAI開発を奨励し、公共の害を避けることを目指している。

This paper aims to shed light on the ethical problems of creating and deploying computer vision tech, particularly in using publicly available datasets. Due to the rapid growth of machine learning and artificial intelligence, computer vision has become a vital tool in many industries, including medical care, security systems, and trade. However, extensive use of visual data that is often collected without consent due to an informed discussion of its ramifications raises significant concerns about privacy and bias. The paper also examines these issues by analyzing popular datasets such as COCO, LFW, ImageNet, CelebA, PASCAL VOC, etc., that are usually used for training computer vision models. We offer a comprehensive ethical framework that addresses these challenges regarding the protection of individual rights, minimization of bias as well as openness and responsibility. We aim to encourage AI development that will take into account societal values as well as ethical standards to avoid any public harm.

翻訳日:2024-11-07 20:35:12 公開日:2024-09-22

# Green Federated Learning: Green Aware AIの新しい時代

Green Federated Learning: A new era of Green Aware AI ( http://arxiv.org/abs/2409.12626v2 )

ライセンス: Link先を確認

Dipanwita Thakur, Antonella Guzzo, Giancarlo Fortino, Francesco Piccialli,

(参考訳) AIアプリケーション、特に大規模無線ネットワークにおける開発は、使用されるアーキテクチャのサイズと複雑さとともに指数関数的に増加している。特に、機械学習は、今日の最もエネルギー集約的な計算応用の1つとして認められており、次世代インテリジェントシステムの環境持続可能性に重大な課題を提起している。環境の持続可能性を達成するには、すべてのAIアルゴリズムが持続可能性を考慮して設計され、アーキテクチャフェーズ以降の緑の考慮事項を統合する必要がある。最近、フェデレートラーニング(FL)は、その分散した性質から、このニーズに対処する新たな機会を提示している。したがって、最近のFLの進歩と持続可能性への影響から生じる可能性と課題を解明することが不可欠である。さらに、グリーンアウェアなAIアルゴリズムの既存の取り組みとギャップをナビゲートし、理解するためのロードマップを研究者、ステークホルダ、関心のある関係者に提供することが重要です。この調査は主に、100を超えるFL作品を特定し分析し、持続可能な環境のためのグリーンアウェアな人工知能への貢献を評価し、IoT研究に特に焦点をあてることによって、この目標を達成することを目的としている。エネルギー効率の観点から、グリーンフェデレーション学習の現在の課題を掘り下げ、グリーンIoTアプリケーション研究の潜在的な課題と今後の展望について論じている。

The development of AI applications, especially in large-scale wireless networks, is growing exponentially, alongside the size and complexity of the architectures used. Particularly, machine learning is acknowledged as one of today's most energy-intensive computational applications, posing a significant challenge to the environmental sustainability of next-generation intelligent systems. Achieving environmental sustainability entails ensuring that every AI algorithm is designed with sustainability in mind, integrating green considerations from the architectural phase onwards. Recently, Federated Learning (FL), with its distributed nature, presents new opportunities to address this need. Hence, it's imperative to elucidate the potential and challenges stemming from recent FL advancements and their implications for sustainability. Moreover, it's crucial to furnish researchers, stakeholders, and interested parties with a roadmap to navigate and understand existing efforts and gaps in green-aware AI algorithms. This survey primarily aims to achieve this objective by identifying and analyzing over a hundred FL works, assessing their contributions to green-aware artificial intelligence for sustainable environments, with a specific focus on IoT research. It delves into current issues in green federated learning from an energy-efficient standpoint, discussing potential challenges and future prospects for green IoT application research.

翻訳日:2024-11-07 14:08:12 公開日:2024-09-22

# 自動運転における半監督セマンティックセマンティックセグメンテーションのための小型擬似ラベルの爆発

Exploiting Minority Pseudo-Labels for Semi-Supervised Semantic Segmentation in Autonomous Driving ( http://arxiv.org/abs/2409.12680v2 )

ライセンス: Link先を確認

Yuting Hong, Hui Xiao, Huazheng Hao, Xiaojie Qiu, Baochen Yao, Chengbin Peng,

(参考訳) 自律運転の進歩により、セマンティックセグメンテーションは目覚ましい進歩を遂げた。このようなネットワークのトレーニングは画像アノテーションに大きく依存する。半教師付き学習は、擬似ラベルの助けを借りてラベル付きデータと未ラベル付きデータの両方を利用することができる。しかし、クラスがバランスの取れない現実のシナリオでは、多数派クラスがトレーニングにおいて支配的な役割を果たすことが多く、マイノリティクラスの学習品質が損なわれることがある。この制限を克服するために、マイノリティクラス学習を強化する専門的なトレーニングモジュールと、より包括的な意味情報を学ぶための一般的なトレーニングモジュールを含む、シナジスティックなトレーニングフレームワークを提案する。画素選択戦略に基づいて、互いから反復的に学習し、エラーの蓄積と結合を低減する。さらに、より明確な決定境界を保証するために、アンカーを用いた二重コントラスト学習を提案する。実験では,ベンチマークデータセットの最先端手法と比較して優れた性能を示す。

With the advancement of autonomous driving, semantic segmentation has achieved remarkable progress. The training of such networks heavily relies on image annotations, which are very expensive to obtain. Semi-supervised learning can utilize both labeled data and unlabeled data with the help of pseudo-labels. However, in many real-world scenarios where classes are imbalanced, majority classes often play a dominant role during training and the learning quality of minority classes can be undermined. To overcome this limitation, we propose a synergistic training framework, including a professional training module to enhance minority class learning and a general training module to learn more comprehensive semantic information. Based on a pixel selection strategy, they can iteratively learn from each other to reduce error accumulation and coupling. In addition, a dual contrastive learning with anchors is proposed to guarantee more distinct decision boundaries. In experiments, our framework demonstrates superior performance compared to state-of-the-art methods on benchmark datasets.

翻訳日:2024-11-07 13:56:59 公開日:2024-09-22

# 量子貯水池計算にはコヒーレンス流入が不可欠である

Coherence influx is indispensable for quantum reservoir computing ( http://arxiv.org/abs/2409.12693v2 )

ライセンス: Link先を確認

Shumpei Kobayashi, Quoc Hoan Tran, Kohei Nakajima,

(参考訳) Echo状態特性(ESP)は、入力駆動の動的システムが情報処理タスクを実行できる基本的な特性である。近年、非定常系やサブシステム、すなわち非定常ESPやサブサブスペースESPへのESPの拡張が提案されている。本稿では,非定常ESPとサブセット/サブスペース非定常ESPを満たすために,量子システムに必要な十分かつ必要な条件を理論的に数値解析する。パウリ変換行列 (PTM) 形式を広く用いた結果,(1)$\textit{coherence influx}$ と呼ばれる量子コヒーレントな環境との相互作用は非定常ESPの実現には不可欠であり,(2)PTMのスペクトル半径は量子貯水池計算 (QRC) のフェードメモリ特性を特徴づけることができることがわかった。スピングラス/マニーボディの局在相を含むハミルトニアン系を含む数値解析実験により, PTMのスペクトル半径は, そのような系に固有の動的相転移を記述できることが判明した。 QRCのESP下でのメカニズムを包括的に理解するために,1次元乗算入力を持つ貯水池計算(RC)システムである簡易モデルである乗算貯水池計算(mRC)を提案する。理論的,数値的には,mRCにおけるスペクトル半径とコヒーレンス流入に対応するパラメータは,その線形記憶容量(MC)と直接相関することを示す。 QRC と mRC に関する知見は PTM の理論的側面と QRC の入力乗算率をもたらす。その結果、オープン量子システムにおけるQRCと情報処理の理解を深めることになる。

Echo state property (ESP) is a fundamental property that allows an input-driven dynamical system to perform information processing tasks. Recently, extensions of ESP to potentially nonstationary systems and subsystems, that is, nonstationary ESP and subset/subspace ESP, have been proposed. In this paper, we theoretically and numerically analyze the sufficient and necessary conditions for a quantum system to satisfy nonstationary ESP and subset/subspace nonstationary ESP. Based on extensive usage of the Pauli transfer matrix (PTM) form, we find that (1) the interaction with a quantum-coherent environment, termed $\textit{coherence influx}$, is indispensable in realizing nonstationary ESP, and (2) the spectral radius of PTM can characterize the fading memory property of quantum reservoir computing (QRC). Our numerical experiment, involving a system with a Hamiltonian that entails a spin-glass/many-body localization phase, reveals that the spectral radius of PTM can describe the dynamical phase transition intrinsic to such a system. To comprehensively understand the mechanisms under ESP of QRC, we propose a simplified model, multiplicative reservoir computing (mRC), which is a reservoir computing (RC) system with a one-dimensional multiplicative input. Theoretically and numerically, we show that the parameters corresponding to the spectral radius and coherence influx in mRC directly correlates with its linear memory capacity (MC). Our findings about QRC and mRC will provide a theoretical aspect of PTM and the input multiplicativity of QRC. The results will lead to a better understanding of QRC and information processing in open quantum systems.

翻訳日:2024-11-07 13:56:58 公開日:2024-09-22

# 対話的で学習可能な協調運転自動化を目指して--大規模言語モデル駆動意思決定フレームワーク

Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework ( http://arxiv.org/abs/2409.12812v2 )

ライセンス: Link先を確認

Shiyu Fang, Jiaqi Liu, Mingyu Ding, Yiming Cui, Chen Lv, Peng Hang, Jian Sun,

(参考訳) 現在、コネクテッド・オートモービルズ(CAV)は世界中の道路試験を開始したが、複雑なシナリオにおける安全性と効率性はまだ不十分である。協調運転は、CAVの接続能力を活用して、複雑なシナリオにおいてCAVの性能を改善するための有望なアプローチとなる。しかしながら、インタラクションと継続的学習能力の欠如は、現在の協調運転を単一シナリオアプリケーションと特定の協調運転自動化(CDA)に制限する。これらの課題に対処するため,本研究では,対話型かつ学習可能なLLM駆動協調運転フレームワークであるCoDrivingLLMを提案し,全シナリオと全CDAを実現する。まず,Large Language Models (LLMs) は数学的計算に適さないため,セマンティックな決定に基づく車両位置の更新を行う環境モジュールを導入し,車両位置のLLM制御による潜在的なエラーを回避する。第2に、SAE J3216規格で定義された4段階のCDAに基づいて、状態認識、意図共有、交渉、意思決定を含むChain-of-Thought(COT)ベースの推論モジュールを提案し、多段階推論タスクにおけるLCMの安定性を向上させる。中央集権的な紛争解決は、推論プロセスのコンフリクトコーディネータを通じて管理される。最後に、メモリモジュールを導入し、検索拡張世代を採用することで、CAVには過去の経験から学ぶ能力が与えられている。提案したCoDrivingLLMは,交渉モジュール上でのアブレーション実験,ショット経験の相違による推論,および他の協調運転法との比較により検証した。

At present, Connected Autonomous Vehicles (CAVs) have begun to open road testing around the world, but their safety and efficiency performance in complex scenarios is still not satisfactory. Cooperative driving leverages the connectivity ability of CAVs to achieve synergies greater than the sum of their parts, making it a promising approach to improving CAV performance in complex scenarios. However, the lack of interaction and continuous learning ability limits current cooperative driving to single-scenario applications and specific Cooperative Driving Automation (CDA). To address these challenges, this paper proposes CoDrivingLLM, an interactive and learnable LLM-driven cooperative driving framework, to achieve all-scenario and all-CDA. First, since Large Language Models(LLMs) are not adept at handling mathematical calculations, an environment module is introduced to update vehicle positions based on semantic decisions, thus avoiding potential errors from direct LLM control of vehicle positions. Second, based on the four levels of CDA defined by the SAE J3216 standard, we propose a Chain-of-Thought (COT) based reasoning module that includes state perception, intent sharing, negotiation, and decision-making, enhancing the stability of LLMs in multi-step reasoning tasks. Centralized conflict resolution is then managed through a conflict coordinator in the reasoning process. Finally, by introducing a memory module and employing retrieval-augmented generation, CAVs are endowed with the ability to learn from their past experiences. We validate the proposed CoDrivingLLM through ablation experiments on the negotiation module, reasoning with different shots experience, and comparison with other cooperative driving methods.

翻訳日:2024-11-07 13:23:33 公開日:2024-09-22

# ビザンチン攻撃による分散型マルチエージェント政策評価の難しさについて

On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks ( http://arxiv.org/abs/2409.12882v2 )

ライセンス: Link先を確認

Hairi, Minghong Fang, Zifan Zhang, Alvaro Velasquez, Jia Liu,

(参考訳) 本稿では,協調型マルチエージェント強化学習において重要なサブプロブレムである完全分散型マルチエージェント政策評価問題について,最大$f$の障害エージェントの存在下で検討する。特に、モデル中毒設定を伴ういわゆるビザンツの欠陥モデルに焦点を当てる。一般に、政策評価は、任意の政策の価値関数を評価することである。協調型マルチエージェントシステムでは、システム全体の報酬は通常、すべてのエージェントからの報酬の均一平均としてモデル化される。ビザンチン系エージェントの存在下でのマルチエージェント政策評価問題,特に異種局所報酬の設定について検討する。理想的には、エージェントの目標は、与えられたポリシーに対する通常のエージェントの報酬の均一な平均である、蓄積されたシステム全体の報酬を評価することである。これは、すべてのエージェントが共通値(コンセンサス部)に合意し、さらにコンセンサス値が値関数(収束部)であることを意味する。しかし、我々はこの目標が達成できないことを証明している。代わりに、エージェントの目標は蓄積されたシステム全体の報酬を評価することであり、通常のエージェントの適切な重み付けされた平均報酬である。さらに、正の重みの総数が $|\mathcal{N}|-f $ を超えることを保証できる正のアルゴリズムが存在しないことを証明している。最後に、スカラー関数近似の下で漸近的コンセンサスを保証するビザンチン耐性の分散時間差分法を提案する。次に,提案アルゴリズムの有効性を実証的に検証する。

In this paper, we study a fully-decentralized multi-agent policy evaluation problem, which is an important sub-problem in cooperative multi-agent reinforcement learning, in the presence of up to $f$ faulty agents. In particular, we focus on the so-called Byzantine faulty model with model poisoning setting. In general, policy evaluation is to evaluate the value function of any given policy. In cooperative multi-agent system, the system-wide rewards are usually modeled as the uniform average of rewards from all agents. We investigate the multi-agent policy evaluation problem in the presence of Byzantine agents, particularly in the setting of heterogeneous local rewards. Ideally, the goal of the agents is to evaluate the accumulated system-wide rewards, which are uniform average of rewards of the normal agents for a given policy. It means that all agents agree upon common values (the consensus part) and furthermore, the consensus values are the value functions (the convergence part). However, we prove that this goal is not achievable. Instead, we consider a relaxed version of the problem, where the goal of the agents is to evaluate accumulated system-wide reward, which is an appropriately weighted average reward of the normal agents. We further prove that there is no correct algorithm that can guarantee that the total number of positive weights exceeds $|\mathcal{N}|-f $, where $|\mathcal{N}|$ is the number of normal agents. Towards the end, we propose a Byzantine-tolerant decentralized temporal difference algorithm that can guarantee asymptotic consensus under scalar function approximation. We then empirically test the effective of the proposed algorithm.

翻訳日:2024-11-07 12:59:09 公開日:2024-09-22

# VLMのロールプレイングゲームは可能か? ブラックマイスウォンを研究事例に

Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case ( http://arxiv.org/abs/2409.12889v2 )

ライセンス: Link先を確認

Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng,

(参考訳) 近年,大規模言語モデル(LLM)に基づくエージェントは,様々な分野において大きな進歩を遂げている。最も人気のある研究分野の1つは、これらのエージェントをビデオゲームに適用することである。伝統的に、これらの手法はゲーム内の環境および行動データにアクセスするためにゲームAPIに依存してきた。しかし、このアプローチはAPIの可用性によって制限されており、人間がゲームをする方法を反映していない。視覚言語モデル(VLM)の出現により、エージェントは視覚的理解能力を強化し、視覚入力のみを使用してゲームと対話できるようになった。これらの進歩にもかかわらず、現在のアプローチはアクション指向のタスク、特に強化学習法が一般的だが一般化が不十分で広範な訓練を必要とするアクションロールプレイングゲーム(ARPG)において、依然として課題に直面している。これらの制限に対処するため、視覚のみの入力と複雑なアクション出力を必要とするシナリオにおいて、既存のVLMの機能境界を探索する研究プラットフォームとして、ARPGの ``Black Myth: Wukong'' を選択する。ゲーム内の12のタスクを定義し、75%が戦闘に焦点を当て、いくつかの最先端のVLMをこのベンチマークに組み込む。さらに、記録されたゲームプレイビデオとマウスとキーボードアクションを含む操作ログを含む人間の操作データセットをリリースする。さらに,行動計画システムと視覚軌道システムからなるVARP(Vision Action Role-Playing)エージェントフレームワークを提案する。我々のフレームワークは、基本的なタスクを実行し、簡単かつ中程度の戦闘シナリオの90%を成功させる能力を示している。本研究の目的は,複雑なアクションゲーム環境にマルチモーダルエージェントを適用するための新たな洞察と方向性を提供することである。コードとデータセットはhttps://varp-agent.github.io/で公開される。

Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. With the advent of vision language models (VLMs), agents now have enhanced visual understanding capabilities, enabling them to interact with games using only visual inputs. Despite these advances, current approaches still face challenges in action-oriented tasks, particularly in action role-playing games (ARPGs), where reinforcement learning methods are prevalent but suffer from poor generalization and require extensive training. To address these limitations, we select an ARPG, ``Black Myth: Wukong'', as a research platform to explore the capability boundaries of existing VLMs in scenarios requiring visual-only input and complex action output. We define 12 tasks within the game, with 75% focusing on combat, and incorporate several state-of-the-art VLMs into this benchmark. Additionally, we will release a human operation dataset containing recorded gameplay videos and operation logs, including mouse and keyboard actions. Moreover, we propose a novel VARP (Vision Action Role-Playing) agent framework, consisting of an action planning system and a visual trajectory system. Our framework demonstrates the ability to perform basic tasks and succeed in 90% of easy and medium-level combat scenarios. This research aims to provide new insights and directions for applying multimodal agents in complex action game environments. The code and datasets will be made available at https://varp-agent.github.io/.

翻訳日:2024-11-07 12:59:09 公開日:2024-09-22

# オープンワールドにおけるライダーパノプティクスのセグメンテーション

Lidar Panoptic Segmentation in an Open World ( http://arxiv.org/abs/2409.14273v1 )

ライセンス: Link先を確認

Anirudh S Chakravarthy, Meghana Reddy Ganesina, Peiyun Hu, Laura Leal-Taixe, Shu Kong, Deva Ramanan, Aljosa Osep,

(参考訳) ライダー・パノプティクス・セグメンテーション(LPS)への対処は、自動運転車の安全な配備に不可欠である。 LPSは、可算オブジェクト(例えば歩行者や車両)のモノクラスや、非定型領域(例えば、植生や道路)のモノクラスを含む、セマンティッククラスの事前に定義された語彙を認識・セグメントすることを目的としている。重要なのは、LPSは個々のインスタンス(例えば、すべての車両)をセグメント化する必要があることだ。現在のLPS法は、意味クラス語彙が実際のオープンな世界で固定されているという非現実的な仮定をしているが、実際には、クラスオントロジは通常、事前に定義されたクラス語彙のように未知であると考えられる新しいクラスのインスタンスに遭遇するにつれて、時間とともに進化する。この非現実的な仮定に対処するため、我々はOpen World (LiPSOW): 定義済みのセマンティッククラスボキャブラリを持つデータセット上でモデルをトレーニングし、それらの一般化を、モノやモノの新たなインスタンスが現れるような大きなデータセットに研究する。この実験的な設定は興味深い結論をもたらす。先行技術訓練では、クラス固有のインスタンスセグメンテーション法と、既知のクラスにおける最先端の結果を得るが、クラスに依存しないボトムアップグルーピング法は、初期クラス語彙以外のクラス(すなわち未知クラス)で好意的に機能する。残念ながら、これらのメソッドは、既知のクラスで完全にデータ駆動のメソッドと同等に動作しない。分類に依存しない点クラスタリングを行い、階層的な方法で入力クラウドを過剰に分離し、次に領域提案ネットワークのようにバイナリポイントセグメントの分類を行う。我々は、意味分類とは独立に、点セグメントの重み付き階層木におけるカットを計算することで、最終点雲のセグメンテーションを得る。注目すべきは、この統一されたアプローチは、既知のクラスと未知のクラスの両方で強力なパフォーマンスをもたらすことだ。

Addressing Lidar Panoptic Segmentation (LPS ) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points w.r.t. a pre-defined vocabulary of semantic classes, including thing classes of countable objects (e.g., pedestrians and vehicles) and stuff classes of amorphous regions (e.g., vegetation and road). Importantly, LPS requires segmenting individual thing instances (e.g., every single vehicle). Current LPS methods make an unrealistic assumption that the semantic class vocabulary is fixed in the real open world, but in fact, class ontologies usually evolve over time as robots encounter instances of novel classes that are considered to be unknowns w.r.t. the pre-defined class vocabulary. To address this unrealistic assumption, we study LPS in the Open World (LiPSOW): we train models on a dataset with a pre-defined semantic class vocabulary and study their generalization to a larger dataset where novel instances of thing and stuff classes can appear. This experimental setting leads to interesting conclusions. While prior art train class-specific instance segmentation methods and obtain state-of-the-art results on known classes, methods based on class-agnostic bottom-up grouping perform favorably on classes outside of the initial class vocabulary (i.e., unknown classes). Unfortunately, these methods do not perform on-par with fully data-driven methods on known classes. Our work suggests a middle ground: we perform class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification, akin to Region Proposal Network [1]. We obtain the final point cloud segmentation by computing a cut in the weighted hierarchical tree of point segments, independently of semantic classification. Remarkably, this unified approach leads to strong performance on both known and unknown classes.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# 大規模言語モデルによる証明自動化

Proof Automation with Large Language Models ( http://arxiv.org/abs/2409.14274v1 )

ライセンス: Link先を確認

Minghai Lu, Benjamin Delaware, Tianyi Zhang,

(参考訳) Coqのようなインタラクティブな定理証明器は、ソフトウェアの正しさを正式に保証する強力なツールである。しかし、これらのツールを使用するには、かなりの手作業と専門知識が必要である。大規模言語モデル(LLM)は、自然言語の非公式な証明を自動生成する可能性を示しているが、対話型定理証明器では形式的な証明を生成できない。本稿では,LLMが形式的証明を生成する際に犯した一般的な誤りを特定するための形式的研究を行う。 GPT-3.5による520個の証明生成誤差を解析した結果、GPT-3.5は証明の正しい高次構造をしばしば特定するが、下位レベルの詳細を正しく把握するのに苦労していることがわかった。この知見に基づいて,まず LLM に初期証明を生成することを促し,次に目標とする記号法を利用して低レベルの問題を反復的に修復する,新しい生成・再生手法である PALM を提案する。 10K以上の定理を含む大規模データセット上でPALMを評価する。その結果、PALMは他の最先端の手法よりも大幅に優れており、76.6%から180.4%の定理を証明できた。さらに、PALMは既存のアプローチの範囲を超えて1270の定理を証明している。また,異なるLLM間のPALMの一般化可能性を示す。

Interactive theorem provers such as Coq are powerful tools to formally guarantee the correctness of software. However, using these tools requires significant manual effort and expertise. While Large Language Models (LLMs) have shown promise in automatically generating informal proofs in natural language, they are less effective at generating formal proofs in interactive theorem provers. In this paper, we conduct a formative study to identify common mistakes made by LLMs when asked to generate formal proofs. By analyzing 520 proof generation errors made by GPT-3.5, we found that GPT-3.5 often identified the correct high-level structure of a proof, but struggled to get the lower-level details correct. Based on this insight, we propose PALM, a novel generate-then-repair approach that first prompts an LLM to generate an initial proof and then leverages targeted symbolic methods to iteratively repair low-level problems. We evaluate PALM on a large dataset that includes more than 10K theorems. Our results show that PALM significantly outperforms other state-of-the-art approaches, successfully proving 76.6% to 180.4% more theorems. Moreover, PALM proves 1270 theorems beyond the reach of existing approaches. We also demonstrate the generalizability of PALM across different LLMs.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# 動的散乱チャンネルを用いたマルチユーザ画像暗号化

Dynamic Scattering-channel-based Approach for Multiuser Image Encryption ( http://arxiv.org/abs/2409.14275v1 )

ライセンス: Link先を確認

Mohammadrasoul Taghavi, Edwin A. Marengo,

(参考訳) 全てのユーザが使用する静的な複雑な媒体に基づいて動作する従来の散乱ベースの暗号化システムは、暗号文とプレーンテキストのペアを利用して散乱媒体の応答をモデル化しリバースエンジニアリングする学習ベースの攻撃に対して脆弱であり、物理媒体を使わずに不正な復号を可能にする。本研究では,マルチユーザ画像暗号化のための動的散乱チャネルに基づく新しい手法を開発した。確立されたアプローチは、複数の散乱ナノ粒子の調整可能な集合体としてモデル化された可変な動的散乱媒質を用いる。提案システムでは,異なる時間ブロックに対する散乱行列の異なる組み合わせと,ユーザ固有の複素値係数を組み合わせることで,ユーザごとに一意な暗号鍵を作成できるようにすることにより,複数のユーザを支援する。本手法は,分散メディアを暗号化機構として用いたマルチユーザセキュア通信およびストレージチャネルの実現可能性を高める。

Conventional scattering-based encryption systems that operate based on a static complex medium which is used by all users are vulnerable to learning-based attacks that exploit ciphertext-plaintext pairs to model and reverse-engineer the scattering medium's response, enabling unauthorized decryption without the physical medium. In this contribution, a new dynamic scattering-channel-based technique for multiuser image encryption is developed. The established approach employs variable, dynamic scattering media which are modeled as tunable aggregates of multiple scattering nanoparticles. The proposed system supports multiple users by allowing distinct combinations of scattering matrices for different time blocks, each combined with user-specific complex-valued coefficients, enabling the creation of unique, hard-to-guess encryption keys for each user. The derived methodology enhances the practical feasibility of multiuser secure communication and storage channels employing scattering media as the encryption mechanism.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# Can-Do! 大規模マルチモーダルモデルを用いた身体的計画のためのデータセットとニューロシンボリック接地フレームワーク

Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models ( http://arxiv.org/abs/2409.14277v1 )

ライセンス: Link先を確認

Yew Ken Chia, Qi Sun, Lidong Bing, Soujanya Poria,

(参考訳) 大規模マルチモーダルモデルは、視覚や言語タスクにおいて、目覚しい問題解決能力を示し、幅広い世界の知識をエンコードする可能性を持っている。しかし、これらのモデルが現実的な環境で知覚、理性、計画、行動を理解することは、依然としてオープンな課題である。本研究では,従来のデータセットよりも多様で複雑なシナリオを通じて,具体的計画能力を評価するために設計されたベンチマークデータセットであるCan-Doを紹介する。私たちのデータセットには400のマルチモーダルサンプルが含まれており、それぞれが自然言語のユーザ指示、環境を描写した視覚イメージ、状態変化、対応するアクションプランで構成されています。データは、常識知識、身体的理解、安全意識の様々な側面を含んでいる。 GPT-4Vを含む最先端のモデルでは、視覚知覚、理解、推論能力のボトルネックに直面している。これらの課題に対処するために,ニューログラウンド(NeuroGround)を提案する。このフレームワークは,まず認識された環境状態におけるプラン生成を基盤として,モデル生成計画の強化にシンボリックな計画エンジンを活用する。実験により,強いベースラインと比較して,フレームワークの有効性が示された。私たちのコードとデータセットはhttps://embodied-planning.github.io.comで公開されています。

Large multimodal models have demonstrated impressive problem-solving abilities in vision and language tasks, and have the potential to encode extensive world knowledge. However, it remains an open challenge for these models to perceive, reason, plan, and act in realistic environments. In this work, we introduce Can-Do, a benchmark dataset designed to evaluate embodied planning abilities through more diverse and complex scenarios than previous datasets. Our dataset includes 400 multimodal samples, each consisting of natural language user instructions, visual images depicting the environment, state changes, and corresponding action plans. The data encompasses diverse aspects of commonsense knowledge, physical understanding, and safety awareness. Our fine-grained analysis reveals that state-of-the-art models, including GPT-4V, face bottlenecks in visual perception, comprehension, and reasoning abilities. To address these challenges, we propose NeuroGround, a neurosymbolic framework that first grounds the plan generation in the perceived environment states and then leverages symbolic planning engines to augment the model-generated plans. Experimental results demonstrate the effectiveness of our framework compared to strong baselines. Our code and dataset are available at https://embodied-planning.github.io.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# 確率的外勾配の加速:分散およびフェデレート学習におけるコミュニケーションの低減を目的としたヘッセンとグラディエント類似性の混合

Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning ( http://arxiv.org/abs/2409.14280v1 )

ライセンス: Link先を確認

Dmitry Bylinkin, Kirill Degtyarev, Aleksandr Beznosikov,

(参考訳) 学習の現代的現実と傾向はモデルのより一般化能力を必要とし、モデルとトレーニングサンプルサイズの両方が増加する。このようなタスクを単一のデバイスモードで解決することは、すでに困難である。これが、分散学習アプローチとフェデレーション学習アプローチが毎日人気を増している理由です。分散コンピューティングはデバイス間の通信を伴うため、効率性とプライバシという2つの重要な問題を解決する必要がある。通信コストと戦うための最もよく知られたアプローチの1つは、ローカルデータの類似性を活用することである。ヘッセンの類似性と同質勾配の両方が文献で研究されているが、別々に研究されている。本稿では、データ類似性とクライアントサンプリングのアイデアを取り入れた新しい手法を解析する上で、これら2つの仮定を組み合わせる。さらに, プライバシー問題に対処するため, 付加雑音の手法を適用し, 提案手法の収束への影響を解析する。この理論は、実際のデータセットのトレーニングによって確認される。

Modern realities and trends in learning require more and more generalization ability of models, which leads to an increase in both models and training sample size. It is already difficult to solve such tasks in a single device mode. This is the reason why distributed and federated learning approaches are becoming more popular every day. Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy. One of the most well-known approaches to combat communication costs is to exploit the similarity of local data. Both Hessian similarity and homogeneous gradients have been studied in the literature, but separately. In this paper, we combine both of these assumptions in analyzing a new method that incorporates the ideas of using data similarity and clients sampling. Moreover, to address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method. The theory is confirmed by training on real datasets.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# Flag Proxy Networks: 量子LDPCコードのアーキテクチャ、スケジューリング、デコード

Flag Proxy Networks: Tackling the Architectural, Scheduling, and Decoding Obstacles of Quantum LDPC codes ( http://arxiv.org/abs/2409.14283v1 )

ライセンス: Link先を確認

Suhas Vittal, Ali Javadi-Abhari, Andrew W. Cross, Lev S. Bishop, Moinuddin Qureshi,

(参考訳) 重要なアプリケーションにおいて指数的スピードアップを達成するためには、量子エラー補正が必要である。平面曲面符号は比較的単純であるため、過去20年間で最も研究されている誤り訂正符号である。しかし、平面曲面符号で特異な論理量子ビットを符号化するには、コード距離~($d$)の物理的量子ビットが必要であり、将来的なアプリケーションに必要な大距離符号には空間非効率である。したがって、平面曲面符号の代替として {\displaystyle {\em Quantum Low-Density Parity-Check (QLDPC) が登場したが、接続性は高い。さらに,これらのコードにはフォールトトレラントシンドロームの抽出と復号化の問題も検討されており,使用上の障害も残っている。本稿では,高次曲面符号と高次カラー符号の2種類のQLDPC符号について考察する。上記の3つの課題に対処する。フラッグ・プロキシ・ネットワーク(FPN)は,フラッグ・プロキシ・キュービットによる低接続を実現する量子符号の一般化可能なアーキテクチャである。本稿では,一般量子符号に対するグレディシンドローム抽出アルゴリズムを提案するとともに,このアルゴリズムをFPN上のフォールトトレラントシンドローム抽出に応用する。フラグ計測を利用して双曲符号を正確に復号する2つの復号器を提案する。我々の研究は、双曲曲面と色符号の次数4のFPNがそれぞれ$2.9\times$と$5.5\times$が$d = 5$平面表面符号よりも空間効率が高く、より高い距離を考慮するとより空間効率が良くなることを示した。双曲符号は、その平面コードに匹敵するエラー率を持つ。

Quantum error correction is necessary for achieving exponential speedups on important applications. The planar surface code has remained the most studied error-correcting code for the last two decades because of its relative simplicity. However, encoding a singular logical qubit with the planar surface code requires physical qubits quadratic in the code distance~($d$), making it space-inefficient for the large-distance codes necessary for promising applications. Thus, {\em Quantum Low-Density Parity-Check (QLDPC)} have emerged as an alternative to the planar surface code but require a higher degree of connectivity. Furthermore, the problems of fault-tolerant syndrome extraction and decoding are understudied for these codes and also remain obstacles to their usage. In this paper, we consider two under-studied families of QLDPC codes: hyperbolic surface codes and hyperbolic color codes. We tackle the three challenges mentioned above as follows. {\em First}, we propose {\em Flag-Proxy Networks (FPNs)}, a generalizable architecture for quantum codes that achieves low connectivity through flag and proxy qubits. {\em Second}, we propose a {\em greedy syndrome extraction scheduling} algorithm for general quantum codes and further use this algorithm for fault-tolerant syndrome extraction on FPNs. {\em Third}, we present two decoders that leverage flag measurements to decode the hyperbolic codes accurately. Our work finds that degree-4 FPNs of the hyperbolic surface and color codes are respectively $2.9\times$ and $5.5\times$ more space-efficient than the $d = 5$ planar surface code, and become even more space-efficient when considering higher distances. The hyperbolic codes also have error rates comparable to their planar counterparts.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# ESPERANTO:テキスト生成のためのAI検出におけるロバスト性を高めるための合成句の評価

ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination ( http://arxiv.org/abs/2409.14285v1 )

ライセンス: Link先を確認

Navid Ayoobi, Lily Knab, Wen Cheng, David Pantoja, Hamidreza Alikhani, Sylvain Flamant, Jin Kim, Arjun Mukherjee,

(参考訳) 大規模言語モデル(LLM)は、様々な領域で重要な有用性を示すが、学術的不正行為や誤情報の拡散など、非倫理的目的の搾取に同時に影響を受けやすい。その結果,AIによるテキスト検出システムが出現した。しかし、これらの検出メカニズムは、回避技術に対する脆弱性を示し、テキスト操作に対する堅牢性を欠いている。本稿では,検出回避のための新しい手法としてバックトランスレーションを導入し,電流検出システムのロバスト性を高める必要性を浮き彫りにした。提案手法では、AI生成したテキストを複数の言語で翻訳し、その後に英語に翻訳する。本稿では、これらの裏書きされたテキストを組み合わせて、オリジナルのAI生成テキストの操作されたバージョンを生成するモデルを提案する。その結果,操作したテキストは元の意味を保ちつつ,既存の検出手法の真の正の率(TPR)を著しく低減していることがわかった。我々は,この手法を,オープンソースと3つのプロプライエタリなシステムを含む9つのAI検出器上で評価し,バックトランスレーション操作に対する感受性を明らかにした。既存のAIテキスト検出装置が抱える欠点に対処し,この形態の操作に対する堅牢性を改善するための対策を提案する。提案手法のTPRは, 逆翻訳操作後, 1.85%しか低下しないことがわかった。さらに,8つの LLM を用いて 720k テキストの大規模なデータセットを構築した。本データセットは,提案手法と既存の検出器の性能を評価するために,各種ドメインの人間によるテキストとLLMによるテキストの両方を格納する。このデータセットは、研究コミュニティの利益のために公開されています。

While large language models (LLMs) exhibit significant utility across various domains, they simultaneously are susceptible to exploitation for unethical purposes, including academic misconduct and dissemination of misinformation. Consequently, AI-generated text detection systems have emerged as a countermeasure. However, these detection mechanisms demonstrate vulnerability to evasion techniques and lack robustness against textual manipulations. This paper introduces back-translation as a novel technique for evading detection, underscoring the need to enhance the robustness of current detection systems. The proposed method involves translating AI-generated text through multiple languages before back-translating to English. We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text. Our findings demonstrate that the manipulated text retains the original semantics while significantly reducing the true positive rate (TPR) of existing detection methods. We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems, revealing their susceptibility to back-translation manipulation. In response to the identified shortcomings of existing AI text detectors, we present a countermeasure to improve the robustness against this form of manipulation. Our results indicate that the TPR of the proposed method declines by only 1.85% after back-translation manipulation. Furthermore, we build a large dataset of 720k texts using eight different LLMs. Our dataset contains both human-authored and LLM-generated texts in various domains and writing styles to assess the performance of our method and existing detectors. This dataset is publicly shared for the benefit of the research community.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# 環境工学のためのオフショア風力エネルギーのオピニオンマイニング

Opinion Mining on Offshore Wind Energy for Environmental Engineering ( http://arxiv.org/abs/2409.14292v1 )

ライセンス: Link先を確認

Isabele Bittencourt, Aparna S. Varde, Pankaj Lal,

(参考訳) 本稿では,ソーシャルメディアデータに対する感情分析を行い,オフショア風力エネルギーに関する世論調査を行う。我々は3つの機械学習モデル、すなわちTextBlob, VADER, SentiWordNetを適用する。 TextBlobは、主観性分析と極性分類を提供する。 VADERは累積的な感情スコアを提供する。 SentiWordNetは、感情を文脈を参照して考慮し、それに従って分類を行う。 NLPの手法は、ソーシャルメディアのテキストデータから意味を収集するために利用される。データ視覚化ツールは、全体的な結果を表示するために好適にデプロイされる。この作業は、大量意見の関与による市民科学やスマートガバナンスと密接に結びついており、意思決定支援のガイドとなっている。機械学習とNLPの役割を例示します。

In this paper, we conduct sentiment analysis on social media data to study mass opinion about offshore wind energy. We adapt three machine learning models, namely, TextBlob, VADER, and SentiWordNet because different functions are provided by each model. TextBlob provides subjectivity analysis as well as polarity classification. VADER offers cumulative sentiment scores. SentiWordNet considers sentiments with reference to context and performs classification accordingly. Techniques in NLP are harnessed to gather meaning from the textual data in social media. Data visualization tools are suitably deployed to display the overall results. This work is much in line with citizen science and smart governance via involvement of mass opinion to guide decision support. It exemplifies the role of Machine Learning and NLP here.

翻訳日:2024-11-06 23:26:16 公開日:2024-09-22

# HM3D-OVON:オープン語彙オブジェクトゴールナビゲーションのためのデータセットとベンチマーク

HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation ( http://arxiv.org/abs/2409.14296v1 )

ライセンス: Link先を確認

Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, Sehoon Ha,

(参考訳) 本稿では,従来のObject Goal Navigation(ObjectNav)ベンチマークの範囲と意味範囲を広げる大規模ベンチマークであるHabitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON)を提案する。 HM3DSemデータセットを活用することで、HM3D-OVONは379の異なるカテゴリにわたる15万以上の注釈付きオブジェクトのインスタンスを組み込む。目標オブジェクトを6-20カテゴリの事前定義されたセットに制限する以前のObjectNavデータセットとは対照的に、HM3D-OVONはテスト時にフリーフォーム言語で定義された目標のオープンセットでモデルのトレーニングと評価を容易にする。このオープン語彙の定式化を通じて、HM3D-OVONは、オープン語彙的な方法でテキストによって指定された任意のオブジェクトを検索できるビジュオ・セマンティックなナビゲーション行動の学習を促進する。さらに,HM3D-OVONの様々なアプローチを系統的に評価し,比較した。我々は,HM3D-OVONを用いて,オープン語彙のObjectNavエージェントを訓練し,高い性能を実現し,最先端のObjectNavアプローチよりもローカライゼーションやアクティベーションノイズに頑健であることを確認した。われわれのベンチマークとベースラインの結果が、現実世界の空間をナビゲートしてフリーフォーム言語で特定された家庭用のオブジェクトを見つけ、より柔軟で人間らしいセマンティックなビジュアルナビゲーションへと進む、エンボディエージェントの開発への関心を喚起する。コードとビデオは: naoki.io/ovon.comで入手できる。

We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of real-world environments. In contrast to earlier ObjectNav datasets, which limit goal objects to a predefined set of 6-20 categories, HM3D-OVON facilitates the training and evaluation of models with an open-set of goals defined through free-form language at test-time. Through this open-vocabulary formulation, HM3D-OVON encourages progress towards learning visuo-semantic navigation behaviors that are capable of searching for any object specified by text in an open-vocabulary manner. Additionally, we systematically evaluate and compare several different types of approaches on HM3D-OVON. We find that HM3D-OVON can be used to train an open-vocabulary ObjectNav agent that achieves both higher performance and is more robust to localization and actuation noise than the state-of-the-art ObjectNav approach. We hope that our benchmark and baseline results will drive interest in developing embodied agents that can navigate real-world spaces to find household objects specified through free-form language, taking a step towards more flexible and human-like semantic visual navigation. Code and videos available at: naoki.io/ovon.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# DBSCANアルゴリズムのニューロモルフィックな実装

A Neuromorphic Implementation of the DBSCAN Algorithm ( http://arxiv.org/abs/2409.14298v1 )

ライセンス: Link先を確認

Charles P. Rizzo, James S. Plank,

(参考訳) DBSCANはノイズの存在下でクラスタリングを行うアルゴリズムである。本稿では、スパイクニューラルネットワークを用いて、DBSCANをニューロモルフィックに実装するための2つの構成法を提案する。最初の構成は「フラット」と呼ばれ、結果として大きなスパイクニューラルネットワークが高速にアルゴリズムを計算し、5つのステップで計算する。さらに、ネットワークはパイプライン化が可能であり、新しいDBSCAN計算をタイムステップ毎に実行することができる。 2番目の構成は"systolic"と呼ばれ、より小さなネットワークを生成するが、列ごとに複数のタイムステップで入力をスパイクする必要がある。構築の正確な仕様を提供し、実用的なニューロモルフィック・コンピューティング・セッティングで解析する。オープンソース実装も提供しています。

DBSCAN is an algorithm that performs clustering in the presence of noise. In this paper, we provide two constructions that allow DBSCAN to be implemented neuromorphically, using spiking neural networks. The first construction is termed "flat," resulting in large spiking neural networks that compute the algorithm quickly, in five timesteps. Moreover, the networks allow pipelining, so that a new DBSCAN calculation may be performed every timestep. The second construction is termed "systolic", and generates much smaller networks, but requires the inputs to be spiked in over several timesteps, column by column. We provide precise specifications of the constructions and analyze them in practical neuromorphic computing settings. We also provide an open-source implementation.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# 条件付きガウスアンサンブルKalmanフィルタを用いたディープラーニング強化データ同化の競争ベースライン

A competitive baseline for deep learning enhanced data assimilation using conditional Gaussian ensemble Kalman filtering ( http://arxiv.org/abs/2409.14300v1 )

ライセンス: Link先を確認

Zachariah Malik, Romit Maulik,

(参考訳) Ensemble Kalman Filtering (EnKF) はデータ同化の一般的な手法であり、幅広い応用がある。しかし、摂動が非線形であるとき、バニラ EnKF フレームワークは十分に定義されていない。本研究では,条件付きガウス式 EnKF (CG-EnKF) と正規値 EnKF (NS-EnKF) と呼ばれるバニラ型 EnKF の非線形拡張について検討した。次に、これらのモデルを、スコアフィルタ(SF)と呼ばれる最先端のディープラーニングに基づく粒子フィルタと比較する。このモデルは、密度を推定するために高価なスコア拡散モデルを使用し、有効性のために摂動作用素に強い仮定を必要とする。比較の結果, CG-EnKF と NS-EnKF は, ローレンツ96 システムによって与えられる高次元多次元データ同化法において, SF を劇的に上回っていることがわかった。解析の結果,CG-EnKFとNS-EnKFは非ガウス的な雑音摂動を処理できることがわかった。

Ensemble Kalman Filtering (EnKF) is a popular technique for data assimilation, with far ranging applications. However, the vanilla EnKF framework is not well-defined when perturbations are nonlinear. We study two non-linear extensions of the vanilla EnKF - dubbed the conditional-Gaussian EnKF (CG-EnKF) and the normal score EnKF (NS-EnKF) - which sidestep assumptions of linearity by constructing the Kalman gain matrix with the `conditional Gaussian' update formula in place of the traditional one. We then compare these models against a state-of-the-art deep learning based particle filter called the score filter (SF). This model uses an expensive score diffusion model for estimating densities and also requires a strong assumption on the perturbation operator for validity. In our comparison, we find that CG-EnKF and NS-EnKF dramatically outperform SF for a canonical problem in high-dimensional multiscale data assimilation given by the Lorenz-96 system. Our analysis also demonstrates that the CG-EnKF and NS-EnKF can handle highly non-Gaussian additive noise perturbations, with the latter typically outperforming the former.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# UU-Mamba: 心血管拡張のための不確かさを意識したU-Mamba

UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation ( http://arxiv.org/abs/2409.14305v1 )

ライセンス: Link先を確認

Ting Yu Tsai, Li Lin, Shu Hu, Connie W. Tsao, Xin Li, Ming-Ching Chang, Hongtu Zhu, Xin Wang,

(参考訳) 心血管構造のセグメンテーションにおけるディープラーニングモデルの成功に基づいて、特に小さな注釈付きデータセットにおいて、一般化と堅牢性の改善に注目が集まっている。最近の進歩にもかかわらず、現在のアプローチは、大きなデータセットや狭い最適化技術に依存しているため、過度な適合や精度の制限といった課題に直面していることが多い。本稿では,U-Mambaアーキテクチャの拡張であるU-Mambaモデルを紹介する。 Sharpness-Aware Minimization (SAM) を取り入れたモデルにより、損失景観におけるフラットなミニマをターゲットとした一般化が促進される。さらに、地域ベース、分布ベース、画素ベースコンポーネントを組み合わせた不確実性認識損失関数を提案し、局所的特徴とグローバル的特徴の両方をキャプチャすることでセグメンテーション精度を向上させる。 UU-Mambaモデルはすでに優れた性能を示しているが、その一般化とロバスト性を完全に評価するにはさらなるテストが必要である。 ImageCAS(冠状動脈)とAorta(大動脈枝とゾーン)のデータセットを新たに試行することで評価を拡大し、これまでの研究で用いたACDCデータセット(左および右心室)よりも複雑なセグメンテーション課題を提示し、モデルの適応性とレジリエンスを示す。 UU-Mamba は TransUNet, Swin-Unet, nnUNet, nnFormer などの先行モデルよりも優れた性能を示している。さらに,より広範な実験で示すように,モデルの堅牢性とセグメント化の精度をより包括的に評価する。

Building on the success of deep learning models in cardiovascular structure segmentation, increasing attention has been focused on improving generalization and robustness, particularly in small, annotated datasets. Despite recent advancements, current approaches often face challenges such as overfitting and accuracy limitations, largely due to their reliance on large datasets and narrow optimization techniques. This paper introduces the UU-Mamba model, an extension of the U-Mamba architecture, designed to address these challenges in both cardiac and vascular segmentation. By incorporating Sharpness-Aware Minimization (SAM), the model enhances generalization by targeting flatter minima in the loss landscape. Additionally, we propose an uncertainty-aware loss function that combines region-based, distribution-based, and pixel-based components to improve segmentation accuracy by capturing both local and global features. While the UU-Mamba model has already demonstrated great performance, further testing is required to fully assess its generalization and robustness. We expand our evaluation by conducting new trials on the ImageCAS (coronary artery) and Aorta (aortic branches and zones) datasets, which present more complex segmentation challenges than the ACDC dataset (left and right ventricles) used in our previous work, showcasing the model's adaptability and resilience. We confirm UU-Mamba's superior performance over leading models such as TransUNet, Swin-Unet, nnUNet, and nnFormer. Moreover, we provide a more comprehensive evaluation of the model's robustness and segmentation accuracy, as demonstrated by extensive experiments.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# LLM は One-Shot URL 分類器と Explainer である

LLMs are One-Shot URL Classifiers and Explainers ( http://arxiv.org/abs/2409.14306v1 )

ライセンス: Link先を確認

Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga Seneviratne,

(参考訳) 悪意のあるURL分類は、サイバーセキュリティの重要な側面である。既存の作業には、多数の機械学習とディープラーニングベースのURL分類モデルが含まれているが、その多くは、一般的なトレーニングデータセットの欠如に起因する一般化とドメイン適応の問題に悩まされている。さらに、これらのモデルでは、自然言語で与えられたURL分類の説明が得られない。本研究では,この問題に対するLarge Language Models (LLM) の使用について検討し,その実例を示す。具体的には、所与のURLが良性であるかフィッシングであるかを予測するために、Chain-of-Thought(CoT)推論を用いるLLMベースのワンショット学習フレームワークを提案する。 3つのURLデータセットと5つの最先端LLMを使用してフレームワークを評価し、一発のLCMプロンプトが実際に教師付きモデルに近いパフォーマンスを提供し、GPT 4-Turboが最高のモデルであり、Claude 3 Opusが続くことを示した。我々は, LLMの説明を定量的に分析し, LLMによる説明のほとんどは, 教師付き分類器のポストホックな説明と一致し, 高い可読性, 一貫性, 情報性を有することを示す。

Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# ゼイガーのNP完全性とゼロ知識証明

NP-Completeness and Physical Zero-Knowledge Proofs for Zeiger ( http://arxiv.org/abs/2409.14308v1 )

ライセンス: Link先を確認

Suthee Ruangwises,

(参考訳) ゼーガー(Zeiger)は、長方形格子からなる鉛筆パズルで、各セルが水平方向または垂直方向に矢印を向けている。一部の細胞は正の整数も含む。このパズルの目的は、正の整数を、各セルの整数が、そのセルの矢印が指している方向に沿った全てのセルの異なる整数の数に等しいように、すべての無数のセルに埋めることである。本稿では,Zeiger パズルの解答性を決定することは,非等値な正の 3SAT (NAE3SAT+) 問題から還元することで NP 完全であることが証明される。また,Zeigerの物理ゼロ知識証明プロトコルを構築することで,証明者がパズルの解の存在を物理的に示すことができる。

Zeiger is a pencil puzzle consisting of a rectangular grid, with each cell having an arrow pointing in horizontal or vertical direction. Some cells also contain a positive integer. The objective of this puzzle is to fill a positive integer into every unnumbered cell such that the integer in each cell is equal to the number of different integers in all cells along the direction an arrow in that cell points to. In this paper, we prove that deciding solvability of a given Zeiger puzzle is NP-complete via a reduction from the not-all-equal positive 3SAT (NAE3SAT+) problem. We also construct a card-based physical zero-knowledge proof protocol for Zeiger, which enables a prover to physically show a verifier the existence of the puzzle's solution without revealing it.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# Sketch-and-Solve:ランダム化数値線形代数を用いた過決定最小二乗の最適化

Sketch-and-Solve: Optimized Overdetermined Least-Squares Using Randomized Numerical Linear Algebra ( http://arxiv.org/abs/2409.14309v1 )

ライセンス: Link先を確認

Alex Lavaee,

(参考訳) スケッチ・アンド・ソルブ (Sketch-and-solve) は、スケッチ行列を用いて次元を小さくすることで、大規模計算問題に取り組むための強力なパラダイムである。本稿では, 機械学習や信号処理, 数値最適化など, 様々な領域に根ざした, 過度に決定された最小二乗問題の解法として, スケッチ・アンド・ソルジアルゴリズムを適用することに焦点を当てる。本稿では、スケッチ・アンド・ソルブのパラダイムを概観し、濃密・スパースな変種を含む様々なスケッチ演算子を解析する。ランダム化線形代数法を用いて近似解を効率的に計算するSketch-and-Apply (SAA-SAS) アルゴリズムを提案する。大規模最小二乗問題に対する広範な実験により,提案手法は従来のLast-Squares QR (LSQR) アルゴリズムよりも高い性能を示し,精度は同等である。本結果は,大規模数値線形代数問題を効率的に扱う上でのスケッチ・アンド・ソルブ法の可能性を強調した。

Sketch-and-solve is a powerful paradigm for tackling large-scale computational problems by reducing their dimensionality using sketching matrices. This paper focuses on applying sketch-and-solve algorithms to efficiently solve the overdetermined least squares problem, which is fundamental in various domains such as machine learning, signal processing, and numerical optimization. We provide a comprehensive overview of the sketch-and-solve paradigm and analyze different sketching operators, including dense and sparse variants. We introduce the Sketch-and-Apply (SAA-SAS) algorithm, which leverages randomized numerical linear algebra techniques to compute approximate solutions efficiently. Through extensive experiments on large-scale least squares problems, we demonstrate that our proposed approach significantly outperforms the traditional Least-Squares QR (LSQR) algorithm in terms of runtime while maintaining comparable accuracy. Our results highlight the potential of sketch-and-solve techniques in efficiently handling large-scale numerical linear algebra problems.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# ヘラルド単一光子の全繊維源のフルキャラクタリゼーション

Full characterization of an all fiber source of heralded single photons ( http://arxiv.org/abs/2409.14310v1 )

ライセンス: Link先を確認

Yunxiao Zhang, Liang Cui, Xueshi Guo, Wen Zhao, Xiaoying Li, Z. Y. Ou,

(参考訳) 本研究では,パルス励起自発4光波混合から発生する光子対を,市販の分散シフトファイバに用いた1光子源を実証する。 1550nm帯の単一光子源は、光子計数法とホモダイン検出法の両方で特徴付けられる。シーディング効率とモード純度は光子計数により測定でき、真空寄与部はホモダイン検出により検出できる。

We demonstrate a heralded single photon source which is based on the photon pairs generated from pulse pumped spontaneous four wave mixing in a piece of commercially available dispersion shifted fiber. The single photon source at 1550 nm telecom band is characterized with both photon counting technique and homodyne detection method. The heralding efficiency and mode purity can be measured by photon counting while the vacuum contribution part can be found by homodyne detection.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# 不均衡画像分類のための異方性拡散確率モデル

Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification ( http://arxiv.org/abs/2409.14313v1 )

ライセンス: Link先を確認

Jingyu Kong, Yuan Guo, Yu Wang, Yuping Duan,

(参考訳) 現実世界のデータはしばしば長い尾の分布を持ち、尾のサンプルの不足はモデルの一般化能力を著しく制限する。 Denoising Diffusion Probabilistic Models (DDPM) は確率微分方程式理論に基づく生成モデルであり、画像分類タスクにおいて顕著な性能を示した。しかし、既存の拡散確率モデルは、末尾類を分類する際に満足に機能しない。本研究では,不均衡な画像分類問題に対するAnisotropic Diffusion Probabilistic Model (ADPM)を提案する。我々は,データ分布を利用して,前処理中の異なるクラスサンプルの拡散速度を制御し,逆処理におけるデノイザの分類精度を効果的に向上する。具体的には,不均衡な分類問題に対処するために,誤差解析理論に基づく拡散過程の異なるカテゴリの雑音レベルを選択する理論的戦略を提案する。さらに,前処理に先立ってグローバル画像と局所画像を統合し,空間次元におけるモデルの識別能力を高めるとともに,逆処理に意味レベルの文脈情報を組み込んで,モデルの識別力と堅牢性を高める。 4つの医用ベンチマークデータセットの最先端手法との比較により,提案手法の有効性を検証した。その結果, 異方性拡散モデルにより, ヘッドクラスの精度を維持しつつ, 希少クラスの分類精度が著しく向上することが確認された。皮膚病変データセット,PAD-UFES,HAM10000では,元の拡散確率モデルと比較してF1スコアが4%,3%改善した。

Real-world data often has a long-tailed distribution, where the scarcity of tail samples significantly limits the model's generalization ability. Denoising Diffusion Probabilistic Models (DDPM) are generative models based on stochastic differential equation theory and have demonstrated impressive performance in image classification tasks. However, existing diffusion probabilistic models do not perform satisfactorily in classifying tail classes. In this work, we propose the Anisotropic Diffusion Probabilistic Model (ADPM) for imbalanced image classification problems. We utilize the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process. Specifically, we provide a theoretical strategy for selecting noise levels for different categories in the diffusion process based on error analysis theory to address the imbalanced classification problem. Furthermore, we integrate global and local image prior in the forward process to enhance the model's discriminative ability in the spatial dimension, while incorporate semantic-level contextual information in the reverse process to boost the model's discriminative power and robustness. Through comparisons with state-of-the-art methods on four medical benchmark datasets, we validate the effectiveness of the proposed method in handling long-tail data. Our results confirm that the anisotropic diffusion model significantly improves the classification accuracy of rare classes while maintaining the accuracy of head classes. On the skin lesion datasets, PAD-UFES and HAM10000, the F1-scores of our method improved by 4% and 3%, respectively compared to the original diffusion probabilistic model.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# MVPGS: スパースインプットビューからガウススティングのマルチビュープリミティブを発掘する

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views ( http://arxiv.org/abs/2409.14316v1 )

ライセンス: Link先を確認

Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang,

(参考訳) 近年,Neural Radiance Field (NeRF) の進歩により,NVS (Novell View Synthesis) が実現している。 NeRFの高密度な入力要求を減らそうとする試みは数多くあるが、それでも時間を要するトレーニングとレンダリングのプロセスに悩まされている。最近では、3D Gaussian Splatting (3DGS) が、明示的な点ベース表現でリアルタイムな高品質なレンダリングを実現している。しかし、NeRFと同様に、制約の欠如のために列車のビューに過度に適合する傾向がある。本稿では,3次元ガウススプラッティングに基づくマルチビュー先行を探索する数ショットNVS法である「textbf{MVPGS}」を提案する。我々は3DGSの幾何学的初期化の質を高めるために,最近の学習ベースマルチビューステレオ(MVS)を活用している。オーバーフィッティングを緩和するため、計算された幾何学に基づいて、シーンに応じた外観制約を付加するフォワードウォーピング手法を提案する。さらに、適切な最適化収束を促進するためにガウスパラメータに対するビュー一貫性幾何制約を導入し、補償として単眼深度正規化を利用する。実験により,提案手法はリアルタイムレンダリング速度で最先端の性能を実現することを示す。プロジェクトページ:https://zezeaa.github.io/projects/MVPGS/

Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose \textbf{MVPGS}, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed. Project page: https://zezeaaa.github.io/projects/MVPGS/

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# テキストによるビデオ質問応答のためのシーンテキストグラウンドリング

Scene-Text Grounding for Text-Based Video Question Answering ( http://arxiv.org/abs/2409.14319v1 )

ライセンス: Link先を確認

Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua,

(参考訳) テキストベースのビデオ質問応答(TextVideoQA)の既存の取り組みは、不透明な意思決定とシーンテキスト認識への依存で批判されている。本稿では,シーンテキスト領域の時空間的ローカライズをモデルに強制し,シーンテキスト認識からQAを分離し,解釈可能なQAに向けた研究を促進することによって,グラウンドドテキストビデオQAを研究することを提案する。その仕事は3倍の意義がある。まず、シーンテキストのエビデンスを他のショートカットと比較して、回答の予測を推奨する。第2に、シーンテキスト領域を直接視覚的回答として受け入れ、文字列マッチングによる非効率な回答評価の問題を回避している。第3に、ビデオQAとシーンテキスト認識で継承された課題を分離する。これにより、失敗予測の根本原因(例えば、間違ったQAや間違ったシーンテキスト認識など)の診断が可能になる。弱教師付きシーン・テキスト・グラウンドイングとグラウンドド・テキスト・コントラスト学習を両立させるT2S-QAモデルを提案する。評価を容易にするために,52Kのシーンテキスト境界ボックスを,2Kの質問と729の動画に関連する2.2Kの時間セグメント内に配置した新しいデータセットViTXT-GQAを構築した。また,VTXT-GQAを用いて実験を行い,既存の手法の厳密な限界を実証する。 T2S-QAは優れた結果が得られたが、ヒトの葉に対する大きな性能ギャップは改善の余地が十分にある。オラクルのシーンテキスト入力のさらなる分析は、シーンテキスト認識が大きな課題であることを示している。 Grounded TextVideoQAの研究を進めるために、我々のデータセットとコードは \url{https://github.com/zhousheng97/ViTXT-GQA.git} にある。

Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and heavy reliance on scene-text recognition. In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards interpretable QA. The task has three-fold significance. First, it encourages scene-text evidence versus other short-cuts for answer predictions. Second, it directly accepts scene-text regions as visual answers, thus circumventing the problem of ineffective answer evaluation by stringent string matching. Third, it isolates the challenges inherited in VideoQA and scene-text recognition. This enables the diagnosis of the root causes for failure predictions, e.g., wrong QA or wrong scene-text recognition? To achieve Grounded TextVideoQA, we propose the T2S-QA model that highlights a disentangled temporal-to-spatial contrastive learning strategy for weakly-supervised scene-text grounding and grounded TextVideoQA. To facilitate evaluation, we construct a new dataset ViTXT-GQA which features 52K scene-text bounding boxes within 2.2K temporal segments related to 2K questions and 729 videos. With ViTXT-GQA, we perform extensive experiments and demonstrate the severe limitations of existing techniques in Grounded TextVideoQA. While T2S-QA achieves superior results, the large performance gap with human leaves ample space for improvement. Our further analysis of oracle scene-text inputs posits that the major challenge is scene-text recognition. To advance the research of Grounded TextVideoQA, our dataset and code are at \url{https://github.com/zhousheng97/ViTXT-GQA.git}

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# 映画におけるトロープ付き大言語モデルの物語的推論限界の解き明かす

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses ( http://arxiv.org/abs/2409.14324v1 )

ライセンス: Link先を確認

Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu,

(参考訳) 大型言語モデル (LLM) にはチェーン・オブ・シンクレット (CoT) のプロンプトが備わっており、数学、常識、論理学などの実コンテンツにおいて、重要な多段階の推論能力を示している。しかし、より抽象的な能力を必要とする物語的推論における彼らのパフォーマンスは、まだ解明されていない。本研究は,映画シナプスのトロープを利用して,最先端のLDMの抽象的推論能力を評価し,その低性能を明らかにする。本稿では,これらの課題に対処し,F1スコアを11.8ポイント向上するためのトロープワイズクエリ手法を提案する。さらに, 先行研究は, CoTが多段階推論を強化することを示唆する一方で, 本研究は, CoTが物語内容の幻覚を引き起こす可能性を示し, GPT-4の性能を低下させることを示した。また, トロープ関連テキストトークンを露骨なトロープなしで映画シンプに埋め込み, それらのインジェクションに対するCoTの高感度化を明らかにした。我々の総合的な分析は将来の研究の方向性についての洞察を提供する。

Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the abstract reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, while prior studies suggest that CoT enhances multi-step reasoning, this study shows CoT can cause hallucinations in narrative content, reducing GPT-4's performance. We also introduce an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, revealing CoT's heightened sensitivity to such injections. Our comprehensive analysis provides insights for future research directions.

翻訳日:2024-11-06 23:15:03 公開日:2024-09-22

# ISC4DGF: LLM駆動初期種子コーパス生成による直接グレーボックスファジリングの強化

ISC4DGF: Enhancing Directed Grey-box Fuzzing with LLM-Driven Initial Seed Corpus Generation ( http://arxiv.org/abs/2409.14329v1 )

ライセンス: Link先を確認

Yijiang Xu, Hongrui Jia, Liguo Chen, Xin Wang, Zhengran Zeng, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang, Zhonghai Wu,

(参考訳) ファズテストはソフトウェア脆弱性の特定に不可欠であり、AFLやAngoraのようなカバレッジガイド付きグレーボックスファズーは広範な検出に優れています。しかし、ターゲット検出の必要性が高まるにつれて、特定の脆弱性に焦点を当てたディレクトグレーボックスファジング(DGF)が不可欠になっている。最初のシードコーパスは、ファザーが出発点として使用する、慎重に選択された入力サンプルで構成され、ファザーが探索する経路を決定するのに基本的なものである。十分に設計されたシードコーパスは、ファッザをより効果的にコードの重要な領域へ誘導し、ファッザリングプロセスの効率と成功を改善することができる。その重要性にもかかわらず、多くの研究は、初期種子コーパスの最適化に注意を払わずに指導機構の精錬に集中している。本稿では,Large Language Models (LLMs) を用いた DGF の初期シードコーパス生成手法である ISC4DGF を紹介する。 LLMの深いソフトウェア理解と洗練されたユーザー入力を活用することで、ISC4DGFは特定の脆弱性を効率的に引き起こす正確なシードコーパスを生成する。 AFLに実装され、AFLGo、FairFuzz、Entropicといった最先端のファジターに対してMagmaベンチマークを用いてテストを行い、ISC4DGFは35.63倍のスピードアップと616.10倍の目標到達を達成した。さらに、ISC4DGFは、より効果的にターゲットの脆弱性を検知し、コードカバレッジを減らし操作しながら効率を向上させることに重点を置いている。

Fuzz testing is crucial for identifying software vulnerabilities, with coverage-guided grey-box fuzzers like AFL and Angora excelling in broad detection. However, as the need for targeted detection grows, directed grey-box fuzzing (DGF) has become essential, focusing on specific vulnerabilities. The initial seed corpus, which consists of carefully selected input samples that the fuzzer uses as a starting point, is fundamental in determining the paths that the fuzzer explores. A well-designed seed corpus can guide the fuzzer more effectively towards critical areas of the code, improving the efficiency and success of the fuzzing process. Even with its importance, many works concentrate on refining guidance mechanisms while paying less attention to optimizing the initial seed corpus. In this paper, we introduce ISC4DGF, a novel approach to generating optimized initial seed corpus for DGF using Large Language Models (LLMs). By leveraging LLMs' deep software understanding and refined user inputs, ISC4DGF creates precise seed corpus that efficiently trigger specific vulnerabilities. Implemented on AFL and tested against state-of-the-art fuzzers like AFLGo, FairFuzz, and Entropic using the Magma benchmark, ISC4DGF achieved a 35.63x speedup and 616.10x fewer target reaches. Moreover, ISC4DGF focused on more effectively detecting target vulnerabilities, enhancing efficiency while operating with reduced code coverage.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 粒度を考える:多粒度曲線による画像超解像の動的量子化

Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues ( http://arxiv.org/abs/2409.14330v1 )

ライセンス: Link先を確認

Mingshen Wang, Zhao Zhang, Feng Li, Ke Xu, Kang Miao, Meng Wang,

(参考訳) ダイナミック量子化は、画像超解像(SR)において、競争性能を維持しながら、重いSRモデルのモバイルデバイスへの可能性を拡張することで注目を集めている。既存の手法では、各レイヤとパッチにビットを適応的に割り当て、各ローカル領域のレイヤ間構成を探索する。この利点にもかかわらず、SRの精度と量子化効率のトレードオフにはまだ不足している。これとは別に、各層に対して個別に量子化レベルを適用することは、元の層間関係を乱す可能性があるため、量子化モデルの表現能力は低下する。本研究では,画像の固有特性を生かしたグラニュラーDQを提案する。グラニュラーDQは、局所パッチの多粒度解析を行い、その情報密度をさらに探求し、固有のパッチワイドおよび層不変な動的量子化パラダイムを実現する。具体的には、Granular-DQは、異なるパッチの粗い粒度の表現を識別する粒度ビットコントローラ(GBC)を開発し、画像全体への比例的な寄与を一致させて適切なビット幅割り当てを決定する。本研究では,ビット幅と情報密度の関係を考察し,高ビットパッチのよりきめ細かな動的ビット適応を実現するエントロピー・ト・ビット(E2B)機構を考案する。広範囲な実験により、様々なSRモデルにおける最近の最先端手法よりもグラニュラーDQの優位性と一般化能力が検証された。コードは \url{https://github.com/MmmingS/Granular-DQ.git} で入手できる。

Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Existing methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the trade-off of SR accuracy and quantization efficiency. Apart from this, adapting the quantization level for each layer individually can disturb the original inter-layer relationships, thus diminishing the representation capability of quantized models. In this work, we propose Granular-DQ, which capitalizes on the intrinsic characteristics of images while dispensing with the previous consideration for layer sensitivity in quantization. Granular-DQ conducts a multi-granularity analysis of local patches with further exploration of their information densities, achieving a distinctive patch-wise and layer-invariant dynamic quantization paradigm. Specifically, Granular-DQ initiates by developing a granularity-bit controller (GBC) to apprehend the coarse-to-fine granular representations of different patches, matching their proportional contribution to the entire image to determine the proper bit-width allocation. On this premise, we investigate the relation between bit-width and information density, devising an entropy-to-bit (E2B) mechanism that enables further fine-grained dynamic bit adaption of high-bit patches. Extensive experiments validate the superiority and generalization ability of Granular-DQ over recent state-of-the-art methods on various SR models. Code will be available at \url{https://github.com/MmmingS/Granular-DQ.git}.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# PISR: ポーラリメトリック・ニューラルインシシット表面再構成

PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects ( http://arxiv.org/abs/2409.14331v1 )

ライセンス: Link先を確認

Guangcheng Chen, Yicheng He, Li He, Hong Zhang,

(参考訳) 神経性暗黙表面再構成は近年顕著な進歩を遂げている。複雑な放射率モデリングを頼りにしているにもかかわらず、最先端の手法はテクスチャレスやスペキュラーな表面といまだに苦労している。 RGB画像と異なり、偏光画像は表面正規の方位角に直接的な制約を与えることができる。本稿では,幾何学的に正確な偏光損失を利用して形状を洗練させる新しい手法であるPISRを提案する。さらに、PISRは画像空間の表面の正規化を円滑にし、厳密な形状の歪みを排除し、ハッシュグリッドベースのニューラルサイン距離関数を活用して再構成を加速する。実験の結果、PISRは0.5mmのL1チャンファー距離と1mmのFスコアが99.5%であり、従来の偏光面再構成法よりも4～30倍高速であることがわかった。

Neural implicit surface reconstruction has achieved remarkable progress recently. Despite resorting to complex radiance modeling, state-of-the-art methods still struggle with textureless and specular surfaces. Different from RGB images, polarization images can provide direct constraints on the azimuth angles of the surface normals. In this paper, we present PISR, a novel method that utilizes a geometrically accurate polarimetric loss to refine shape independently of appearance. In addition, PISR smooths surface normals in image space to eliminate severe shape distortions and leverages the hash-grid-based neural signed distance function to accelerate the reconstruction. Experimental results demonstrate that PISR achieves higher accuracy and robustness, with an L1 Chamfer distance of 0.5 mm and an F-score of 99.5% at 1 mm, while converging 4~30 times faster than previous polarimetric surface reconstruction methods.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 量子カオスにおける情報取得、スクランブル、およびエラーに対する感度

Information acquisition, scrambling, and sensitivity to errors in quantum chaos ( http://arxiv.org/abs/2409.14332v1 )

ライセンス: Link先を確認

Sreeram PG, Abinash Sahu, Naga Dileep Varikuti, Bishal Kumar Das, Sourav Manna, Vaibhav Madhok,

(参考訳) カオスのシグナチャは、古典的なものがカオスである量子系を研究することによって理解することができる。しかし、可積分性、非可積分性、カオスの概念は古典的な類似を持たないシステムにまで拡張される。ここでは、秩序からカオスへの古典的なルートを最初にレビューする。自然は基本的に量子であるため、量子領域におけるカオスがどのように現れるかについて議論する。半古典的手法を簡潔に記述し、量子情報処理におけるカオスの結果について議論する。我々は、時間外順序相関器(OTOC)、コルモゴロフ-シナイ(KS)エントロピーと誤りに対する感度によって定量化されたリアプノフ指数の量子バージョンをレビューする。次に、量子トモグラフィーを用いた量子カオスのシグネチャの研究をレビューする。古典的には、ダイナミクスを正確に知っていれば、軌道の粗い追跡を一定に保ちながら、初期状態に関する指数関数的にきめ細かな情報を得る。量子環境では、測定記録を固定信号対雑音で追跡すると、初期状態に関する情報が増大する。この過程で我々は、量子状態再構成を伴うクリロフ部分空間に広がる作用素の新しい量子化を与えた。これらのシグネチャの研究は理論的な関心だけでなく、実際的な重要性も持っている。

Signatures of chaos can be understood by studying quantum systems whose classical counterpart is chaotic. However, the concepts of integrability, non-integrability and chaos extend to systems without a classical analogue. Here, we first review the classical route from order into chaos. Since nature is fundamentally quantum, we discuss how chaos manifests in the quantum domain. We briefly describe semi-classical methods, and discuss the consequences of chaos in quantum information processing. We review the quantum version of Lyapunov exponents, as quantified by the out-of-time ordered correlators (OTOC), Kolmogorov-Sinai (KS) entropy and sensitivity to errors. We then review the study of signatures of quantum chaos using quantum tomography. Classically, if we know the dynamics exactly, as we maintain a constant coarse-grained tracking of the trajectory, we gain exponentially fine-grained information about the initial condition. In the quantum setting,as we track the measurement record with fixed signal-to-noise, we gain increasing information about the initial condition. In the process, we have given a new quantification of operator spreading in Krylov subspaces with quantum state reconstruction. The study of these signatures is not only of theoretical interest but also of practical importance.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# MQM-APE:LLM翻訳評価器における自動後編集による高品質エラーアノテーション予測

MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators ( http://arxiv.org/abs/2409.14335v1 )

ライセンス: Link先を確認

Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, Dacheng Tao,

(参考訳) 大規模言語モデル(LLM)は、機械翻訳(MT)の品質評価の裁判官として大きな可能性を示し、スコアときめ細かいフィードバックを提供する。 GEMBA-MQMのような手法は、基準のない評価においてSOTAの性能を示すが、予測誤差は人間によって注釈付けされたものとうまく一致せず、フィードバック信号としての解釈可能性を制限する。 LLM評価器によって予測されるエラーアノテーションの品質を高めるために、各エラーに基づいて原文の翻訳を自動ポスト編集(APE)することで非インパクトエラーをフィルタリングし、品質改善に寄与するエラーのみを残すというアイデアに基づいて、普遍的でトレーニング不要なフレームワークである$\textbf{MQM-APE}$を導入する。具体的には LLM が機能するように促します 1) $\textit{evaluator}$ エラーアノテーションを提供する。 2) $\textit{post-editor}$ エラーが品質改善や品質改善に影響を及ぼすかどうかを決定する。 3) $\textit{pairwise quality verifier}$ as the error filter。 GEMBA-MQMに対する誤りの信頼性と品質は,高リソース言語と低リソース言語の両方において8つのLLMにわたって一貫して改善されている。 MQM-APEは、訓練されたアプローチと直交し、T Towerのような翻訳固有の評価器を補完し、その適用性を強調している。さらに,各モジュールの有効性を検証し,評価器の設計とLLMの選択に関する貴重な知見を提供する。コードはコミュニティを促進するためにリリースされます。

Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment, providing both scores and fine-grained feedback. Although approaches such as GEMBA-MQM has shown SOTA performance on reference-free evaluation, the predicted errors do not align well with those annotated by human, limiting their interpretability as feedback signals. To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement. Specifically, we prompt the LLM to act as 1) $\textit{evaluator}$ to provide error annotations, 2) $\textit{post-editor}$ to determine whether errors impact quality improvement and 3) $\textit{pairwise quality verifier}$ as the error filter. Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM, across eight LLMs in both high- and low-resource languages. Orthogonal to trained approaches, MQM-APE complements translation-specific evaluators such as Tower, highlighting its broad applicability. Further analysis confirm the effectiveness of each module and offer valuable insights into evaluator design and LLMs selection. The code will be released to facilitate the community.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# デュアルビジュアルテキストアライメントを用いたゼロショット骨格に基づく行動認識

Zero-Shot Skeleton-based Action Recognition with Dual Visual-Text Alignment ( http://arxiv.org/abs/2409.14336v1 )

ライセンス: Link先を確認

Jidong Kuang, Hongsong Wang, Chaolei Han, Jie Gui,

(参考訳) ゼロショットアクション認識(ゼロショットアクション認識)は、アクション認識におけるスケーラビリティと一般化の問題に対処し、新しいアクションや見えないアクションに動的に適応できるようにする。ゼロショットアクション認識の鍵は、視覚的特徴とアクションカテゴリを表す意味ベクトルの整合にある。既存のほとんどの手法は、視覚的特徴を直接テキストカテゴリのセマンティック空間に投影するか、2つのモード間の共有埋め込み空間を学習する。しかし、直接投影は2つのモダリティを正確に整合させることはできず、視覚的表現とテキスト表現の間の堅牢で差別的な埋め込み空間を学習することはしばしば困難である。これらの問題に対処するために、骨格に基づくゼロショット動作認識のためのデュアルビジュアルテキストアライメント(DVTA)を導入する。 DVTAは2つのアライメントモジュール、DA(Direct Alignment)とAugmented Alignment(Augmented Alignment)で構成され、SDE(Semantic Description Enhancement)が設計されている。 DAモジュールは、特別に設計された視覚プロジェクタを通して、骨格の特徴を意味空間にマッピングし、SDEは、スケルトンとテキストの接続を強化するために、相互アテンションに基づいて、モダリティ間のギャップを減らす。 AAモジュールは、深いメートル法学習を利用して埋め込み空間の学習を強化し、骨格とテキストの類似性を学ぶ。提案手法は、一般的なゼロショットスケルトンに基づく動作認識ベンチマークにおいて、最先端のパフォーマンスを実現する。

Zero-shot action recognition, which addresses the issue of scalability and generalization in action recognition and allows the models to adapt to new and unseen actions dynamically, is an important research topic in computer vision communities. The key to zero-shot action recognition lies in aligning visual features with semantic vectors representing action categories. Most existing methods either directly project visual features onto the semantic space of text category or learn a shared embedding space between the two modalities. However, a direct projection cannot accurately align the two modalities, and learning robust and discriminative embedding space between visual and text representations is often difficult. To address these issues, we introduce Dual Visual-Text Alignment (DVTA) for skeleton-based zero-shot action recognition. The DVTA consists of two alignment modules-Direct Alignment (DA) and Augmented Alignment (AA)-along with a designed Semantic Description Enhancement (SDE). The DA module maps the skeleton features to the semantic space through a specially designed visual projector, followed by the SDE, which is based on cross-attention to enhance the connection between skeleton and text, thereby reducing the gap between modalities. The AA module further strengthens the learning of the embedding space by utilizing deep metric learning to learn the similarity between skeleton and text. Our approach achieves state-of-the-art performances on several popular zero-shot skeleton-based action recognition benchmarks.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# セルフ・スーパービジョンオーディオ・ビジュアル・サウンドスケープ・スティライゼーション

Self-Supervised Audio-Visual Soundscape Stylization ( http://arxiv.org/abs/2409.14340v1 )

ライセンス: Link先を確認

Tingle Li, Renhao Wang, Po-Yao Huang, Andrew Owens, Gopala Anumanchipalli,

(参考訳) 音声はシーンに関する情報を多く伝達し、残響から追加の環境音まで様々な効果をもたらす。本稿では、そのシーンから録音された音声-視覚条件の例から、入力音声を異なるシーンで録音されたかのように操作する。本モデルは,自然映像が繰り返し発生する音のイベントやテクスチャを含むという事実を活かして,自己監督を通じて学習する。ビデオから音声クリップを抽出し、音声強調を行う。次に、ビデオ内の他の場所から撮影した別の音声映像クリップを条件付きヒントとして、潜時拡散モデルを訓練し、元の音声を復元する。このプロセスを通じて、モデルは条件付きサンプルの音響特性を入力音声に転送することを学ぶ。提案手法は,未ラベル・イン・ザ・ワイルドビデオによるトレーニングが成功し,付加的な視覚信号による予測能力の向上が期待できることを示す。ビデオの結果については、プロジェクトのWebページをご覧ください。

Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a different scene, given an audio-visual conditional example recorded from that scene. Our model learns through self-supervision, taking advantage of the fact that natural video contains recurring sound events and textures. We extract an audio clip from a video and apply speech enhancement. We then train a latent diffusion model to recover the original speech, using another audio-visual clip taken from elsewhere in the video as a conditional hint. Through this process, the model learns to transfer the conditional example's sound properties to the input speech. We show that our model can be successfully trained using unlabeled, in-the-wild videos, and that an additional visual signal can improve its sound prediction abilities. Please see our project webpage for video results: https://tinglok.netlify.app/files/avsoundscape/

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# メモリマッチングは不十分:ビデオオブジェクトセグメンテーションのためのメモリマッチングとデコーディングを共同で改善

Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation ( http://arxiv.org/abs/2409.14343v1 )

ライセンス: Link先を確認

Jintu Zheng, Yun Liang, Yuqing Zhang, Wanchao Su,

(参考訳) メモリベースビデオオブジェクトセグメンテーション手法は、メモリバンクを確立することで、時間空間と空間空間の長い複数のオブジェクトをモデル化する。しかし、彼らは偽のマッチングを克服するのに苦労し、重要な情報を失う傾向にあり、異なるオブジェクト間で混乱する。本稿では、マッチングとデコーディングの段階を協調的に改善し、偽マッチング問題を緩和する効果的な手法を提案する。メモリマッチングの段階では、短期記憶のわずかな誤差を抑えるコスト認識機構と、広範囲のオブジェクトスケールのマッチング空間を確立する長期記憶用シャッタート・クロススケールマッチングを提案する。読み出し復号の段階では、マッチング段階で欠落している重要な情報を回復することを目的とした補償機構を実装した。 DAVIS 2016&2017 Val (92.4%&88.1%) と DAVIS 2017 Test (83.9%) は、YouTubeVOS 2018&2019 Valで84.8%&84.6%を達成している。

Memory-based video object segmentation methods model multiple objects over long temporal-spatial spans by establishing memory bank, which achieve the remarkable performance. However, they struggle to overcome the false matching and are prone to lose critical information, resulting in confusion among different objects. In this paper, we propose an effective approach which jointly improving the matching and decoding stages to alleviate the false matching issue.For the memory matching stage, we present a cost aware mechanism that suppresses the slight errors for short-term memory and a shunted cross-scale matching for long-term memory which establish a wide filed matching spaces for various object scales. For the readout decoding stage, we implement a compensatory mechanism aims at recovering the essential information where missing at the matching stage. Our approach achieves the outstanding performance in several popular benchmarks (i.e., DAVIS 2016&2017 Val (92.4%&88.1%), and DAVIS 2017 Test (83.9%)), and achieves 84.8%&84.6% on YouTubeVOS 2018&2019 Val.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 絶対分離状態とPT状態の集合の極点について

On the extreme points of sets of absolulely separable and PPT states ( http://arxiv.org/abs/2409.14347v1 )

ライセンス: Link先を確認

Zhiwei Song, Lin Chen,

(参考訳) 絶対分離可能状態 (resp. PPT) は、任意の大域的ユニタリ演算の下では分離可能状態 (resp. positive partial transpose) のままである。絶対分離状態と PPT 状態の集合における極点のコンパクトな形式を 2 量子および qubit-qudit 系で提示する。結果は、各極点が少なくとも3つの異なる固有値を持つことを示している。可解線型方程式として表される2量子および四量子系における絶対PPT状態の集合の極点を決定する必要十分条件を確立する。また、qutrit-qudit系における任意の極点が、少なくとも7つの異なる固有値を持つことを示す。非絶対分離性(nonabsolute separability)の概念を導入する。状態が他の状態と混同する必要がある最小の量を定量化し、全体状態が完全に分離可能である。このロバスト性は、単項変換、単調性、凸性の下での正則性、不変性、不変性を満たすことを示し、非絶対分離性(英語版)の資源理論における良い測度である。この測度の分析式は、任意の系における純粋状態と、2量子系における階数と階数と混合状態に対して与えられる。

The absolutely separable (resp. PPT) states remain separable (resp. positive partial transpose) under any global unitary operation. We present a compact form of the extreme points in the sets of absolutely separable states and PPT states in two-qubit and qubit-qudit systems. The results imply that each extreme point has at most three distinct eigenvalues. We establish a necessary and sufficient condition for determining extreme points of the set of absolutely PPT states in two-qutrit and qutrit-qudit systems, expressed as solvable linear equations. We also demonstrate that any extreme point in qutrit-qudit system has at most seven distinct eigenvalues. We introduce the concept of robustness of nonabsolute separability. It quantifies the minimal amount by which a state needs to mix with other states such that the overall state is absolutely separable. We show that the robustness satisfies positivity, invariance under unitary transformation, monotonicity and convexity, so it is a good measure within the resource theory of nonabsolute separability. Analytical expressions for this measure are given for pure states in arbitrary system and rank-two mixed states in two-qubit system.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 1D-CNNを用いた文字・口語タミル音声分類のための特徴工学的アプローチ

A Feature Engineering Approach for Literary and Colloquial Tamil Speech Classification using 1D-CNN ( http://arxiv.org/abs/2409.14348v1 )

ライセンス: Link先を確認

M. Nanmalar, S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan,

(参考訳) 理想的なヒューマンコンピュータインタラクション(HCI)では、日常会話で使用される形式であるため、言語の口語形式がほとんどのユーザーに好まれる。しかし、形式的な文体を維持する必要もない。新しいものを受け入れ、古いものを保存することで、共通の人へのサービス(実践性)と言語自体へのサービス(保存性)の両方をレンダリングすることができる。したがって、コンピュータが必要に応じて両方の言語形式で受け入れ、処理し、会話する能力を持つことは理想的である。この問題に対処するためには、まず入力音声の形式を特定することが必要である。このようなフロントエンドシステムは、音声信号の基本となるパターンを捉えることができるいくつかの効果的な特徴に基づいて訓練された、シンプルで効果的で軽量な分類器でなければならない。これを実現するために、時間をかけて特徴の包絡を学習する1次元畳み込みニューラルネットワーク(1D-CNN)を提案する。このネットワークは、最初は特定の手作りの特徴に基づいて訓練され、その後Mel周波数ケプストラム係数(MFCC)を用いて比較を行う。音声のスペクトル特性や時間特性,韻律,声質など,音声の様々な側面に対処するために,手作りの特徴が選択された。特徴は、まず10の並行発話を考慮し、時間に関する各特徴の傾向を観察することによって分析される。提案された1D-CNNは手作りの特徴を使って訓練され、F1スコアは0.9803、MFCCで訓練されたF1スコアは0.9895である。これを踏まえて、特徴アブレーションと特徴の組み合わせを探索する。最高の手工芸品がMFCCと組み合わせられる場合、F1スコアは0.9946である。

In ideal human computer interaction (HCI), the colloquial form of a language would be preferred by most users, since it is the form used in their day-to-day conversations. However, there is also an undeniable necessity to preserve the formal literary form. By embracing the new and preserving the old, both service to the common man (practicality) and service to the language itself (conservation) can be rendered. Hence, it is ideal for computers to have the ability to accept, process, and converse in both forms of the language, as required. To address this, it is first necessary to identify the form of the input speech, which in the current work is between literary and colloquial Tamil speech. Such a front-end system must consist of a simple, effective, and lightweight classifier that is trained on a few effective features that are capable of capturing the underlying patterns of the speech signal. To accomplish this, a one-dimensional convolutional neural network (1D-CNN) that learns the envelope of features across time, is proposed. The network is trained on a select number of handcrafted features initially, and then on Mel frequency cepstral coefficients (MFCC) for comparison. The handcrafted features were selected to address various aspects of speech such as the spectral and temporal characteristics, prosody, and voice quality. The features are initially analyzed by considering ten parallel utterances and observing the trend of each feature with respect to time. The proposed 1D-CNN, trained using the handcrafted features, offers an F1 score of 0.9803, while that trained on the MFCC offers an F1 score of 0.9895. In light of this, feature ablation and feature combination are explored. When the best ranked handcrafted features, from the feature ablation study, are combined with the MFCC, they offer the best results with an F1 score of 0.9946.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 自然言語処理によるテキスト分類によるバーンアウトの表示:オンラインデータから実世界データへ

Using Natural Language Processing to find Indication for Burnout with Text Classification: From Online Data to Real-World Data ( http://arxiv.org/abs/2409.14357v1 )

ライセンス: Link先を確認

Mascha Kurpicz-Briki, Ghofrane Merhbene, Alexandre Puttick, Souhir Ben Souissi, Jannic Bieri, Thomas Jörg Müller, Christoph Golz,

(参考訳) ICD-11の症候群に分類されるバーンアウトは、効果的に管理されていない慢性的な職場ストレスから生じる。疲労、シニシズム、職業効果の低下を特徴とし、測定方法の不整合によりその有病率は著しく異なる。自然言語処理(NLP)と機械学習の最近の進歩は、テキストデータ分析を通じてバーンアウトを検出するための有望なツールを提供する。本稿では,ドイツ語テキストのバーンアウト検出に寄与する。 (a)自由文回答とOldenburg Burnout Inventory(OLBI)応答を含む匿名現実世界データセットの収集 b) オンラインデータに基づいて訓練されたジャーマンバートに基づく分類器の限界を示すこと (c) 実世界のアプリケーションでよく機能するモデルを生成する、キュレートされたBurnoutExpressionsデータセットの2つのバージョンを提示します。 (d)バーンアウト検出に使用されるAIモデルの解釈可能性に関する学際的焦点グループからの質的な洞察を提供する。我々の発見は、燃え尽き症候群の検出モデルを洗練するために、AI研究者と臨床専門家とのより深いコラボレーションの必要性を強調した。さらに、NLP研究で開発された現在のAIメソッドの有効性を検証し、強化するためには、より現実的なデータが不可欠である。これは、AIツールが実用的なアプリケーションに適していることを保証するために不可欠である。

Burnout, classified as a syndrome in the ICD-11, arises from chronic workplace stress that has not been effectively managed. It is characterized by exhaustion, cynicism, and reduced professional efficacy, and estimates of its prevalence vary significantly due to inconsistent measurement methods. Recent advancements in Natural Language Processing (NLP) and machine learning offer promising tools for detecting burnout through textual data analysis, with studies demonstrating high predictive accuracy. This paper contributes to burnout detection in German texts by: (a) collecting an anonymous real-world dataset including free-text answers and Oldenburg Burnout Inventory (OLBI) responses; (b) demonstrating the limitations of a GermanBERT-based classifier trained on online data; (c) presenting two versions of a curated BurnoutExpressions dataset, which yielded models that perform well in real-world applications; and (d) providing qualitative insights from an interdisciplinary focus group on the interpretability of AI models used for burnout detection. Our findings emphasize the need for greater collaboration between AI researchers and clinical experts to refine burnout detection models. Additionally, more real-world data is essential to validate and enhance the effectiveness of current AI methods developed in NLP research, which are often based on data automatically scraped from online sources and not evaluated in a real-world context. This is essential for ensuring AI tools are well suited for practical applications.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# MANTA -- 拡張可能なモデルアダプタネイティブ世代

MANTA -- Model Adapter Native generations that's Affordable ( http://arxiv.org/abs/2409.14363v1 )

ライセンス: Link先を確認

Ansh Chaurasia,

(参考訳) モデル生成アルゴリズムは、パーソナライズされた結果を提供するために、単純で柔軟性のないアダプタの選択に依存している。本稿では,実用ハードウェアにおける過去の作業ファクタリングに対する一般化問題としてモデル適応型合成問題を提案し,その新しいアプローチとしてMANTAを導入する。 COCO 2014バリデーションの実験では、MANTAは画像タスクの多様性と品質において、適度な調整のコストで優れていることが示されている。本システムは,タスクの多様性において9,4 %の勝利率,80 %のタスク品質の勝利率を最もよく知られたシステムに対して達成し,合成データ生成や創造的アートドメインにおいて,直接的な利用の可能性を示す。

The presiding model generation algorithms rely on simple, inflexible adapter selection to provide personalized results. We propose the model-adapter composition problem as a generalized problem to past work factoring in practical hardware and affordability constraints, and introduce MANTA as a new approach to the problem. Experiments on COCO 2014 validation show MANTA to be superior in image task diversity and quality at the cost of a modest drop in alignment. Our system achieves a $94\%$ win rate in task diversity and a $80\%$ task quality win rate versus the best known system, and demonstrates strong potential for direct use in synthetic data generation and the creative art domains.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# 初心者プログラマのための大規模言語モデルによるコードコメントの品質評価

Evaluating the Quality of Code Comments Generated by Large Language Models for Novice Programmers ( http://arxiv.org/abs/2409.14368v1 )

ライセンス: Link先を確認

Aysa Xuemo Fan, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Jiaze Ke,

(参考訳) 大規模言語モデル (LLM) は初心者プログラマにコードコメントを生成することを約束している。本研究は, GPT-4, GPT-3.5-Turbo, Llama2によるコードコメントの指導的品質を評価する。 LeetCodeから‘easy’レベルのJavaソリューションのデータセットを分析してみると、GPT-4は、明快さ、初心者フレンドリさ、概念の解明、ステップバイステップのガイダンスなど、初心者にとって重要な側面において、専門家のコメントに匹敵する品質を示す。 GPT-4は複雑性(chi-square = 11.40, p = 0.001)を議論する上でLlama2よりも優れており、GPT-3.5やマン・ホイットニー U-統計学 = 300.5, 322.5, p = 0.0017, 0.0003) の初心者よりもはるかに支持的であると考えられている。この研究は、初心者プログラマに適したコードコメントを生成するLLMの可能性を強調した。

Large Language Models (LLMs) show promise in generating code comments for novice programmers, but their educational effectiveness remains under-evaluated. This study assesses the instructional quality of code comments produced by GPT-4, GPT-3.5-Turbo, and Llama2, compared to expert-developed comments, focusing on their suitability for novices. Analyzing a dataset of ``easy'' level Java solutions from LeetCode, we find that GPT-4 exhibits comparable quality to expert comments in aspects critical for beginners, such as clarity, beginner-friendliness, concept elucidation, and step-by-step guidance. GPT-4 outperforms Llama2 in discussing complexity (chi-square = 11.40, p = 0.001) and is perceived as significantly more supportive for beginners than GPT-3.5 and Llama2 with Mann-Whitney U-statistics = 300.5 and 322.5, p = 0.0017 and 0.0003). This study highlights the potential of LLMs for generating code comments tailored to novice programmers.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# オープンエンド要求に対するエージェント応答における制約満足度評価のための大規模言語モデルの能力

The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests ( http://arxiv.org/abs/2409.14371v1 )

ライセンス: Link先を確認

Lior Madmoni, Amir Zait, Ilia Labzovsky, Danny Karmon,

(参考訳) 生成AIエージェントは、NORA(No One Right Answer)を持つ複雑なユーザリクエストに応答することがしばしば期待されている。このようなリクエストは、エージェントが従うべき一連の制約を伴います。 NORAシナリオのエージェントをうまく開発するには、正確な自動評価フレームワークが不可欠であり、具体的には、エージェントの応答における制約の満足度を検証することができる。近年,大規模な言語モデル (LLM) が多くのNORAタスクに対して多元的評価法として採用されているが,その制約満足度を評価する能力は未だ不明である。そこで本研究では,ACS(Arithmetic Constraint-Satisfaction)ベンチマークデータセットの開発とリリースを行う。データセットは、対応する制約のある複雑なユーザリクエスト、エージェント応答、応答における各制約の満足度を示すヒューマンラベルで構成されている。このデータセットのユニークな特性は、その制約の多くを検証するには、レスポンス全体をレビューする必要があることである(単一の独立した項目の検証を必要とする他の多くのベンチマークとは対照的に)。さらに、推論、文脈内データ抽出、算術演算、計数を行う際のLCMを評価する。次に、制約満足度の評価にオープンとプロプライエタリの両方のLSMをベンチマークし、ほとんどのモデルにまだ改善のための重要なヘッドルームがあることを示し、エラーは主に推論の問題に起因する。さらに、ほとんどのモデルは歪んだ制約満足度予測パターンを示し、接地構造ラベルが「満足」された場合の精度が高い。最後に,本研究モデルの多くは,導入時に性能が低下していることから,タスクのシュートプロンプトは極めて困難であることが判明した。

Generative AI agents are often expected to respond to complex user requests that have No One Right Answer (NORA), e.g., "design a vegetarian meal plan below 1800 calories". Such requests may entail a set of constraints that the agent should adhere to. To successfully develop agents for NORA scenarios, an accurate automatic evaluation framework is essential, and specifically - one capable of validating the satisfaction of constraints in the agent's response. Recently, large language models (LLMs) have been adopted as versatile evaluators for many NORA tasks, but their ability to evaluate constraint-satisfaction in generated text remains unclear. To study this, we develop and release a novel Arithmetic Constraint-Satisfaction (ACS) benchmarking dataset. The dataset consists of complex user requests with corresponding constraints, agent responses and human labels indicating each constraint's satisfaction level in the response. A unique property of this dataset is that validating many of its constraints requires reviewing the response as a whole (in contrast to many other benchmarks that require the validation of a single independent item). Moreover, it assesses LLMs in performing reasoning, in-context data extraction, arithmetic calculations, and counting. We then benchmark both open and proprietary LLMs on evaluating constraint-satisfaction, and show that most models still have a significant headroom for improvement, and that errors primarily stem from reasoning issues. In addition, most models exhibit a skewed constraint-satisfaction prediction pattern, with higher accuracy where the ground-truth label is "satisfied". Lastly, few-shot prompting for our task proved to be rather challenging, since many of the studied models showed a degradation in performance when it was introduced.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# AIシステムへの適切な信頼性を実現するための介入としてデバッグする

To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems ( http://arxiv.org/abs/2409.14377v1 )

ライセンス: Link先を確認

Gaole He, Abri Bharos, Ujwal Gadiraju,

(参考訳) 強力な予測AIシステムは、人間の意思決定を増強する大きな可能性を証明している。最近の実証研究は、最適な人間とAIのコラボレーションのビジョンは、AIシステムに対する人間の「適切な依存」を必要とすると主張している。しかし、特にAIシステムに関連するパフォーマンスフィードバックがない場合には、インスタンスレベルでAIアドバイスの信頼性を正確に見積もることは非常に難しい。実際には、アウト・オブ・ディストリビューションデータにおける機械学習モデルのパフォーマンス格差は、データセット固有のパフォーマンスフィードバックを、人間とAIのコラボレーションでは信頼できないものにしている。批判的思考と批判的マインドセットに関する既存の文献にヒントを得て、適切な信頼を育むための介入として、AIシステムをデバッグすることを提案する。本稿では,デバッギング設定におけるAI性能の批判的評価が,ユーザのAIシステム評価を校正し,より適切な信頼性をもたらすかどうかを検討する。定量的実証研究(N = 234)により,提案するデバッグ介入は,適切な依存を促す上では期待通りに機能しないことがわかった。その代わりに、介入後のAIシステムへの依存の減少を観察します。我々は、不適切な信頼パターンの発生を説明するために、異なるパフォーマンスレベルを持つグループ間でのユーザ信頼度とAI信頼度の推定のダイナミクスについて検討する。本研究は、適切な信頼とより良い人間とAIのコラボレーションを促進するために効果的な介入を設計する上で重要な意味を持つ。

Powerful predictive AI systems have demonstrated great potential in augmenting human decision making. Recent empirical work has argued that the vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. However, accurately estimating the trustworthiness of AI advice at the instance level is quite challenging, especially in the absence of performance feedback pertaining to the AI system. In practice, the performance disparity of machine learning models on out-of-distribution data makes the dataset-specific performance feedback unreliable in human-AI collaboration. Inspired by existing literature on critical thinking and a critical mindset, we propose the use of debugging an AI system as an intervention to foster appropriate reliance. In this paper, we explore whether a critical evaluation of AI performance within a debugging setting can better calibrate users' assessment of an AI system and lead to more appropriate reliance. Through a quantitative empirical study (N = 234), we found that our proposed debugging intervention does not work as expected in facilitating appropriate reliance. Instead, we observe a decrease in reliance on the AI system after the intervention -- potentially resulting from an early exposure to the AI system's weakness. We explore the dynamics of user confidence and user estimation of AI trustworthiness across groups with different performance levels to help explain how inappropriate reliance patterns occur. Our findings have important implications for designing effective interventions to facilitate appropriate reliance and better human-AI collaboration.

翻訳日:2024-11-06 23:04:03 公開日:2024-09-22

# GroupDiff: 拡散に基づくグループポートレート編集

GroupDiff: Diffusion-based Group Portrait Editing ( http://arxiv.org/abs/2409.14379v1 )

ライセンス: Link先を確認

Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu,

(参考訳) グループ肖像画編集は、ユーザーが常に人を追加したり、削除したり、既存の人を操ったりすることを望んでいるため、非常に望ましい。また、人間同士の相互作用の複雑なダイナミクスや多様なジェスチャーによっても困難である。本稿では,グループ写真編集の先駆的取り組みであるGroupDiffを紹介する。 1) データエンジン:グループ写真編集のためのラベル付きデータがないため、トレーニング用のペアデータを生成するデータエンジンを作成します。トレーニングデータエンジンは、グループ肖像画編集の多様なニーズをカバーしている。 2) 外観保存: 編集後の外観の整合性を維持するため, グループ写真からの人物像を注目モジュールに注入し, 骨格を用いて人体内指導を行う。 3)制御フレキシビリティ:各人物の位置を示す境界ボックスを用いて注意行列を重み付けし、各人物の特徴を正しい場所に注入する。この対人的指導は、操作の柔軟な方法を提供する。大規模な実験では、GroupDiffは既存の方法と比較して最先端のパフォーマンスを示している。 GroupDiffは、オリジナルの写真の忠実さを編集し、維持するためのコントロール機能を提供する。

Group portrait editing is highly desirable since users constantly want to add a person, delete a person, or manipulate existing persons. It is also challenging due to the intricate dynamics of human interactions and the diverse gestures. In this work, we present GroupDiff, a pioneering effort to tackle group photo editing with three dedicated contributions: 1) Data Engine: Since there is no labeled data for group photo editing, we create a data engine to generate paired data for training. The training data engine covers the diverse needs of group portrait editing. 2) Appearance Preservation: To keep the appearance consistent after editing, we inject the images of persons from the group photo into the attention modules and employ skeletons to provide intra-person guidance. 3) Control Flexibility: Bounding boxes indicating the locations of each person are used to reweight the attention matrix so that the features of each person can be injected into the correct places. This inter-person guidance provides flexible manners for manipulation. Extensive experiments demonstrate that GroupDiff exhibits state-of-the-art performance compared to existing methods. GroupDiff offers controllability for editing and maintains the fidelity of the original photos.

翻訳日:2024-11-06 22:52:53 公開日:2024-09-22

# 大規模言語モデルにおける層の重要性の調査

Investigating Layer Importance in Large Language Models ( http://arxiv.org/abs/2409.14381v1 )

ライセンス: Link先を確認

Yang Zhang, Yanfei Dong, Kenji Kawaguchi,

(参考訳) 大規模言語モデル (LLM) は、テキストの理解と処理に際し、注目を集めている。しかし、LLMはいまだに不透明である。 LLMの理解の欠如は、安全クリティカルなシナリオへの展開を妨げ、より良いモデルの開発を妨げる。本研究では,LLMにおける個々の層の重要性を調査し,LLMの理解を深める。本稿では,特徴属性とデータ評価に広く用いられている説明フレームワークであるShapley値を用いて,レイヤの重要性を忠実に評価する効率的なサンプリング手法を提案する。さらに,特定の層を排除して生じる性能劣化を評価するために,層アブレーション実験を実施している。以上の結果から,岩盤層の存在が明らかとなり,初期層が他の層に対して支配的な寄与を示すことが示唆された。 1つのグラウト層を除去すると、モデルの性能が劇的に低下し、しばしばランダムな推測に還元される。逆に、非コーナストーン層を除去すると、パフォーマンスの限界が変更される。本研究は, LLMの基盤層を同定し, 今後の研究におけるその重要な役割を浮き彫りにする。

Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. Nevertheless, LLMs largely remain opaque. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs. We propose an efficient sampling method to faithfully evaluate the importance of layers using Shapley values, a widely used explanation framework in feature attribution and data valuation. In addition, we conduct layer ablation experiments to assess the performance degradation resulting from the exclusion of specific layers. Our findings reveal the existence of cornerstone layers, wherein certain early layers can exhibit a dominant contribution over others. Removing one cornerstone layer leads to a drastic collapse of the model performance, often reducing it to random guessing. Conversely, removing non-cornerstone layers results in only marginal performance changes. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.

翻訳日:2024-11-06 22:52:53 公開日:2024-09-22

# 顔超解像のための事前知識蒸留ネットワーク

Prior Knowledge Distillation Network for Face Super-Resolution ( http://arxiv.org/abs/2409.14385v1 )

ライセンス: Link先を確認

Qiu Yang, Xiao Sun, Xin-yu Li, Feng-Qi Cui, Yu-Tong Guo, Shuang-Zhen Hu, Ping Luo, Si-Ying Li,

(参考訳) 顔超解像(FSR)の目的は、低分解能(LR)入力から高分解能(HR)顔画像を再構成することである。ディープラーニング技術の継続的な進歩により、現代の先進的なFSR法は、当初は顔の事前を推定し、この情報を用いて超解像再構成プロセスを支援する。しかし、事前推定の正確性を保証することは依然として困難であり、単純なカスケードと畳み込み操作は、しばしば事前の知識を十分に活用できない。不正確な、あるいは不十分に利用された事前情報は、必然的にFSR性能を低下させる。この問題に対処するため,教師ネットワークから学生ネットワークに事前情報を転送するFSRのための事前知識蒸留ネットワーク(PKDN)を提案する。このアプローチにより、ネットワークは、テスト段階では低解像度の顔画像のみを頼りながら、トレーニング段階での事前学習を可能にし、事前推定の不正確さによる悪影響を軽減することができる。さらに,事前情報を効果的に活用する解析マップ融合ブロックの設計に,ロバストな注意機構を取り入れた。特徴の喪失を防止するため,特徴抽出段階ではマルチスケールの特徴を保ち,その後の超解像再構成プロセスで採用する。ベンチマークデータセットによる実験結果から,我々のPKDNアプローチは,高品質な顔画像を生成する上で,既存のFSR手法を超越していることが示された。

The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation remains challenging, and straightforward cascading and convolutional operations often fail to fully leverage prior knowledge. Inaccurate or insufficiently utilized prior information inevitably degrades FSR performance. To address this issue, we propose a prior knowledge distillation network (PKDN) for FSR, which involves transferring prior information from the teacher network to the student network. This approach enables the network to learn priors during the training stage while relying solely on low-resolution facial images during the testing stage, thus mitigating the adverse effects of prior estimation inaccuracies. Additionally, we incorporate robust attention mechanisms to design a parsing map fusion block that effectively utilizes prior information. To prevent feature loss, we retain multi-scale features during the feature extraction stage and employ them in the subsequent super-resolution reconstruction process. Experimental results on benchmark datasets demonstrate that our PKDN approach surpasses existing FSR methods in generating high-quality face images.

翻訳日:2024-11-06 22:52:53 公開日:2024-09-22

# 非エルミートハミルトニアンによるTE波とTM波の散乱と量子力学

Scattering of TE and TM waves and quantum dynamics generated by non-Hermitian Hamiltonians ( http://arxiv.org/abs/2409.14386v1 )

ライセンス: Link先を確認

Farhang Loran, Ali Mostafazadeh,

(参考訳) 平面対称性を持つ線形等方性媒質による電磁波の散乱の研究は、TEおよびTMモードの散乱に還元することができる。媒体が平行な均一なスラブで構成されている状況では、標準転送行列技術を用いてこれらのモードの散乱問題に対処することができる。本手法は, TEおよびTM波の散乱の動的定式化を提案し, 有効非単項量子系の進化演算子を用いて媒体の遷移行列を与えることにより, 不均一誘電率および透過性プロファイルに拡張する。これにより、反射振幅と透過振幅の力学方程式の系が導かれる。これらの方程式を分離することにより、TEモードとTMモードの散乱問題の解を、リカティ方程式の初期値問題に還元する。 TE波やTM波を反射しない媒体を所定の波数と入射角で同定する上で,この観測の適用について論じる。

The study of the scattering of electromagnetic waves by a linear isotropic medium with planar symmetry can be reduced to that of their TE and TM modes. For situations where the medium consists of parallel homogeneous slabs, one may use the standard transfer matrix technique to address the scattering problem for these modes. We extend the utility of this technique to inhomogeneous permittivity and permeability profiles by proposing a dynamical formulation of the scattering of TE and TM waves in which the transfer matrix for the medium is given in terms of the evolution operator for an effective non-unitary quantum system. This leads to a system of dynamical equations for the reflection and transmission amplitudes. Decoupling these equations we reduce the solution of the scattering problem for TE and TM modes to that of an initial-value problem for a Riccati equation. We discuss the application of this observation in identifying media that do not reflect TE or TM waves with given wavenumber and incidence angle.

翻訳日:2024-11-06 22:52:53 公開日:2024-09-22

# 新しい視点の定義:エンタープライズ情報ガバナンス

Defining a new perspective: Enterprise Information Governance ( http://arxiv.org/abs/2409.14388v1 )

ライセンス: Link先を確認

Alastair McCullough,

(参考訳) 本稿では,組織における情報及びデータ資産に関する意思決定権の管理において,説明責任を保証するための制御機構を通じて機能する戦略的な枠組みとして,規制エンタープライズ情報ガバナンスの新たな定義を提唱する。この新たな実践的定義は、実践者と学者の両方の視点を捉えている。この定義は、新しい、より明確に規制されたアプローチを取り入れ、そのようなガバナンスのための新しい定義を合成するために、以前の定義に基づいて構築されている。この論文は学術的な考察とさらなる研究を支援する。情報とデータの定義、情報とデータに関する戦略、データ管理に関する戦略、エンタープライズアーキテクチャの戦略、ガバナンスの戦略的な取り組みとしてのガバナンス、そしてそのようなガバナンスの基礎となる戦略的および戦術的政策と標準の性質について考察する。

This paper adduces a novel definition of regulatory enterprise information governance as a strategic framework that acts through control mechanisms designed to assure accountability in managing decision rights over information and data assets in organizations. This new pragmatic definition takes the perspectives of both the practitioner and of the scholar. It builds upon earlier definitions to take a novel and more clearly regulatory approach and to synthesize a new definition for such governance; to build out a view of it as a scalable regulatory framework for large or complex organizations that sees governance from this new perspective as a business architecture or target operating model in this increasingly critical domain. The paper supports and enables scholarly consideration and further research. It looks at definitions of information and data; of strategy in relation to information and data; of data management; of enterprise architecture; of governance, and governance as a type of strategic endeavor, and of the nature of strategic and tactical policies and standards that form the basis for such governance.

翻訳日:2024-11-06 22:52:53 公開日:2024-09-22

# MaskedMimic: Masked Motion Inpaintingによる統一された物理ベースの文字制御

MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting ( http://arxiv.org/abs/2409.14393v1 )

ライセンス: Link先を確認

Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, Xue Bin Peng,

(参考訳) さまざまなシナリオにまたがって、対話的なキャラクターに人生を吹き込むことのできる、多用途な物理ベースのコントローラーを作れば、キャラクターアニメーションのエキサイティングなフロンティアになる。理想的なコントローラは、スパースターゲットキーフレーム、テキスト命令、シーン情報など、多様な制御モダリティをサポートする必要がある。以前の研究では、物理的にシミュレートされたシーン認識制御モデルが提案されていたが、これらのシステムは、それぞれのタスクの狭いセットと制御モダリティを専門とするコントローラの開発に主に焦点を絞っている。この研究はMaskedMimicという、物理に基づく文字制御を一般的な運動インペイント問題として定式化する新しいアプローチを提示している。私たちの重要な洞察は、マスクされたキーフレーム、オブジェクト、テキスト記述、あるいはそれらの組み合わせのような部分的な(マスキングされた)モーション記述から、モーションを合成するための単一の統一モデルをトレーニングすることです。これは、動作追跡データを活用するとともに、多様な動作記述を効果的に活用してコヒーレントなアニメーションを生成するスケーラブルなトレーニング手法を設計することで実現される。このプロセスを通じて,興味のあるすべての動作に対して面倒な報酬工学を必要とせず,直感的な制御インタフェースを提供する物理ベースのコントローラを学習する。コントローラは幅広い制御モードをサポートし、異なるタスク間のシームレスな遷移を可能にする。 MaskedMimicは、モーションインペイントによる文字制御を統一することにより、多目的な仮想文字を生成する。これらのキャラクターは複雑なシーンに動的に適応し、必要に応じて多様な動きを構成でき、よりインタラクティブで没入的な体験を可能にする。

Crafting a single, versatile physics-based controller that can breathe life into interactive characters across a wide spectrum of scenarios represents an exciting frontier in character animation. An ideal controller should support diverse control modalities, such as sparse target keyframes, text instructions, and scene information. While previous works have proposed physically simulated, scene-aware control models, these systems have predominantly focused on developing controllers that each specializes in a narrow set of tasks and control modalities. This work presents MaskedMimic, a novel approach that formulates physics-based character control as a general motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training method that can effectively utilize diverse motion descriptions to produce coherent animations. Through this process, our approach learns a physics-based controller that provides an intuitive control interface without requiring tedious reward engineering for all behaviors of interest. The resulting controller supports a wide range of control modalities and enables seamless transitions between disparate tasks. By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters. These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# スパースビュートモグラフィ再構成のための周波数規則化ニューラル表現法

Frequency-regularized Neural Representation Method for Sparse-view Tomographic Reconstruction ( http://arxiv.org/abs/2409.14394v1 )

ライセンス: Link先を確認

Jingmou Xian, Jian Zhu, Haolin Liao, Si Li,

(参考訳) スパース・ビュー・トモグラフィーは放射線線量削減と臨床応用性向上のための重要な方向である。多くの研究がスパース2次元投影からのトモグラフィ画像の再構成を提案しているが、既存のモデルはスパース入力画像内の低周波成分を見落としながら、過度に高周波情報に集中する傾向にある。高周波情報に対するこのバイアスは、しばしば過度に適合し、特に再建されたスライスにおけるエッジとバウンダリで強められる。本稿では,周波数正規化ニューラル減衰/活動場(Freq-NAF)を自己教師付きスパース・ビュー・トモグラフィーの再構成に適用する。 Freq-NAFは、ニューラルネットワーク入力の可視周波数帯域を直接制御し、周波数正規化を導入することで過度な適合を緩和する。このアプローチは、高周波と低周波の情報を効果的にバランスさせる。 CBCTおよびSPECTデータセットの数値実験を行い,その精度を実証した。

Sparse-view tomographic reconstruction is a pivotal direction for reducing radiation dose and augmenting clinical applicability. While many research works have proposed the reconstruction of tomographic images from sparse 2D projections, existing models tend to excessively focus on high-frequency information while overlooking low-frequency components within the sparse input images. This bias towards high-frequency information often leads to overfitting, particularly intense at edges and boundaries in the reconstructed slices. In this paper, we introduce the Frequency Regularized Neural Attenuation/Activity Field (Freq-NAF) for self-supervised sparse-view tomographic reconstruction. Freq-NAF mitigates overfitting by incorporating frequency regularization, directly controlling the visible frequency bands in the neural network input. This approach effectively balances high-frequency and low-frequency information. We conducted numerical experiments on CBCT and SPECT datasets, and our method demonstrates state-of-the-art accuracy.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# 大規模言語モデルを用いたターゲット非依存情報からのユーザスタンス予測

Predicting User Stances from Target-Agnostic Information using Large Language Models ( http://arxiv.org/abs/2409.14395v1 )

ライセンス: Link先を確認

Siyuan Brandon Loh, Liang Ze Wong, Prasanta Bhattacharya, Joseph Simons, Wei Gao, Hong Zhang,

(参考訳) 本研究では,ターゲットを意識しないソーシャルメディア投稿(ユーザレベルのスタンス予測)の収集から,ターゲットに対するユーザのスタンスを予測できるLarge Language Models(LLMs)能力について検討する。 LLMがこのタスクをこなせることを示す初期の証拠を示す一方で、モデル全体の性能にかなりのばらつきが浮かび上がっている。 (i)スタンスターゲットの種類 (二)予測戦略及び予測戦略 (三)対象不明の官職の個数ポストホック分析は、表面レベル(例えば、ターゲット関連キーワード)とユーザレベル機能(例えば、ユーザーの道徳的価値をエンコードする)の両方の存在を通して、LLMに関連情報を提供するターゲット非依存ポストの有用性をさらに示唆している。以上の結果から,LLMは歴史的・目標非依存のデータに基づいて,新たなトピックに対する公衆のスタンスを決定するための有効な方法である可能性が示唆された。同時に、姿勢予測タスクにおけるLCMの強みと、その効果がタスクコンテキストによってどのように変化するかをよりよく理解するために、さらなる研究も求めている。

We investigate Large Language Models' (LLMs) ability to predict a user's stance on a target given a collection of his/her target-agnostic social media posts (i.e., user-level stance prediction). While we show early evidence that LLMs are capable of this task, we highlight considerable variability in the performance of the model across (i) the type of stance target, (ii) the prediction strategy and (iii) the number of target-agnostic posts supplied. Post-hoc analyses further hint at the usefulness of target-agnostic posts in providing relevant information to LLMs through the presence of both surface-level (e.g., target-relevant keywords) and user-level features (e.g., encoding users' moral values). Overall, our findings suggest that LLMs might offer a viable method for determining public stances towards new topics based on historical and target-agnostic data. At the same time, we also call for further research to better understand LLMs' strong performance on the stance prediction task and how their effectiveness varies across task contexts.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# 平らなロラ:平らなロスランドスケープへの低ランク適応

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape ( http://arxiv.org/abs/2409.14396v1 )

ライセンス: Link先を確認

Tao Li, Zhengbao He, Yujun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang,

(参考訳) 微調整された大規模事前訓練モデルは、計算とメモリコストの点で極めて高価である。 Low-Rank Adaptation (LoRA) はパラメータ効率の良いファインチューニング(PEFT)法であり、低ランク行列のみを最適化することで、モデルを微調整する効率的な方法を提供する。 LoRAの性能改善の最近の進歩にもかかわらず、LoRA最適化空間と元の完全なパラメータ空間との接続はしばしば見過ごされる。ロラ空間に平坦に見える解は、全パラメータ空間に鋭い方向が存在し、一般化性能を損なう可能性がある。本稿では、フルパラメータ空間の平坦な領域に位置する低ランク適応を求める効率的なアプローチであるFlat-LoRAを提案し、計算量やメモリ負荷を著しく低減できる、確立されたシャープネス認識最小化アプローチに頼る代わりに、ベイズ予測損失目標によるランダムな重量摂動を利用してトレーニング効率の維持と改良された摂動生成戦略の設計を行う。自然言語処理と様々なアーキテクチャを用いた画像分類タスクの実験により,提案手法の有効性が示された。

Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computational and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, provides an efficient way to fine-tune models by optimizing only a low-rank matrix. Despite recent progress made in improving LoRA's performance, the connection between the LoRA optimization space and the original full parameter space is often overlooked. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. In this paper, we propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.Instead of relying on the well-established sharpness-aware minimization approach, which can incur significant computational and memory burdens, we utilize random weight perturbation with a Bayesian expectation loss objective to maintain training efficiency and design a refined perturbation generation strategy for improved performance. Experiments on natural language processing and image classification tasks with various architectures demonstrate the effectiveness of our approach.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# ハードサンプルがデータ不均衡の正確性に及ぼす影響の検討

Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance ( http://arxiv.org/abs/2409.14401v1 )

ライセンス: Link先を確認

Pawel Pukowski, Haiping Lu,

(参考訳) AutoMLドメインでは、テスト精度がモデルの有効性を評価するための重要な指標として認識され、ニューラルアーキテクチャサーチからハイパーパラメータ最適化まで幅広いアプリケーションを支える。しかし、実験精度の信頼性は、特にラベルノイズがいかに最先端モデルの真のランキングを曖昧にするかを明らかにする研究によって疑問視されている。データセット内のハードサンプルの存在が、テスト精度だけで推測される一般化能力にさらに疑念を抱くという、別の視点に沿って、私たちはさらに先進的です。本研究は, トレーニングセットとテストセット間のハードサンプルの分布が, それらの集合の難易度に影響を及ぼし, モデルの一般化能力に影響を及ぼすことを明らかにした。そこで本研究では,バランスモデル評価の複雑さを浮き彫りにして,より容易かつハードな2つの一般化経路を明らかにした。最後に,この領域におけるよりニュアンスなアプローチの進展を促進するため,ハードサンプル識別法の比較のためのベンチマーク手法を提案する。我々の第一の目的は、決定的な解決策を提案することではなく、クラス内のデータ不均衡問題を導入することで、バランスの取れたデータセットを扱う場合でも、評価基準としてテスト精度に主に依存する制限を強調することです。そこで我々は,研究コミュニティにおける批判的な議論を刺激し,モデル評価基準の範囲を広く検討する研究のための新たな道を開くことを目的としている。匿名のコードは https://github.com/PawPuk/CurvBIM でGPL-3.0 ライセンスの下で利用可能である。

In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy, underpinning a wide array of applications from neural architecture search to hyperparameter optimization. However, the reliability of test accuracy as the primary performance metric has been called into question, notably through research highlighting how label noise can obscure the true ranking of state-of-the-art models. We venture beyond, along another perspective where the existence of hard samples within datasets casts further doubt on the generalization capabilities inferred from test accuracy alone. Our investigation reveals that the distribution of hard samples between training and test sets affects the difficulty levels of those sets, thereby influencing the perceived generalization capability of models. We unveil two distinct generalization pathways-toward easy and hard samples-highlighting the complexity of achieving balanced model evaluation. Finally, we propose a benchmarking procedure for comparing hard sample identification methods, facilitating the advancement of more nuanced approaches in this area. Our primary goal is not to propose a definitive solution but to highlight the limitations of relying primarily on test accuracy as an evaluation metric, even when working with balanced datasets, by introducing the in-class data imbalance problem. By doing so, we aim to stimulate a critical discussion within the research community and open new avenues for research that consider a broader spectrum of model evaluation criteria. The anonymous code is available at https://github.com/PawPuk/CurvBIM blueunder the GPL-3.0 license.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# GraspMamba: 階層的特徴学習を備えた言語駆動型Grasp検出フレームワーク

GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning ( http://arxiv.org/abs/2409.14403v1 )

ライセンス: Link先を確認

Huy Hoang Nguyen, An Vuong, Anh Nguyen, Ian Reid, Minh Nhat Vu,

(参考訳) グラフ検出は、多くの産業アプリケーションの成功に欠かせない基本的なロボット作業である。しかしながら、このタスクの現在の言語駆動モデルは、乱雑なイメージ、長いテキスト記述、遅い推論速度に悩まされることが多い。この課題に対処するために,Mambaビジョンと階層的特徴融合を用いた言語駆動型グリップ検出手法であるGraspMambaを紹介した。本手法は,マンバをベースとしたバックボーンのリッチな視覚的特徴とテキスト情報を活用することにより,マルチモーダルな特徴の融合を効果的に促進する。 GraspMambaは、複数のスケールで視覚と言語の特徴を抽出し、堅牢なパフォーマンスと高速な推論時間を提供する、最初のMambaベースのグリップ検出モデルである。集中的な実験により、GraspMambaは最近の手法よりも明確なマージンで優れていることが示された。実際のロボット実験を通じて、我々のアプローチを検証し、その高速な推論速度を強調します。

Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# COSBO:保守的なオフラインシミュレーションに基づく政策最適化

COSBO: Conservative Offline Simulation-Based Policy Optimization ( http://arxiv.org/abs/2409.14412v1 )

ライセンス: Link先を確認

Eshagh Kargar, Ville Kyrki,

(参考訳) オフライン強化学習は、ライブデプロイメントのデータに関する強化学習モデルのトレーニングを可能にする。しかし、トレーニングデータに存在する行動の最良の組み合わせを選択することは限られている。対照的に、ライブ環境を再現しようとするシミュレーション環境は、ライブデータの代わりに利用できるが、この手法はシミュレーションと現実のギャップによって制限され、バイアスをもたらす。両世界を最大限に活用するために,不完全なシミュレーション環境と対象環境のデータを組み合わせてオフラインの強化学習ポリシーを訓練する手法を提案する。実験により,提案手法はCQL,MOPO,COMBO,特に多種多様かつ挑戦的な動的シナリオにおいて,最先端の手法よりも優れており,様々な実験条件において頑健な動作を示す。その結果,シミュレータ生成データを用いることで,実世界との直接のインタラクションが不可能な場合,シミュレートと現実のギャップにもかかわらず,オフラインポリシ学習を効果的に向上させることができることがわかった。

Offline reinforcement learning allows training reinforcement learning models on data from live deployments. However, it is limited to choosing the best combination of behaviors present in the training data. In contrast, simulation environments attempting to replicate the live environment can be used instead of the live data, yet this approach is limited by the simulation-to-reality gap, resulting in a bias. In an attempt to get the best of both worlds, we propose a method that combines an imperfect simulation environment with data from the target environment, to train an offline reinforcement learning policy. Our experiments demonstrate that the proposed method outperforms state-of-the-art approaches CQL, MOPO, and COMBO, especially in scenarios with diverse and challenging dynamics, and demonstrates robust behavior across a variety of experimental conditions. The results highlight that using simulator-generated data can effectively enhance offline policy learning despite the sim-to-real gap, when direct interaction with the real-world is not possible.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# EDK2ファームウェア欠陥の発見:コード監査ツールからの洞察

Uncovering EDK2 Firmware Flaws: Insights from Code Audit Tools ( http://arxiv.org/abs/2409.14416v1 )

ライセンス: Link先を確認

Mahsa Farahani, Ghazal Shenavar, Ali Hosseinghorban, Alireza Ejlali,

(参考訳) ファームウェアは現代のコンピュータの基盤となるソフトウェア層として機能し、最小限のオペレーティングシステムと同様に、プラットフォームハードウェア上で最初に実行されるコードとして開始される。オペレーティングシステムとプラットフォームファームウェアの間のソフトウェアインターフェースとして定義された統一拡張ファームウェアインタフェース(UEFI)は、システムの初期化と管理を標準化する。 EFI Development Kit II (EDK2) は、ファームウェアアーキテクチャを形成する上で重要な役割を担っている。広く採用されているにもかかわらず、アーキテクチャは、初期段階のシステムリソースの制限や、標準的なセキュリティ機能の欠如といった課題に直面している。さらに、ファームウェア分析用に特別に設計されたオープンソースツールの不足は、適応的で革新的なソリューションの必要性を強調している。本稿では,汎用コード監査ツールのファームウェアへの適用について検討し,EDK2に着目した。これらのツールはもともとファームウェア分析のために設計されたものではないが、ファームウェアのセキュリティを強化する重要な領域を特定するのに有効であることが証明されている。 EDK2にキー監査ツールを配置した結果,これらのツールを方法論に基づいて分類し,ユニークなファームウェア属性を明らかにする能力を示すとともに,ファームウェアセキュリティの理解と改善に大きく貢献した。

Firmware serves as a foundational software layer in modern computers, initiating as the first code executed on platform hardware, similar in function to a minimal operating system. Defined as a software interface between an operating system and platform firmware, the Unified Extensible Firmware Interface (UEFI) standardizes system initialization and management. A prominent open-source implementation of UEFI, the EFI Development Kit II (EDK2), plays a crucial role in shaping firmware architecture. Despite its widespread adoption, the architecture faces challenges such as limited system resources at early stages and a lack of standard security features. Furthermore, the scarcity of open-source tools specifically designed for firmware analysis emphasizes the need for adaptable, innovative solutions. In this paper, we explore the application of general code audit tools to firmware, with a particular focus on EDK2. Although these tools were not originally designed for firmware analysis, they have proven effective in identifying critical areas for enhancement in firmware security. Our findings, derived from deploying key audit tools on EDK2, categorize these tools based on their methodologies and illustrate their capability to uncover unique firmware attributes, significantly contributing to the understanding and improvement of firmware security.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# ドミナント:人間による人間のイメージアニメーションを擁護する

Dormant: Defending against Pose-driven Human Image Animation ( http://arxiv.org/abs/2409.14424v1 )

ライセンス: Link先を確認

Jiachen Zhou, Mingsi Wang, Tianlin Li, Guozhu Meng, Kai Chen,

(参考訳) ポーズ駆動の人間の画像アニメーションは、非常に進歩し、1枚の写真から鮮明でリアルな人間のビデオを生成することができる。しかし、これは逆に画像誤用のリスクを悪化させ、攻撃者は1つの画像を使って政治、暴力、その他の違法コンテンツを含むビデオを作成することができる。この脅威に対処するため,ポーズ駆動型ヒューマンイメージアニメーション技術に対する防御に適した新しい保護手法であるDormantを提案する。 Dormantは、人間の1つのイメージに保護的摂動を適用し、オリジナルと視覚的類似性を保ちながら、品質の悪いビデオ生成をもたらす。保護摂動は、画像から外観特徴の誤抽出を誘発し、生成された映像フレーム間に不整合を生じさせるように最適化される。 8つのアニメーション手法と4つのデータセットにまたがる広範囲な評価は、6つのベースライン保護手法よりもDormantの方が優れていることを示す。さらに、Dormantは、完全なブラックボックスアクセスであっても、6つの現実世界の商用サービスで有効性を示す。

Pose-driven human image animation has achieved tremendous progress, enabling the generation of vivid and realistic human videos from just one single photo. However, it conversely exacerbates the risk of image misuse, as attackers may use one available image to create videos involving politics, violence and other illegal content. To counter this threat, we propose Dormant, a novel protection approach tailored to defend against pose-driven human image animation techniques. Dormant applies protective perturbation to one human image, preserving the visual similarity to the original but resulting in poor-quality video generation. The protective perturbation is optimized to induce misextraction of appearance features from the image and create incoherence among the generated video frames. Our extensive evaluation across 8 animation methods and 4 datasets demonstrates the superiority of Dormant over 6 baseline protection methods, leading to misaligned identities, visual distortions, noticeable artifacts, and inconsistent frames in the generated videos. Moreover, Dormant shows effectiveness on 6 real-world commercial services, even with fully black-box access.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# Kerr修飾キャビティマグノメカニクスにおける不安定性と極限サイクルの量子シグネチャ

Quantum signatures of bistability and limit cycle in Kerr-modified cavity magnomechanics ( http://arxiv.org/abs/2409.14427v1 )

ライセンス: Link先を確認

Pooja Kumari Gupta, Subhadeep Chakraborty, Sampreet Kalita, Amarendra K. Sarma,

(参考訳) 両立状態に着目したKerr修飾キャビティマグメカニクスシステムについて検討した。 2つの安定な枝と1つの不安定な枝が中央に存在する。興味深いことに、本研究では、上枝が十分に強い駆動下で安定性を失い、サイクルの振動が制限されるというユニークな遷移を明らかにした。その結果、双安定および周期解の両方からなる豊富な位相図を報告し、それらの周りの量子相関について研究する。不安定な状態では、絡み合いは異なる定常な値に達するが、不安定な状態では、絡み合いは時間とともに振動する。この研究は、Kerr修飾キャビティ・マグノメカティカルシステムで生じる、異なる安定かつ不安定な点における量子絡み合いを理解する上で特に重要である。

We study a Kerr-modified cavity magnomechanical system with a focus on its bistable regime. We identify a distinct parametric condition under which bistability appears, featuring two stable branches and one unstable branch in the middle. Interestingly, our study reveals a unique transition where the upper branch loses its stability under a sufficiently strong drive, giving rise to limit cycle oscillation. Consequently, we report a rich phase diagram consisting of both bistable and periodic solutions and study quantum correlations around them. While in the bistable regime, we find the entanglement reaching different steady state value, in the unstable regime, entanglement oscillates in time. This study is especially important in understanding quantum entanglement at different stable and unstable points arising in a Kerr-modified cavity magnomechanical system.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# 性能-解釈可能性トレードオフの整合性:解釈可能な機械学習モデルの評価

Challenging the Performance-Interpretability Trade-off: An Evaluation of Interpretable Machine Learning Models ( http://arxiv.org/abs/2409.14429v1 )

ライセンス: Link先を確認

Sven Kruschel, Nico Hambauer, Sven Weinzierl, Sandra Zilker, Mathias Kraus, Patrick Zschech,

(参考訳) 機械学習は、データ駆動決定サポートを促進するために、すべての認識可能なドメインに浸透している。性能上の利点が想定されるため、高度なブラックボックスモデルに焦点が当てられることが多いが、解釈可能なモデルは、しばしば劣った予測品質に関連付けられている。しかし、近年では、完全解釈可能なまま複雑で非線形なパターンをキャプチャするための有望な特性を提供するGAM(Generalized Additive Model)が提案されている。これらのモデルの利点と限界を明らかにするため、20の表付きベンチマークデータセットの集合に基づく7つの一般的な機械学習モデルと比較して、7つの異なるGAMの予測性能について検討した。公平かつロバストなモデル比較を保証するため、クロスバリデーションと組み合わせた広範囲なハイパーパラメータ探索を行い、68,500回のモデル実行を実現した。さらに,本研究では,モデルの視覚的出力を質的に検討し,解釈可能性のレベルを評価する。これらの結果から,グラフデータに対する予測性能とモデル解釈性の間に厳密なトレードオフがないことを示すことによって,ブラックボックスモデルのみが高い精度を達成できるという誤解を解消する。さらに,情報システム分野における強力な解釈可能なモデルとしてのGAMの重要性を論じ,社会技術的観点からの今後の研究への示唆を導出する。

Machine learning is permeating every conceivable domain to promote data-driven decision support. The focus is often on advanced black-box models due to their assumed performance advantages, whereas interpretable models are often associated with inferior predictive qualities. More recently, however, a new generation of generalized additive models (GAMs) has been proposed that offer promising properties for capturing complex, non-linear patterns while remaining fully interpretable. To uncover the merits and limitations of these models, this study examines the predictive performance of seven different GAMs in comparison to seven commonly used machine learning models based on a collection of twenty tabular benchmark datasets. To ensure a fair and robust model comparison, an extensive hyperparameter search combined with cross-validation was performed, resulting in 68,500 model runs. In addition, this study qualitatively examines the visual output of the models to assess their level of interpretability. Based on these results, the paper dispels the misconception that only black-box models can achieve high accuracy by demonstrating that there is no strict trade-off between predictive performance and model interpretability for tabular data. Furthermore, the paper discusses the importance of GAMs as powerful interpretable models for the field of information systems and derives implications for future work from a socio-technical perspective.

翻訳日:2024-11-06 22:52:52 公開日:2024-09-22

# Pomo3D: 3D対応のポートレートアクセサイティングなど

Pomo3D: 3D-Aware Portrait Accessorizing and More ( http://arxiv.org/abs/2409.14430v1 )

ライセンス: Link先を確認

Tzu-Chieh Liu, Chih-Ting Liu, Shao-Yi Chien,

(参考訳) ポートレートとアクセサリーを分解・再コンパイルすることで,フリーアクセゾライズを可能にする3Dポートレート操作フレームワークであるPomo3Dを提案する。これにより、アバターは複数のアクセサリーを同時に装着することで、アウト・オブ・ディストリビューション(OOD)の外観を達成できる。既存の方法は、そのような明示的できめ細かな編集に苦慮しており、特定の肖像画に追加のオブジェクトを生成したり、アクセサリーを生成する際にポートレート(例えばアイデンティティシフト)を変更することに失敗する。この制限は、人々が通常仮想空間で多様でファッション可能なアクセサリーで魅力的な外観を創り出そうとする際、注目すべき障害となる。私たちのアプローチは、この適応の少ない問題に対する効果的な解決策を提供します。 Scribble2Accessoriesモジュールも導入したので,Pomo3Dはユーザが描いたアクセサリのスクリブルマップから3Dアクセサリを作成できる。さらに、実世界のデータセットに存在するバイアス付き関連性を緩和するバイアス対応マッパーを設計する。上記のオブジェクトレベルの操作に加えて、Pomo3Dは、幾何学やテクスチャのグローバルなあるいはローカルな編集や、アバターのスタイラス化、ニューラルポートレートの3D編集をより包括的なレベルに高めるといった、ポートレートの編集オプションも備えている。

We propose Pomo3D, a 3D portrait manipulation framework that allows free accessorizing by decomposing and recomposing portraits and accessories. It enables the avatars to attain out-of-distribution (OOD) appearances of simultaneously wearing multiple accessories. Existing methods still struggle to offer such explicit and fine-grained editing; they either fail to generate additional objects on given portraits or cause alterations to portraits (e.g., identity shift) when generating accessories. This restriction presents a noteworthy obstacle as people typically seek to create charming appearances with diverse and fashionable accessories in the virtual universe. Our approach provides an effective solution to this less-addressed issue. We further introduce the Scribble2Accessories module, enabling Pomo3D to create 3D accessories from user-drawn accessory scribble maps. Moreover, we design a bias-conscious mapper to mitigate biased associations present in real-world datasets. In addition to object-level manipulation above, Pomo3D also offers extensive editing options on portraits, including global or local editing of geometry and texture and avatar stylization, elevating 3D editing of neural portraits to a more comprehensive level.