Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240227となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# アナログ回路設計のための機械学習駆動グローバル最適化フレームワーク Machine Learning Driven Global Optimisation Framework for Analog Circuit Design ( http://arxiv.org/abs/2404.02911v1 ) ライセンス: Link先を確認	Ria Rashid, Komala Krishna, Clint Pazhayidam George, Nandakumar Nambath,	(参考訳) 本稿では,アナログ回路設計のための機械学習による最適化フレームワークを提案する。主な目的は、与えられた仕様セットに対するアナログ回路の最適性能のためのデバイスサイズを決定することである。提案手法では,機械学習モデルとスパイスシミュレーションを用いて,アナログ回路の最適設計に向けて最適化アルゴリズムを誘導する。機械学習に基づくグローバルオフラインサロゲートモデルは、回路設計パラメータを入力として、研究中のアナログ回路の設計空間に構築され、最適化アルゴリズムを導出するために使用される。アナログ回路の設計仕様を予測するために多層パーセプトロンとランダムフォレスト回帰器を用いる。トランジスタの飽和状態はアナログ回路の適切な動作に不可欠であるため、回路内の各トランジスタの飽和状態を予測するために多層パーセプトロン分類器が使用される。スパイスシミュレーションを実行する前に、機械学習モデルを用いて候補解の有効性を検証する。提案手法は,バンドギャップ参照,折り畳まれたカスコード動作増幅器,二段動作増幅器の3つの回路トポロジを用いて検証する。シミュレーションの結果、収束後のフィットネス関数の最適値と標準偏差がより低いことがわかった。最適化手法で提案した機械学習に基づく予測を組み込むことで,本研究で検討した3つのテストケースの標準手法と比較して,スパイスコールが56%,59%,83%削減された。 We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for analog circuits. Machine learning based global offline surrogate models, with the circuit design parameters as the input, are built in the design space for the analog circuits under study and is used to guide the optimisation algorithm, resulting in faster convergence and a reduced number of spice simulations. Multi-layer perceptron and random forest regressors are employed to predict the required design specifications of the analog circuit. Since the saturation condition of transistors is vital in the proper working of analog circuits, multi-layer perceptron classifiers are used to predict the saturation condition of each transistor in the circuit. The feasibility of the candidate solutions is verified using machine learning models before invoking spice simulations. We validate the proposed framework using three circuit topologies--a bandgap reference, a folded cascode operational amplifier, and a two-stage operational amplifier. The simulation results show better optimum values and lower standard deviations for fitness functions after convergence. Incorporating the machine learning-based predictions proposed in the optimisation method has resulted in the reduction of spice calls by 56%, 59%, and 83% when compared with standard approaches in the three test cases considered in the study.	翻訳日:2024-07-01 12:08:31 公開日:2024-02-27
# ニューラルネットワークとSMOTE統合アプローチによるクレジットカード不正検出の強化 Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach ( http://arxiv.org/abs/2405.00026v1 ) ライセンス: Link先を確認	Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang,	(参考訳) クレジットカード詐欺検出は金融セクターにとって重要な課題であり、不正取引を正確に識別するための高度なアプローチを要求している。本研究では、ニューラルネットワーク(NN)とSMOTE(Synthet ic Minority Over-Sampling Technique)を組み合わせて検出性能を向上させる革新的な手法を提案する。この研究は、クレジットカード取引データに固有の不均衡に対処し、堅牢で正確な不正検出のための技術的進歩に焦点を当てた。その結果、NNとSMOTEの統合は従来のモデルに比べて精度、リコール、F1スコアが優れており、クレジットカード不正検出シナリオにおいて不均衡なデータセットを扱うための高度なソリューションとしての可能性を強調している。このリースアーチは、不正行為から金融取引を保護するための効果的かつ効率的なメカニズムを開発するための継続的な努力に寄与している。 Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities.	翻訳日:2024-07-01 11:29:30 公開日:2024-02-27
# 分光光場イメージングのための多次元圧縮センシング Multidimensional Compressed Sensing for Spectral Light Field Imaging ( http://arxiv.org/abs/2405.00027v1 ) ライセンス: Link先を確認	Wen Cao, Ehsan Miandji, Jonas Unger,	(参考訳) 本稿では, 単一単色センサを用いて, 空間, 角, スペクトル情報を捉えるために, 単孔スペクトル符号化マスクとマイクロレンズアレイを用いた圧縮型マルチスペクトル光場カメラモデルを提案する。我々は, 圧縮センシング技術を用いて, アンダーサンプド計測から全マルチスペクトル光場を再構成するモデルを提案する。光電場を1次元信号にベクトル化する従来の手法とは異なり、本手法では5次元ベースと新しい5次元計測モデルを用いて、マルチスペクトル光電場の固有次元をマッチングする。我々は, 5D と 1D のセンシングモデルの等価性を数学的かつ経験的に示し, 最も重要なことは, メモリのごく一部を必要としながら, 5D フレームワークが桁違いに高速な再構成を実現することである。さらに,新しい多次元センシングモデルにより,効率的な視覚的データ取得アルゴリズムとハードウェアを設計するための新たな研究方向が開かれる。 This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.	翻訳日:2024-07-01 11:29:30 公開日:2024-02-27
# Rydberg原子系アンテナのドップラー感度と共振チューニング Doppler sensitivity and resonant tuning of Rydberg atom-based antennas ( http://arxiv.org/abs/2405.07993v1 ) ライセンス: Link先を確認	Peter B. Weichman,	(参考訳) Rydberg 原子蒸気セルをベースとした電波アンテナは、原則としてどのワイヤアンテナよりも感度が高い。その他の望ましい特徴として、非金属、低いプロファイル、元素がある。本稿では、Rydbergアンテナの感度に関する詳細な理論的研究を行い、現在テストされている構成よりも2～3桁の感度を累積的に増加させることができるパラメータ構造を解明する。重要な洞察は、2つのよく研究されたアプローチの利点を最適に組み合わせることである。 (i)レーザー「`2D星配置'」は、レーザーパワーの増大とともに強化され、原子運動誘起ドップラー膨張の補償に役立てられる。 (II)局所発振器と入射信号との共振器により調整された近接縮退Rydbergレベル間の共振結合。恒星のセットアップの利点は、期待されるドップラー制限値に対する全体的な感度を回復するだけで、異なる移動する原子集団がネット信号で互いに破壊的に干渉する追加の非共鳴還元を補うため微妙である。局所発振器チューニングのさらなる独特な利点は、コア状態寿命によって設定される典型的な10MHzではなく、内在的リドベルク状態寿命によって設定される ~10kHz まで、ライン幅が大幅に狭くなることである。直感的には、2つのRydberg状態は独立な高Q空洞として振る舞うように調整され、アンテナ共鳴応答の周波数依存性の研究を通して支持される視点である。様々な外在線拡大効果を抑え、このキャビティ応答を完全に活用するためには、多くの実用的な実験的進歩、特に1cmのレーザービーム幅が必要とされる。 Radio frequency antennas based on Rydberg atom vapor cells can in principle reach sensitivities beyond those of any wire antenna, especially at lower frequencies where long wires are needed to accommodate a growing wavelength. They also have other desirable features such as nonmetallic, lower profile, elements. This paper presents a detailed theoretical investigation of Rydberg antenna sensitivity, elucidating parameter regimes that could cumulatively lead to 2--3 orders of magnitude sensitivity increase beyond that of currently tested configurations. The key insight is to optimally combine the advantages of two well-studied approaches: (i) three laser ``2D star configuration'' setups that, enhanced also with increased laser power, help compensate for atom motion-induced Doppler broadening, and (ii) resonant coupling between a pair of near-degenerate Rydberg levels, tuned via a local oscillator to the incident signal. The advantage of the star setup is subtle because it only restores overall sensitivity to the expected Doppler-limited value, compensating for additional off-resonance reductions where differently moving atom populations destructively interfere with each other in the net signal. The additional unique advantage of the local oscillator tuning is that it leads to vastly narrower line widths, as low as ~10 kHz set by the intrinsic Rydberg state lifetimes, rather than the typical ~10 MHz set by the core state lifetimes. Intuitively, with this setup the two Rydberg states may be tuned to act as an independent high-q cavity, a point of view supported through a study of the frequency-dependence of the antenna resonant response. There are a number of practical experimental advances, especially larger ~1 cm laser beam widths, required to suppress various extrinsic line broadening effects and to fully exploit this cavity response.	翻訳日:2024-07-01 08:49:26 公開日:2024-02-27
# 米国におけるAI, ML, 5G技術を用いた森林火災対策と管理の相乗的アプローチ A Synergistic Approach to Wildfire Prevention and Management Using AI, ML, and 5G Technology in the United States ( http://arxiv.org/abs/2403.14657v1 ) ライセンス: Link先を確認	Stanley Chinedu Okoro, Alexander Lopez, Austine Unuriode,	(参考訳) 過去数年間、山火事は世界的な環境危機となり、自然の生息地に大きな被害を与え、気候変動の加速に寄与した。森林火災管理手法には、予防、対応、回復の努力が含まれる。検出技術の改善にもかかわらず、山火事の発生が増加すると、迅速な識別と効果的な制御のための創造的な解決策が要求される。本研究は、人工知能(AI)、機械学習(ML)、および5G技術を利用して、米国の山火事を検知・処理するための積極的な方法を検討する。本研究の目的は、先進技術を用いた山火事の能動的検出と予防、遠隔センシングと5G技術を利用した信号マッピングによる能動的モニタリングとマッピング、ドローンとIOTデバイスを用いた山火事に対する高度な応答メカニズムについてである。本研究は,政府データベースから収集した二次データに基づいて記述統計を用いて分析した。また、過去の出版物は内容分析を通じてレビューし、物語合成を用いて様々な研究から得られた知見を提示した。その結果,新技術開発は山火事を積極的に検出・管理する機会を与えることがわかった。高度な技術を利用することで命を救うことができ、山火事による経済的損失を防ぐことができる。 AI対応のリモートセンシングや5Gベースのアクティブモニタリングなど、さまざまな方法により、アクティブな山火事の検出と管理が強化される。さらに、超インテリジェントドローンとIOTデバイスは、山火事に対するより安全な応答に使用できる。これは、消防管理機関と政府に対する勧告の中核をなす。 Over the past few years, wildfires have become a worldwide environmental emergency, resulting in substantial harm to natural habitats and playing a part in the acceleration of climate change. Wildfire management methods involve prevention, response, and recovery efforts. Despite improvements in detection techniques, the rising occurrence of wildfires demands creative solutions for prompt identification and effective control. This research investigates proactive methods for detecting and handling wildfires in the United States, utilizing Artificial Intelligence (AI), Machine Learning (ML), and 5G technology. The specific objective of this research covers proactive detection and prevention of wildfires using advanced technology; Active monitoring and mapping with remote sensing and signaling leveraging on 5G technology; and Advanced response mechanisms to wildfire using drones and IOT devices. This study was based on secondary data collected from government databases and analyzed using descriptive statistics. In addition, past publications were reviewed through content analysis, and narrative synthesis was used to present the observations from various studies. The results showed that developing new technology presents an opportunity to detect and manage wildfires proactively. Utilizing advanced technology could save lives and prevent significant economic losses caused by wildfires. Various methods, such as AI-enabled remote sensing and 5G-based active monitoring, can enhance proactive wildfire detection and management. In addition, super intelligent drones and IOT devices can be used for safer responses to wildfires. This forms the core of the recommendation to the fire Management Agencies and the government.	翻訳日:2024-04-01 03:43:10 公開日:2024-02-27
# 人工知能開発プロセスにおける人間の電位入射の同定 Identifying Potential Inlets of Man in the Artificial Intelligence Development Process ( http://arxiv.org/abs/2403.14658v1 ) ライセンス: Link先を確認	Deja Workman, Christopher L. Dancy,	(参考訳) 本稿では,典型的あるいは標準的な人工知能開発プロセスが,人種化技術の創造をいかに促進するか,あるいは促進するかを明らかにすることを目的とする。我々は、シルヴィア・ウィンター(Sylvia Wynter)による生物中心マンのジャンルの定義と、黒さを人間性から排除することから始める。問題、開発プロセスと管理ツールの選択、データセットの開発とデータ処理、モデル開発、デプロイメントとリスクアセスメント、統合と監視です。この論文の目的は、Wynterのバイオセンシティブ・マンがどのようにAIライフサイクルとライフサイクル自体で生み出されている技術によってどのように表現され、強化されているのかをよりよく理解することである。この開発プロセスのデコンストラクションによって、一般的に人間が優先順位付けされていない方法や、その影響が疎外された人々にどのように影響するかを特定できる可能性がある。 AI開発サイクルの変更を促進するソリューションを提供したいと思っています。 In this paper we hope to identify how the typical or standard artificial intelligence development process encourages or facilitates the creation of racialized technologies. We begin by understanding Sylvia Wynter's definition of the biocentric Man genre and its exclusion of Blackness from humanness. We follow this with outlining what we consider to be the typical steps for developing an AI-based technology, which we have broken down into 6 stages: identifying a problem, development process and management tool selection, dataset development and data processing, model development, deployment and risk assessment, and integration and monitoring. The goal of this paper is to better understand how Wynter's biocentric Man is being represented and reinforced by the technologies we are producing in the AI lifecycle and by the lifecycle itself; we hope to identify ways in which the distinction of Blackness from the "ideal" human leads to perpetual punishment at the hands of these technologies. By deconstructing this development process, we can potentially identify ways in which humans in general have not been prioritized and how those affects are disproportionately affecting marginalized people. We hope to offer solutions that will encourage changes in the AI development cycle.	翻訳日:2024-04-01 03:43:10 公開日:2024-02-27
# スペーサー選択によるスパースモデルの効率向上 Enhancing Efficiency in Sparse Models with Sparser Selection ( http://arxiv.org/abs/2403.18926v1 ) ライセンス: Link先を確認	Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao, Zenglin Xu,	(参考訳) Sparse Mixture-of-Experts (MoE)モデルを含むスパースモデルは、Transformerモデルをスケールするための効果的なアプローチとして現れている。しかし、多くのパラメータがゼロまたは低いアクティベーション値の乗算によって計算に不要に関わっているため、計算の非効率さに悩まされることが多い。この問題に対処するために,スパースモデルの有効性と効率性を両立させる新しいMOEである \tool を提案する。 \toolは小さなエキスパートとしきい値ベースのルータを活用して、トークンが必須パラメータのみを選択的にエンゲージできるようにする。言語モデリングと機械翻訳タスクに関する広範な実験により,性能を犠牲にすることなく,MoE層での計算負荷を50%以上削減し,モデル性能を向上させることができることを示した。さらに,高密度モデルに適用することで,推論時のスパース計算を可能にした。包括的な分析を行い、https://anonymous.4open.science/r/XMoEでコードを利用できるようにします。 Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are unnecessarily involved in computations via multiplying values by zero or low activation values. To address this issue, we present \tool, a novel MoE designed to enhance both the efficacy and efficiency of sparse MoE models. \tool leverages small experts and a threshold-based router to enable tokens to selectively engage only essential parameters. Our extensive experiments on language modeling and machine translation tasks demonstrate that \tool can enhance model performance while decreasing the computation load at MoE layers by over 50\% without sacrificing performance. Furthermore, we present the versatility of \tool by applying it to dense models, enabling sparse computation during inference. We provide a comprehensive analysis and make our code available at https://anonymous.4open.science/r/XMoE.	翻訳日:2024-04-01 02:25:04 公開日:2024-02-27
# OpenAPI Specification Extended Security Scheme:Broken Object Level Authorizationの頻度を下げる方法 OpenAPI Specification Extended Security Scheme: A method to reduce the prevalence of Broken Object Level Authorization ( http://arxiv.org/abs/2212.06606v2 ) ライセンス: Link先を確認	Rami Haddad, Rim El Malki,	(参考訳) APIは、サービス間通信を達成するための重要な技術になっています。 APIデプロイメントの増加により、セキュリティ標準の欠如に対処する緊急性が高まっている。 API Securityは、OpenAPI標準の標準化された認証がないため、不適切な認証は、既知の脆弱性や未知の脆弱性の可能性を開く。本稿は,API Security: Broken Object Level Authorization (BOLA) における第1の脆弱性について検討し,この脆弱性の頻度を下げるための方法とツールを提案する。 BOLAはさまざまなAPIフレームワークに影響を与えており、私たちのスコープはOpenAPI Specification(OAS)に固定されています。 OASはAPIの記述と実装の標準であり、一般的なOAS実装はFastAPI、Connexion(Flask)などである。これらの実装には、OASsのAPIプロパティに関する知識に関連する長所と短所がある。 Open API Specificationsのセキュリティプロパティは、オブジェクト認証に対処せず、そのようなオブジェクトプロパティを定義するための標準化されたアプローチを提供しない。これにより、オブジェクトレベルのセキュリティは開発者の慈悲に委ねられ、意図しない攻撃ベクタ生成のリスクが増大する。私たちの目標は、この空白に挑戦することです。 1) OAS ESS(OpenAPI Specification Extended Security Scheme)には、OAS(Design-based approach)内のオブジェクトに対する宣言型セキュリティ制御が含まれている。 2) APIサービス(Flask/FastAPI)にインポートして、オブジェクトレベルで認証チェックを実行することができる認証モジュール(開発ベースのアプローチ)。 APIサービスを構築する場合、開発者はAPI設計(仕様)またはそのコードから始めることができる。どちらの場合も、BOLAの頻度を緩和し、削減するために一連のメカニズムが導入される。 APIs have become the prominent technology of choice for achieving inter-service communications. The growth of API deployments has driven the urgency in addressing its lack of security standards. API Security is a topic for concern given the absence of standardized authorization in the OpenAPI standard, improper authorization opens the possibility for known and unknown vulnerabilities, which in the past years have been exploited by malicious actors resulting in data loss. This paper examines the number one vulnerability in API Security: Broken Object Level Authorization(BOLA), and proposes methods and tools to reduce the prevalence of this vulnerability. BOLA affects various API frameworks, our scope is fixated on the OpenAPI Specification(OAS). The OAS is a standard for describing and implementing APIs; popular OAS Implementations are FastAPI, Connexion (Flask), and many more. These implementations carry the pros and cons that are associated with the OASs knowledge of API properties. The Open API Specifications security properties do not address object authorization and provide no standardized approach to define such object properties. This leaves object-level security at the mercy of developers, which presents an increased risk of unintentionally creating attack vectors. Our aim is to tackle this void by introducing 1) the OAS ESS (OpenAPI Specification Extended Security Scheme) which includes declarative security controls for objects in OAS (design-based approach), and 2) an authorization module that can be imported to API services (Flask/FastAPI) to enforce authorization checks at the object level (development-based approach). When building an API service, a developer can start with the API design (specification) or its code. In both cases, a set of mechanisms are introduced to help developers mitigate and reduce the prevalence of BOLA.	翻訳日:2024-03-19 08:01:36 公開日:2024-02-27
# BarraCUDA:GPUはDNNの重量をリークする BarraCUDA: GPUs do Leak DNN Weights ( http://arxiv.org/abs/2312.07783v2 ) ライセンス: Link先を確認	Peter Horvath, Lukasz Chmielewski, Leo Weissbart, Lejla Batina, Yuval Yarom,	(参考訳) 過去10年間で、ニューラルネットワーク(NN)の応用は、私たちの生活のさまざまな側面に広がってきました。多くの企業は、顔認識、機械翻訳、自動運転車といったタスクにニューラルネットワークを使用する製品の開発にビジネスを基盤としている。これらの製品を支える知的特性の多くは、ニューラルネットワークの正確なパラメータに符号化されている。したがって、これらの保護は企業にとって最優先事項である。同時に、これらの製品の多くは強力な脅威モデルの下で運用する必要がある。本研究では,Nvidia Jetson Nanoデバイス上で動作するニューラルネットワークのパラメータを抽出可能な汎用グラフ処理ユニット(GPU)に対する新たな攻撃であるBarraCUDAを提案する。 BarraCUDAは相関電磁分析を用いて、現実世界の畳み込みニューラルネットワークのパラメータを復元する。 Over the last decade, applications of neural networks (NNs) have spread to various aspects of our lives. A large number of companies base their businesses on building products that use neural networks for tasks such as face recognition, machine translation, and self-driving cars. Much of the intellectual property underpinning these products is encoded in the exact parameters of the neural networks. Consequently, protecting these is of utmost priority to businesses. At the same time, many of these products need to operate under a strong threat model, in which the adversary has unfettered physical control of the product. In this work, we present BarraCUDA, a novel attack on general purpose Graphic Processing Units (GPUs) that can extract parameters of neural networks running on the popular Nvidia Jetson Nano device. BarraCUDA uses correlation electromagnetic analysis to recover parameters of real-world convolutional neural networks.	翻訳日:2024-03-18 12:26:52 公開日:2024-02-27
# 楕円曲線を用いたステルスアドレスプロトコル Elliptic Curve Pairing Stealth Address Protocols ( http://arxiv.org/abs/2312.12131v2 ) ライセンス: Link先を確認	Marija Mikic, Mihajlo Srbakoski,	(参考訳) トランザクションのプライバシ保護は、ユーザにとって非常に重要です。ステルスアドレスプロトコル(SAP)を使用すると、ユーザはステルスメタアドレスにリンクしないステルスアドレスでアセットを受け取ることができる。 SAPは様々な暗号手法を用いて生成される。 DKSAPは楕円曲線の乗算と共有秘密のハッシュを使用する。もうひとつのアプローチは、双線型マッピングを使用することだ。本稿では楕円曲線ペアリングを暗号解として用いる2つのSAプロトコルを提案する。 ECPDKSAP はペアリングベースのプロトコルであり、ECPSKSAP はペアリングベースのプロトコルであり、消費と視聴キーが導出される単一のキーを使用する。ビュータグを用いたDKSAPよりもECPDKSAPの方が優れた結果が得られることがわかった。 ECPSKSAPは非常に遅いが、1つの秘密鍵しか使わないため、興味深い理論的結果である。 The protection of transactions privacy is extremely important for the user. With stealth address protocols (SAP), users can receive assets on stealth addresses that they do not link to their stealth meta-addresses. SAP can be generated using various cryptographic approaches. DKSAP uses elliptic curve multiplication and hashing of the resulting shared secret. Another approach is to use a bilinear mapping. The paper presents two SA protocols that use elliptic curve pairing as a cryptographic solution. ECPDKSAP is a pairing-based protocol that includes viewing key and spending key, while ECPSKSAP is a pairing-based protocol that uses a single key with which spending and the viewing key are derived. We obtain that ECPDKSAP has better results than DKSAP with the view tag. ECPSKSAP is significantly slower, but it represents an interesting theoretical result, because it uses only one private key.	翻訳日:2024-03-18 11:47:54 公開日:2024-02-27
# 液体抽出誘導体(LSD)を用いたレバレッジ・ステーク--機会とリスク Leverage Staking with Liquid Staking Derivatives (LSDs): Opportunities and Risks ( http://arxiv.org/abs/2401.08610v2 ) ライセンス: Link先を確認	Xihan Xiong, Zhipeng Wang, Xi Chen, William Knottenbelt, Michael Huth,	(参考訳) LidoはEthereum上のLiquid Stake Derivative(LSD)プロバイダで、ユーザが任意の量のETHを持てばstETHを受け取り、Aaveのような分散ファイナンス(DeFi)プロトコルと統合することができる。 Lido と Aave のコンポーザビリティにより、ユーザが Lido に ETH を賭けて stETH を取得し、stETH を Aave に担保として利用して ETH を借用し、Lido に借用した ETH を再利用する、"leverage stake" と呼ばれる新しい戦略が実現される。ユーザは、リスクプロファイルに基づいて、このプロセスを反復的に実行して、潜在的なリターンを最適化することができる。本稿では,レバレッジ・ステークに関連する機会とリスクを体系的に研究する。私たちは、Lido-Aaveエコシステム内のレバレッジ戦略を形式化した最初の人です。実験により、Ethereum上の262のレバレッジ・ステーク位置が同定され、合計295,243 ETH (482M USD) が得られた。 90.13%のレバレッジステークが従来のステークよりも高いリターンを達成したことが判明した。さらに,過酷な条件下でのレバレッジ・ステークによって引き起こされるリスクを評価するため,ストレステストを実施している。我々はレバレッジ・ステークがカスケード液化のリスクを著しく増幅することを発見した。本稿は,Lido-Aave LSDエコシステムを保護すべく,ロバストリスク管理手法の開発を促進することを願っている。 Lido, the leading Liquid Staking Derivative (LSD) provider on Ethereum, allows users to stake an arbitrary amount of ETH to receive stETH, which can be integrated with Decentralized Finance (DeFi) protocols such as Aave. The composability between Lido and Aave enables a novel strategy called "leverage staking", where users stake ETH on Lido to acquire stETH, utilize stETH as collateral on Aave to borrow ETH, and then restake the borrowed ETH on Lido. Users can iteratively execute this process to optimize potential returns based on their risk profile. This paper systematically studies the opportunities and risks associated with leverage staking. We are the first to formalize the leverage staking strategy within the Lido-Aave ecosystem. Our empirical study identifies 262 leverage staking positions on Ethereum, with an aggregated staking amount of 295,243 ETH (482M USD). We discover that 90.13% of leverage staking positions have achieved higher returns than conventional staking. Furthermore, we perform stress tests to evaluate the risk introduced by leverage staking under extreme conditions. We find that leverage staking significantly amplifies the risk of cascading liquidations. We hope this paper can inform and encourage the development of robust risk management approaches to protect the Lido-Aave LSD ecosystem.	翻訳日:2024-03-18 08:46:40 公開日:2024-02-27
# スマートグリッド公開鍵基盤のための認証取得リスト付きハイブリッドオンライン認証ステータスプロトコル Hybrid Online Certificate Status Protocol with Certificate Revocation List for Smart Grid Public Key Infrastructure ( http://arxiv.org/abs/2401.10787v4 ) ライセンス: Link先を確認	Hong-Sheng Huang, Zhe-Yi Jiang, Hsuan-Tung Chen, Hung-Min Sun,	(参考訳) Hsu et al (2022)は、スマートグリッドメーターのセキュリティを強化するために、公開鍵インフラストラクチャ内の暗号スキームを提案した。彼らの提案には、シンプルな認証登録プロトコルを確立するためのCMSメカニズムによる認証管理とセキュアトランスポートプロトコルによる登録の開発が含まれていた。さらに彼らは、証明書のステータスを独立してクエリするために、OCSP(Online Certificate Status Protocol)サービスを実装した。しかし、その実装は単一のOCSPサーバで全てのクエリ要求を処理する。数万以上のエンドメーターを持つスマートグリッドPKI環境における典型的なシナリオを考慮すると、ハイブリッドオンライン認証ステータスプロトコル機構を導入しました。このアプローチは、クライアントからCertificate Revocation Listsと連携したOCSPサーバへのクエリリソースの需要を減少させる。我々のシミュレーションでは、メーターの挙動を模倣して効率を向上し、スマートグリッドメーターのランドスケープに合わせてより堅牢なアーキテクチャを構築しました。 Hsu et al. (2022) proposed a cryptographic scheme within the public key infrastructure to bolster the security of smart grid meters. Their proposal involved developing the Certificate Management over CMS mechanism to establish Simple Certificate Enrollment Protocol and Enrollment over Secure Transport protocol. Additionally, they implemented Online Certificate Status Protocol (OCSP) services to independently query the status of certificates. However, their implementation featured a single OCSP server handling all query requests. Considering the typical scenario in smart grid PKI environments with over tens of thousands of end-meters, we introduced a Hybrid Online Certificate Status Protocol mechanism. This approach decreases demand of query resources from the client to OCSP servers collaborating with Certificate Revocation Lists. Our simulations, mimicking meter behavior, demonstrated increased efficiency, creating a more robust architecture tailored to the smart grid meter landscape.	翻訳日:2024-03-18 08:36:55 公開日:2024-02-27
# ピオニアリング研究とイノベーティブ情報理論に基づくフィッシング検出における透明性向上手法 A Pioneering Study and An Innovative Information Theory-based Approach to Enhance The Transparency in Phishing Detection ( http://arxiv.org/abs/2402.17092v1 ) ライセンス: Link先を確認	Van Nguyen, Tingmin Wu, Xingliang Yuan, Marthie Grobler, Surya Nepal, Carsten Rudolph,	(参考訳) フィッシング攻撃は、検出、説明、防衛において深刻で困難な問題となっている。フィッシングに関する10年以上の研究が、技術と非技術の両方を包含しているにもかかわらず、フィッシングは深刻な問題であり続けている。現在、AIベースのフィッシング検出は、データに対する脆弱性(フィッシングや良心)の予測を提供することによってフィッシング攻撃を防御する最も効果的なソリューションの1つとして注目されている。しかし、データのフィッシングとして分類される原因となる特定の情報を特定するなど、予測に対する包括的な解釈を提供するという点では、説明容易性に欠ける。この目的のために,メール(最も一般的なフィッシング方式)のフィッシング攻撃ローカライゼーションのための革新的なディープラーニングベースのアプローチを提案する。本手法は,メールデータの脆弱性を予測できるだけでなく,フィッシングメールにおける最も重要なフィッシング関連情報(文)を自動的に抽出し,ハイライトする。選択された情報は、フィッシングメールデータの脆弱性に関する有用な説明を示す。 7つの実世界の電子メールデータセットに対する厳密な実験により,2つの主要なラベル精度と認知的傾向の指標において,フィッシング・メールにおける最も重要なフィッシング関連情報(フィッシング・メールにおけるフィッシング関連情報)の脆弱性に対する包括的説明(フィッシング・メールにおける最も重要な情報とフィッシング関連情報の抽出)の有効性と進歩が示された。 Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulnerability (i.e., phishing or benign) predictions for the data. However, it lacks explainability in terms of providing comprehensive interpretations for the predictions, such as identifying the specific information that causes the data to be classified as phishing. To this end, we propose an innovative deep learning-based approach for email (the most common phishing way) phishing attack localization. Our method can not only predict the vulnerability of the email data but also automatically figure out and highlight the most important and phishing-relevant information (i.e., sentences) in each phishing email. The selected information indicates useful explanations for the vulnerability of the phishing email data. The rigorous experiments on seven real-world email datasets show the effectiveness and advancement of our proposed method in providing comprehensive explanations (by successfully figuring out the most important and phishing-relevant information in phishing emails) for the vulnerability of corresponding phishing data with higher performances from nearly (1% to 3%) and (1% to 4%) in two main Label-Accuracy and Cognitive-True-Positive measures, respectively, compared to the state-of-the-art potential baselines.	翻訳日:2024-03-18 07:09:00 公開日:2024-02-27
# 金融のためのブロックチェーン: 調査 Blockchain for Finance: A Survey ( http://arxiv.org/abs/2402.17219v1 ) ライセンス: Link先を確認	Hanjie Wu, Qian Yao, Zhenguang Liu, Butian Huang, Yuan Zhuang, Huayun Tang, Erwu Liu,	(参考訳) 信頼性、セキュリティ、リスク管理を強化する革新的な技術として、ブロックチェーンは、貿易と金融システムで広く採用されている。イミュータビリティや透明性といったブロックチェーンのユニークな機能は、分散データストレージの新しいビジネスモデル、ポイントツーポイントトランザクション、分散型自律型組織を可能にします。本稿では,ブロックチェーンベースの証券取引に注目し,ブロックチェーン技術が金融サービスにおいて重要な役割を担っている。私たちは、最も人気のある12のブロックチェーンプラットフォームを調査し、金融に関連する6つのプラットフォームを精査し、証券取引プラクティスのパノラマを提供しようとしています。一方、この調査はブロックチェーンベースの証券取引アプリケーションの包括的な概要を提供する。ブロックチェーンベースの証券取引の実践的応用を数多く集め、それらを4つのカテゴリに分類する。各カテゴリについて、典型例を紹介し、FinTech企業や研究者が直面する重要な問題を解決するためにブロックチェーンがどのように貢献するかを説明します。最後に、メインストリームのブロックチェーンベースの金融機関から、分散金融アプリケーションのセキュリティ問題まで、金融における現在のブロックチェーンエコシステムを見極めるための興味深い観察結果を提供しています。 As an innovative technology for enhancing authenticity, security, and risk management, blockchain is being widely adopted in trade and finance systems. The unique capabilities of blockchain, such as immutability and transparency, enable new business models of distributed data storage, point-to-point transactions, and decentralized autonomous organizations. In this paper, we focus on blockchain-based securities trading, in which blockchain technology plays a vital role in financial services as it ultimately lifts trust and frees the need for third-party verification by using consensus-based verification. We investigate the 12 most popular blockchain platforms and elaborate on 6 platforms that are related to finance, seeking to provide a panorama of securities trading practices. Meanwhile, this survey provides a comprehensive summary of blockchain-based securities trading applications. We gather numerous practical applications of blockchain-based securities trading and categorize them into four distinct categories. For each category, we introduce a typical example and explain how blockchain contributes to solving the key problems faced by FinTech companies and researchers. Finally, we provide interesting observations ranging from mainstream blockchain-based financial institutions to security issues of decentralized finance applications, aiming to picture the current blockchain ecosystem in finance.	翻訳日:2024-03-18 07:09:00 公開日:2024-02-27
# PoW系ブロックチェーンの時間制限二重発振攻撃 Time-Restricted Double-Spending Attack on PoW-based Blockchains ( http://arxiv.org/abs/2402.17223v1 ) ライセンス: Link先を確認	Yiming Jiang, Jiangfan Zhang,	(参考訳) このようなブロックチェーンアプリケーションに対するダブルスペンディングアタック(DSA)は、タスクが完了する前に、特に有限時間枠内で実行される傾向にあります。さらに、既存の研究では、実際の攻撃者は計算資源が限られているため、有限時間枠内でのDSAの実行を好んでいることが示唆されている。これらの観察は、Proof-of-Workベースのブロックチェーン上での時間制限付きDSA(TR-DSA)モデルを調査する上での鍵となる。このTR-DSAモデルでは、攻撃者は有限時間枠内でのみそのブランチをマイニングし、攻撃者のブランチが正直なマイナーのブランチを超えることができなければ、そのブランチが特定のブロック数で成長すると、TR-DSAは失敗すると考えられる。まず,TR-DSAの成功確率に対する一般閉形式式を開発した。この発達した確率は、タイムリーなタスクでブロックチェーンアプリケーション上でのDSAのリスクを評価するのに役立つだけでなく、限られた計算資源を持つ実用的な攻撃者がTR-DSAを起動する可能性と期待される報酬を評価することができる。さらに、TR-DSAの成功確率が、攻撃者が無期限にその分岐を採掘する制限のないDSAの成功確率よりも大きいという厳密な証明を提供する。この結果から、タイムリーなタスクを持つブロックチェーンアプリケーションは、攻撃に対して無制限のタイムフレームを提供するブロックチェーンアプリケーションよりも、DSAに対する脆弱性が低いことが示唆される。さらに,攻撃者がネットワーク内のハッシュレートの半分以上を制御しているにも関わらず,TR-DSAの成功確率は常に1よりも小さいことを示す。この結果は、ネットワーク内のハッシュレートの大部分を蓄積しても、TR-DSAの起動に失敗するリスクがまだあることを攻撃者に警告する。 Numerous blockchain applications are designed with tasks that naturally have finite durations, and hence, a double-spending attack (DSA) on such blockchain applications leans towards being conducted within a finite timeframe, specifically before the completion of their tasks. Furthermore, existing research suggests that practical attackers typically favor executing a DSA within a finite timeframe due to their limited computational resources. These observations serve as the impetus for this paper to investigate a time-restricted DSA (TR-DSA) model on Proof-of-Work based blockchains. In this TR-DSA model, an attacker only mines its branch within a finite timeframe, and the TR-DSA is considered unsuccessful if the attacker's branch fails to surpass the honest miners' branch when the honest miners' branch has grown by a specific number of blocks. First, we developed a general closed-form expression for the success probability of a TR-DSA. This developed probability not only can assist in evaluating the risk of a DSA on blockchain applications with timely tasks, but also can enable practical attackers with limited computational resources to assess the feasibility and expected reward of launching a TR-DSA. In addition, we provide rigorous proof that the success probability of a TR-DSA is no greater than that of a time-unrestricted DSA where the attacker indefinitely mines its branch. This result implies that blockchain applications with timely tasks are less vulnerable to DSAs than blockchain applications that provide attackers with an unlimited timeframe for their attacks. Furthermore, we show that the success probability of a TR-DSA is always smaller than one even though the attacker controls more than half of the hash rate in the network. This result alerts attackers that there is still a risk of failure in launching a TR-DSA even if they amass a majority of the hash rate in the network.	翻訳日:2024-03-18 07:09:00 公開日:2024-02-27
# ソフトウェア脆弱性の発見と修正のための大規模言語モデルの連鎖プロンプト Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities ( http://arxiv.org/abs/2402.17230v1 ) ライセンス: Link先を確認	Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, Haipeng Cai,	(参考訳) 現代のソフトウェアでは、セキュリティの脆弱性がますます多くなり、私たちの社会に広く当てはまります。これらの脆弱性に対して防御する様々なアプローチが提案されており、その中にはディープラーニング(DL)を利用する者が他の手法による大きな障壁を回避しているため、近年は注目を集めている。しかし、DLベースのアプローチは、サイズと品質をラベル付けしたタスク固有のデータセットの欠如や、目に見えない現実世界のシナリオにうまく一般化できないなど、重要な課題に直面している。近年、大規模言語モデル (LLM) はこれらの課題を克服し、特にチェーン・オブ・思想 (CoT) のプロンプトを通じて、様々な領域において顕著な可能性を実証している。本稿では, LLMとCoTを利用して, 脆弱性の特定, 脆弱性の発見, 検出された脆弱性のパッチ作成という, 3つの重要なソフトウェア脆弱性解析課題に対処する方法について検討する。我々は、これらのタスクのコンテキストにおいて、VSPを通じて一般的なCoT方法論をインスタンス化し、VSPを3つのLLMと2つのデータセットに対して5つのベースラインに対して評価する広範囲な実験を行う。結果は、ベースラインよりもCoTにインスパイアされたプロンプト(553.3%、36.5%、30.8%高いF1精度で脆弱性の識別、発見、パッチング)がかなり優れていることを示している。 VSPの障害を分析した詳細なケーススタディを通じて、脆弱性ケースに対するLLM/CoTの現在のギャップを明らかにし、それぞれの改善を提案し、検証する。 Security vulnerabilities are increasingly prevalent in modern software and they are widely consequential to our society. Various approaches to defending against these vulnerabilities have been proposed, among which those leveraging deep learning (DL) avoid major barriers with other techniques hence attracting more attention in recent years. However, DL-based approaches face critical challenges including the lack of sizable and quality-labeled task-specific datasets and their inability to generalize well to unseen, real-world scenarios. Lately, large language models (LLMs) have demonstrated impressive potential in various domains by overcoming those challenges, especially through chain-of-thought (CoT) prompting. In this paper, we explore how to leverage LLMs and CoT to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. We instantiate the general CoT methodology in the context of these tasks through VSP , our unified, vulnerability-semantics-guided prompting approach, and conduct extensive experiments assessing VSP versus five baselines for the three tasks against three LLMs and two datasets. Results show substantial superiority of our CoT-inspired prompting (553.3%, 36.5%, and 30.8% higher F1 accuracy for vulnerability identification, discovery, and patching, respectively, on CVE datasets) over the baselines. Through in-depth case studies analyzing VSP failures, we also reveal current gaps in LLM/CoT for challenging vulnerability cases, while proposing and validating respective improvements.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# 拡張EHR共有とドラッグサプライチェーン管理のためのスケーラブルな多層ブロックチェーンアーキテクチャ A Scalable Multi-Layered Blockchain Architecture for Enhanced EHR Sharing and Drug Supply Chain Management ( http://arxiv.org/abs/2402.17342v1 ) ライセンス: Link先を確認	Reza Javan, Mehrzad Mohammadi, Mohammad Beheshti-Atashgah, Mohammad Reza Aref,	(参考訳) 近年、医療部門のオンラインプラットフォームへの移行は、データセキュリティ、プライバシ、スケーラビリティに関する課題を浮き彫りにしている。ブロックチェーン技術は、分散化され、セキュアで、不変な性質で知られており、これらのプレッシャー問題に対して実行可能なソリューションとして現れます。本稿では、拡張性、セキュリティ、データの完全性、トレーサビリティ、セキュアなデータ共有に対処するために、革新的な電子健康記録(EHR)共有とドラッグサプライチェーン管理フレームワークを提示します。このフレームワークは5つのレイヤとトランザクションを導入し、患者の健康情報に対する包括的なアクセス制御を提供することで、患者中心の医療を優先する。このアクセスは、堅牢なセキュリティ対策を維持しながら、保険請求のようなよりスムーズなプロセスを促進する。特に、並列処理の実装は、ネットワークトラフィックを最小限にしつつ、スケーラビリティとトランザクションスループットを著しく向上させます。 Caliperベンチマークによる性能評価は、並列化によって効果的に軽減される、特定のトランザクション中のプロセッサ消費がわずかに増加することを示している。 RAMの要求は依然として安定している。さらに、当社のアプローチは、トランザクションスループットを3倍にしながら、ネットワークトラフィックを著しく削減します。このフレームワークにより、患者のプライバシ、データの整合性、アクセス制御、相互運用性が保証され、従来の医療システムと整合する。さらに、透明性とリアルタイムな薬物供給監視を提供し、意思決定者に実用的な洞察を与える。医療が進化するにつれて、我々のフレームワークは革新的でスケーラブルでセキュアなシステムにとって重要な先例となる。将来的には、スケーラビリティ、現実世界のデプロイメント、標準化されたデータフォーマット、強化されたセキュリティプロトコル、プライバシ保護、IoT統合に重点を置き、規制に準拠し、進化する業界のニーズを満たすことが可能になる。 In recent years, the healthcare sector's shift to online platforms has spotlighted challenges concerning data security, privacy, and scalability. Blockchain technology, known for its decentralized, secure, and immutable nature, emerges as a viable solution for these pressing issues. This article presents an innovative Electronic Health Records (EHR) sharing and drug supply chain management framework tailored to address scalability, security, data integrity, traceability, and secure data sharing. The framework introduces five layers and transactions, prioritizing patient-centric healthcare by granting patients comprehensive access control over their health information. This access facilitates smoother processes, such as insurance claims, while maintaining robust security measures. Notably, our implementation of parallelism significantly bolsters scalability and transaction throughput while minimizing network traffic. Performance evaluations conducted through the Caliper benchmark indicate a slight increase in processor consumption during specific transactions, mitigated effectively by parallelization. RAM requirements remain largely stable. Additionally, our approach notably reduces network traffic while tripling transaction throughput. The framework ensures patient privacy, data integrity, access control, and interoperability, aligning with traditional healthcare systems. Moreover, it provides transparency and real-time drug supply monitoring, empowering decision-makers with actionable insights. As healthcare evolves, our framework sets a crucial precedent for innovative, scalable, and secure systems. Future enhancements could focus on scalability, real-world deployment, standardized data formats, reinforced security protocols, privacy preservation, and IoT integration to comply with regulations and meet evolving industry needs.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# PureLottery: 単発トーナメントアルゴリズムによる公正かつバイアス耐性の高いリーダ選挙 PureLottery: Fair and Bias-Resistant Leader Election with a Novel Single-Elimination Tournament Algorithm ( http://arxiv.org/abs/2402.17459v1 ) ライセンス: Link先を確認	Jonas Ballweg,	(参考訳) リーダ選挙(LE)は分散システムとブロックチェーン技術において重要であり、ひとつの参加者がリーダとして行動することを保証する。従来のLEメソッドは分散乱数生成(RNG)に依存しており、操作の脆弱性、公平性の欠如、検証遅延関数(VDF)や公開検証秘密共有(PVSS)といった複雑な手順の必要性といった問題に直面している。この学説はランダム化されたLEに対する新しいアプローチを示し、ゲーム理論的な仮定を利用して、参加者がリーダーとして選ばれることを目指して、機会を減少させる行為を自然に避ける。この観点は、分散RNGの必要性を排除してLEを単純化する。 PureLotteryの導入は、単一エリートスポーツトーナメントにインスパイアされたもので、ブロックチェーン環境に対して公正でバイアス耐性があり、効率的なLEソリューションを提供する。それぞれの試合に出場する2人の参加者の原則に基づいて運営され、共謀の努力は役に立たない。 PureLotteryは計算と通信の複雑さが低く、スマートコントラクトの実装に適している。誠実さに対するゲーム理論的な強いインセンティブを提供し、敵に対して堅牢であり、不正行為による選挙機会の増加を確実にする。このプロトコルは、各正直なプレイヤーが、他のn-1参加者間の敵の操作にかかわらず、少なくとも1/nの確率で勝つことを保証している。 PureLotteryは、参加者のランク付け、複数のリーダの選択、リーダの逆転といった関連する問題にも対処できる。オープンソース実装が一般公開されている。 Leader Election (LE) is crucial in distributed systems and blockchain technology, ensuring one participant acts as the leader. Traditional LE methods often depend on distributed random number generation (RNG), facing issues like vulnerability to manipulation, lack of fairness, and the need for complex procedures such as verifiable delay functions (VDFs) and publicly-verifiable secret sharing (PVSS). This Bachelor's thesis presents a novel approach to randomized LE, leveraging a game-theoretic assumption that participants, aiming to be chosen as leaders, will naturally avoid actions that diminish their chances. This perspective simplifies LE by eliminating the need for decentralized RNG. Introducing PureLottery, inspired by single-elimination sports tournaments, this method offers a fair, bias-resistant, and efficient LE solution for blockchain environments. It operates on the principle of two participants competing in each match, rendering collusion efforts useless. PureLottery stands out for its low computational and communication complexity, suitable for smart contract implementation. It provides strong game-theoretic incentives for honesty and is robust against adversaries, ensuring no increase in election chances through dishonesty. The protocol guarantees that each honest player has at least a 1/n chance of winning, irrespective of adversary manipulation among the other n-1 participants. PureLottery can also address related problems like participant ranking, electing multiple leaders, and leader aversion, showcasing its versatility across various applications, including lotteries and blockchain protocols. An open-source implementation is made available for public use.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# Bitcoin確認時間と最適価格選択の全体的アプローチ A Holistic Approach for Bitcoin Confirmation Times & Optimal Fee Selection ( http://arxiv.org/abs/2402.17474v1 ) ライセンス: Link先を確認	Rowel Gündlach, Ivo V. Stoepker, Stella Kapodistria, Jacques A. C. Resing,	(参考訳) Bitcoinは現在、大きな対速取引の対象となっている。これは、特に混雑時に、長く、非常に可変なトランザクション確認時間によって引き起こされる。ユーザーは取引手数料を増やすことで取引確認時間を短縮できる。本稿では,Bitcoinの内部動作に基づいて,例えば平均値や量子値を用いて最適な手数料を決定するためのモデルベースアプローチ(Cram\'er-Lundbergモデルに基づく)を提案し,所定の手数料の確認時間分布を正確にモデル化する。提案モデルは非常に適しており,メムプールプロセスの限界モデル(未確認トランザクションを追尾する)として,流体的限界を通じて厳密に表現し,これを拡散限界に拡張する(高集積インスタンスでの高速計算のためのCram\'er-Lundbergモデルの適用)。また、モデルパラメータを推定する手法(リアルタイムデータを含む)を提案し、モデルとデータ駆動のアプローチを組み合わせる。モデルベースのアプローチは、実世界のデータに基づいて検証され、結果として生じるトランザクション手数料は、ほとんどの場合、データ駆動型よりも優れています。 Bitcoin is currently subject to a significant pay-for-speed trade-off. This is caused by lengthy and highly variable transaction confirmation times, especially during times of congestion. Users can reduce their transaction confirmation times by increasing their transaction fee. In this paper, based on the inner workings of Bitcoin, we propose a model-based approach (based on the Cram\'er-Lundberg model) that can be used to determine the optimal fee, via, for example, the mean or quantiles, and models accurately the confirmation time distribution for a given fee. The proposed model is highly suitable as it arises as the limiting model for the mempool process (that tracks the unconfirmed transactions), which we rigorously show via a fluid limit and we extend this to the diffusion limit (an approximation of the Cram\'er-Lundberg model for fast computations in highly congested instances). We also propose methods (incorporating the real-time data) to estimate the model parameters, thereby combining model and data-driven approaches. The model-based approach is validated on real-world data and the resulting transaction fees outperform, in most instances, the data-driven ones.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# 自己相関のディエントロピーを用いたアナログセキュリティプリミティブの複雑度評価 Complexity Assessment of Analog Security Primitives Using the Disentropy of Autocorrelation ( http://arxiv.org/abs/2402.17488v1 ) ライセンス: Link先を確認	Paul Jimenez, Raphael Cardoso, Maurìcio Gomes de Queiroz, Mohab Abdalla, Cédric Marchand, Xavier Letartre, Fabio Pavanello,	(参考訳) 信号の規則性の研究は、典型的には心電図 (ECG) や筋電図 (EMG) の信号を分析する医学において非常に重要であるが、気候研究、金融学、安全保障学においても重要である。本研究では,Pseudo-Random Number Generators (PRNG) などのセキュリティプリミティブに焦点を当てる。このようなプリミティブは、アプリケーションに対する十分なセキュリティを保証するために、レスポンスにおいて高いレベルの複雑さやエントロピーを持つ必要があります。応答の複雑さを評価する方法はいくつかあり、特にバイナリドメインではそうである。光(フォトニック)PUFのようなアナログPUFの開発により、例えばアナログ信号をバイナリに変換する前に、アナログ領域の複雑さを評価することができる。本研究では, 自己相関の非エントロピーの可能性を, PUFやPRNGなどのセキュリティプリミティブとアナログ出力, 応答の複雑さの尺度として検討することとした。近似エントロピー(ApEn)やファジィエントロピー(FuzEn)といったアナログ信号の正則性を評価するために用いられる他の指標と比較する。自己相関の非エントロピーは、ApEnやFuzEnよりも優れたコントラストを持つアナログドメインとバイナリドメインにおいて、よく知られたPRNGと最適化されていないPRNGとを区別できることを示す。次に、自己相関の非エントロピーは、PUFs応答に注入された小さなパターンを検知し、フォトニックPUFsシミュレーションに応用できることを示す。 The study of regularity in signals can be of great importance, typically in medicine to analyse electrocardiogram (ECG) or electromyography (EMG) signals, but also in climate studies, finance or security. In this work we focus on security primitives such as Physical Unclonable Functions (PUFs) or Pseudo-Random Number Generators (PRNGs). Such primitives must have a high level of complexity or entropy in their responses to guarantee enough security for their applications. There are several ways of assessing the complexity of their responses, especially in the binary domain. With the development of analog PUFs such as optical (photonic) PUFs, it would be useful to be able to assess their complexity in the analog domain when designing them, for example, before converting analog signals into binary. In this numerical study, we decided to explore the potential of the disentropy of autocorrelation as a measure of complexity for security primitives as PUFs or PRNGs with analog output or responses. We compare this metric to others used to assess regularities in analog signals such as Approximate Entropy (ApEn) and Fuzzy Entropy (FuzEn). We show that the disentropy of autocorrelation is able to differentiate between well-known PRNGs and non-optimised or bad PRNGs in the analog and binary domain with a better contrast than ApEn and FuzEn. Next, we show that the disentropy of autocorrelation is able to detect small patterns injected in PUFs responses and then we applied it to photonic PUFs simulations.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# ブロックチェーンベースのサプライチェーン検証によるOpenEN-RAN機器のセキュア化 Securing OPEN-RAN Equipment Using Blockchain-Based Supply Chain Verification ( http://arxiv.org/abs/2402.17632v1 ) ライセンス: Link先を確認	Ali Mehrban, Mostafa Jani,	(参考訳) OPEN-RANネットワークの非集約的でマルチベンダな性質は、新たなサプライチェーンセキュリティリスクを導入し、機器の信頼性と整合性において重要な課題となっている。製造と統合の脆弱性を軽減するためには、ロバストなソリューションが必要である。本稿では,そのライフサイクルを通じてOPEN-RAN機器をセキュアにするための,ブロックチェーンベースの新たなアプローチを提案する。ファームウェア認証コード、認可されたブロックチェーン台帳、機器ノードバリデータを組み合わせることで、私たちは、実績を追跡するためにタンパー耐性のエコシステムを設計します。設計の概要は概念的ではあるが、将来の実現のための基盤とロードマップを確立している。ファームウェアの署名したハッシュやスマートコントラクトなどのコアコンポーネントの開発,厳格なパフォーマンス評価を通じて,本論文は概念から実践へと進化することができる。 OPEN-RANサプライチェーンを安全な状態にし、さらなる研究と実世界の展開を後押しする、明確な可能性を秘めている。 The disaggregated and multi-vendor nature of OPEN-RAN networks introduces new supply chain security risks, making equipment authenticity and integrity crucial challenges. Robust solutions are needed to mitigate vulnerabilities in manufacturing and integration. This paper puts forth a novel blockchain-based approach to secure OPEN-RAN equipment through its lifecycle. By combining firmware authentication codes, a permissioned blockchain ledger, and equipment node validators, we architect a tamper-resistant ecosystem to track provenance. The outlined design, while conceptual, establishes a foundation and roadmap for future realization. Through careful implementation planning, development of core components like firmware signed hashes and smart contracts, and rigorous performance evaluation, this paper can evolve from concept to practice. There is a vivid potential to make OPEN-RAN supply chains corner to corner secure, igniting further research and real-world deployment.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# SoK: 暗号通貨ウォレット - 認証要因に基づくセキュリティレビューと分類 SoK: Cryptocurrency Wallets -- A Security Review and Classification based on Authentication Factors ( http://arxiv.org/abs/2402.17659v1 ) ライセンス: Link先を確認	Ivan Homoliak, Martin Perešíni,	(参考訳) 本研究では,既存の暗号通貨ウォレットソリューションについて,ユーザの視点から,認証方法や要因について検討する。特に、ブロックチェーンに対して検証される認証要因と、ローカル(あるいは集中型パーティに対して)で検証される認証要因を区別します。これを考慮して、ブロックチェーンに対する$k-factor$認証と、認証ファクタに対する$k-factor$認証の概念を定義します。これらの概念に基づいて,認証方式の分類を提案する。当社の分類を拡張して、(交換や共同署名サービスのような)中央集権的な当事者によるしきい値署名と署名トランザクションに対応できるようにします。最後に、当社の分類を既存のウォレットソリューションに適用し、各種セキュリティ機能とキー管理機能に基づいて比較する。 In this work, we review existing cryptocurrency wallet solutions with regard to authentication methods and factors from the user's point of view. In particular, we distinguish between authentication factors that are verified against the blockchain and the ones verified locally (or against a centralized party). With this in mind, we define notions for $k-factor$ authentication against the blockchain and $k-factor$ authentication against the authentication factors. Based on these notions, we propose a classification of authentication schemes. We extend our classification to accommodate the threshold signatures and signing transactions by centralized parties (such as exchanges or co-signing services). Finally, we apply our classification to existing wallet solutions, which we compare based on various security and key-management features.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# 古典通信を用いた量子暗号の中央プリミティブについて On Central Primitives for Quantum Cryptography with Classical Communication ( http://arxiv.org/abs/2402.17715v1 ) ライセンス: Link先を確認	Kai-Min Chung, Eli Goldin, Matthew Gray,	(参考訳) 最近の研究は、暗号の「量子計算古典通信(QCCC)設定(Chung et al )」を導入している。 One Way Puzzles(OWPuzz)がこの設定の自然な中央暗号プリミティブである(KhuranaとTomer)という証拠もある。プリミティブを中央と見なすには、いくつかの特性を持つ必要がある。うまく振る舞うべきであり(この論文では、増幅、組合せ、普遍的な構成を持つと考えるだろう)、他の様々なプリミティブによって暗示されるべきであり、有用なプリミティブのクラスに等価であるべきである。 OWPuzzのコンバインダ、正確性、セキュリティの増幅、ユニバーサルな構成について述べる。セキュリティ増幅の証明では、OWPuzzからの新しいよりクリーンなEFIの構成を用いており(Khurana と Tomer の結果と比較して)、OWPuzz の弱い部分へと一般化し、最も技術的に関わった部分である。 OWPuzzは、コミット、対称鍵暗号、一方向状態発生器(OWSG)、従って擬似ランダム状態(PRS)など、他のプリミティブのプリミティブによって暗示されていることが以前は知られていた。しかし、一般的なOWPuzzとOWPuzzの制限クラス(EV-OWPuzzと呼ぶ効率的な検証を伴うもの)とのブラックボックスの分離を示すことによって、OWPuzzの同値性をこれらのプリミティブの多くに排除することができる。次に、EV-OWPuzzがこれらのプリミティブのほとんどによってもたらされていることを示し、OWPuzzから分離する。この分離により、拡張PSSはAnanthらのオープンな質問に答える高圧縮PSSから分離される。 Recent work has introduced the "Quantum-Computation Classical-Communication" (QCCC) (Chung et. al.) setting for cryptography. There has been some evidence that One Way Puzzles (OWPuzz) are the natural central cryptographic primitive for this setting (Khurana and Tomer). For a primitive to be considered central it should have several characteristics. It should be well behaved (which for this paper we will think of as having amplification, combiners, and universal constructions); it should be implied by a wide variety of other primitives; and it should be equivalent to some class of useful primitives. We present combiners, correctness and security amplification, and a universal construction for OWPuzz. Our proof of security amplification uses a new and cleaner version construction of EFI from OWPuzz (in comparison to the result of Khurana and Tomer) that generalizes to weak OWPuzz and is the most technically involved section of the paper. It was previously known that OWPuzz are implied by other primitives of interest including commitments, symmetric key encryption, one way state generators (OWSG), and therefore pseudorandom states (PRS). However we are able to rule out OWPuzz's equivalence to many of these primitives by showing a black box separation between general OWPuzz and a restricted class of OWPuzz (those with efficient verification, which we call EV-OWPuzz). We then show that EV-OWPuzz are also implied by most of these primitives, which separates them from OWPuzz as well. This separation also separates extending PRS from highly compressing PRS answering an open question of Ananth et. al.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# 液体抽出誘導体(LSD)の市場ダイナミクスの探索 Exploring the Market Dynamics of Liquid Staking Derivatives (LSDs) ( http://arxiv.org/abs/2402.17748v1 ) ライセンス: Link先を確認	Xihan Xiong, Zhipeng Wang, Qin Wang,	(参考訳) StakeはEthereumがProof-of-Stakeコンセンサスに移行した後、重要なコンセプトとして登場した。 LSD(Liquid Stake Derivatives)の導入は、ソロステイクに伴う不公平な問題に効果的に対処し、市場の注目を集めている。本稿では流動性テイカー(LT)と流動性プロバイダ(LP)の両方の観点からLCD市場のダイナミクスを分析する。まず、LSDプライマリとセカンダリマーケットの価格差を定量化する。そこで我々は,LSD仲裁に対する潜在的な障壁を明らかにするために,このような不一致を利用して調停機会を活用できるかを実験的に検討した。また,流動性確保のためにLSDを供給しているLPが経験した財政利益と損失を評価した。その結果, 66%のLSD流動性供給位置は, 対応するLSDを単に保持するよりも, 年率(APR)が低いことがわかった。 Staking has emerged as a crucial concept following Ethereum's transition to Proof-of-Stake consensus. The introduction of Liquid Staking Derivatives (LSDs) has effectively addressed the illiquidity issue associated with solo staking, gaining significant market attention. This paper analyzes the LSD market dynamics from the perspectives of both liquidity takers (LTs) and liquidity providers (LPs). We first quantify the price discrepancy between the LSD primary and secondary markets. Then we investigate and empirically measure how LTs can leverage such discrepancy to exploit arbitrage opportunities, unveiling the potential barriers to LSD arbitrages. In addition, we evaluate the financial profit and losses experienced by LPs who supply LSDs for liquidity provision. Our findings reveal that 66% LSD liquidity provision positions yield an Annual Percentage Rate (APR) lower than simply holding the corresponding LSDs.	翻訳日:2024-03-18 06:59:15 公開日:2024-02-27
# モデル生成画像におけるモチベーションコンテキストのレンズによる主観的理解 Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction ( http://arxiv.org/abs/2403.05576v1 ) ライセンス: Link先を確認	Senjuti Dutta, Sherol Chen, Sunny Mak, Amnah Ahmad, Katherine Collins, Alena Butryna, Deepak Ramachandran, Krishnamurthy Dvijotham, Ellie Pavlick, Ravi Rajakumar,	(参考訳) 画像生成モデルは、様々なアプリケーションでユビキタスになる可能性がある。これらのモデルはしばしば、普遍的な基準を仮定する人間の品質判断を用いて微調整され評価され、そのようなタスクの主観性を考慮できない。主観性とその影響の規模を定量化するために, 異なるユースケースにおいて, 人間のアノテータ間で評価がどう異なるかを測定する。従来のアノテータの主観性が潜伏する要素の影響をシミュレートし、クラウドソーシングタスクのセットをコンテキスト化するためのモチベーション(Tシャツグラフィックス、プレゼンテーションビジュアル、電話背景画像)のセットを探索する。以上の結果から,人間の画像評価は個々の文脈やコンテキストの組み合わせによって異なることが明らかとなった。この主観性に影響を与える3つの要因は、画像の外観、テキストとのイメージアライメント、テキストで言及されたオブジェクトの表現である。本研究は,生成モデルの構築と評価において,個々のユーザとコンテキストを考慮に入れることの重要性を強調した。 Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models	翻訳日:2024-03-18 06:10:13 公開日:2024-02-27
# マルチタスクメディアバイアス解析による表現の事前学習のための一般化 Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions ( http://arxiv.org/abs/2403.07910v1 ) ライセンス: Link先を確認	Tomáš Horych, Martin Wessel, Jan Philip Wahle, Terry Ruas, Jerome Waßmuth, André Greiner-Petter, Akiko Aizawa, Bela Gipp, Timo Spinde,	(参考訳) メディアバイアス検出は、伝統的に単一タスクモデルと小さなドメイン内のデータセットを使用して取り組まれてきた複雑で多面的な問題であり、結果として一般化性に欠ける。そこで本稿では,メディアバイアス検出に適した大規模マルチタスク事前学習手法であるMAGPIEを紹介する。本稿では,59のバイアス関連タスクのコンパイルであるLarge Bias Mixture(LBM)を提案する。 MAGPIE は Bias Annotation By Experts (BABE) データセットのメディアバイアス検出における従来の手法より優れており、相対的な改善は 3.3% F1スコアである。 MAGPIEはまた、メディアバイアス識別ベンチマーク(MBIB)の8つのタスクのうち5つのタスクにおいて、以前のモデルよりもパフォーマンスが良い。 RoBERTaエンコーダを使用すると、MAGPIEはシングルタスクアプローチに比べて15%の微調整ステップしか必要としない。私たちの評価では、感情や感情といったタスクがすべての学習を促進し、すべてのタスクがフェイクニュースの検出を促進し、タスクのスケーリングが最良の結果につながることが示されています。 MAGPIEは、MTLがメディアバイアス検出に対処し、既存のモデルの精度と効率を高めるための有望なアプローチであることを確認した。さらに、LBMはメディアバイアスMTLに焦点を当てた最初のリソースコレクションである。 Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. MAGPIE also performs better than previous models on 5 out of 8 tasks in the Media Bias Identification Benchmark (MBIB). Using a RoBERTa encoder, MAGPIE needs only 15% of finetuning steps compared to single-task approaches. Our evaluation shows, for instance, that tasks like sentiment and emotionality boost all learning, all tasks enhance fake news detection, and scaling tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL.	翻訳日:2024-03-18 06:00:28 公開日:2024-02-27
# 医療システムにおける公正で有用で信頼性の高いAIモデルを評価するためのフレームワークFURM Ground Standing on FURM ground -- A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems ( http://arxiv.org/abs/2403.07911v1 ) ライセンス: Link先を確認	Alison Callahan, Duncan McElfresh, Juan M. Banda, Gabrielle Bunney, Danton Char, Jonathan Chen, Conor K. Corbin, Debadutta Dash, Norman L. Downing, Srikar Nallan, Sneha S. Jain, Nikesh Kotecha, Jonathan Masterson, Michelle M. Mello, Keith Morse, Abby Pandya, Anurang Revri, Aditya Sharma, Christopher Sharp, Rahul Thapa, Michael Wornow, Alaa Youssef, Michael A. Pfeffer, Nigam H. Shah,	(参考訳) 人工知能(AI)を用いて患者のケアや手術プロセスを導くことの影響は、AIモデルのアウトプットと、そのアウトプットに基づく意思決定プロトコルと、必要な後続のアクションを取るために必要なステークホルダーの能力の相互運用である。このインタープレイの効果をデプロイ前に推定し、その後リアルタイムで研究することは、AIモデル開発と達成可能な利益の間のギャップを埋めるのに不可欠である。これを達成するために、Stanford Health CareのData Scienceチームは、潜在的な価値のミスマッチを特定する倫理的レビュー、有用性を推定するシミュレーション、持続可能性を評価するための財務予測、ITの実現可能性を決定するための分析、デプロイメント戦略の設計、予測監視と評価計画の推奨によって、公正で有用なAIモデル(FURM)を識別するメカニズムを開発した。臨床および手術環境にまたがる6つのAI誘導ソリューションを評価するためのFURMアセスメントについて報告する。評価プロセスを説明し、6つのアセスメントを要約し、同様のアセスメントを行うためのフレームワークを共有します。私たちが評価した6つのソリューションのうち、2つは計画と実装フェーズに移行しました。我々の新しいコントリビューション - シミュレーションによる有用性の推定、持続可能性の定量化のための財務予測、倫理的評価を行うプロセス - と、その基盤となる方法とオープンソースツール - は、他の医療システムにおいて、候補AIソリューションの実行可能な評価を行うことができる。 The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge the chasm between AI model development and achievable benefit. To accomplish this, the Data Science team at Stanford Health Care has developed a mechanism to identify fair, useful and reliable AI models (FURM) by conducting an ethical review to identify potential value mismatches, simulations to estimate usefulness, financial projections to assess sustainability, as well as analyses to determine IT feasibility, design a deployment strategy, and recommend a prospective monitoring and evaluation plan. We report on FURM assessments done to evaluate six AI guided solutions for potential adoption, spanning clinical and operational settings, each with the potential to impact from several dozen to tens of thousands of patients each year. We describe the assessment process, summarize the six assessments, and share our framework to enable others to conduct similar assessments. Of the six solutions we assessed, two have moved into a planning and implementation phase. Our novel contributions - usefulness estimates by simulation, financial projections to quantify sustainability, and a process to do ethical assessments - as well as their underlying methods and open source tools, are available for other healthcare systems to conduct actionable evaluations of candidate AI solutions.	翻訳日:2024-03-18 05:50:41 公開日:2024-02-27
# HandGCAT: モノクロ画像からの閉塞型3Dハンドメッシュ再構成 HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images ( http://arxiv.org/abs/2403.07912v1 ) ライセンス: Link先を確認	Shuaibing Wang, Shunli Wang, Dingkang Yang, Mingcheng Li, Ziyun Qian, Liuzhen Su, Lihua Zhang,	(参考訳) モノクロ画像から3Dハンドメッシュを再構築するための頑健で正確な手法を提案する。これは非常に難しい問題であり、しばしば手は物体によって取り除かれる。従来の作品では2次元手ポーズ情報は無視されることが多く、それらは隠蔽領域と強く相関する手前の知識を含んでいる。そこで本研究では,ハンドメッシュ再構成ネットワークHandGCATを提案する。具体的には、知識誘導グラフ変換(KGC)モジュールとCATモジュールを設計した。 KGCはグラフ畳み込みにより2次元手ポーズから手先情報を抽出する。 CATは、高い相関性を考慮して、前もって閉塞領域に融合する。 HO3D v2、HO3D v3、DexYCBといった、手動オクルージョンに挑戦する一般的なデータセットに関する大規模な実験は、HandGCATが最先端のパフォーマンスに達することを実証しています。コードはhttps://github.com/heartStrive/HandGCATで公開されている。 We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features. Specifically, we designed the Knowledge-Guided Graph Convolution (KGC) module and the Cross-Attention Transformer (CAT) module. KGC extracts hand prior information from 2D hand pose by graph convolution. CAT fuses hand prior into occluded regions by considering their high correlation. Extensive experiments on popular datasets with challenging hand-object occlusions, such as HO3D v2, HO3D v3, and DexYCB demonstrate that our HandGCAT reaches state-of-the-art performance. The code is available at https://github.com/heartStrive/HandGCAT.	翻訳日:2024-03-18 05:50:41 公開日:2024-02-27
# ACTrack: ビジュアルオブジェクト追跡のための時空間条件の追加 ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking ( http://arxiv.org/abs/2403.07914v1 ) ライセンス: Link先を確認	Yushan Han, Kaer Huang,	(参考訳) オブジェクトの時空間的関係を効果的にモデル化することは、視覚的オブジェクト追跡(VOT)において重要な課題である。既存の手法は外観に基づく類似性や長期関係モデリングによって追跡され、連続するフレーム間の時間的コンテキストは容易に見過ごされてしまう。さらに、スクラッチや微調整された大型モデルからトラッカーをトレーニングするには、より多くの時間とメモリ消費が必要である。本稿では,加法的時空間条件をもつ新しい追跡フレームワークACTrackを提案する。パラメータを凍結することで、トレーニング済みのTransformerバックボーンの品質と機能を保持し、トレーニング可能な軽量な加算ネットをトラッキングの時空間関係をモデル化する。本研究では,空間的特徴の整合性を確保するための付加的なシアム畳み込みネットワークを設計し,時間的シーケンスモデリングを行い,追跡パイプラインを簡素化する。いくつかのベンチマーク実験の結果、ACTrackはトレーニング効率と追跡性能のバランスをとることができた。 Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked. Moreover, training trackers from scratch or fine-tuning large pre-trained models needs more time and memory consumption. In this paper, we present ACTrack, a new tracking framework with additive spatio-temporal conditions. It preserves the quality and capabilities of the pre-trained Transformer backbone by freezing its parameters, and makes a trainable lightweight additive net to model spatio-temporal relations in tracking. We design an additive siamese convolutional network to ensure the integrity of spatial features and perform temporal sequence modeling to simplify the tracking pipeline. Experimental results on several benchmarks prove that ACTrack could balance training efficiency and tracking performance.	翻訳日:2024-03-18 05:50:41 公開日:2024-02-27
# 投資フロンティアの強化:ポートフォリオ最適化のための産業レベルの深層強化学習 Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization ( http://arxiv.org/abs/2403.07916v1 ) ライセンス: Link先を確認	Philip Ndikum, Serge Ndikum,	(参考訳) 本研究は、資産クラス非依存ポートフォリオ最適化におけるDeep Reinforcement Learning(DRL)の適用を考察し、業界グレードの方法論と定量的ファイナンスを統合することを目的とする。この統合の核心は、高度なDRLアルゴリズムと現代的な計算技法を融合するだけでなく、厳密な統計分析、ソフトウェア工学、規制コンプライアンスも重視する、堅牢なフレームワークです。我々の知る限りでは、この研究は、金融強化学習とロボット工学と数理物理学のシミュレーションから現実の方法論を統合する最初の研究であり、このユニークな視点で我々のフレームワークと議論を豊かにする。我々の研究は、独自の強化学習エージェント(および対応するライブラリ)であるAlphaOptimizerNetの導入によって頂点に達した。 The State-of-the-art(SOTA)文学と我々の独自の学際方法論の合成から開発されたAlphaOptimizerNetは、現実的な制約のある様々なアセットクラスにまたがるリスク-リターン最適化の促進を実証する。これらの予備的な結果は,我々の枠組みの実践的有効性を裏付けるものである。金融セクターが高度なアルゴリズムソリューションへと向かっていくにつれて、我々の研究は理論上の進歩を現実の応用性で橋渡しし、この技術的に推進された未来における安全性と堅牢な標準を保証するためのテンプレートを提供する。 This research paper delves into the application of Deep Reinforcement Learning (DRL) in asset-class agnostic portfolio optimization, integrating industry-grade methodologies with quantitative finance. At the heart of this integration is our robust framework that not only merges advanced DRL algorithms with modern computational techniques but also emphasizes stringent statistical analysis, software engineering and regulatory compliance. To the best of our knowledge, this is the first study integrating financial Reinforcement Learning with sim-to-real methodologies from robotics and mathematical physics, thus enriching our frameworks and arguments with this unique perspective. Our research culminates with the introduction of AlphaOptimizerNet, a proprietary Reinforcement Learning agent (and corresponding library). Developed from a synthesis of state-of-the-art (SOTA) literature and our unique interdisciplinary methodology, AlphaOptimizerNet demonstrates encouraging risk-return optimization across various asset classes with realistic constraints. These preliminary results underscore the practical efficacy of our frameworks. As the finance sector increasingly gravitates towards advanced algorithmic solutions, our study bridges theoretical advancements with real-world applicability, offering a template for ensuring safety and robust standards in this technologically driven future.	翻訳日:2024-03-18 05:50:41 公開日:2024-02-27
# オープンファンデーションモデルの社会的影響について On the Societal Impact of Open Foundation Models ( http://arxiv.org/abs/2403.07918v1 ) ライセンス: Link先を確認	Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan,	(参考訳) ファンデーションモデルは強力な技術であり、どのように公開され、その社会的影響を直接形作るかである。本稿では,広範に利用可能なモデルウェイト(例えばLlama 2, 安定拡散XL)として定義したオープンファンデーションモデルに焦点を当てる。オープンファンデーションモデルの5つの特徴(例えば、カスタマイズ性の向上、監視の低さなど)を識別し、その利点とリスクを両立させます。オープンファンデーションモデルは、イノベーション、競争、意思決定力の分散、透明性にまたがるいくつかの注意点を含む、大きなメリットを示します。誤用リスクを理解するため,リスク評価フレームワークを設計し,その限界リスクを分析する。いくつかの誤用ベクター(例えばサイバー攻撃、バイオ兵器)において、既存の技術と比較してオープンファンデーションモデルの限界リスクを効果的に特徴づけるには、現在の研究は不十分であることがわかった。このフレームワークは、なぜ限界リスクが低いのかを説明するのに役立ち、過去の作業が異なる仮定でフレームワークの異なるサブセットに焦点を当てていることを明らかにすることで、誤用リスクに関する不一致を明確にし、より建設的な議論を進めるための道筋を明確にする。全体として、我々の研究は、理論上の利点とリスクを実証的に検証するためにどんな研究が必要なのかを概説することで、オープンファンデーションモデルの社会的影響のより根底的な評価を支援するのに役立ちます。 Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to both their benefits and risks. Open foundation models present significant benefits, with some caveats, that span innovation, competition, the distribution of decision-making power, and transparency. To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk. Across several misuse vectors (e.g. cyberattacks, bioweapons), we find that current research is insufficient to effectively characterize the marginal risk of open foundation models relative to pre-existing technologies. The framework helps explain why the marginal risk is low in some cases, clarifies disagreements about misuse risks by revealing that past work has focused on different subsets of the framework with different assumptions, and articulates a way forward for more constructive debate. Overall, our work helps support a more grounded assessment of the societal impact of open foundation models by outlining what research is needed to empirically validate their theoretical benefits and risks.	翻訳日:2024-03-18 05:50:41 公開日:2024-02-27
# コラプソ・オブジェティヴォ・パラ・ロス問題における情報伝達の解釈 Una interpretacion de colapso objetivo para los problemas de la medida y la clasicalizacion donde no se conserva la informacion ( http://arxiv.org/abs/2403.01584v1 ) ライセンス: Link先を確認	Eduardo Franco Sotelo Baz\'an	(参考訳) 本研究は、古典化の問題とそれらを一つの問題に統合する計測の問題について研究した: 崩壊の問題は、この目的のために提案された客観的崩壊の解釈において、作業プログラム(古典化のプログラム)が開発され、公式な解釈の限界を克服するものである。 This program explain that the classicization arises by alternating the unitary evolution and the collapses, due to the redistribution of energy, so the classical mechanics emerges as a good approximation; however, was defined deep chaos as the regime where macrosystems are so sensitive to the initial conditions that are affected by the collapses and indeterminism of quantum mechanics, allowing the emergence of thermodynamics with its irreversibility; at this level the observation occurs because the measurement device (classic) extends their collapses at microsystem through quantum entanglement. この研究では、ブラックホールにおける古典化のプログラムの拡張を行い、その熱力学における崩壊の役割について説明され、さらに量子デコヒーレンスプログラムが批判され、発展中の物理情報の理論が提示された。 In this work was researched the problems of classicalization and measurement unifying them in a single problem: the problem of collapse, for this purpose was developed a working program -- the classicalization's program -- in an interpretation of objective collapse that was proposed for this purpose, and which overcomes the limitations of the official interpretation. This program explain that the classicization arises by alternating the unitary evolution and the collapses, due to the redistribution of energy, so the classical mechanics emerges as a good approximation; however, was defined deep chaos as the regime where macrosystems are so sensitive to the initial conditions that are affected by the collapses and indeterminism of quantum mechanics, allowing the emergence of thermodynamics with its irreversibility; at this level the observation occurs because the measurement device (classic) extends their collapses at microsystem through quantum entanglement. In this work, was did an extension of the classicalization's program at the black holes, and was explained -- at a level still in development -- about the role of the collapses in its thermodynamic; in addition, the program of quantum decoherence was critized, and was presented a theory of physical information that was developing.	翻訳日:2024-03-10 23:51:11 公開日:2024-02-27
# テキスト分類における正規化手法の有効性の比較:データ不足状況における単純で複雑なモデル Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation ( http://arxiv.org/abs/2403.00825v1 ) ライセンス: Link先を確認	Jongga Lee, Jaeseung Yim, Seohee Park, Changwon Lim	(参考訳) テキスト分類は、あらかじめ定義されたクラスに文書を割り当てるタスクである。しかし、十分なラベル付き文書の取得やラベル付けは高価である。本稿では,いくつかのラベル付きデータのみ利用可能な分類モデルに対する正規化手法の効果について検討する。簡単な単語埋め込みモデルと複雑なモデル(CNN, BiLSTM)を比較した。教師付き学習では、敵対的トレーニングはモデルをさらに規則化することができる。ラベルなしデータセットが利用可能であれば、Piモデルや仮想敵トレーニングのような半教師付き学習手法を用いてモデルを正規化することができる。 4つのテキスト分類データセット(AG News, DBpedia, Yahoo! Answers, Yelp Polarity)の正規化効果を、ラベル付きトレーニング文書の0.1%から0.5%のみを用いて評価する。単純なモデルは、完全な教師付き学習において比較的よく機能するが、敵対的なトレーニングと半教師付き学習の助けを借りて、単純で複雑なモデルの両方を正規化し、複雑なモデルに対してより良い結果を与えることができる。単純なモデルは過剰適合に対して堅牢であるが、十分に設計された事前信念を持つ複雑なモデルは過適合に対しても堅牢である。 Text classification is the task of assigning a document to a predefined class. However, it is expensive to acquire enough labeled documents or to label them. In this paper, we study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models (CNN and BiLSTM). In supervised learning, adversarial training can further regularize the model. When an unlabeled dataset is available, we can regularize the model using semi-supervised learning methods such as the Pi model and virtual adversarial training. We evaluate the regularization effects on four text classification datasets (AG news, DBpedia, Yahoo! Answers, Yelp Polarity), using only 0.1% to 0.5% of the original labeled training documents. The simple model performs relatively well in fully supervised learning, but with the help of adversarial training and semi-supervised learning, both simple and complex models can be regularized, showing better results for complex models. Although the simple model is robust to overfitting, a complex model with well-designed prior beliefs can be also robust to overfitting.	翻訳日:2024-03-10 23:50:17 公開日:2024-02-27
# GIN-SD:位置エンコーディングと注意融合による不完全ノードグラフのソース検出 GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion ( http://arxiv.org/abs/2403.00014v1 ) ライセンス: Link先を確認	Le Cheng, Peican Zhu, Keke Tang, Chao Gao, Zhen Wang	(参考訳) グラフにおけるソース検出は、噂ソース識別の領域において堅牢な有効性を示している。近年のソリューションでは、ディープニューラルネットワークを活用することでパフォーマンスが向上しているが、完全なユーザデータを必要とすることが多い。本稿では,不完全なユーザデータを用いた噂ソース検出という,より困難な課題に対処し,この課題に対処するための新しいフレームワーク,すなわち,位置エンコーディングとアテンティブフュージョン(GIN-SD)による不完全なノード付きグラフのソース検出を提案する。具体的には,不完全なノードを識別するために位置埋め込みモジュールを使用し,情報伝達能力の高いノードに注目するセルフアテンション機構を採用している。また,ソースノード数と非ソースノード数の差による予測バイアスを軽減するために,クラスバランス機構を導入する。 GIN-SDの有効性と最先端手法に対する優位性を検証する。 Source detection in graphs has demonstrated robust efficacy in the domain of rumor source identification. Although recent solutions have enhanced performance by leveraging deep neural networks, they often require complete user data. In this paper, we address a more challenging task, rumor source detection with incomplete user data, and propose a novel framework, i.e., Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion (GIN-SD), to tackle this challenge. Specifically, our approach utilizes a positional embedding module to distinguish nodes that are incomplete and employs a self-attention mechanism to focus on nodes with greater information transmission capacity. To mitigate the prediction bias caused by the significant disparity between the numbers of source and non-source nodes, we also introduce a class-balancing mechanism. Extensive experiments validate the effectiveness of GIN-SD and its superiority to state-of-the-art methods.	翻訳日:2024-03-05 23:17:10 公開日:2024-02-27
# 雑音データからの深層学習のための情報的特徴と例の優先順位付け Prioritizing Informative Features and Examples for Deep Learning from Noisy Data ( http://arxiv.org/abs/2403.00013v1 ) ライセンス: Link先を確認	Dongmin Park	(参考訳) 本稿では,開発プロセスの各段階を強化するために,情報的特徴や事例を優先するシステムフレームワークを提案する。具体的には,情報的特徴と例を優先し,特徴学習,データラベリング,データ選択の性能を向上させる。まず, 補助的分散データを用いて, 対象課題の解決に固有の情報的特徴のみを抽出する手法を提案する。分布外データを用いて,対象分布の雑音特性を非活性化する。次に、能動学習のラベル付けコストを削減するために、ラベルなしノイズデータから情報的サンプルを優先する手法を提案する。情報化事例の選択を試み,ノイズの多い例の選定を誘導する純度情報ジレンマを解決するために,純度と情報化のバランスを最も良くするメタモデルを提案する。最後に,ラベル付きノイズデータから有意な例を優先し,データ選択性能を維持する手法を提案する。ラベル付き画像ノイズデータに対しては,近隣サンプルの信頼度を考慮したデータ選択手法を提案する。そこで,ラベル付きテキスト雑音データに対して,指示の質を評価・評価するための多様性を考慮した指示選択手法を提案する。全体として、我々の統一フレームワークは、ノイズの多いデータに頑健なディープラーニング開発プロセスを誘導し、現実世界のアプリケーションでノイズの多い機能や例を効果的に緩和します。 In this dissertation, we propose a systemic framework that prioritizes informative features and examples to enhance each stage of the development process. Specifically, we prioritize informative features and examples and improve the performance of feature learning, data labeling, and data selection. We first propose an approach to extract only informative features that are inherent to solving a target task by using auxiliary out-of-distribution data. We deactivate the noise features in the target distribution by using that in the out-of-distribution data. Next, we introduce an approach that prioritizes informative examples from unlabeled noisy data in order to reduce the labeling cost of active learning. In order to solve the purity-information dilemma, where an attempt to select informative examples induces the selection of many noisy examples, we propose a meta-model that finds the best balance between purity and informativeness. Lastly, we suggest an approach that prioritizes informative examples from labeled noisy data to preserve the performance of data selection. For labeled image noise data, we propose a data selection method that considers the confidence of neighboring samples to maintain the performance of the state-of-the-art Re-labeling models. For labeled text noise data, we present an instruction selection method that takes diversity into account for ranking the quality of instructions with prompting, thereby enhancing the performance of aligned large language models. Overall, our unified framework induces the deep learning development process robust to noisy data, thereby effectively mitigating noisy features and examples in real-world applications.	翻訳日:2024-03-05 23:16:54 公開日:2024-02-27
# 順序保存分割によるタイミング予測のためのpreroutgnn:グローバル回路事前学習、局所遅延学習、注意セルモデリング PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling ( http://arxiv.org/abs/2403.00012v1 ) ライセンス: Link先を確認	Ruizhe Zhong, Junjie Ye, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Junchi Yan	(参考訳) チップ設計における候補セル配置の品質評価のために, プレルーティングタイミング予測が研究されている。ピンレベル(スラック、スルー)とエッジレベル(ネット遅延、セル遅延)の両方のタイミングメトリクスを、時間を要するルーティングなしで直接推定する。しかし、大規模産業回路における長いタイミングパスのため、信号の減衰やエラーの蓄積に苦しむことが多い。これらの課題に対処するために,我々は二段階アプローチを提案する。まず、回路網リストからグローバルグラフ埋め込みを学習するグラフオートエンコーダを事前学習するためのグローバル回路トレーニングを提案する。次に,学習グラフ埋め込みと回路グラフのトポロジカルソートシーケンスに従って,gcn上のメッセージパッシングのためのノード更新方式を提案する。このスキームは、更新シーケンス内の隣接する2つのピン間の局所時間遅延を残留的にモデル化し、新しい注意機構を介して各セル内のルックアップテーブル情報を抽出する。大規模回路を効率的に処理するために,トポロジ依存を維持しながらメモリ消費を削減する順序保存分割方式を導入する。 21個の実世界の回路の実験では、スラック予測のための新しいSOTA R2が0.93となる。コードはhttps://github.com/thinklab-sjtu/eda-ai。 Pre-routing timing prediction has been recently studied for evaluating the quality of a candidate cell placement in chip design. It involves directly estimating the timing metrics for both pin-level (slack, slew) and edge-level (net delay, cell delay), without time-consuming routing. However, it often suffers from signal decay and error accumulation due to the long timing paths in large-scale industrial circuits. To address these challenges, we propose a two-stage approach. First, we propose global circuit training to pre-train a graph auto-encoder that learns the global graph embedding from circuit netlist. Second, we use a novel node updating scheme for message passing on GCN, following the topological sorting sequence of the learned graph embedding and circuit graph. This scheme residually models the local time delay between two adjacent pins in the updating sequence, and extracts the lookup table information inside each cell via a new attention mechanism. To handle large-scale circuits efficiently, we introduce an order preserving partition scheme that reduces memory consumption while maintaining the topological dependencies. Experiments on 21 real world circuits achieve a new SOTA R2 of 0.93 for slack prediction, which is significantly surpasses 0.59 by previous SOTA method. Code will be available at: https://github.com/Thinklab-SJTU/EDA-AI.	翻訳日:2024-03-05 23:16:27 公開日:2024-02-27
# 大規模言語モデルのための深層学習検出法-科学的コンテンツ Deep Learning Detection Method for Large Language Models-Generated Scientific Content ( http://arxiv.org/abs/2403.00828v1 ) ライセンス: Link先を確認	Bushra Alhijawi, Rawan Jarrar, Aseel AbuAlRub, and Arwa Bader	(参考訳) GPT-3 や BERT のような大規模言語モデル (LLM) は、テキストの書き方や通信方法を再定義する。これらのモデルは、人間が書いたものと区別できない科学的コンテンツを生成する可能性がある。したがって、LLMは出版物の完全性と信頼性に依存する科学界に深刻な結果をもたらす。本稿では,ChatGPTを用いた科学テキスト検出手法であるAI-Catcherを提案する。 AI-Catcherは、2つのディープラーニングモデル、多層パーセプトロン(MLP)と畳み込みニューラルネットワーク(CNN)を統合する。 MLPは言語的特徴と統計的特徴の特徴表現を学習する。 CNNは、テキストコンテンツからシーケンシャルパターンの高レベル表現を抽出する。 AI-Catcherは、MLPとCNNから派生した隠れパターンを融合するマルチモーダルモデルである。さらに、AI生成テキスト検出ツールであるAITxtを強化するために、新たなChatGPT生成科学テキストデータセットが収集される。 AIGTxtには10のドメインにまたがる学術論文から収集された3000のレコードが含まれており、人書き、チャットGPT生成、混合テキストの3つのクラスに分けられている。 AI-Catcherの性能を評価するために,いくつかの実験を行った。比較結果は、AI-Catcherが人間の書き起こしとChatGPT生成した科学的テキストを、代替手法よりも正確に区別する能力を示している。 AI-Catcherの精度は平均37.4%向上した。 Large Language Models (LLMs), such as GPT-3 and BERT, reshape how textual content is written and communicated. These models have the potential to generate scientific content that is indistinguishable from that written by humans. Hence, LLMs carry severe consequences for the scientific community, which relies on the integrity and reliability of publications. This research paper presents a novel ChatGPT-generated scientific text detection method, AI-Catcher. AI-Catcher integrates two deep learning models, multilayer perceptron (MLP) and convolutional neural networks (CNN). The MLP learns the feature representations of the linguistic and statistical features. The CNN extracts high-level representations of the sequential patterns from the textual content. AI-Catcher is a multimodal model that fuses hidden patterns derived from MLP and CNN. In addition, a new ChatGPT-Generated scientific text dataset is collected to enhance AI-generated text detection tools, AIGTxt. AIGTxt contains 3000 records collected from published academic articles across ten domains and divided into three classes: Human-written, ChatGPT-generated, and Mixed text. Several experiments are conducted to evaluate the performance of AI-Catcher. The comparative results demonstrate the capability of AI-Catcher to distinguish between human-written and ChatGPT-generated scientific text more accurately than alternative methods. On average, AI-Catcher improved accuracy by 37.4%.	翻訳日:2024-03-05 23:07:36 公開日:2024-02-27
# 外部プロキシメトリクスフィードバックによる言語モデルの自己定義 Self-Refinement of Language Models from External Proxy Metrics Feedback ( http://arxiv.org/abs/2403.00827v1 ) ライセンス: Link先を確認	Keshav Ramji, Young-Suk Lee, Ram\'on Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos	(参考訳) 大きな言語モデル(llm)では、応答を提供する際に複数の目的をキャプチャすることが望ましいことが多い。例えば、文書接地応答生成では、エージェント応答は、与えられた文書に接地されている間、ユーザのクエリに関連することが期待される。本稿では,Proxy Metric-based Self-Refinement (ProMiSe)を導入し,外部メトリクスフィードバックによって導かれる品質の重要次元に沿ってLLMが独自の初期応答を洗練し,全体的な最終応答を向上する。 promiseは原則固有のプロキシメトリクスを通じて、応答品質に対するフィードバックを活用し、その応答を1つの原則として反復的に洗練します。本稿では,オープンソースの言語モデルであるFlan-T5-XXLとLlama-2-13B-ChatにProMiSeを適用し,その性能を評価する。さらに,promiseが生成する合成対話データに対するllama-2-13b-chatの微調整により,ゼロショットベースラインよりも大幅に性能が向上することを示す。 It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial response along key dimensions of quality guided by external metrics feedback, yielding an overall better final response. ProMiSe leverages feedback on response quality through principle-specific proxy metrics, and iteratively refines its response one principle at a time. We apply ProMiSe to open source language models Flan-T5-XXL and Llama-2-13B-Chat, to evaluate its performance on document-grounded question answering datasets, MultiDoc2Dial and QuAC, demonstrating that self-refinement improves response quality. We further show that fine-tuning Llama-2-13B-Chat on the synthetic dialogue data generated by ProMiSe yields significant performance improvements over the zero-shot baseline as well as a supervised fine-tuned model on human annotated data.	翻訳日:2024-03-05 23:07:17 公開日:2024-02-27
# LLMGuard: 安全でないLLM動作に対するガード LLMGuard: Guarding Against Unsafe LLM Behavior ( http://arxiv.org/abs/2403.00826v1 ) ライセンス: Link先を確認	Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan	(参考訳) エンタープライズ環境でのLarge Language Models(LLM)の台頭は、新たな機会と能力をもたらすが、規制に違反し、法的懸念を持つ可能性のある、不適切な、偏見のある、誤解を招くコンテンツを生成するリスクも生じる。 LLMアプリケーションとのユーザインタラクションを監視し、特定の動作や会話トピックに対してコンテンツをフラグするツールである"LLMGuard"を提案する。 LLMGuardは検出器のアンサンブルを使っている。 Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.	翻訳日:2024-03-05 23:06:55 公開日:2024-02-27
# 情報フロー経路:大規模言語モデルの自動解釈 Information Flow Routes: Automatically Interpreting Language Models at Scale ( http://arxiv.org/abs/2403.00824v1 ) ライセンス: Link先を確認	Javier Ferrando and Elena Voita	(参考訳) 情報はモデルに実装されたメカニズムを介してネットワーク内のルートによって流れる。これらのルートは、ノードがネットワーク内の操作にトークン表現とエッジに対応するグラフとして表現できる。予測毎に最も重要なノードとエッジのみを残すように、これらのグラフをトップダウンで自動的に構築します。アクティベーションパッチを頼りにしている既存のワークフローとは対照的に、私たちは属性を通じてこれを実行します。予測テンプレートを慎重に設計する人間は必要ありませんし、任意の予測のための情報フロールート(許可されたテンプレートに含まれるものだけでなく)を抽出することも可能です。結果として、特定の種類の予測、または異なるドメインに対して、モデル行動全般について話すことができる。 Llama 2 を用いて実験した結果,従来のトークンヘッドやサブワードマージヘッドなど,注意頭の役割が全体的に重要であることがわかった。次に、同じ部分のトークンを扱う場合、llama 2の動作に類似性を見出す。最後に、いくつかのモデルコンポーネントは、コーディングや多言語テキストなどのドメインに特化できることを示す。 Information flows by routes inside the network via mechanisms implemented in the model. These routes can be represented as graphs where nodes correspond to token representations and edges to operations inside the network. We automatically build these graphs in a top-down manner, for each prediction leaving only the most important nodes and edges. In contrast to the existing workflows relying on activation patching, we do this through attribution: this allows us to efficiently uncover existing circuits with just a single forward pass. Additionally, the applicability of our method is far beyond patching: we do not need a human to carefully design prediction templates, and we can extract information flow routes for any prediction (not just the ones among the allowed templates). As a result, we can talk about model behavior in general, for specific types of predictions, or different domains. We experiment with Llama 2 and show that the role of some attention heads is overall important, e.g. previous token heads and subword merging heads. Next, we find similarities in Llama 2 behavior when handling tokens of the same part of speech. Finally, we show that some model components can be specialized on domains such as coding or multilingual texts.	翻訳日:2024-03-05 23:06:39 公開日:2024-02-27
# Meta-Tasks: メタラーニング規則化の代替的考え方 Meta-Tasks: An alternative view on Meta-Learning Regularization ( http://arxiv.org/abs/2402.18599v1 ) ライセンス: Link先を確認	Mohammad Rostami, Atik Faysal, Huaxia Wang, Avimanyu Sahoo and Ryan Antle	(参考訳) ラベル付きデータの不足により、FSL(Few-shot Learning)は難しい機械学習問題である。新規タスクとトレーニングタスクの両方で効果的に一般化する能力は、FSLにとって重要な障壁である。本稿では,未ラベルのサンプルを生かしながら,トレーニングと新規タスクの両方に一般化できる新しいソリューションを提案する。この手法は,外ループを更新する前に<meta-tasks''として非教師技術を用いて埋め込みモデルを洗練する。実験の結果,提案手法は, より高速かつ優れた収束, より低い一般化, 標準偏差誤差を伴い, 新規および訓練作業において良好に動作し, FSLにおける実用的応用の可能性を示している。実験の結果,提案手法はプロトタイプネットワークを3.9%上回る性能を示した。 Few-shot learning (FSL) is a challenging machine learning problem due to a scarcity of labeled data. The ability to generalize effectively on both novel and training tasks is a significant barrier to FSL. This paper proposes a novel solution that can generalize to both training and novel tasks while also utilizing unlabeled samples. The method refines the embedding model before updating the outer loop using unsupervised techniques as ``meta-tasks''. The experimental results show that our proposed method performs well on novel and training tasks, with faster and better convergence, lower generalization, and standard deviation error, indicating its potential for practical applications in FSL. The experimental results show that the proposed method outperforms prototypical networks by 3.9%.	翻訳日:2024-03-01 17:12:27 公開日:2024-02-27
# 注:進化ゲーム理論焦点情報健康:不完全な情報と繰り返しジレンマの期待値を用いたess探索法に基づく人狼ゲームによるカクテルパーティ効果 Note: Evolutionary Game Theory Focus Informational Health: The Cocktail Party Effect Through Werewolfgame under Incomplete Information and ESS Search Method Using Expected Gains of Repeated Dilemmas ( http://arxiv.org/abs/2402.18598v1 ) ライセンス: Link先を確認	Yasuko Kawahata	(参考訳) 我々は,カクテルパーティー効果による情報破壊の実態を,不完全な情報ゲームや,複数のオオカミを持つエボリューティブゲームという枠組みの中で検討する。特に,偽ニュースの公害リスクが反復的ジレンマの文脈でランダムに割当てられるという仮定の下で,各戦略選択の利得と進化安定戦略(ess)の形成過程に及ぼす効果を数学的にモデル化し分析する。我々は、ゲイン行列の構築から始まって、レプリケータ方程式を用いて進化のダイナミクスをモデル化し、essを同定し、計算過程を詳細に展開する。さらに、異なる初期条件とパラメータ設定の下でシステムの挙動を観察するために数値シミュレーションを行い、偽ニュースの拡散が戦略進化に与える影響をよりよく理解する。この研究は、情報の真正性に関する現代社会の複雑な問題に対する理論的洞察を提供し、進化ゲーム理論の応用範囲を広げる。 We explore the state of information disruption caused by the cocktail party effect within the framework of non-perfect information games and evolutive games with multiple werewolves. In particular, we mathematically model and analyze the effects on the gain of each strategy choice and the formation process of evolutionary stable strategies (ESS) under the assumption that the pollution risk of fake news is randomly assigned in the context of repeated dilemmas. We will develop the computational process in detail, starting with the construction of the gain matrix, modeling the evolutionary dynamics using the replicator equation, and identifying the ESS. In addition, numerical simulations will be performed to observe system behavior under different initial conditions and parameter settings to better understand the impact of the spread of fake news on strategy evolution. This research will provide theoretical insights into the complex issues of contemporary society regarding the authenticity of information and expand the range of applications of evolutionary game theory.	翻訳日:2024-03-01 17:12:15 公開日:2024-02-27
# nisqデバイス上のマルコフおよび非マルコフ単一量子ポーリチャネルの凸混合のディジタルシミュレーション Digital simulation of convex mixtures of Markovian and non-Markovian single qubit Pauli channels on NISQ devices ( http://arxiv.org/abs/2108.11343v3 ) ライセンス: Link先を確認	I J David, I Sinayskiy, and F Petruccione	(参考訳) 量子システムをシミュレートする量子アルゴリズムは、フォールトトレラント設定における古典的アルゴリズムよりも明確で証明可能な優位性を提供する。また、ノイズ中間スケール量子(NISQ)設定における量子アルゴリズムとその実装にも関心がある。これらの設定では、実験を行う際に様々なノイズ源とエラーを考慮しなければならない。近年,NISQ デバイスはオープン量子系をシミュレートするための汎用的なテストベッドとして検証され,単純な量子チャネルのシミュレートに利用されている。我々の目標は、NISQデバイス上の単一キュービットパウリチャネルの凸混合をシミュレーションするより複雑な問題を解決することである。非マルコフチャネル(m+m=nm)を生じるマルコフチャネルの混合物と、マルコフチャネル(nm+nm=m)を生じる非マルコフチャネルの混合物である。第1のケースでは、マルコフ単量子パウリチャネルの混合を考えるが、第2のケースでは、単量子パウリチャネルの特別な場合である非マルコフ単量子分離チャネルの混合を考える。現在利用可能なデバイスのトポロジと現在のデコヒーレンスレベルを考慮に入れた効率的な回路は、我々の回路で使用されるCNOTゲートの数を減らすヒューリスティックなアプローチで構築可能であることを示す。また,プロセスマトリックスを正規化して,プロセストモグラフィーが完全に正かつトレース保存(CPTP)チャネルを生成する方法を提案する。 Quantum algorithms for simulating quantum systems provide a clear and provable advantage over classical algorithms in fault-tolerant settings. There is also interest in quantum algorithms and their implementation in Noisy Intermediate Scale Quantum (NISQ) settings. In these settings, various noise sources and errors must be accounted for when executing any experiments. Recently, NISQ devices have been verified as versatile testbeds for simulating open quantum systems and have been used to simulate simple quantum channels. Our goal is to solve the more complicated problem of simulating convex mixtures of single qubit Pauli channels on NISQ devices. We consider two specific cases: mixtures of Markovian channels that result in a non-Markovian channel (M+M=nM) and mixtures of non-Markovian channels that result in a Markovian channel (nM+nM=M). For the first case, we consider mixtures of Markovian single-qubit Pauli channels; for the second case, we consider mixtures of Non-Markovian single-qubit depolarising channels, which is a special case of the single-qubit Pauli channel. We show that efficient circuits, which account for the topology of currently available devices and current levels of decoherence, can be constructed by heuristic approaches that reduce the number of CNOT gates used in our circuit. We also present a strategy for regularising the process matrix so that the process tomography yields a completely positive and trace-preserving (CPTP) channel.	翻訳日:2024-02-29 19:41:38 公開日:2024-02-27
# 人工手指制御におけるEMGのマルチモーダル融合と人間のグラフインテント推論のためのビジョン Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control ( http://arxiv.org/abs/2104.03893v5 ) ライセンス: Link先を確認	Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif, Sezen Yagmur Gunay, Mariusz P. Furmanek, Mathew Yarossi, Paolo Bonato, Cagdas Onal, Taskin Padir, Deniz Erdogmus, Gunar Schirner	(参考訳) 目的: トランスラジアル・アンプテアの場合、ロボット義手は日常生活活動を行う能力を取り戻すことを約束する。筋電図(EMG)などの生理的信号に基づく現在の制御手法は、運動アーチファクトや筋肉疲労などによる推論結果の低下を引き起こす傾向にある。視覚センサーは環境状態に関する主要な情報源であり、実現可能で意図されたジェスチャーを推測する上で重要な役割を果たす。しかし、視覚証拠は、しばしば物体の閉塞や照明の変化などにより、自身の人工物にも影響を受けやすい。生理的および視覚的センサ計測を用いたマルチモーダルエビデンス融合は、これらのモダリティの相補的な強度による自然なアプローチである。方法:本論文では,ニューラルネットワークモデルにより処理された前腕の視線映像,眼球運動,筋電図を用いた意図推定のためのベイズ証拠融合フレームワークを提案する。我々は、手が物体に近づくと、時間関数として個人と融合のパフォーマンスを分析する。この目的のために、ニューラルネットワークコンポーネントをトレーニングするための新しいデータ処理および拡張技術を開発した。結果:本研究の結果から,核融合は,emg (81.64%非融合) と視覚的証拠 (80.5%非融合) に対して, 到達段階では13.66%, 14.8%の瞬間的把握型分類精度が向上し, 総合的核融合精度は95.3%となった。結論: 実験データ解析の結果,emgと視覚的なエビデンスは相補的な強みを示し,その結果,マルチモーダルなエビデンスの融合は,任意の時点において個々のエビデンスモダリティを上回る可能性がある。 Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.	翻訳日:2024-02-29 19:40:47 公開日:2024-02-27
# 機械学習の自動化 - 原則から実践へ Automated Machine Learning: From Principles to Practices ( http://arxiv.org/abs/1810.13306v5 ) ライセンス: Link先を確認	Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, Quanming Yao	(参考訳) 機械学習(ML)メソッドは急速に発展してきたが、望ましいパフォーマンスを達成するための適切なメソッドの設定と選択はますます困難で面倒である。この課題に対処するため、自動機械学習(AutoML)が登場し、データ駆動方式で与えられたタスクに対して満足いくML構成を生成することを目指している。本稿では,本トピックに関する包括的調査を行う。まずAutoMLの形式的定義から始め,二段階学習目標,学習戦略,理論的解釈などの原則を導入します。そこで我々は,検索空間,探索アルゴリズム,評価戦略という3つの主要な要因に基づいて,既存の作品の分類を設定することでAutoMLの実践を要約する。各カテゴリは代表的な手法で説明される。次に、MLパイプラインの設定、ワンショットニューラルアーキテクチャサーチ、基礎モデルとの統合など、模範的なアプリケーションによる原則とプラクティスについて説明する。最後に、AutoMLの今後の方向性を強調し、調査を締めくくる。 Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on this topic. We begin with the formal definition of AutoML and then introduce its principles, including the bi-level learning objective, the learning strategy, and the theoretical interpretation. Then, we summarize the AutoML practices by setting up the taxonomy of existing works based on three main factors: the search space, the search algorithm, and the evaluation strategy. Each category is also explained with the representative methods. Then, we illustrate the principles and practices with exemplary applications from configuring ML pipeline, one-shot neural architecture search, and integration with foundation models. Finally, we highlight the emerging directions of AutoML and conclude the survey.	翻訳日:2024-02-29 19:40:02 公開日:2024-02-27
# 予測付きオンライン検索:パレート最適化アルゴリズムとエネルギー市場への応用 Online Search with Predictions: Pareto-optimal Algorithm and its Applications in Energy Markets ( http://arxiv.org/abs/2211.06567v2 ) ライセンス: Link先を確認	Russell Lee, Bo Sun, Mohammad Hajiesmaili, John C.S. Lui	(参考訳) 本稿では,揮発性電力市場におけるエネルギー取引の学習型アルゴリズムを開発した。基本的な問題は、競争分析の文献において古典的なオンライン検索問題とみなすことができる、不確実な時間変動価格よりも高い収益(最低コスト)で$k$のエネルギーを売り(または購入)することである。最先端のアルゴリズムは、各タイムスロットで取引決定を行う際に、将来の市場価格に関する知識を前提とせず、最悪の価格シーケンスのパフォーマンスを保証することを目的としている。しかし実際には、機械学習を活用することで、将来の価格の予測が一般的になる。本稿では,オンライン検索問題に対する競合アルゴリズムの設計に機械学習による予測を取り入れることを目的とする。アルゴリズムの重要な特性は、予測が正確である場合(すなわち一貫性)にオフラインアルゴリズムと競合する性能を後から達成し、予測が任意に間違っている場合(すなわち堅牢性)に最悪の保証を提供することである。提案手法は一貫性と頑健性の間のパレート最適トレードオフを実現しており、オンライン検索の他のアルゴリズムでは与えられた頑健性に対する一貫性を改善できない。さらに,電力市場における蓄電支援エネルギー取引を捉えることのできる,より一般的な在庫管理環境にオンライン検索の基本問題を拡張する。実世界のアプリケーションからのトレースを用いた経験的評価では、学習によるアルゴリズムはベンチマークアルゴリズムと比較して平均的な経験的パフォーマンスを改善しつつ、最悪の場合のパフォーマンスも改善しています。 This paper develops learning-augmented algorithms for energy trading in volatile electricity markets. The basic problem is to sell (or buy) $k$ units of energy for the highest revenue (lowest cost) over uncertain time-varying prices, which can framed as a classic online search problem in the literature of competitive analysis. State-of-the-art algorithms assume no knowledge about future market prices when they make trading decisions in each time slot, and aim for guaranteeing the performance for the worst-case price sequence. In practice, however, predictions about future prices become commonly available by leveraging machine learning. This paper aims to incorporate machine-learned predictions to design competitive algorithms for online search problems. An important property of our algorithms is that they achieve performances competitive with the offline algorithm in hindsight when the predictions are accurate (i.e., consistency) and also provide worst-case guarantees when the predictions are arbitrarily wrong (i.e., robustness). The proposed algorithms achieve the Pareto-optimal trade-off between consistency and robustness, where no other algorithms for online search can improve on the consistency for a given robustness. Further, we extend the basic online search problem to a more general inventory management setting that can capture storage-assisted energy trading in electricity markets. In empirical evaluations using traces from real-world applications, our learning-augmented algorithms improve the average empirical performance compared to benchmark algorithms, while also providing improved worst-case performance.	翻訳日:2024-02-29 19:34:21 公開日:2024-02-27
# データ拡張の善、悪、悪の側面:暗黙のスペクトル正規化の観点から The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective ( http://arxiv.org/abs/2210.05021v3 ) ライセンス: Link先を確認	Chi-Heng Lin, Chiraag Kaushik, Eva L. Dyer, Vidya Muthukumar	(参考訳) データ拡張(da)は、現代の機械学習のパフォーマンスを高める強力なワークホースである。コンピュータビジョンにおける翻訳やスケーリングのような特定の拡張は、伝統的に同じ分布から新しい(人工)データを生成することによって一般化を改善すると考えられている。しかし、この伝統的な視点は、トレーニングデータ分布を大きく変える現代の機械学習(ランダム化マスキング、カットアウト、ミックスアップなど)における一般的な拡張の成功を説明できない。本研究では,DAの一般クラスが過度パラメータ化および過度パラメータ化線形モデル一般化に与える影響を特徴付ける新しい理論フレームワークを開発する。 daは2つの異なる効果の組み合わせによって暗黙のスペクトル正規化を誘導する。 a)データ共分散行列の固有値の相対比率を訓練データに依存して操作すること b) リッジ回帰によるデータ共分散行列のスペクトル全体を均一に増加させる。これらの効果は、一般的な拡張に適用すると、過小パラメータと過小パラメータのレジームの一般化における不一致や、回帰と分類のタスクの違いなど、幅広い現象をもたらす。本フレームワークは,DAの一般化に対する微妙な影響と,時として驚くべき影響を強調し,新しい拡張設計のためのテストベッドとして機能する。 Data augmentation (DA) is a powerful workhorse for bolstering performance in modern machine learning. Specific augmentations like translations and scaling in computer vision are traditionally believed to improve generalization by generating new (artificial) data from the same distribution. However, this traditional viewpoint does not explain the success of prevalent augmentations in modern machine learning (e.g. randomized masking, cutout, mixup), that greatly alter the training data distribution. In this work, we develop a new theoretical framework to characterize the impact of a general class of DA on underparameterized and overparameterized linear model generalization. Our framework reveals that DA induces implicit spectral regularization through a combination of two distinct effects: a) manipulating the relative proportion of eigenvalues of the data covariance matrix in a training-data-dependent manner, and b) uniformly boosting the entire spectrum of the data covariance matrix through ridge regression. These effects, when applied to popular augmentations, give rise to a wide variety of phenomena, including discrepancies in generalization between over-parameterized and under-parameterized regimes and differences between regression and classification tasks. Our framework highlights the nuanced and sometimes surprising impacts of DA on generalization, and serves as a testbed for novel augmentation design.	翻訳日:2024-02-29 19:33:11 公開日:2024-02-27
# nora:高連結ハミルトニアンの体積則エンタングル平衡状態に対するテンソルネットワーク ansatz NoRA: A Tensor Network Ansatz for Volume-Law Entangled Equilibrium States of Highly Connected Hamiltonians ( http://arxiv.org/abs/2303.16946v4 ) ライセンス: Link先を確認	Val\'erie Bettaque, Brian Swingle	(参考訳) 平均場量子スピングラスモデルやSachdev-Ye-Kitaev(SYK)モデルのような全対全相互作用を持つ量子モデルの基底状態構造により、体積法則の絡み合いと大きな基底状態の縮退を緩和できるテンソルネットワークアーキテクチャを提案する。このアーキテクチャを非局所再正規化 ansatz (nora) と呼ぶのは、mera、dmera、分岐 meraネットワークの一般化であり、空間的局所性の制約を取り除いているからである。アーキテクチャはSYKモデルの接地空間の絡み合いや複雑さを捉えるのに十分な表現性を持っているため、適切な変分アンザッツとなるが、SYKの詳細な研究は今後の研究に任せる。さらに、テンソルがランダムクリフォードゲートである特別な場合のアーキテクチャについても検討する。ここで、アーキテクチャはランダム安定化コードのエンコーディングマップと見なすことができる。我々はSYKモデルにインスパイアされた一連の符号を導入し、高重量安定器のコストで一定速度と線形距離を選択できることを示した。また、この符号族とSYK基底空間から形成される近似符号との潜在的な類似点についてもコメントする。 Motivated by the ground state structure of quantum models with all-to-all interactions such as mean-field quantum spin glass models and the Sachdev-Ye-Kitaev (SYK) model, we propose a tensor network architecture which can accomodate volume law entanglement and a large ground state degeneracy. We call this architecture the non-local renormalization ansatz (NoRA) because it can be viewed as a generalization of MERA, DMERA, and branching MERA networks with the constraints of spatial locality removed. We argue that the architecture is potentially expressive enough to capture the entanglement and complexity of the ground space of the SYK model, thus making it a suitable variational ansatz, but we leave a detailed study of SYK to future work. We further explore the architecture in the special case in which the tensors are random Clifford gates. Here the architecture can be viewed as the encoding map of a random stabilizer code. We introduce a family of codes inspired by the SYK model which can be chosen to have constant rate and linear distance at the cost of some high weight stabilizers. We also comment on potential similarities between this code family and the approximate code formed from the SYK ground space.	翻訳日:2024-02-29 19:23:05 公開日:2024-02-27
# プライバシー保護データ生成のための微分プライベートニューラルネットワークカーネル Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation ( http://arxiv.org/abs/2303.01687v2 ) ライセンス: Link先を確認	Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li, Mijung Park	(参考訳) 差分的にプライベートなデータ生成において、最大平均差(mmd)は特に有用な距離メトリックである: 有限次元の機能で使用される場合、データの分散を一度に要約し、民営化することができる。このフレームワークにおける重要な質問は、実際のデータ分布と合成データ分布を区別するのに有用な機能と、それが高品質な合成データを生成することができるかどうかである。この研究は、$\textit{neural tangent kernels (NTKs)}$、より正確には$\textit{empirical}$ NTKs (e-NTKs) の機能の使用を検討する。おそらく驚くべきことに、トレーニングされていないe-NTK機能の表現力は、公開データを使って事前トレーニングされた知覚機能から得られる機能と同等である。その結果、いくつかの表や画像のベンチマークデータセットで示されるように、公開データに頼ることなく、他の最先端手法と比較してプライバシーと精度のトレードオフを改善することができる。 Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers the using the features of $\textit{neural tangent kernels (NTKs)}$, more precisely $\textit{empirical}$ NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.	翻訳日:2024-02-29 19:22:12 公開日:2024-02-27
# 情報理論による量子学習のための最適下界 Optimal lower bounds for Quantum Learning via Information Theory ( http://arxiv.org/abs/2301.02227v3 ) ライセンス: Link先を確認	Shima Bab Hadiashar, Ashwin Nayak, Pulkit Sinha	(参考訳) Arunachalam and de Wolf (JMLR, 2018) は、量子学習者は量子PACとAgnostic学習モデルにおける古典的なものよりも漸近的に効率が良くないことを証明した。彼らは量子状態の同定とフーリエ解析によってサンプルの複雑さの低い境界を確立した。本稿では,PACモデルと非依存モデルの両方において,情報理論的手法を用いて量子サンプル複雑性の最適下界を導出する。証明は間違いなく単純であり、同じアイデアは量子学習理論における他の問題の最適境界を導出するためにも使用できる。次に、確率論の古典的な問題であるクーポンコレクタ問題(英語版)の量子アナログ(英語版)に目を向け、pac学習の研究においても重要である。 Arunachalam, Belovs, Childs, Kothari, Rosmanis, de Wolf (TQC, 2020) は、この問題の量子サンプルの複雑さを一定要素まで特徴づけた。まず,上述した情報理論のアプローチが最適下界を導出しないことを示す。副産物として、任意の高次元の純粋な状態の自然なアンサンブルが得られ、それは(同時に)容易に区別できない。第二に、情報理論的なアプローチは問題の近似変種に対する漸近的最適境界をもたらすことを発見した。最後に,量子クーポンコレクタ問題に対するよりシャープな下界を,アンサンブルの識別性に基づく一般化されたホレボ・カルランダー境界を通じて導出する。量子クーポンコレクター問題のすべての側面は、関連するグラマー行列のスペクトルの性質に残っており、これは独立な関心を持つかもしれない。 Although a concept class may be learnt more efficiently using quantum samples as compared with classical samples in certain scenarios, Arunachalam and de Wolf (JMLR, 2018) proved that quantum learners are asymptotically no more efficient than classical ones in the quantum PAC and Agnostic learning models. They established lower bounds on sample complexity via quantum state identification and Fourier analysis. In this paper, we derive optimal lower bounds for quantum sample complexity in both the PAC and agnostic models via an information-theoretic approach. The proofs are arguably simpler, and the same ideas can potentially be used to derive optimal bounds for other problems in quantum learning theory. We then turn to a quantum analogue of the Coupon Collector problem, a classic problem from probability theory also of importance in the study of PAC learning. Arunachalam, Belovs, Childs, Kothari, Rosmanis, and de Wolf (TQC, 2020) characterized the quantum sample complexity of this problem up to constant factors. First, we show that the information-theoretic approach mentioned above provably does not yield the optimal lower bound. As a by-product, we get a natural ensemble of pure states in arbitrarily high dimensions which are not easily (simultaneously) distinguishable, while the ensemble has close to maximal Holevo information. Second, we discover that the information-theoretic approach yields an asymptotically optimal bound for an approximation variant of the problem. Finally, we derive a sharper lower bound for the Quantum Coupon Collector problem, via the generalized Holevo-Curlander bounds on the distinguishability of an ensemble. All the aspects of the Quantum Coupon Collector problem we study rest on properties of the spectrum of the associated Gram matrix, which may be of independent interest.	翻訳日:2024-02-29 19:20:53 公開日:2024-02-27
# コード言語モデルが学んだことを理解する Towards Understanding What Code Language Models Learned ( http://arxiv.org/abs/2306.11943v2 ) ライセンス: Link先を確認	Toufique Ahmed, Dian Yu, Chengxuan Huang, Cathy Wang, Prem Devanbu, Kenji Sagae	(参考訳) 事前学習された言語モデルは、様々な自然言語タスクにおいて有効であるが、その能力は、言語の意味や理解を完全に学習するものではないと論じられている。言語モデルがどのような意味を学べるかを理解するために、表面周波数や共起を超越したコードの意味を捉える能力について検討する。言語的特徴の探索モデルに関するこれまでの研究とは対照的に,事前学習されたモデルについて,モデルの意味論を学習する能力の客観的かつ分かりやすい評価を可能にする設定で検討する。本稿では,そのようなモデルがコードの意味を正確に定式化しているかどうかを検討する。コードフラグメントの操作に関する実験を通じて、事前学習されたコードのモデルが、フォームの表層的特徴を超えた、コードの計算的意味論の堅牢な表現を学ぶことを示す。 Pre-trained language models are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which language models can learn some form of meaning, we investigate their ability to capture semantics of code beyond superficial frequency and co-occurrence. In contrast to previous research on probing models for linguistic features, we study pre-trained models in a setting that allows for objective and straightforward evaluation of a model's ability to learn semantics. In this paper, we examine whether such models capture the semantics of code, which is precisely and formally defined. Through experiments involving the manipulation of code fragments, we show that code pre-trained models of code learn a robust representation of the computational semantics of code that goes beyond superficial features of form alone	翻訳日:2024-02-29 19:02:14 公開日:2024-02-27
# 区間境界伝搬による認定訓練の理解 Understanding Certified Training with Interval Bound Propagation ( http://arxiv.org/abs/2306.10426v2 ) ライセンス: Link先を確認	Yuhao Mao, Mark Niklas M\"uller, Marc Fischer, Martin Vechev	(参考訳) 堅牢性検証の手法がより正確になるにつれて、堅牢性のあるニューラルネットワークのトレーニングがますます重要になっている。この目的のために、認定トレーニングメソッドは、堅牢性仕様よりも最悪のケース損失の上限を計算し、最適化する。皮肉なことに、不正確な間隔境界伝播(IBP)に基づく訓練法は、より正確なバウンディング法を利用する方法よりも一貫して優れている。しかし、我々はippを成功させるメカニズムについて理解していない。本研究は,IPP境界の密度を計測する新しい測定基準を利用して,これらのメカニズムを徹底的に検討する。まず, ディープリニアモデルでは, 初期化時の幅と深さでタイトネスが減少するが, ネットワーク幅が十分であればippトレーニングにより改善することを示す。そして,IPP境界の重量行列に関する十分かつ必要な条件を導出し,これらが厳密な正則化を課していることを示し,認定トレーニングにおける堅牢性と精度のトレードオフを実証的に検証した。広範囲な実験により,ReLUネットワークの理論的予測が検証され,ネットワークの性能が向上し,最先端の結果が得られた。興味深いことに、全てのIPPベースのトレーニング手法は、高い厳密性をもたらすが、高い認証性を達成するには不十分であり、必要ではない。このことは、厳密なIPB境界に必要な強い正規化を誘発しない新たなトレーニング方法の存在を示唆しており、堅牢性と標準精度の向上につながっている。 As robustness verification methods are becoming more precise, training certifiably robust neural networks is becoming ever more relevant. To this end, certified training methods compute and then optimize an upper bound on the worst-case loss over a robustness specification. Curiously, training methods based on the imprecise interval bound propagation (IBP) consistently outperform those leveraging more precise bounding methods. Still, we lack an understanding of the mechanisms making IBP so successful. In this work, we thoroughly investigate these mechanisms by leveraging a novel metric measuring the tightness of IBP bounds. We first show theoretically that, for deep linear models, tightness decreases with width and depth at initialization, but improves with IBP training, given sufficient network width. We, then, derive sufficient and necessary conditions on weight matrices for IBP bounds to become exact and demonstrate that these impose strong regularization, explaining the empirically observed trade-off between robustness and accuracy in certified training. Our extensive experimental evaluation validates our theoretical predictions for ReLU networks, including that wider networks improve performance, yielding state-of-the-art results. Interestingly, we observe that while all IBP-based training methods lead to high tightness, this is neither sufficient nor necessary to achieve high certifiable robustness. This hints at the existence of new training methods that do not induce the strong regularization required for tight IBP bounds, leading to improved robustness and standard accuracy.	翻訳日:2024-02-29 19:01:36 公開日:2024-02-27
# 行列積状態における有限絡み合いスケーリングからの創発的等角境界 Emergent conformal boundaries from finite-entanglement scaling in matrix product states ( http://arxiv.org/abs/2306.08163v2 ) ライセンス: Link先を確認	Rui-Zhen Huang, Long Zhang, Andreas M. L\"auchli, Jutho Haegeman, Frank Verstraete, and Laurens Vanderstraeten	(参考訳) 行列積状態(mps)を用いた有限絡み合いスケーリングは、1+1次元臨界格子理論、特に創発的共形対称性を研究する上で重要な道具となっている。有限絡み合いは、臨界理論に関連する変形をもたらすと主張する。結果として、MPSから定義される二部交絡ハミルトニアンは、物理的および絡み合い境界を持つ境界共形場理論として理解することができる。物理共形境界条件を設計するためにMPSの対称性特性を利用することができる。一方、絡み合い境界はコンクリート格子モデルと関連しており、この関連する摂動の下では不変である。 Ising, Potts, and free compact boson CFTs によって記述された臨界格子モデルを用いて、交絡スペクトルの対称性と関連する変形が共形境界に与える影響を説明する。 The use of finite entanglement scaling with matrix product states (MPS) has become a crucial tool for studying 1+1d critical lattice theories, especially those with emergent conformal symmetry. We argue that finite entanglement introduces a relevant deformation in the critical theory. As a result, the bipartite entanglement Hamiltonian defined from the MPS can be understood as a boundary conformal field theory with a physical and an entanglement boundary. We are able to exploit the symmetry properties of the MPS to engineer the physical conformal boundary condition. The entanglement boundary, on the other hand, is related to the concrete lattice model and remains invariant under this relevant perturbation. Using critical lattice models described by the Ising, Potts, and free compact boson CFTs, we illustrate the influence of the symmetry and the relevant deformation on the conformal boundaries in the entanglement spectrum.	翻訳日:2024-02-29 19:00:34 公開日:2024-02-27
# プログラムセマンティックス学習のためのコード対称性の展開 Exploiting Code Symmetries for Learning Program Semantics ( http://arxiv.org/abs/2308.03312v6 ) ライセンス: Link先を確認	Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana	(参考訳) 本稿では,Large Language Models (LLM) にコードセマンティクスを教えることの課題に,モデルアーキテクチャにコード対称性を組み込むことで対処する。我々は,コードの対称性を意味論保存変換(semantics-preserving transformations)として定義するグループ理論的フレームワークを導入する。提案手法であるsymcは,プログラム依存グラフ上で定義される置換群からコード対称性に同値な,新しい自己着脱型を開発した。 SymCは5つのプログラム解析タスクにおいて優れた性能を示し、事前トレーニングなしでGPT-4を含む最先端のコードモデルより優れている。この結果から,コード対称性群を経由したコード構造を符号化するコードLLMが,より高速に一般化されることが示唆された。 This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models, including GPT-4, without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.	翻訳日:2024-02-29 18:50:56 公開日:2024-02-27
# 確率的勾配降下の異なる性質について On the different regimes of Stochastic Gradient Descent ( http://arxiv.org/abs/2309.10688v4 ) ライセンス: Link先を確認	Antonio Sclocchi and Matthieu Wyart	(参考訳) 現代のディープネットワークは、各ステップまたはバッチサイズで考慮されるデータ数、ステップサイズまたは学習レートが$\eta$である確率勾配降下(SGD)を用いて訓練されている。小さい$B$と大きな$\eta$の場合、SGDはパラメータの確率的進化に対応し、そのノイズ振幅は'温度'の$T\equiv \eta/B$で制御される。しかし、この記述は、十分に大きなバッチに対して$B\geq B^$で分解するか、温度が十分に小さい場合には勾配降下(GD)を単純化する。これらのクロスオーバーの場所を理解することは、依然として中心的な課題である。本稿では,教師が指導するパーセプトロン分類モデルに対して,これらの疑問を解き,その鍵となる予測が深層ネットワークにも応用できることを示す。具体的には、3つの動的位相を分離する$b$-$\eta$ 平面の位相図を得る。 (i)温度が支配する騒音支配SGD。 (ii)大第1段支配SGD及び (iii)gd。これらの異なる位相はまた、一般化誤差の異なる状態に対応する。興味深いことに、我々の分析はバッチサイズが$B^$分離レギュレータであることを明らかにする。 (i)および (ii) 分類問題の難しさを特徴付ける指数を用いて、トレーニングセットのサイズが$p$であるスケール。 Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the ''temperature'' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here, we resolve these questions for a teacher-student perceptron classification model and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: (i) a noise-dominated SGD governed by temperature, (ii) a large-first-step-dominated SGD and (iii) GD. These different phases also correspond to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^$ separating regimes (i) and (ii) scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.	翻訳日:2024-02-29 18:40:24 公開日:2024-02-27
# ロングテール探索:論理ルールガイド探索によるロングテール推論知識の体系的生成 In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search ( http://arxiv.org/abs/2311.07237v2 ) ライセンス: Link先を確認	Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Wenting Zhao, Faeze Brahman, Yejin Choi, Xiang Ren	(参考訳) 最先端のllmは自然言語推論のような推論タスクで人間を上回っている。 LLMを評価する最近の研究は、低確率分布、すなわち、ロングテールからの入力データに対する顕著な性能低下に注目している。そこで我々は,LLMを推論空間でより効果的に評価するために,長い尾の推論知識を含む文を体系的に生成することに焦点を当てた。まず,シンボリックルールテンプレートに基づく事実的正確かつロングテールな知識文を生成する,新しいフレームワーク論理誘導知識検索(link)を提案する。linkは,ゼロショットトリガーllmが到達できないロングテール分布のデータを効果的に生成し,事実的正確性においてゼロショットgpt4を5%上回る。さらに、LINKが生成したデータを用いて、LINT(Logic-induced-Long-Tail)というデータセットを構築し、LINTには4つのドメインにまたがる108Kの知識文が含まれている。我々は,lintを用いて詳細な分類タスクでllmをテストした結果,モデル性能が頭部分布に比べて最大5%低下することを確認した。本研究は,ロングテール分布におけるモデル評価の有用性を示し,ロングテール分布における評価データ生成に関するさらなる研究を求めるものである。 State-of-the-art LLMs outperform humans on reasoning tasks such as Natural Language Inference. Recent works evaluating LLMs note a marked performance drop on input data from the low-probability distribution, i.e., the longtail. Therefore, we focus on systematically generating statements involving long-tail inferential knowledge for more effective evaluation of LLMs in the reasoning space. We first propose a novel framework Logic-Induced- Knowledge-Search (LINK) that generates factually correct and long-tail knowledge statements grounded on symbolic rule templates; LINK effectively generates data in the longtail distribution that zero-shot prompted LLMs are unable to reach, and outperforms zero-shot GPT4 on factual correctness by 5%. We further use the data generated by LINK to construct a dataset Logic-Induced-Long-Tail (LINT) that can be used to evaluate downstream models on the long-tail distribution; LINT contains 108K knowledge statements spanning four domains. We use LINT to test LLMs on an entailment classification task and find that model performances drop by as high as 5% in the long-tail distribution compared to head distribution. Our work shows the utility of evaluating models in the long-tail distribution, and calls for more research on generating evaluation data in the long-tail distribution.	翻訳日:2024-02-29 18:34:09 公開日:2024-02-27
# 神経陰影表現における単眼カメラの連続ポーズ Continuous Pose for Monocular Cameras in Neural Implicit Representation ( http://arxiv.org/abs/2311.17119v2 ) ライセンス: Link先を確認	Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool	(参考訳) 本稿では,時間的連続的な機能として単眼カメラポーズの最適化の有効性を示す。カメラポーズは、所定の時刻を対応するカメラポーズにマッピングする暗黙のニューラル関数を使用して表現される。マッピングされたカメラポーズは、ジョイントカメラポーズ最適化が必要な下流タスクに使用される。その際、暗黙的にカメラポーズを表すネットワークパラメータが最適化される。提案手法は,(1)ノイズのあるポーズからのNeRF,(2)非同期イベントからのNeRF,(3)視覚的局所化とマッピング(vSLAM),(4)VSLAMとIMUの4つの異なる実験環境において有効である。これら4つの設定において,提案手法は比較したベースラインや最先端手法よりも性能が優れている。さらに、連続運動の仮定を用いて、ポーズの変化は実際には6度以下の自由度(DOF)を持つ多様体に存在することができる。我々はこの低DOF動作表現を \emph{intrinsic motion} と呼び、vSLAM設定でこのアプローチを使用し、カメラ追跡性能を高く評価した。 In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters -- that implicitly represent camera poses -- are optimized. We exploit the proposed method in four diverse experimental settings, namely, (1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all four settings, the proposed method performs significantly better than the compared baselines and the state-of-the-art methods. Additionally, using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is also realized. We call this low DOF motion representation as the \emph{intrinsic motion} and use the approach in vSLAM settings, showing impressive camera tracking performance.	翻訳日:2024-02-29 18:24:58 公開日:2024-02-27
# 合成データを用いた学習のための品質多様性生成サンプリング Quality-Diversity Generative Sampling for Learning with Synthetic Data ( http://arxiv.org/abs/2312.14369v2 ) ライセンス: Link先を確認	Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matari\'c, Stefanos Nikolaidis	(参考訳) 生成モデルは、合成トレーニングデータセットを作成することによって、実際のデータソースのサロゲートとして機能することができる。合成トレーニングデータセットを生成する際の品質と多様性の保護に注力する。バイアス発生器から得られるデータにもかかわらず、ユーザ定義測度空間を均一にサンプリングするフレームワークである品質多様性生成サンプリング(QDGS)を提案する。 qdgsはモデルに依存しないフレームワークで、生成モデルを微調整することなく、合成によって生成されたデータの多様性の尺度で品質目標を最適化する。 qdgsが生成するバランスのとれた合成データセットを用いて,まず,カラーバイアス形状データセットで学習した識別器を概念実証としてデバイアスする。顔データ合成にqdgを適用することで、肌の色調や年齢といった所望の意味概念を駆使して、視覚特徴のブレンドを組み合わせた交叉データセットを作成する。このバランスの取れたデータを分類器のトレーニングに利用することで、顔認識ベンチマークの精度を維持しながら公平性が向上する。コードはhttps://github.com/cylumn/qd-generative-sampling。 Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: https://github.com/Cylumn/qd-generative-sampling.	翻訳日:2024-02-29 18:15:08 公開日:2024-02-27
# スケーラブルで高速なシミュレーションベース推論のための一貫性モデル Consistency Models for Scalable and Fast Simulation-Based Inference ( http://arxiv.org/abs/2312.05440v2 ) ライセンス: Link先を確認	Marvin Schmitt, Valentin Pratz, Ullrich K\"othe, Paul-Christian B\"urkner, Stefan T Radev	(参考訳) シミュレーションベース推論(sbi)は、ノイズデータから複雑なモデルのパラメータを正確に推測するために、より表現力のあるアルゴリズムを常に探している。本稿では,ニューラルリテラル推定(CMPE)のための一貫性モデルを提案する。 cmpeは、フローとフローマッチングメソッドを単一の生成アーキテクチャに正規化することの利点を組み合わせる: 本質的には、連続的な確率フローを蒸留し、推定問題の構造に合わせた制約のないアーキテクチャで、短時間の少数ショット推論を可能にする。以上の結果から,cmpeは3つの難易度低次元問題に対する最先端アルゴリズムに勝るだけでなく,高次元ベイズ雑音発生実験や多次元腫瘍スフェロイド増殖モデルにおける競合性能も有することが示された。 Simulation-based inference (SBI) is constantly in search of more expressive algorithms for accurately inferring the parameters of complex models from noisy data. We present consistency models for neural posterior estimation (CMPE), a new free-form conditional sampler for scalable, fast, and amortized SBI with generative neural networks. CMPE combines the advantages of normalizing flows and flow matching methods into a single generative architecture: It essentially distills a continuous probability flow and enables rapid few-shot inference with an unconstrained architecture that can be tailored to the structure of the estimation problem. Our empirical evaluation demonstrates that CMPE not only outperforms current state-of-the-art algorithms on three hard low-dimensional problems but also achieves competitive performance in a high-dimensional Bayesian denoising experiment and in estimating a computationally demanding multi-scale model of tumor spheroid growth.	翻訳日:2024-02-29 18:11:36 公開日:2024-02-27
# アダマール門は普遍量子計算における資源状態に置き換えられない The Hadamard gate cannot be replaced by a resource state in universal quantum computation ( http://arxiv.org/abs/2312.03515v3 ) ライセンス: Link先を確認	Benjamin D. M. Jones, Noah Linden and Paul Skrzypczyk	(参考訳) 固定資源の量子状態上で実行される演算を含む量子計算のモデルを考える。このパラダイムに適合する例としては、マジックステートインジェクションと測定ベースのアプローチがある。これらのケースを両方組み込んだフレームワークを導入し、アダマール門の例に示すように、この文脈におけるコヒーレンス(あるいは重ね合わせ)の役割に焦点を当てる。不整合ユニタリ(CNOT、対角ゲートなど計算基底状態から重ね合わせを生成できないもの)、古典的制御、計算基底測定、および任意の資源的な補助状態(任意の次元の)へのアクセスが与えられた場合、コヒーレントユニタリ(例えばアダマール)を非ゼロ確率で正確に実装することは不可能である。また、上記の演算と$n$ hadamardゲートの間の誘導トレース距離の下限を提供することにより、近似の場合を考える。この結果の安定性を示すために、$k$ Hadamard gatesを使用して$n>k$ Hadamard gatesを正確に実装する場合、同様のno-go結果に拡張する。 We consider models of quantum computation that involve operations performed on some fixed resourceful quantum state. Examples that fit this paradigm include magic state injection and measurement-based approaches. We introduce a framework that incorporates both of these cases and focus on the role of coherence (or superposition) in this context, as exemplified through the Hadamard gate. We prove that given access to incoherent unitaries (those that are unable to generate superposition from computational basis states, e.g. CNOT, diagonal gates), classical control, computational basis measurements, and any resourceful ancillary state (of arbitrary dimension), it is not possible to implement any coherent unitary (e.g. Hadamard) exactly with non-zero probability. We also consider the approximate case by providing lower bounds for the induced trace distance between the above operations and $n$ Hadamard gates. To demonstrate the stability of this result, this is then extended to a similar no-go result for the case of using $k$ Hadamard gates to exactly implement $n>k$ Hadamard gates.	翻訳日:2024-02-29 18:11:15 公開日:2024-02-27
# 量子非局所性:自然はどのように行うのか? Quantum Nonlocality: how does Nature do it? ( http://arxiv.org/abs/2402.00725v2 ) ライセンス: Link先を確認	Marian Kupczynski	(参考訳) science誌のニコラス・ギシンは、量子相関は外部の時空から生じると主張した。それらは時空対称性によるものであると説明する。本稿は,最近の多くの論文に見られるメタフィジカルな結論の批判的レビューである。文脈性、アインシュタイン因果性、世界対称性の重要性を主張する。ベル検定は局所的な隠れ変数モデルによって与えられる確率的結合を拒絶するだけを許すが、量子非局所性や互いに知っている物体について、たとえ大きな距離で分離されたとしても、メタフィジカルな推測を正当化しない。物理学や認知科学におけるベルの不平等の違反は、ボーアの文脈性の概念を用いて説明できる。様々な実験的な文脈を記述する文脈変数が確率モデルに正しく組み込まれている場合、ベルとchshの不等式は証明できず、非局所相関は直感的に説明できる。我々はまた、統計的独立の仮定の意味を誤って「自由選択」、「測定独立」、「陰謀なし」と解き明かす。相関は因果関係を示唆しないので、統計的独立の違反は文脈性と呼ばれ、実験者の選択の自由を制限するものではない。したがって、信念に反して、選択ループの自由を閉じることは文脈的ループを閉じるものではない。 In his article in Science, Nicolas Gisin claimed that quantum correlations emerge from outside space time. We explain that they are due to space time symmetries. This paper is a critical review of metaphysical conclusions found in many recent articles. It advocates the importance of contextuality, Einstein causality and global symmetries. Bell tests allow only rejecting probabilistic coupling provided by a local hidden variable model, but they do not justify metaphysical speculations about quantum nonlocality and objects which know about each other state, even when separated by large distances. The violation of Bell inequalities in physics and in cognitive science can be explained using the notion of Bohr contextuality. If contextual variables, describing varying experimental contexts, are correctly incorporated into a probabilistic model, then the Bell and CHSH inequalities cannot be proven and nonlocal correlations may be explained in an intuitive way. We also elucidate the meaning of statistical independence assumption incorrectly called free choice, measurement independence or no conspiracy. Since correlation does not imply causation, the violation of statistical independence should be called contextuality and it does not restrict the experimenter freedom of choice. Therefore, contrary to what is believed, closing the freedom of choice loophole does not close the contextuality loophole.	翻訳日:2024-02-29 18:04:57 公開日:2024-02-27
# 空洞支援量子メモリにおける光子の高速貯蔵 Fast storage of photons in cavity-assisted quantum memories ( http://arxiv.org/abs/2401.17394v2 ) ライセンス: Link先を確認	Johann S. Kollath-B\"onig, Luca Dellantonio, Luigi Giannelli, Tom Schmit, Giovanna Morigi and Anders S. S{\o}rensen	(参考訳) 理想的なフォトニック量子メモリは、任意の光パルスを単位効率で保存することができる。これは、パルスがメモリの帯域幅よりも長い時間を持つ断熱的な状態で動作する必要がある。短パルスの非断熱的な状態においては、記憶は不完全であり、情報は常に失われる。光キャビティ内に閉じ込められた個々の原子、またはそのアンサンブルに基づくセットアップの帯域制限を理論的に検討する。パルスの持続時間によらず,記憶・検索プロセスの効率を最適化するための効果的な戦略を明らかにする。本プロトコルは, ほぼ完全に解析的に導出され, 数値最適化により得られたプロトコルよりも効率が良い。さらに,本研究は,いくつかのレシエーションにおける量子メモリの性能に関する理解を深めた。無限の時間間隔で定義されるパルスを考えるとき、その形は漸近的な振る舞いによって2つのカテゴリに分けられる。パルスの強度が指数関数よりも遅くなり、あるいは指数関数として増加すると、記憶効率はパルス幅によってのみ制限される。一方、有限間隔で定義されたパルスに対して、効率は記憶の開始時の形状、または検索プロセスの終了時の形状によって決定される。 Ideal photonic quantum memories can store arbitrary pulses of light with unit efficiency. This requires operating in the adiabatic regime, where pulses have a duration much longer than the bandwidth of the memory. In the non-adiabatic regime of short pulses, memories are therefore imperfect, and information is always lost. We theoretically investigate the bandwidth limitations for setups based on individual atoms, or ensembles thereof, confined inside optical cavities. We identify an effective strategy for optimizing the efficiencies of the storage and retrieval process regardless of the duration of the pulses. Our protocol is derived almost completely analytically and attains efficiencies better than or comparable to those obtained by numerical optimization. Furthermore, our results provide an improved understanding of the performance of quantum memories in several regimes. When considering pulses defined on an infinite time interval, the shapes can be divided into two categories, depending on their asymptotic behaviours. If the intensity of the pulse increases with time slower than or as an exponential function, then the storage efficiency is only limited by the pulse width. For pulses defined on a finite interval, on the other hand, the efficiency is determined by the shape at the beginning of the storage or, correspondingly, at the end of the retrieval process.	翻訳日:2024-02-29 18:04:25 公開日:2024-02-27
# 強化学習エージェントにおける創発的支配階層 Emergent Dominance Hierarchies in Reinforcement Learning Agents ( http://arxiv.org/abs/2401.12258v4 ) ライセンス: Link先を確認	Ram Rachum, Yonatan Nakar, Bill Tomlinson, Nitay Alon, Reuth Mirsky	(参考訳) 現代の強化学習(RL)アルゴリズムは、様々なタスクにおいて人間より優れている。マルチエージェント強化学習(MARL)の設定には新たな課題があり、エージェントの混合モチベーションにおける協調の成功は、個人とグループ間の微妙なバランスをとる行為に依存する。社会的慣習や規範は、しばしば人間の制度に触発され、このバランスを取るための道具として用いられる。本稿では,動物社会と人間社会の連携の基盤となる,基礎的でよく研究された社会慣行について考察する。我々は、支配階層の倫理理論を人工エージェントに適用し、確立された用語と定義を可能な限り少ない修正で借用する。明示的なプログラミングや本質的な報酬なしに活動するRLエージェントの集団は、新しい集団に支配階層を発明し、学習し、強制し、伝達することができることを示す。支配的な階層構造は、鶏、マウス、魚、その他の種で研究されるものと類似した構造を持つ。 Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance. In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies. We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species.	翻訳日:2024-02-29 18:03:04 公開日:2024-02-27
# 何故かという問いに対するダイナミックな見方 A Dynamical View of the Question of Why ( http://arxiv.org/abs/2402.10240v2 ) ライセンス: Link先を確認	Mehdi Fatemi and Sindhu Gowda	(参考訳) 確率過程によって生成される多変量時系列データにおける因果推論に対処する。既存のアプローチは静的な設定に限られており、時間の経過とともに変化の連続性と放出が無視される。対照的に、時間経過中の事象間の因果関係を直接確立する学習パラダイムを提案する。因果関係を計算し,強化学習問題として扱うための2つの重要な補題を提案する。本手法は拡散過程における因果関係の解明と定量化のための公式および計算ツールを提供し,離散時間マルコフ決定過程などの重要な設定を仮定する。最後に、かなり複雑な実験とせん断学習によって、我々のフレームワークは因果関係を明らかにし、定量化する。 We address causal reasoning in multivariate time series data generated by stochastic processes. Existing approaches are largely restricted to static settings, ignoring the continuity and emission of variations across time. In contrast, we propose a learning paradigm that directly establishes causation between events in the course of time. We present two key lemmas to compute causal contributions and frame them as reinforcement learning problems. Our approach offers formal and computational tools for uncovering and quantifying causal relationships in diffusion processes, subsuming various important settings such as discrete-time Markov decision processes. Finally, in fairly intricate experiments and through sheer learning, our framework reveals and quantifies causal links, which otherwise seem inexplicable.	翻訳日:2024-02-29 17:54:43 公開日:2024-02-27
# 格子ハミルトニアンと量子プロセッサ内のひずみ相互作用 Lattice Hamiltonians and Stray Interactions Within Quantum Processors ( http://arxiv.org/abs/2402.09145v2 ) ライセンス: Link先を確認	Xuexin Xu, Manabputra, Chlo\'e Vignes, Mohammad H. Ansari and John Martinis	(参考訳) ストライカップリングとして知られる量子ビット間の意図しない相互作用はゲート操作に悪影響を及ぼし、エラーを引き起こす。本研究は,格子ハミルトニアンを量子回路設計に組み込むことの重要性を強調した。 3体と2体のストレイカップリングの強度を比較することで、2量子ビットゲートの忠実性を高めるのに役立つ非自明な回路パラメータドメインを同定する。さらに、量子コンピューティングに関連するパラメータ空間内での2体ZZZ相互作用を3体ZZZ相互作用が超越する事例を示し、量子コンピューティング技術の進歩に不可欠な新しいマルチキュービットゲートの設計に格子ハミルトニアンを用いることの可能性を示した。 Unintended interactions between qubits, known as stray couplings, negatively impact gate operations, leading to errors. This study highlights the significance of incorporating the lattice Hamiltonian into quantum circuit design. By comparing the intensity of three-body versus two-body stray couplings, we identify non-trivial circuit parameter domains that help to enhance fidelity of two-qubit gates. Additionally, we demonstrate instances where three-body ZZZ interactions surpass two-body ZZ interactions within the parameter space relevant to quantum computing, indicating the potential use of lattice Hamiltonian for designing novel multi-qubit gates essential for advancing quantum computing technologies.	翻訳日:2024-02-29 17:53:43 公開日:2024-02-27
# ビデオは効果的に使っていない: 更新されたドメイン適応ビデオセグメンテーションベースライン We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline ( http://arxiv.org/abs/2402.00868v3 ) ライセンス: Link先を確認	Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Prithvijit Chattopadhyay, Judy Hoffman, Viraj Prabhu	(参考訳) セマンティックセグメンテーション(DAS)のための教師なしドメイン適応には、ラベル付きソースドメインからラベル付きターゲットドメインへのイメージに基づいてトレーニングされたモデルを適応させようとする多くの作業がある。以前の研究の大半はフレームレベルの画像DAS問題としてこれを研究してきたが、ビデオDASでは隣接するフレームに存在する時間信号をさらに活用しようと試みている。しかし、Video-DASの研究は歴史的にImage-DASとは異なるベンチマークのセットを最小のベンチマークで研究してきた。この作業では、このギャップに対処します。驚いたことに、(1)データとモデルアーキテクチャを慎重に制御した後でも、(HRDAとHRDA+MIC)ビデオDAS手法は、確立されたビデオDASベンチマーク(+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq)において、(HRDAとHRDA+MIC)ビデオDAS手法よりも優れており、(2)Image-DASとVideo-DAS技術の組み合わせはデータセット間の限界改善にしか至らない。 Image-DAS と Video-DAS のサイロ化の進展を避けるため、我々は、共通のベンチマークで Video-DAS と Image-DAS メソッドの包括的なセットをサポートするコードベースをオープンソース化した。コードはhttps://github.com/simarkareer/unifiedvideodaで利用可能 There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA	翻訳日:2024-02-29 17:50:49 公開日:2024-02-27
# バックドア強化アライメントによる細調整ジェイルブレイク攻撃の軽減 Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment ( http://arxiv.org/abs/2402.14968v2 ) ライセンス: Link先を確認	Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao	(参考訳) GPT-4やLlama-2のようなLarge Language Models(LLMs)の一般的な機能にもかかわらず、これらのモデルは、特定のビジネス要求を満たすため、カスタマイズされたデータによる微調整や適応を要求する。しかし、このプロセスは必然的に新しい安全性の脅威をもたらし、特にFJAttack(Fin-tuning based Jailbreak Attack)に対して、ファインチューニングデータセットにいくつかの有害な例を組み込むことで、モデルの安全性を著しく損なう可能性がある。安全上の問題を軽減するために、微調整データセットに安全性サンプルを組み込むことで、潜在的な防御策が提案されているが、このようなアプローチでは相当量の安全性サンプルを組み込むことが必要となり、効率が悪くなる。安全事例が限られているFJAttackに対して効果的に防御するために,バックドアアタックの概念に類推されたバックドア強化安全アライメント手法を提案する。特に,安全事例に先行する「バックドアトリガー」として機能するシークレットプロンプトを統合することで,プレフィックス付き安全事例を構築した。我々の総合的な実験は、バックドア強化安全アライメント(Backdoor Enhanced Safety Alignment)により、最大11個のプレフィックス付き安全サンプルを追加することで、悪意ある微調整 LLM が元のアライメントモデルと同様の安全性性能を達成することを実証している。さらに,FJAttackの例と微調整タスクデータの両方からなる微調整データにより,より実用的な方法で本手法の有効性を検討する。本手法は,FJAttackに対する防御において,微調整タスクの性能を損なうことなく有効性を示す。 Despite the general capabilities of Large Language Models (LLMs) like GPT-4 and Llama-2, these models still request fine-tuning or adaptation with customized data when it comes to meeting the specific business demands and intricacies of tailored use cases. However, this process inevitably introduces new safety threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack), where incorporating just a few harmful examples into the fine-tuning dataset can significantly compromise the model safety. Though potential defenses have been proposed by incorporating safety examples into the fine-tuning dataset to reduce the safety issues, such approaches require incorporating a substantial amount of safety examples, making it inefficient. To effectively defend against the FJAttack with limited safety examples, we propose a Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, we construct prefixed safety examples by integrating a secret prompt, acting as a "backdoor trigger", that is prefixed to safety examples. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models. Furthermore, we also explore the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data. Our method shows great efficacy in defending against FJAttack without harming the performance of fine-tuning tasks.	翻訳日:2024-02-29 17:45:01 公開日:2024-02-27
# インフォーマル論理を用いた体系的分解型自然言語推論の強化 Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic ( http://arxiv.org/abs/2402.14798v2 ) ライセンス: Link先を確認	Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme	(参考訳) 現代言語モデルは、不安定な形式論理に頼ることなく、直感的で証明的なテキストエンターメントツリーの構築と評価のような、テキストによる構造化推論の新しい機会を可能にする。しかし、この方向の進行は、有効な構成内容を決定するための明確なプロトコルの欠如によって妨げられている。この欠如は、現代のニューロシンボリックエンジンによるノイズデータセットと限られた性能向上を引き起こす。これらの問題に対処するため,分解包含データセットのアノテートに対する一貫した理論的なアプローチを定式化し,LLMに基づくテキスト推論への影響を評価する。その結果得られたデータセットであるrdte( decompositional textual entailment の認識)は,従来よりもかなり高い内部一貫性(+9%)を持つことが分かった。また,知識蒸留によるRDTE指向の係り受け分類器の訓練や,現代のニューロシンボリック推論エンジンでの活用により,他の係り受け分類器ベースラインよりも結果(精度と品質の両方)が有意に向上し,テキスト推論におけるこの進歩の実用的メリットが示される。 Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment datasets, and evaluate its impact on LLM-based textual inference. We find that our resulting dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets, suggesting that RDTE is a significant step forward in the long-standing problem of forming a clear protocol for discerning entailment. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in a modern neuro-symbolic reasoning engine significantly improves results (both accuracy and proof quality) over other entailment classifier baselines, illustrating the practical benefit of this advance for textual inference.	翻訳日:2024-02-29 17:44:29 公開日:2024-02-27
# 神経生物学ネットワークにおける機能的コネクトームの学習動的表現 Learning dynamic representations of the functional connectome in neurobiological networks ( http://arxiv.org/abs/2402.14102v2 ) ライセンス: Link先を確認	Luciano Dyballa, Samuel Lang, Alexandra Haslund-Gourley, Eviatar Yemini, Steven W. Zucker	(参考訳) ニューロン回路の静的シナプス接続は、その機能のダイナミクスと直接的に対照的である。コミュニティの相互作用の変化と同様に、異なるニューロンは様々な組み合わせで活動し、異なる時間に行動に影響を及ぼす。動物に生息する神経細胞間の動的親和性を学習し、異なる時期にどのニューロン同士のコミュニティを形成するかを明らかにするために、教師なしアプローチを導入する。推論は2つの大きなステップで行われる。第一に、脳全体のカルシウム活性からニューロンのトレース間の一対の非線形親和性は、非負のテンソル因子分解(ntf)によって構成される。各因子は、どのニューロン群が時間的間隔で、どの動物と相互作用しているかを規定する。最後に、NTFが生成する機能的モチーフに重み付けされたコミュニティ検出を可能にする生成モデルを適用し、動的機能的コネクトームを明らかにする。時間(時間)は異なる実験変数(例えば化学刺激の応用)をコードするので、実験の別々の段階(例えば刺激の応用や自発的な行動)で活動する神経モチーフのアトラスを提供する。本手法は神経細胞間の因果相互作用をロバストに予測し,行動を生成することができることを確認した。コードはhttps://github.com/dyballa/dynamic-connectomesで入手できる。 The static synaptic connectivity of neuronal circuits stands in direct contrast to the dynamics of their function. As in changing community interactions, different neurons can participate actively in various combinations to effect behaviors at different times. We introduce an unsupervised approach to learn the dynamic affinities between neurons in live, behaving animals, and to reveal which communities form among neurons at different times. The inference occurs in two major steps. First, pairwise non-linear affinities between neuronal traces from brain-wide calcium activity are organized by non-negative tensor factorization (NTF). Each factor specifies which groups of neurons are most likely interacting for an inferred interval in time, and for which animals. Finally, a generative model that allows for weighted community detection is applied to the functional motifs produced by NTF to reveal a dynamic functional connectome. Since time codes the different experimental variables (e.g., application of chemical stimuli), this provides an atlas of neural motifs active during separate stages of an experiment (e.g., stimulus application or spontaneous behaviors). Results from our analysis are experimentally validated, confirming that our method is able to robustly predict causal interactions between neurons to generate behavior. Code is available at https://github.com/dyballa/dynamic-connectomes.	翻訳日:2024-02-29 17:43:40 公開日:2024-02-27
# 視覚分類のためのアーキテクチャ全体のゼロショット一般化 Zero-shot generalization across architectures for visual classification ( http://arxiv.org/abs/2402.14095v2 ) ライセンス: Link先を確認	Evan Gerritz, Luciano Dyballa, Steven W. Zucker	(参考訳) 未知データへの一般化はディープネットワークの重要なデシドラタムであるが、その分類精度との関係は明らかではない。最小主義的ビジョンデータセットと一般化可能性尺度を用いることで、深層畳み込みネットワーク(cnns)からトランスフォーマーまで、レイヤ間およびアーキテクチャ全体にわたって非認識クラスに外挿する能力が異なることが分かる。精度は一般化可能性の予測に適しておらず、一般化は単調に層深さで変化する。コードはhttps://github.com/dyballa/zero-shot-generalizationで入手できる。 Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth. Code is available at https://github.com/dyballa/zero-shot-generalization.	翻訳日:2024-02-29 17:43:18 公開日:2024-02-27
# 強化学習支援量子アーキテクチャによる変分量子アルゴリズムの探索 Reinforcement learning-assisted quantum architecture search for variational quantum algorithms ( http://arxiv.org/abs/2402.13754v2 ) ライセンス: Link先を確認	Akash Kundu	(参考訳) ノイズの多い中間スケール量子(NISQ)時代の重要なハードルは、機能量子回路を特定することである。これらの回路は、現在の量子ハードウェアの制限によって課される制約にも従わなければならない。量子古典最適化アルゴリズムのクラスである変分量子アルゴリズム(VQA)は、現在利用可能な量子デバイスにおけるこれらの課題に対処するために開発された。しかしながら、VQAの全体的な性能は、変動回路の初期化戦略、回路の構造(アンザッツとも呼ばれる)、コスト関数の設定に依存する。回路の構造に着目し,この論文では,強化学習(RL)を用いた変分回路の最適構造探索を自動化することにより,VQAの性能を向上させる。論文の中で、回路の最適性は、その深さ、ゲートとパラメータの全体数、および与えられた問題を解決するための精度を評価することによって決定される。最適量子回路の探索を自動化するタスクは量子アーキテクチャサーチ(QAS)として知られている。 QASの研究の大部分は、主にノイズのないシナリオに焦点を当てている。しかし、QASに対するノイズの影響はいまだに不十分である。本稿では,テンソルをベースとした量子回路の符号化,可能回路の探索空間を効率的に探索するための環境力学の制限,より短い回路を見つけるためにエージェントを操るエピソード停止スキーム,安定性向上のための$\epsilon$-greedyポリシを備えたDDQN(Double Deep Q-network)を導入することで課題に取り組む。ノイズレスおよびノイズの多い量子ハードウェアに関する数値実験は、様々なVQAを扱う際に、我々のRLベースのQASが既存のQASより優れていることを示している。一方、論文で提案する手法は、他の幅広いvqaに対応するために容易に適用できる。 A significant hurdle in the noisy intermediate-scale quantum (NISQ) era is identifying functional quantum circuits. These circuits must also adhere to the constraints imposed by current quantum hardware limitations. Variational quantum algorithms (VQAs), a class of quantum-classical optimization algorithms, were developed to address these challenges in the currently available quantum devices. However, the overall performance of VQAs depends on the initialization strategy of the variational circuit, the structure of the circuit (also known as ansatz), and the configuration of the cost function. Focusing on the structure of the circuit, in this thesis, we improve the performance of VQAs by automating the search for an optimal structure for the variational circuits using reinforcement learning (RL). Within the thesis, the optimality of a circuit is determined by evaluating its depth, the overall count of gates and parameters, and its accuracy in solving the given problem. The task of automating the search for optimal quantum circuits is known as quantum architecture search (QAS). The majority of research in QAS is primarily focused on a noiseless scenario. Yet, the impact of noise on the QAS remains inadequately explored. In this thesis, we tackle the issue by introducing a tensor-based quantum circuit encoding, restrictions on environment dynamics to explore the search space of possible circuits efficiently, an episode halting scheme to steer the agent to find shorter circuits, a double deep Q-network (DDQN) with an $\epsilon$-greedy policy for better stability. The numerical experiments on noiseless and noisy quantum hardware show that in dealing with various VQAs, our RL-based QAS outperforms existing QAS. Meanwhile, the methods we propose in the thesis can be readily adapted to address a wide range of other VQAs.	翻訳日:2024-02-29 17:43:06 公開日:2024-02-27
# 有限温度における光ツイーザ中中性原子の最適制御輸送 Optimal control transport of neutral atoms in optical tweezers at finite temperature ( http://arxiv.org/abs/2402.17831v1 ) ライセンス: Link先を確認	Alice Pagano, Daniel Jaschke, Werner Weiss, and Simone Montangero	(参考訳) ライドバーグ量子コンピュータにおける中性原子の輸送は、グリッドの初期配置と動的接続への重要なステップであり、最近成功している。有限温度における光ツイーザ中中性原子の輸送に対する最適制御と量子速度制限の適用について検討し,レーザーノイズが輸送忠実度に与える影響を分析する。開ループ最適制御は輸送性を大幅に向上させ、最低解析温度が1,\mu$kで3マイクロメートルの距離で最大$89\%向上する。さらに, 輸送効率を推定し, 閉ループ最適制御を実装する実験において実現可能な, 放出・捕獲計測における輸送の忠実さの挙動をシミュレートする。 The transport of neutral atoms in Rydberg quantum computers is a crucial step of the initial arrangement of the grid as well as to the dynamic connectivity, recently successfully demonstrated. We study the application of optimal control and the quantum speed limit for the transport of neutral atoms in optical tweezers at finite temperatures and analyze how laser noise affects transport fidelity. Open-loop optimal control significantly enhances transport fidelity, achieving an improvement up to $89\%$ for the lowest analyzed temperature of $1\,\mu$K for a distance of three micrometers. Furthermore, we simulate how the transport fidelity behaves in release-and-capture measurements, which are realizable in the experiment to estimate transport efficiency and implement closed-loop optimal control.	翻訳日:2024-02-29 17:25:53 公開日:2024-02-27
# 大規模言語モデルの予測駆動ランキング Prediction-Powered Ranking of Large Language Models ( http://arxiv.org/abs/2402.17826v1 ) ライセンス: Link先を確認	Ivi Chatzi, Eleni Straitouri, Suhas Thejaswi, Manuel Gomez Rodriguez	(参考訳) 大規模な言語モデルは、人間の好みに合わせてランク付けされることが多い -- アウトプットが人間に好まれる場合、他のモデルよりも優れたモデルである。人間の嗜好を引き出す最も一般的な方法の1つは、異なるモデルによって提供される出力と同じ入力とのペアワイズ比較を利用する。しかし、人間による対数比較の収集は費用がかかり、時間を要するため、強力な大規模言語モデルによって対数比較を収集することは、非常に一般的なプラクティスとなっている。驚いたことに、現在、人間とモデルの選好のミスマッチが構築されたランキングに導入される可能性があるという不確実性を測定することはできない。本研究では,このギャップを埋める統計的枠組みを開発する。人間によるペアワイズ比較の小さなセットとモデルによるペアワイズ比較の大規模なセットが与えられた場合、我々のフレームワークは比較対象の各モデルに対してランクセット – 可能なランク位置のセット – を提供する。さらに、利用者が指定した値以上の確率を持つ場合、ランク集合は、人間同士の選好(分布)と一致する真のランキングをカバーすることが保証される。私たちのフレームワークは計算効率が良く、使いやすいので、人間の好みの分布や、人間による対比較と強力な大きな言語モデルとのアライメントの程度については仮定していません。 Large language models are often ranked according to their level of alignment with human preferences -- a model is better than other models if its outputs are more frequently preferred by humans. One of the most popular ways to elicit human preferences utilizes pairwise comparisons between the outputs provided by different models to the same inputs. However, since gathering pairwise comparisons by humans is costly and time-consuming, it has become a very common practice to gather pairwise comparisons by a strong large language model -- a model strongly aligned with human preferences. Surprisingly, practitioners cannot currently measure the uncertainty that any mismatch between human and model preferences may introduce in the constructed rankings. In this work, we develop a statistical framework to bridge this gap. Given a small set of pairwise comparisons by humans and a large set of pairwise comparisons by a model, our framework provides a rank-set -- a set of possible ranking positions -- for each of the models under comparison. Moreover, it guarantees that, with a probability greater than or equal to a user-specified value, the rank-sets cover the true ranking consistent with (the distribution of) human pairwise preferences. Our framework is computationally efficient, easy to use, and does not make any assumption about the distribution of human preferences nor about the degree of alignment between the pairwise comparisons by the humans and the strong large language model.	翻訳日:2024-02-29 17:25:40 公開日:2024-02-27
# 時間的危険下における粒子検出器 Particle detectors under chronological hazard ( http://arxiv.org/abs/2402.17825v1 ) ライセンス: Link先を確認	Ana Alonso-Serrano, Erickson Tjoa, Luis J. Garay, Eduardo Mart\'in-Mart\'inez	(参考訳) 我々は,CTCから因果的に切り離された時空領域に局所粒子検出器を設置することにより,時間マシンを特徴付ける閉時間曲線(CTC)の存在をいかに認識できるかを分析する。我々の研究は、検出器がCTCが存在するかどうかを判断できるだけでなく、幾何学的情報から位相的情報を分離し、CTC(アインシュタインシリンダーなど)や曲率、時間機械を許容する位相的識別を伴う時空を欠く周期的時空を識別できることを示している。 We analyze how the presence of closed timelike curves (CTCs) characterizing a time machine can be discerned by placing a local particle detector in a region of spacetime which is causally disconnected from the CTCs. Our study shows that not only can the detector tell if there are CTCs, but also that the detector can separate topological from geometrical information and distinguish periodic spacetimes without CTCs (like the Einstein cylinder), curvature, and spacetimes with topological identifications that enable time-machines.	翻訳日:2024-02-29 17:25:16 公開日:2024-02-27
# DropBP: 後方伝播による大規模言語モデルの微調整の高速化 DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation ( http://arxiv.org/abs/2402.17812v1 ) ライセンス: Link先を確認	Sunghyeon Woo, Baeseong Park, Byeongwook Kim, Minjung Jo, Sejung Kwon, Dongsuk Jeon, and Dongsoo Lee	(参考訳) ディープニューラルネットワークのトレーニングは通常、前方と後方の両方で計算コストがかなりかかる。従来のレイヤドロップテクニックは、計算の負担を軽減するためにトレーニング中に特定のレイヤをドロップする。しかし, 前方伝播時の落下層は, 精度を低下させることでトレーニング過程に悪影響を及ぼす。本稿では,精度を維持しつつ計算コストを削減するための新しい手法であるdropbpを提案する。 DropBPは後方伝播中にランダムに層を落とし、前方伝播を逸脱しない。さらに、DropBPは各層の感度を算出して適切なドロップレートを割り当て、トレーニングプロセスを安定化させる。 DropBPは、バックプロパゲーションによるトレーニングプロセスの効率を高めるために設計されており、バックプロパゲーションを用いた完全な微調整とパラメータ効率の高い微調整の両方の加速を可能にする。具体的には、QLoRAでDropBPを使用すると、トレーニング時間を44%削減し、コンバージェンス速度を1.5$\times$にし、LLaMA2-70Bの1つのNVIDIA-A100 80GiB GPU上で6.2$\times$より大きなシーケンス長でトレーニングすることができる。コードはhttps://github.com/woosunghyeon/dropbpで入手できる。 Training deep neural networks typically involves substantial computational costs during both forward and backward propagation. The conventional layer dropping techniques drop certain layers during training for reducing the computations burden. However, dropping layers during forward propagation adversely affects the training process by degrading accuracy. In this paper, we propose Dropping Backward Propagation (DropBP), a novel approach designed to reduce computational costs while maintaining accuracy. DropBP randomly drops layers during the backward propagation, which does not deviate forward propagation. Moreover, DropBP calculates the sensitivity of each layer to assign appropriate drop rate, thereby stabilizing the training process. DropBP is designed to enhance the efficiency of the training process with backpropagation, thereby enabling the acceleration of both full fine-tuning and parameter-efficient fine-tuning using backpropagation. Specifically, utilizing DropBP in QLoRA reduces training time by 44%, increases the convergence speed to the identical loss level by 1.5$\times$, and enables training with a 6.2$\times$ larger sequence length on a single NVIDIA-A100 80GiB GPU in LLaMA2-70B. The code is available at https://github.com/WooSunghyeon/dropbp.	翻訳日:2024-02-29 17:25:05 公開日:2024-02-27
# truthx: 真理空間における大規模言語モデルの編集による幻覚の緩和 TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space ( http://arxiv.org/abs/2402.17811v1 ) ライセンス: Link先を確認	Shaolei Zhang, Tian Yu, Yang Feng	(参考訳) 大規模言語モデル(llm)は様々なタスクにまたがる顕著な能力を示している。しかし、しばしば幻覚を生じさせ、特に正しい知識を持っているにもかかわらず、不合理な反応を起こすことがある。本論文では,LLMの内部表現を真理空間で編集することで,LLMの真理性を引き出す推論時間手法であるTrathXを提案する。 TruthX は自動エンコーダを用いて LLM の表現をそれぞれ意味空間と真理空間にマッピングし、真理空間内の真理編集方向を特定するために対照的な学習を適用する。推測では、LLMの内部表現を真理空間で編集することで、TruthXはLLMの内部表現を効果的に強化する。実験の結果,TruthfulQAベンチマークでは,13の高度なLCMの真偽を平均20%向上することがわかった。さらなる分析により、トゥルースXが獲得した真理空間は、真理または幻覚応答を生成するLLMを制御する上で重要な役割を担っていることが示唆される。 Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks. However, they sometimes suffer from producing hallucinations, particularly in cases where they may generate untruthful responses despite possessing the correct knowledge. In this paper, we propose TruthX, an inference-time method to elicit the truthfulness of LLMs by editing their internal representations in truthful space. TruthX employs an auto-encoder to map LLM's representations into semantic and truthful latent spaces respectively, and applies contrastive learning to identify a truthful editing direction within the truthful space. During inference, by editing LLM's internal representations in truthful space, TruthX effectively enhances the truthfulness of LLMs. Experiments show that TruthX effectively improves the truthfulness of 13 advanced LLMs by an average of 20% on TruthfulQA benchmark. Further analyses suggest that the truthful space acquired by TruthX plays a pivotal role in controlling LLM to produce truthful or hallucinatory responses.	翻訳日:2024-02-29 17:24:41 公開日:2024-02-27
# BioT5+: IUPAC統合とマルチタスクチューニングによる汎用生物学的理解を目指して BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning ( http://arxiv.org/abs/2402.17810v1 ) ライセンス: Link先を確認	Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan	(参考訳) 計算生物学における最近の研究動向は、特に分子やタンパク質の文脈において、テキストとバイオエンタリティモデリングの統合に焦点を当てている。しかし、BioT5のような以前の取り組みは、様々なタスクをまたいだ一般化の課題に直面し、特にテキスト表現(IUPACなど)において、分子構造に関する微妙な理解が欠如していた。本稿では,BioT5フレームワークの拡張であるBioT5+を紹介する。 BioT5+ には、分子理解のための IUPAC 名の統合、bioRxiv や PubChem などのソースからの広範なバイオテキストと分子データの統合、タスク間の汎用性のためのマルチタスク命令チューニング、数値データの処理を改善するための新しい数値トークン化技術など、いくつかの新しい特徴が含まれている。これらの拡張により、BioT5+は、分子表現とそれらのテキスト記述のギャップを埋め、生物学的実体をより包括的に理解し、バイオテキストとバイオシーケンスの基底的推論を大幅に改善することができる。モデルは事前訓練され、多数の実験で微調整されており、例えば \emph{3 タイプの問題(分類、回帰、生成)、15種類のタスク、21種類のベンチマークデータセットなどがあり、ほとんどのケースで顕著なパフォーマンスと最先端の結果を示している。 BioT5+は、生物学的データの複雑な関係を捉え、バイオインフォマティクスや計算生物学に大きく貢献する。我々のコードは \url{https://github.com/QizhiPei/BioT5} で入手できる。 Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a novel numerical tokenization technique for improved processing of numerical data. These enhancements allow BioT5+ to bridge the gap between molecular representations and their textual descriptions, providing a more holistic understanding of biological entities, and largely improving the grounded reasoning of bio-text and bio-sequences. The model is pre-trained and fine-tuned with a large number of experiments, including \emph{3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets}, demonstrating the remarkable performance and state-of-the-art results in most cases. BioT5+ stands out for its ability to capture intricate relationships in biological data, thereby contributing significantly to bioinformatics and computational biology. Our code is available at \url{https://github.com/QizhiPei/BioT5}.	翻訳日:2024-02-29 17:24:22 公開日:2024-02-27
# uwb nlos信号データの分類予測のためのICAアンサンブル学習手法 AN An ica-ensemble learning approach for prediction of uwb nlos signals data classification ( http://arxiv.org/abs/2402.17808v1 ) ライセンス: Link先を確認	Jiya A. Enoch, Ilesanmi B. Oluwafemi, Francis A. Ibikunle and Olulope K. Paul	(参考訳) 探索・救助(SAR)シナリオにおける追跡された人間の検出は、広汎なコンピューティングにおいて大きな課題となる。本研究は,高精度な機械学習技術を活用することでこの問題に対処する。しかし、閉じ込められた個体の正確な識別は、次元と雑音データの呪いによって妨げられている。特に、破滅的な出来事における非視線(NLOS)の状況において、次元性の呪いは、検出におけるノイズや非相関値による盲点につながる可能性がある。本研究では,無線通信による情報調和と,UWBレーダ信号を用いたNLOSシナリオにおける個人識別に焦点を当てた。特徴抽出に独立成分分析(ICA)を用い,静的および動的データセットのアンサンブルアルゴリズムを用いて分類性能を評価する。実験の結果,静的データでは88.37%,動的データでは87.20%の分類精度が示され,提案手法の有効性が示された。最後に、この作業は科学者やエンジニアがSAR操作中に即時決定を下すのに役立つ。 Trapped human detection in search and rescue (SAR) scenarios poses a significant challenge in pervasive computing. This study addresses this issue by leveraging machine learning techniques, given their high accuracy. However, accurate identification of trapped individuals is hindered by the curse of dimensionality and noisy data. Particularly in non-line-of-sight (NLOS) situations during catastrophic events, the curse of dimensionality may lead to blind spots due to noise and uncorrelated values in detections. This research focuses on harmonizing information through wireless communication and identifying individuals in NLOS scenarios using ultra-wideband (UWB) radar signals. Employing independent component analysis (ICA) for feature extraction, the study evaluates classification performance using ensemble algorithms on both static and dynamic datasets. The experimental results demonstrate categorization accuracies of 88.37% for static data and 87.20% for dynamic data, highlighting the effectiveness of the proposed approach. Finally, this work can help scientists and engineers make instant decisions during SAR operations.	翻訳日:2024-02-29 17:23:52 公開日:2024-02-27
# 低咽頭癌およびEGFR変異肺腺癌に対する遺伝子調節相互作用ネットワークの探索と治療分子の予測 Exploring Gene Regulatory Interaction Networks and predicting therapeutic molecules for Hypopharyngeal Cancer and EGFR-mutated lung adenocarcinoma ( http://arxiv.org/abs/2402.17807v1 ) ライセンス: Link先を確認	Abanti Bhattacharjya, Md Manowarul Islam, Md Ashraf Uddin, Md. Alamin Talukder, AKM Azad, Sunil Aryal, Bikash Kumar Paul, Wahia Tasnim, Muhammad Ali Abdulllah Almoyad, Mohammad Ali Moni	(参考訳) 情報技術の出現により、バイオインフォマティクス研究分野は研究者や学者にますます魅力的になりつつある。近年,様々なバイオインフォマティクスツールキットが開発され,人間の知覚のための大量の生物学的データの高速な処理と解析が進められている。ほとんどの研究は、2つの接続された疾患の発見と、様々な遺伝子制御相互作用ネットワークの構築にいくつかの観察を行うことに焦点を当てている。例えば下咽頭癌はEGFR変異肺腺癌の関連疾患である。本研究では,下咽頭癌における肺転移の発見によりEGFR変異肺腺癌と下咽頭癌を選択した。本研究では,NCBI が管理するオンラインデータベース GEO (Gene Expression Omnibus) から Mircorarray データセットを収集する。選択された2つの疾患間で異なる発現遺伝子、共通遺伝子、ハブ遺伝子がその後の移動のために検出される。以上の結果から, 次数トポロジー法と最大クライク中心性 (mcc) に基づく10個のハブ遺伝子に基づき, 選択された疾患に対する共通の治療分子が示唆された。提案する治療分子は,これら2つの疾患の患者に対して同時に有益である。 With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians. The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception. Most studies focus on locating two connected diseases and making some observations to construct diverse gene regulatory interaction networks, a forerunner to general drug design for curing illness. For instance, Hypopharyngeal cancer is a disease that is associated with EGFR-mutated lung adenocarcinoma. In this study, we select EGFR-mutated lung adenocarcinoma and Hypopharyngeal cancer by finding the Lung metastases in hypopharyngeal cancer. To conduct this study, we collect Mircorarray datasets from GEO (Gene Expression Omnibus), an online database controlled by NCBI. Differentially expressed genes, common genes, and hub genes between the selected two diseases are detected for the succeeding move. Our research findings have suggested common therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC). Our suggested therapeutic molecules will be fruitful for patients with those two diseases simultaneously.	翻訳日:2024-02-29 17:23:37 公開日:2024-02-27
# VAE-Regressionを用いた多モード優先材料設計 Material Microstructure Design Using VAE-Regression with Multimodal Prior ( http://arxiv.org/abs/2402.17806v1 ) ライセンス: Link先を確認	Avadhut Sardeshmukh, Sreedhar Reddy, BP Gautham, Pushpak Bhattacharyya	(参考訳) 本稿では, 計算材料科学における最重要課題である, 前方および逆構造固有結合を構築するための変分オートエンコーダ(VAE)に基づくモデルを提案する。我々のモデルはVAEと回帰を体系的に組み合わせ、回帰変数上で2段階の事前条件で2つのモデルをリンクする。回帰損失は、変分オートエンコーダの再構成損失、特性予測と再構成に関連する学習マイクロ構造特徴と合わせて最適化される。得られたモデルは, 先行予測と逆予測の両方, すなわち, 与えられた微細構造の性質の予測と, 与えられた特性を得るのに必要な微細構造の予測に使用できる。逆問題(一対多)は不適切であるため、対象特性集合に対して複数の微細構造を推定できる前に、マルチモーダルガウス混合を用いて目的関数を導出する。先行予測では,最先端のフォワードオンリーモデルと同等の精度を示す。さらに,本手法は直接逆推論を可能にする。本モデルを用いて推定した微細構造は, 適正に所望の特性を達成でき, コストのかかる最適化ループの必要性を回避できることを示した。 We propose a variational autoencoder (VAE)-based model for building forward and inverse structure-property linkages, a problem of paramount importance in computational materials science. Our model systematically combines VAE with regression, linking the two models through a two-level prior conditioned on the regression variables. The regression loss is optimized jointly with the reconstruction loss of the variational autoencoder, learning microstructure features relevant for property prediction and reconstruction. The resultant model can be used for both forward and inverse prediction i.e., for predicting the properties of a given microstructure as well as for predicting the microstructure required to obtain given properties. Since the inverse problem is ill-posed (one-to-many), we derive the objective function using a multi-modal Gaussian mixture prior enabling the model to infer multiple microstructures for a target set of properties. We show that for forward prediction, our model is as accurate as state-of-the-art forward-only models. Additionally, our method enables direct inverse inference. We show that the microstructures inferred using our model achieve desired properties reasonably accurately, avoiding the need for expensive optimization loops.	翻訳日:2024-02-29 17:23:18 公開日:2024-02-27
# グラフニューラルネットワークと演算回路 Graph Neural Networks and Arithmetic Circuits ( http://arxiv.org/abs/2402.17805v1 ) ライセンス: Link先を確認	Timon Barlag, Vivian Holzapfel, Laura Strieker, Jonni Virtema, Heribert Vollmer	(参考訳) グラフニューラルネットワーク(GNN)アーキテクチャに従うニューラルネットワークの計算能力は,集約結合型GNNや他の特定のタイプに限定されない。実数上での多様なアクティベーション関数と演算回路を用いて,GNNの表現率の正確な対応性を確立する。その結果,ネットワークの活性化関数は回路内のゲート型となる。この結果は、全ての共通アクティベーション関数に対して、一様かつ非一様に、一定深度回路とネットワークの族に対して成り立つ。 We characterize the computational power of neural networks that follow the graph neural network (GNN) architecture, not restricted to aggregate-combine GNNs or other particular types. We establish an exact correspondence between the expressivity of GNNs using diverse activation functions and arithmetic circuits over real numbers. In our results the activation function of the network becomes a gate type in the circuit. Our result holds for families of constant depth circuits and networks, both uniformly and non-uniformly, for all common activation functions.	翻訳日:2024-02-29 17:22:59 公開日:2024-02-27
# 多変量時系列による機械故障の予測--産業ケーススタディ Predicting machine failures from multivariate time series: an industrial case study ( http://arxiv.org/abs/2402.17804v1 ) ライセンス: Link先を確認	Nicol\`o Oreste Pinciroli Vago, Francesca Forbicini, Piero Fraternali	(参考訳) 非神経機械学習(ML)とディープラーニング(DL)モデルは、産業保守の文脈でシステム障害を予測するためにしばしば使用される。しかし、予測を行うのに使われた過去のデータ量と、予測の将来的な拡張の効果を共同で評価する研究はごくわずかである。本研究は,(1)個別に作業する産業用包装機,(2)連続的に作業する産業用血液冷凍機,(3)連続的に作業する窒素発生器の運用に関する3つのデータセットにおける故障予測に訓練されたモデルの性能に及ぼす読書窓の大きさと予測窓の影響を評価する。この問題は、この区間で発生する障害の確率に基づいて予測ウィンドウに正のラベルを割り当てる二分分類タスクとして定式化される。 6つのアルゴリズム(論理回帰、ランダムフォレスト、サポートベクターマシン、LSTM、ConvLSTM、トランスフォーマー)を多変量テレメトリ時系列を用いて比較した。その結果,予測ウィンドウの次元が重要な役割を担い,障害に先行する多様な時間依存パターンによるデータの分類におけるDLアプローチの有効性と,障害に先行する類似パターンと反復パターンの分類におけるMLアプローチの有効性が示唆された。 Non-neural Machine Learning (ML) and Deep Learning (DL) models are often used to predict system failures in the context of industrial maintenance. However, only a few researches jointly assess the effect of varying the amount of past data used to make a prediction and the extension in the future of the forecast. This study evaluates the impact of the size of the reading window and of the prediction window on the performances of models trained to forecast failures in three data sets concerning the operation of (1) an industrial wrapping machine working in discrete sessions, (2) an industrial blood refrigerator working continuously, and (3) a nitrogen generator working continuously. The problem is formulated as a binary classification task that assigns the positive label to the prediction window based on the probability of a failure to occur in such an interval. Six algorithms (logistic regression, random forest, support vector machine, LSTM, ConvLSTM, and Transformers) are compared using multivariate telemetry time series. The results indicate that, in the considered scenarios, the dimension of the prediction windows plays a crucial role and highlight the effectiveness of DL approaches at classifying data with diverse time-dependent patterns preceding a failure and the effectiveness of ML approaches at classifying similar and repetitive patterns preceding a failure.	翻訳日:2024-02-29 17:22:51 公開日:2024-02-27
# 圧縮機を用いた機械の時系列解析 Time Series Analysis in Compressor-Based Machines: A Survey ( http://arxiv.org/abs/2402.17802v1 ) ライセンス: Link先を確認	Francesca Forbicini, Nicol\`o Oreste Pinciroli Vago, Piero Fraternali	(参考訳) 産業と住宅の両面では、冷凍機、HVACシステム、ヒートポンプ、冷却機といった圧縮機ベースの機械は、生産と消費者のニーズを満たすために不可欠である。センサとIoT接続の拡散は、障害の検出と予測、行動シフトの識別、マシンとそのコンポーネントの運用状態の予測を可能にする監視システムの開発を支援する。本稿では, コンプレッサをベースとする機械の動作を特徴付ける多変量時系列に適用する故障検出, 故障予測, 予測, 変化点検出などのタスクに関する最近の研究について検討する。具体的には、故障検出は障害を検出し診断し、故障予測はそのような発生を予測し、マシンの特性変数の将来的な値を予測する。上記の課題に対するアプローチを特定し,分類し,採用するアルゴリズムを比較し,現在の技術水準のギャップを強調し,この分野で最も有望な研究方向について論じる。 In both industrial and residential contexts, compressor-based machines, such as refrigerators, HVAC systems, heat pumps and chillers, are essential to fulfil production and consumers' needs. The diffusion of sensors and IoT connectivity supports the development of monitoring systems able to detect and predict faults, identify behavioural shifts and forecast the operational status of machines and of their components. The focus of this paper is to survey the recent research on such tasks as Fault Detection, Fault Prediction, Forecasting and Change Point Detection applied to multivariate time series characterizing the operations of compressor-based machines. Specifically, Fault Detection detects and diagnoses faults, Fault Prediction predicts such occurrences, forecasting anticipates the future value of characteristic variables of machines and Change Point Detection identifies significant variations in the behaviour of the appliances, such as a change in the working regime. We identify and classify the approaches to the above-mentioned tasks, compare the algorithms employed, highlight the gaps in the current status of the art and discuss the most promising future research directions in the field.	翻訳日:2024-02-29 17:22:27 公開日:2024-02-27
# ジェネレーティブAIと著作権:ダイナミックな視点 Generative AI and Copyright: A Dynamic Perspective ( http://arxiv.org/abs/2402.17801v1 ) ライセンス: Link先を確認	S. Alex Yang and Angela Huyue Zhang	(参考訳) 生成AIの急速な進歩は、クリエイティブ産業を混乱させる可能性がある。この新技術に対する大きな興奮の中で、創造産業における将来の開発と応用は、2つの著作権問題に大きく依存している。 1) 生成aiモデル(フェアユース標準)の訓練に使用されてきたコンテンツの制作者に対する補償 2) 著作権保護のためのAI生成コンテンツの適格性(AI対応性) どちらの問題も学者や実践者の間で激しい議論を巻き起こしてきたが、ほとんどの分析は既存の著作権ドクトリンに対する彼らの挑戦に焦点を当てている。本稿では、これらの2つの規制問題とその相互作用の経済的影響をよりよく理解することを目的とする。内在的なコンテンツ生成とAIモデル開発を備えた動的モデルを構築することで、AI開発、AI企業の利益、クリエーターの収入、消費者福祉に対する公正使用標準とAIコピーライトビリティの影響、そしてこれらの影響が様々な経済的および運用上の要因にどのように影響するかを明らかにする。例えば、寛大なフェアユース(AIトレーニングにAIトレーニングを補うことなく使用する)は、豊富なトレーニングデータが存在する場合、すべての関係者に恩恵を与えるが、そのようなデータが不足している場合には、クリエイターや消費者を傷つける可能性がある。同様に、より強力なAIコピーライトビリティ(AIコンテンツはより著作権保護を享受している)は、AI開発を妨げ、社会福祉を減らす可能性がある。私たちの分析では、これらの2つの著作権問題の間の複雑な相互作用も強調しています。例えば、既存のトレーニングデータが不足している場合、寛大な公正使用はAIコピーライト性が弱い場合にのみ望ましい。我々の調査結果は、政策立案者が規制決定にダイナミックで状況に応じたアプローチを取り入れ、グローバルな規制環境の複雑さをナビゲートするビジネスリーダーに洞察を提供する必要性を浮き彫りにした。 The rapid advancement of generative AI is poised to disrupt the creative industry. Amidst the immense excitement for this new technology, its future development and applications in the creative industry hinge crucially upon two copyright issues: 1) the compensation to creators whose content has been used to train generative AI models (the fair use standard); and 2) the eligibility of AI-generated content for copyright protection (AI-copyrightability). While both issues have ignited heated debates among academics and practitioners, most analysis has focused on their challenges posed to existing copyright doctrines. In this paper, we aim to better understand the economic implications of these two regulatory issues and their interactions. By constructing a dynamic model with endogenous content creation and AI model development, we unravel the impacts of the fair use standard and AI-copyrightability on AI development, AI company profit, creators income, and consumer welfare, and how these impacts are influenced by various economic and operational factors. For example, while generous fair use (use data for AI training without compensating the creator) benefits all parties when abundant training data exists, it can hurt creators and consumers when such data is scarce. Similarly, stronger AI-copyrightability (AI content enjoys more copyright protection) could hinder AI development and reduce social welfare. Our analysis also highlights the complex interplay between these two copyright issues. For instance, when existing training data is scarce, generous fair use may be preferred only when AI-copyrightability is weak. Our findings underscore the need for policymakers to embrace a dynamic, context-specific approach in making regulatory decisions and provide insights for business leaders navigating the complexities of the global regulatory environment.	翻訳日:2024-02-29 17:22:08 公開日:2024-02-27
# 非対数凹分布のゼロ次サンプリング法:拡散による転移性軽減 Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion ( http://arxiv.org/abs/2402.17886v1 ) ライセンス: Link先を確認	Ye He, Kevin Rojas, Molei Tao	(参考訳) 本稿では,非正規化密度の問合せに基づいて,非logconcave分布からのサンプリング問題を考察する。最初に、一般化モンテカルロ推定器によって近似されたスコア関数を持つ消音拡散過程のシミュレーションに基づいて、拡散モンテカルロ(dmc)という枠組みを記述する。 dmcはoracleベースのメタアルゴリズムであり、oracleはモンテカルロスコア推定器を生成するサンプルへのアクセスを想定している。次に、このオラクルの実装を拒絶サンプリングに基づいて提供し、DMCをZOD-MC(Zeroth-Order Diffusion Monte Carlo)と呼ばれる真のアルゴリズムに変換する。対象分布が対数凸である、あるいは任意の等長不等式を満たすと仮定することなく、まず汎用フレームワーク、すなわちdmcの性能保証を構築することにより収束解析を行う。そして、ZOD-MCが所望のサンプリング精度に逆多項式依存があることを証明した。その結果、低次元分布では、ZOD-MCは非常に効率的なサンプリング装置であり、RDMCやRS-DMCを含む最新のサンプリング器よりも性能が高い。最後に,ZOD-MCの非凸電位におけるモード間や不連続性に対する感受性を実験的に実証した。 This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Diffusion Monte Carlo (DMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RS-DMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.	翻訳日:2024-02-29 17:16:33 公開日:2024-02-27
# マルコフポテンシャルゲームにおける独立学習 Independent Learning in Constrained Markov Potential Games ( http://arxiv.org/abs/2402.17885v1 ) ライセンス: Link先を確認	Philip Jordan, Anas Barakat, Niao He	(参考訳) 制約付きマルコフゲームは、エージェントの動作が制約を受けるマルチエージェント強化学習問題をモデル化するための公式な数学的枠組みを提供する。本研究では,最近導入された制約付きマルコフポテンシャルゲームに注目する。このような制約付きゲームを解くために集中型アルゴリズムが提案されているが、制約付き設定に合わせた独立した学習アルゴリズムを収束させる設計は未解決のままである。各エージェントは、共有状態とともに、それぞれのアクションと報酬を観察する。最適化文献に触発された本アルゴリズムは,正規化制約セットを付加した近点的更新を行う。各近位ステップは確率的スイッチング勾配アルゴリズムを用いて不正確に解く。特に,ターンベースのエージェント更新を必要とする集中型コーディネーション機構を必要とせずに,アルゴリズムを独立に実装できる。いくつかの技術的制約条件の下では、制約付き近似ナッシュ平衡に対する収束保証を確立する。我々はその結果を説明するためにシミュレーションを行う。 Constrained Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems where the behavior of the agents is subject to constraints. In this work, we focus on the recently introduced class of constrained Markov Potential Games. While centralized algorithms have been proposed for solving such constrained games, the design of converging independent learning algorithms tailored for the constrained setting remains an open question. We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state. Inspired by the optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Notably, our algorithm can be implemented independently without a centralized coordination mechanism requiring turn-based agent updates. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results.	翻訳日:2024-02-29 17:16:07 公開日:2024-02-27
# blendsql:リレーショナル代数におけるハイブリッド質問応答を統一するスケーラブルな方言 BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra ( http://arxiv.org/abs/2402.17882v1 ) ライセンス: Link先を確認	Parker Glenn, Parag Pravin Dakle, Liang Wang, Preethi Raghavan	(参考訳) ハイブリッドな質問応答タスクのための既存のエンドツーエンドシステムの多くは、ユーザが最終的な結果を達成するのに使用される中間的推論ステップを制限された制御と洞察を持つ"prompt-and-pray"パラダイムに導かれることが多い。加えて、多くのトランスフォーマーベースのLCMのコンテキストサイズ制限のため、フル構造化および非構造化のコンテキストがゼロショット設定で与えられたプロンプトに収まることを期待することは、数ショット設定で言うまでもない。我々は、sqliteのスーパーセットであるblendsqlを紹介し、非構造化データと構造化データの両方で推論をオーケストレーションするための統合方言として機能する。マルチホップ推論を含むハイブリッドな質問応答タスクでは、分解された推論ロードマップを単一解釈可能なblendsqlクエリにエンコードします。特に、BlendSQLは、トークンを35%減らしながら、大量のデータセットにスケールし、エンドツーエンドシステムのパフォーマンスを向上させることができることを示す。私たちのコードはhttps://github.com/parkervg/blendsqlでパッケージとしてインストールできます。 Many existing end-to-end systems for hybrid question answering tasks can often be boiled down to a "prompt-and-pray" paradigm, where the user has limited control and insight into the intermediate reasoning steps used to achieve the final result. Additionally, due to the context size limitation of many transformer-based LLMs, it is often not reasonable to expect that the full structured and unstructured context will fit into a given prompt in a zero-shot setting, let alone a few-shot setting. We introduce BlendSQL, a superset of SQLite to act as a unified dialect for orchestrating reasoning across both unstructured and structured data. For hybrid question answering tasks involving multi-hop reasoning, we encode the full decomposed reasoning roadmap into a single interpretable BlendSQL query. Notably, we show that BlendSQL can scale to massive datasets and improve the performance of end-to-end systems while using 35% fewer tokens. Our code is available and installable as a package at https://github.com/parkervg/blendsql.	翻訳日:2024-02-29 17:15:52 公開日:2024-02-27
# jaynes-cummings から anisotropic rabi model への超対称性の旅 A supersymmetry journey from the Jaynes-Cummings to the anisotropic Rabi model ( http://arxiv.org/abs/2402.17881v1 ) ライセンス: Link先を確認	A. Kafuri and F. H. Maldonado-Villamizar and A. Moroz and B. M. Rodr\'iguez-Lara	(参考訳) リー理論のレンズを通してjaynes-cummingsおよびanti-jaynes-cummingsモデルを再検討し、対角化に対するオペレーターベースのアプローチの有効性を強調する。 u(1 \vert 1)$ superalgebra によって提供される抽象超対称性から、実験フレーム内の具体的な固有状態とエネルギーへと、ステップを明確に並べることに集中する。さらに、$osp(2 \vert 2)$ superalgebraによって提供される下層の超対称性を持つ異方性Rabiモデルを、圧縮された参照フレームで探索し、有効なJaynes-Cummingsモデルによりそのスペクトル特性を近似することができる。最後に、一意な基底状態エネルギーを持つ等間隔の二重縮退エネルギースペクトルを示す、因子化可能な異方性ラビモデルのためのレジームを同定する。我々の研究は、数学的物理学と実用的な量子光学を融合することを目的としており、リー理論の重要な役割を担っている。 We revisit the Jaynes-Cummings and anti-Jaynes-Cummings model through the lens of Lie theory, aiming to highlight the efficacy of an operator-based approach for diagonalization. We focus on explicitly delineating the steps from an underlying abstract supersymmetry, provided by the $u(1 \vert 1)$ superalgebra, into concrete proper states and energies in the laboratory frame. Additionally, we explore the anisotropic Rabi model possessing an underlying supersymmetry, provided by the $osp(2 \vert 2)$ superalgebra, in a squeezed reference frame, where it is possible to approximate its spectral characteristics by an effective Jaynes--Cummings model. Finally, we identify a regime for a factorizable anisotropic Rabi model, exhibiting an equally spaced, double degenerate energy spectrum with a unique ground state energy. Our work aims to merge mathematical physics with practical quantum optics, underscoring the critical role of Lie theory.	翻訳日:2024-02-29 17:15:33 公開日:2024-02-27
# 言語モデルを用いた統計モデルの自動発見 Automated Statistical Model Discovery with Language Models ( http://arxiv.org/abs/2402.17879v1 ) ライセンス: Link先を確認	Michael Y. Li, Emily B. Fox, Noah D. Goodman	(参考訳) 統計モデルの発見は、ドメイン固有のモデリング制約に従う膨大なモデル空間の探索を伴う。この領域を効率的に探索するには、モデリングと問題領域に関する人間の専門知識が必要である。大規模言語モデル(LM)のドメイン知識とプログラミング能力に動機付けられ,言語モデルによる自動統計モデル発見のための手法を提案する。 lmは、確率的プログラムとして表現される統計モデルの提案と、モデラーとして振る舞うこと、そしてそれらのモデルを批判し、ドメインエキスパートとして振る舞うことの間を繰り返す。 lmsを活用することで、モデルのドメイン固有言語を定義したり、従来のシステムの主要な制約である手作りの検索手順を設計する必要はなくなります。確率的モデリングでは,制約されたモデルの空間内を探索し,オープンな空間を探索し,自然言語制約下での古典的モデルを改善する(例えば,このモデルは生態学者に解釈できる)。提案手法は,従来のシステムの性能と一致し,人間の専門家設計モデルに匹敵するモデルを特定し,古典的モデルを解釈可能な方法で拡張する。その結果,lm駆動モデル発見の期待が浮き彫りになった。 Statistical model discovery involves a challenging search over a vast space of models subject to domain-specific modeling constraints. Efficiently searching over this space requires human expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, key restrictions of previous systems. We evaluate our method in three common settings in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space, and improving classic models under natural language constraints (e.g., this model should be interpretable to an ecologist). Our method matches the performance of previous systems, identifies models on par with human expert designed models, and extends classic models in interpretable ways. Our results highlight the promise of LM driven model discovery.	翻訳日:2024-02-29 17:15:14 公開日:2024-02-27
# 予測最大化のためのバイアスMCMCによる確率近似 Stochastic Approximation with Biased MCMC for Expectation Maximization ( http://arxiv.org/abs/2402.17870v1 ) ライセンス: Link先を確認	Samuel Gruffaz, Kyurae Kim, Alain Oliviero Durmus, Jacob R. Gardner	(参考訳) 期待最大化(EM)アルゴリズムは経験的ベイズ推論の広範な手法であるが、期待段階(Eステップ)はしばしば難解である。マルコフ連鎖モンテカルロ(MCMC)による確率近似スキームを用いることでこの問題を回避することができ、MCMC-SAEMと呼ばれるアルゴリズムが作られる。 MCMC-SAEMの理論的保証はこれまで確立されてきたが、これらの結果は漸近的に偏りのないMCMCアルゴリズムを用いる場合に限られている。実際には、MCMC-SAEMはしばしば漸近的に偏りのあるMCMCで実行される。本研究では,SAEMの漸近と非漸近をMCMCステップ,特にバイアスの影響で解析することにより,このギャップを埋める。また、漸近的に偏りのないメトロポリス調整ランゲヴィンアルゴリズム(MALA)と、漸近的に偏りのある非調整ランゲヴィンアルゴリズム(ULA)を合成データセットと実データセットで比較した数値実験を行った。実験の結果、ULAはランゲヴィンの段階的な選択に関してより安定であり、時にはより速い収束をもたらすことが示されている。 The expectation maximization (EM) algorithm is a widespread method for empirical Bayesian inference, but its expectation step (E-step) is often intractable. Employing a stochastic approximation scheme with Markov chain Monte Carlo (MCMC) can circumvent this issue, resulting in an algorithm known as MCMC-SAEM. While theoretical guarantees for MCMC-SAEM have previously been established, these results are restricted to the case where asymptotically unbiased MCMC algorithms are used. In practice, MCMC-SAEM is often run with asymptotically biased MCMC, for which the consequences are theoretically less understood. In this work, we fill this gap by analyzing the asymptotics and non-asymptotics of SAEM with biased MCMC steps, particularly the effect of bias. We also provide numerical experiments comparing the Metropolis-adjusted Langevin algorithm (MALA), which is asymptotically unbiased, and the unadjusted Langevin algorithm (ULA), which is asymptotically biased, on synthetic and real datasets. Experimental results show that ULA is more stable with respect to the choice of Langevin stepsize and can sometimes result in faster convergence.	翻訳日:2024-02-29 17:14:54 公開日:2024-02-27
# 近接力近似を超えたカシミール物理学:微分拡大 Casimir Physics beyond the Proximity Force Approximation: The Derivative Expansion ( http://arxiv.org/abs/2402.17864v1 ) ライセンス: Link先を確認	C\'esar D. Fosco, Fernando C. Lombardo, Francisco D. Mazzitelli	(参考訳) 本稿では、近接力近似(PFA)を拡張するアプローチである、カシミール物理学における微分展開(DE)法について概説する。カシミール効果以外の文脈でdeを導入し動機付けした後、その領域に対応する異なる例を示す。我々は、異なる特定の測地、境界条件、場の種類、量子および熱揺らぎに焦点を当てる。この方法が適用できる様々な例を提供するのに加えて、DE が適用できない具体的な例、すなわち 2 + 1 次元の完全ノイマン条件の場合について論じる。同じ例では、より現実的な境界条件が問題を回避していることを示す。また, 粒子-表面相互作用のより広い視点を提供するカシミール-ポルダー相互作用に対するdeの適用についても考察する。 We review the derivative expansion (DE) method in Casimir physics, an approach which extends the proximity force approximation (PFA). After introducing and motivating the DE in contexts other than the Casimir effect, we present different examples which correspond to that realm. We focus on different particular geometries, boundary conditions, types of fields, and quantum and thermal fluctuations. Besides providing various examples where the method can be applied, we discuss a concrete example for which the DE cannot be applied; namely, the case of perfect Neumann conditions in 2 + 1 dimensions. By the same example, we show how a more realistic type of boundary condition circumvents the problem. We also comment on the application of the DE to the Casimir-Polder interaction which provides a broader perspective on particle-surface interactions.	翻訳日:2024-02-29 17:14:30 公開日:2024-02-27
# 自然言語意味論を用いた視覚トランスフォーマー Vision Transformers with Natural Language Semantics ( http://arxiv.org/abs/2402.17863v1 ) ライセンス: Link先を確認	Young Kyung Kim, J. Mat\'ias Di Martino, Guillermo Sapiro	(参考訳) ViT(Vision Transformers)内のトークンやパッチには、自然言語処理(NLP)と異なり、基本的な意味情報がない。通常、ViTトークンは、特定の意味的コンテキストを持たない長方形のイメージパッチと関連付けられ、解釈が難しく、情報を効果的にカプセル化できない。本稿では,セグメンテーションモデルの最近の進歩を利用して新しいトークン化戦略を設計する,新しいトランスフォーマモデルSemantic Vision Transformers(sViT)を提案する。 svitはセマンティック情報を有効に活用し、畳み込みニューラルネットワークを思わせる帰納的バイアスを生成し、トランスフォーマーの特徴である画像内のグローバルな依存関係とコンテキスト情報をキャプチャする。実際のデータセットを使用した検証を通じて、sViTはViTよりも優れており、類似や優れたパフォーマンスを維持しながら、トレーニングデータが少なくなる。さらに、sViTは、そのスケール不変なセマンティック特性により、分布外一般化と自然分布シフトに対するロバスト性において大きな優位性を示す。特にセマンティクストークンの使用はモデルの解釈性を大幅に向上させる。最後に、提案されたパラダイムはトークン(あるいはセグメント)レベルで新しい強力な拡張技術の導入を促進し、トレーニングデータの多様性と一般化能力を高める。文が単語でできているように、画像は意味オブジェクトによって形成され、提案手法はオブジェクトセグメンテーションの最近の進歩を活用し、解釈可能で堅牢な視覚変換器への重要な自然な一歩を踏み出す。 Tokens or patches within Vision Transformers (ViT) lack essential semantic information, unlike their counterparts in natural language processing (NLP). Typically, ViT tokens are associated with rectangular image patches that lack specific semantic context, making interpretation difficult and failing to effectively encapsulate information. We introduce a novel transformer model, Semantic Vision Transformers (sViT), which leverages recent progress on segmentation models to design novel tokenizer strategies. sViT effectively harnesses semantic information, creating an inductive bias reminiscent of convolutional neural networks while capturing global dependencies and contextual information within images that are characteristic of transformers. Through validation using real datasets, sViT demonstrates superiority over ViT, requiring less training data while maintaining similar or superior performance. Furthermore, sViT demonstrates significant superiority in out-of-distribution generalization and robustness to natural distribution shifts, attributed to its scale invariance semantic characteristic. Notably, the use of semantic tokens significantly enhances the model's interpretability. Lastly, the proposed paradigm facilitates the introduction of new and powerful augmentation techniques at the token (or segment) level, increasing training data diversity and generalization capabilities. Just as sentences are made of words, images are formed by semantic objects; our proposed methodology leverages recent progress in object segmentation and takes an important and natural step toward interpretable and robust vision transformers.	翻訳日:2024-02-29 17:14:15 公開日:2024-02-27
# RePrune:カーネル代表選考によるチャンネルのプルーニング REPrune: Channel Pruning via Kernel Representative Selection ( http://arxiv.org/abs/2402.17862v1 ) ライセンス: Link先を確認	Mincheol Park, Dongjin Kim, Cheonjun Park, Yuna Park, Gyeong Eun Gong, Won Woo Ro, Suhyun Kim	(参考訳) チャネルプルーニングは現代の畳み込みニューラルネットワーク(cnns)を加速するために広く受け入れられている。結果として得られたprunedモデルは、汎用ソフトウェアとハードウェアリソースへの即時デプロイから恩恵を受ける。しかし、特に畳み込みフィルタの単位において、その大きな粉砕粒度は、cnnにスパース性を導入する方法や場所を決定する柔軟性がないため、望ましくない精度低下に繋がることが多い。本稿では,カーネルプルーニングをエミュレートする新しいチャネルプルーニング手法であるREPruneを提案する。 repruneは凝集クラスタリングを使用して各チャネル内の類似のカーネルを識別する。そして、最大クラスタカバレッジ問題を最適化しつつ、カーネル代表者の取り込みを最大化するフィルタを選択する。同時にトレーニング・プルーニングのパラダイムを統合することで、REPruneはCNNのトレーニング全体を通じて効率的でプログレッシブなプルーニングを促進する。実験結果から、REPruneは既存の手法よりもコンピュータビジョンタスクにおいて優れており、加速比と性能保持のバランスを効果的に達成できることがわかった。 Channel pruning is widely accepted to accelerate modern convolutional neural networks (CNNs). The resulting pruned model benefits from its immediate deployment on general-purpose software and hardware resources. However, its large pruning granularity, specifically at the unit of a convolution filter, often leads to undesirable accuracy drops due to the inflexibility of deciding how and where to introduce sparsity to the CNNs. In this paper, we propose REPrune, a novel channel pruning technique that emulates kernel pruning, fully exploiting the finer but structured granularity. REPrune identifies similar kernels within each channel using agglomerative clustering. Then, it selects filters that maximize the incorporation of kernel representatives while optimizing the maximum cluster coverage problem. By integrating with a simultaneous training-pruning paradigm, REPrune promotes efficient, progressive pruning throughout training CNNs, avoiding the conventional train-prune-finetune sequence. Experimental results highlight that REPrune performs better in computer vision tasks than existing methods, effectively achieving a balance between acceleration ratio and performance retention.	翻訳日:2024-02-29 17:13:47 公開日:2024-02-27
# AIアカウンタビリティインフラストラクチャを目指す - AI監査ツールのギャップと機会 Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling ( http://arxiv.org/abs/2402.17861v1 ) ライセンス: Link先を確認	Victor Ojewale, Ryan Steed, Briana Vecchione, Abeba Birhane, and Inioluwa Deborah Raji	(参考訳) 監査は、デプロイされた人工知能(AI)システムのリスクと限界を特定するための重要なメカニズムである。しかし、AI監査の効果的な実行は、依然として信じられないほど難しい。結果として、実践者は様々なツールを使って努力を支援します。 35人のAI監査実践者とのインタビューと390のツールのランドスケープ分析に基づいて、利用可能なAI監査ツールの現在のエコシステムをマップします。標準の設定やaiシステムの評価を支援するツールが数多く存在するが、これらのツールは、実際にai監査の説明責任の目標をサポートするのに不足していることが多い。したがって、私たちは、発見から擁護まで、評価以外の将来のツール開発分野を強調し、AI監査ツールを使用する上で実践者が直面した課題を概説する。我々は、多くのAI監査実践者のニーズの全範囲を適切にサポートするリソースが不足していると結論付け、現場は単に評価のためのツールを超えて、AI説明責任のためのより包括的なインフラへと移行することを推奨する。 Audits are critical mechanisms for identifying the risks and limitations of deployed artificial intelligence (AI) systems. However, the effective execution of AI audits remains incredibly difficult. As a result, practitioners make use of various tools to support their efforts. Drawing on interviews with 35 AI audit practitioners and a landscape analysis of 390 tools, we map the current ecosystem of available AI audit tools. While there are many tools designed to assist practitioners with setting standards and evaluating AI systems, these tools often fell short of supporting the accountability goals of AI auditing in practice. We thus highlight areas for future tool development beyond evaluation -- from harms discovery to advocacy -- and outline challenges practitioners faced in their efforts to use AI audit tools. We conclude that resources are lacking to adequately support the full scope of needs for many AI audit practitioners and recommend that the field move beyond tools for just evaluation, towards more comprehensive infrastructure for AI accountability.	翻訳日:2024-02-29 17:13:27 公開日:2024-02-27
# latent neural pde solver: 偏微分方程式のための低次モデリングフレームワーク Latent Neural PDE Solver: a reduced-order modelling framework for partial differential equations ( http://arxiv.org/abs/2402.17853v1 ) ライセンス: Link先を確認	Zijie Li, Saurabh Patil, Francis Ogoke, Dule Shu, Wilson Zhen, Michael Schneier, John R. Buchanan, Jr., Amir Barati Farimani	(参考訳) ニューラルネットワークは偏微分方程式(pdes)が支配する系の数値シミュレーションを加速する可能性を示している。高次元の離散化フィールドで動作する多くの既存のニューラルネットワークサロゲートとは異なり、より粗い離散化を伴う潜在空間におけるシステムのダイナミクスを学習することを提案する。提案するフレームワーク - Latent Neural PDE Solver (LNS) において、非線形オートエンコーダは、まず、システムの全順序表現をメッシュ再現空間に投影するように訓練され、その後、このメッシュ再現空間の将来の状態を予測するために時間モデルが訓練される。この削減プロセスは、微分化に伴う計算コストを大幅に削減することにより、時間モデルのトレーニングを簡略化する。システムパラメータの異なる単相・多相流を含む様々な種類のシステムにおいて,提案するフレームワークと他の一般的なPDE解法の性能について検討した。実時間空間で動作するニューラルPDEソルバと比較して, 精度と効率が優れていることを示す。 Neural networks have shown promising potential in accelerating the numerical simulation of systems governed by partial differential equations (PDEs). Different from many existing neural network surrogates operating on high-dimensional discretized fields, we propose to learn the dynamics of the system in the latent space with much coarser discretizations. In our proposed framework - Latent Neural PDE Solver (LNS), a non-linear autoencoder is first trained to project the full-order representation of the system onto the mesh-reduced space, then a temporal model is trained to predict the future state in this mesh-reduced space. This reduction process simplifies the training of the temporal model by greatly reducing the computational cost accompanying a fine discretization. We study the capability of the proposed framework and several other popular neural PDE solvers on various types of systems including single-phase and multi-phase flows along with varying system parameters. We showcase that it has competitive accuracy and efficiency compared to the neural PDE solver that operates on full-order space.	翻訳日:2024-02-29 17:13:09 公開日:2024-02-27
# 連続セルオートマトンにおける相境界の複雑さの探索 Looking for Complexity at Phase Boundaries in Continuous Cellular Automata ( http://arxiv.org/abs/2402.17848v1 ) ライセンス: Link先を確認	Vassilis Papadopoulos, Guilhem Doat, Arthur Renard, Cl\'ement Hongler	(参考訳) 人工生命の重要な課題の1つは、複雑な行動の出現を示すシステムを設計することである。そのような系の多くは高次元のパラメータ空間に依存しており、その小さな部分集合だけが興味深いダイナミクスを示す。連続系の場合に着目し,二相間の境界に位置するパラメータを効率的に生成できる「相遷移ファインダ(ptf)アルゴリズム」を提案する。これらの点が複雑な振る舞いを示す傾向が強く、PTFをレニアに適用することで2倍以上の興味深い行動の頻度を増大させることができる一方で、大規模な探索に十分な効率が維持できることを示す。 One key challenge in Artificial Life is designing systems that display an emergence of complex behaviors. Many such systems depend on a high-dimensional parameter space, only a small subset of which displays interesting dynamics. Focusing on the case of continuous systems, we introduce the 'Phase Transition Finder'(PTF) algorithm, which can be used to efficiently generate parameters lying at the border between two phases. We argue that such points are more likely to display complex behaviors, and confirm this by applying PTF to Lenia showing it can increase the frequency of interesting behaviors more than two-fold, while remaining efficient enough for large-scale searches.	翻訳日:2024-02-29 17:12:51 公開日:2024-02-27
# 私の指示に従い、beansをこぼす: 検索型生成システムからのスケーラブルなデータ抽出 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems ( http://arxiv.org/abs/2402.17840v1 ) ライセンス: Link先を確認	Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, Himabindu Lakkaraju	(参考訳) Retrieval-Augmented Generation (RAG)は、テスト時に外部知識を組み込むことで、事前訓練されたモデルを改善する。 Retrieval-In-Context RAG Language Models (LM) におけるデータストアリークのリスクについて検討する。本稿では,命令調整されたLMを組み込んだRAGシステムのデータストアから,命令追従機能を利用してテキストデータを簡単に抽出できることを示す。この脆弱性は、Llama2、Mistral/Mixtral、Vicuna、SOLAR、WizardLM、Qwen1.5、Platypus2にまたがる幅広い現代のLMに存在し、モデルのサイズが大きくなるにつれて、エクスプロイビリティが悪化する。我々は,本研究をRAGモデルGPTに拡張し,ランダムに選択された25個のGPTに対して,100%の成功率で,最大2つのクエリでデータストアリークを発生させる攻撃を設計し,本書から77,000語,1,569,000語のコーパスから3%の確率でテキストデータを冗長に抽出する。 Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.	翻訳日:2024-02-29 17:12:38 公開日:2024-02-27
# 安定lm 2 1.6b技術報告 Stable LM 2 1.6B Technical Report ( http://arxiv.org/abs/2402.17834v1 ) ライセンス: Link先を確認	Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme	(参考訳) 我々は,新世代の言語モデルシリーズの最初のStableLM 2 1.6Bを紹介する。本技術報告では,StableLM 2 1.6Bのベースおよび命令調整版へのデータおよびトレーニング手順の詳細について述べる。両方のモデルの重みは、誰でもダウンロードして使えるhughing faceを通じて利用できる。レポートには、ゼロおよびマイノショットベンチマーク、多言語ベンチマーク、マルチターン対話に焦点を当てたmtベンチマークなど、これらのモデルの徹底した評価が含まれている。本報告の公開時点では、StableLM 2 1.6Bは2Bパラメータによる最先端のオープンモデルであった。小型であることから、多くのエッジデバイスでスループットの測定も行っています。さらに、いくつかの定量化されたチェックポイントをオープンソース化し、元のモデルと比較したパフォーマンス指標を提供する。 We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.	翻訳日:2024-02-29 17:12:14 公開日:2024-02-27
# 動的回路による量子コンピューティングのスケーリング Scaling quantum computing with dynamic circuits ( http://arxiv.org/abs/2402.17833v1 ) ライセンス: Link先を確認	Almudena Carrera Vazquez, Caroline Tornow, Diego Riste, Stefan Woerner, Maika Takita, Daniel J. Egger	(参考訳) 量子コンピュータは情報を量子力学の法則で処理する。現在の量子ハードウェアはノイズが多く、情報は短時間しか保存できず、数量子ビット、すなわち平板接続で配列された量子ビットに制限されている。しかし、量子コンピューティングの多くの応用は、単一量子処理ユニット(qpu)よりも多くの量子ビットでハードウェアが提供する平面格子よりも多くの接続を必要とする。ここでは,複数のQPUにまたがる最大142キュービットの周期接続を必要とする量子状態を生成するために,エラー低減動的回路と回路切断を用いてこれらの制限を克服する。動的回路では、量子ゲートは、実行時間、すなわち量子ビットのコヒーレンス時間のごく一部で、中間回路の測定結果によって古典的に制御できる。我々のリアルタイム古典リンクは、量子ハードウェアのモジュラースケーリングを可能にする別のQPUの測定結果に基づいて、あるQPUに量子ゲートを適用することができる。さらに、誤差軽減制御フローにより、量子ビット接続とハードウェアの命令セットが向上し、量子コンピュータの汎用性が向上する。したがって、動的回路と量子モジュラリティは量子コンピュータをスケールして有用にするための鍵となる。 Quantum computers process information with the laws of quantum mechanics. Current quantum hardware is noisy, can only store information for a short time, and is limited to a few quantum bits, i.e., qubits, typically arranged in a planar connectivity. However, many applications of quantum computing require more connectivity than the planar lattice offered by the hardware on more qubits than is available on a single quantum processing unit (QPU). Here we overcome these limitations with error mitigated dynamic circuits and circuit-cutting to create quantum states requiring a periodic connectivity employing up to 142 qubits spanning multiple QPUs connected in real-time with a classical link. In a dynamic circuit, quantum gates can be classically controlled by the outcomes of mid-circuit measurements within run-time, i.e., within a fraction of the coherence time of the qubits. Our real-time classical link allows us to apply a quantum gate on one QPU conditioned on the outcome of a measurement on another QPU which enables a modular scaling of quantum hardware. Furthermore, the error mitigated control-flow enhances qubit connectivity and the instruction set of the hardware thus increasing the versatility of our quantum computers. Dynamic circuits and quantum modularity are thus key to scale quantum computers and make them useful.	翻訳日:2024-02-29 17:12:04 公開日:2024-02-27
# 非圧縮計算流体力学におけるAIライブラリの利用 Using AI libraries for Incompressible Computational Fluid Dynamics ( http://arxiv.org/abs/2402.17913v1 ) ライセンス: Link先を確認	Boyang Chen, Claire E. Heaney and Christopher C. Pain	(参考訳) 近年、さまざまなコンピュータアーキテクチャ(cpu、gpu、新しいaiプロセッサなど)で人工知能(ai)に関連する計算を実行するための、高度に効率的なオープンソースライブラリの開発に力を入れている。これにより、これらのライブラリをベースとしたアルゴリズムは、異なるアーキテクチャ間で高度に効率的かつポータブルになるだけでなく、AIを使ったメソッド開発への参入障壁を大幅に単純化した。本稿では,convolutional neural networks(cnns)などのai手法を,偏微分方程式(pdes)の数値解の分野において必要とされる標準演算として再提案することにより,aiソフトウェアとハードウェアの両方のパワーを数値モデリングの分野に持ち込む新しい手法を提案する。本研究の目的は、PDEの数値解の分野に高性能、アーキテクチャ非依存、使いやすさをもたらすことである。提案手法を用いて, 対流拡散方程式, 非線型バーガース方程式, ブラフ体を過ぎる非圧縮性流れを解く。後者の場合、畳み込みニューラルネットワークは、非圧縮性制約を強制するためにマルチグリッドソルバとして使用される。提案手法は,これらの問題をAIライブラリを用いて効率的に解くことができ,暗黙的手法を用いたPDEと計算流体力学の解法開発における新たな手法を提案する。 Recently, there has been a huge effort focused on developing highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has substantially simplified the entry barrier to develop methods using AI. Here, we present a novel methodology to bring the power of both AI software and hardware into the field of numerical modelling by repurposing AI methods, such as Convolutional Neural Networks (CNNs), for the standard operations required in the field of the numerical solution of Partial Differential Equations (PDEs). The aim of this work is to bring the high performance, architecture agnosticism and ease of use into the field of the numerical solution of PDEs. We use the proposed methodology to solve the advection-diffusion equation, the non-linear Burgers equation and incompressible flow past a bluff body. For the latter, a convolutional neural network is used as a multigrid solver in order to enforce the incompressibility constraint. We show that the presented methodology can solve all these problems using repurposed AI libraries in an efficient way, and presents a new avenue to explore in the development of methods to solve PDEs and Computational Fluid Dynamics problems with implicit methods.	翻訳日:2024-02-29 17:08:13 公開日:2024-02-27
# 浅影を用いたロバストかつ効率的な量子特性学習の実証 Demonstration of Robust and Efficient Quantum Property Learning with Shallow Shadows ( http://arxiv.org/abs/2402.17911v1 ) ライセンス: Link先を確認	Hong-Ye Hu, Andi Gu, Swarnadeep Majumder, Hang Ren, Yipei Zhang, Derek S. Wang, Yi-Zhuang You, Zlatko Minev, Susanne F. Yelin, Alireza Seif	(参考訳) 量子システムから効率的に情報を抽出することは、量子情報処理タスクの主要なコンポーネントである。ランダム化された測定(古典影)は、任意の量子状態の多くの特性をわずかな測定で予測できる。ランダムな単一量子ビットの測定は実験的に親和性があり、低重量のpauli観測値の学習に適しているが、非局所観測値では不十分である。測定前の浅いランダム量子回路に先行して、この実験的な親和性を維持するが、高重量のポーリスやフィデリティのような大域的な低ランク特性を含む、低重量のポーリス以外の可観測性には好適なサンプル複雑さがある。しかし、現実的なシナリオでは、浅い回路の各層に量子ノイズが蓄積され、結果が偏る。これらの課題に対処するため,我々はhash shadowsプロトコルを提案する。提案プロトコルはベイズ推定を用いて実験的なノイズモデルを学び,後処理において緩和する。この緩和はバイアス分散のトレードオフをもたらし、ノイズ誘起バイアスの補正はより大きい推定値分散のコストで行われる。このような分散の増大にもかかわらず、超伝導量子プロセッサで示すように、我々のプロトコルは、ランダムな単一量子ビット測定方式に比べて低いサンプル複雑さを維持しながら、期待値、忠実度、絡み合いエントロピーなどの状態特性を正確に回復する。また, ノイズがサンプルの複雑さに与える影響を理論的に解析し, 浅い影の深さの最適選択が雑音強度によってどう変化するかを示す。この理論と実験的分析の組み合わせは、ロバストな浅層影プロトコルを、現在の量子コンピューティングプラットフォーム上で量子状態を特徴付けるスケーラブルでロバストでサンプル効率の良いプロトコルとして位置づけている。 Extracting information efficiently from quantum systems is a major component of quantum information processing tasks. Randomized measurements, or classical shadows, enable predicting many properties of arbitrary quantum states using few measurements. While random single qubit measurements are experimentally friendly and suitable for learning low-weight Pauli observables, they perform poorly for nonlocal observables. Prepending a shallow random quantum circuit before measurements maintains this experimental friendliness, but also has favorable sample complexities for observables beyond low-weight Paulis, including high-weight Paulis and global low-rank properties such as fidelity. However, in realistic scenarios, quantum noise accumulated with each additional layer of the shallow circuit biases the results. To address these challenges, we propose the robust shallow shadows protocol. Our protocol uses Bayesian inference to learn the experimentally relevant noise model and mitigate it in postprocessing. This mitigation introduces a bias-variance trade-off: correcting for noise-induced bias comes at the cost of a larger estimator variance. Despite this increased variance, as we demonstrate on a superconducting quantum processor, our protocol correctly recovers state properties such as expectation values, fidelity, and entanglement entropy, while maintaining a lower sample complexity compared to the random single qubit measurement scheme. We also theoretically analyze the effects of noise on sample complexity and show how the optimal choice of the shallow shadow depth varies with noise strength. This combined theoretical and experimental analysis positions the robust shallow shadow protocol as a scalable, robust, and sample-efficient protocol for characterizing quantum states on current quantum computing platforms.	翻訳日:2024-02-29 17:07:50 公開日:2024-02-27
# Box It to Bind it:T2I拡散モデルにおける統一レイアウト制御と属性結合 Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models ( http://arxiv.org/abs/2402.17910v1 ) ライセンス: Link先を確認	Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga, and Farid Boussaid	(参考訳) 潜在拡散モデル(LDMs)は想像的画像を作成するのに優れているが、それらはしばしば意味的忠実さとオブジェクトが生成される場所の空間的制御の精度に欠ける。これらの欠陥に対処するために,テキスト・トゥ・イメージ(T2I)拡散モデルにおける空間制御と意味的精度を改善するための新しいトレーニング不要アプローチであるBox-it-to-Bind-it(B2B)モジュールを導入する。 B2Bは、破滅的な無視、属性バインディング、レイアウトガイダンスの3つの主要な課題をターゲットにしている。プロセスには2つの主要なステップが含まれます。一潜在符号化を調整して、オブジェクト生成を保証し、特定境界ボックス内に指示するオブジェクト生成及び ii) 属性バインディングは、生成されたオブジェクトがプロンプトで指定された属性に従属することを保証します。 B2Bは既存のT2Iモデルのプラグイン・アンド・プレイモジュールとして設計されており、重要な課題に対処する上で、モデル性能を著しく向上させる。確立されたCompBenchおよびTIFAスコアベンチマークを用いて,本手法の評価を行い,既存手法と比較して大幅な性能向上を示した。ソースコードはhttps://github.com/nextaistudio/BoxIt2BindItで公開されている。 While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods. The source code will be made publicly available at https://github.com/nextaistudio/BoxIt2BindIt.	翻訳日:2024-02-29 17:07:16 公開日:2024-02-27
# 量子観測器を分割する方法 How to Partition a Quantum Observable ( http://arxiv.org/abs/2402.17908v1 ) ライセンス: Link先を確認	Caleb M. Webb and Charles A. Stafford	(参考訳) 我々は、基礎となるヒルベルト空間または構成空間の分割から継承される開量子系における量子可観測性の分割を示す。この分割は、一般の非局所可観測性に対する不均質連続性方程式の定義をもたらすことが示されている。この形式主義は、平衡から外れた独立量子粒子系のフォン・ノイマンエントロピーの局所進化を記述するために用いられる。重要なことは、エントロピーの局所的なゆらぎはエントロピー電流演算子によって支配され、エントロピーの絡み合いの生成はこの分割エントロピーによって測定されないことを意味する。平衡から線形に摂動する系では、このエントロピー電流は熱電流と同値であることが示され、系-保存結合は対称に分配される。最後に、カップリングの他の分割はフォン・ノイマンのエントロピーの発散に直接繋がることを示す。したがって、ヒルベルト空間分割は、熱力学の法則と一致するフォン・ノイマンエントロピーの唯一の分割であると結論付ける。 We present a partition of quantum observables in an open quantum system which is inherited from the division of the underlying Hilbert space or configuration space. It is shown that this partition leads to the definition of an inhomogeneous continuity equation for generic, non-local observables. This formalism is employed to describe the local evolution of the von Neumann entropy of a system of independent quantum particles out of equilibrium. Crucially, we find that all local fluctuations in the entropy are governed by an entropy current operator, implying that the production of entanglement entropy is not measured by this partitioned entropy. For systems linearly perturbed from equilibrium, it is shown that this entropy current is equivalent to a heat current, provided that the system-reservoir coupling is partitioned symmetrically. Finally, we show that any other partition of the coupling leads directly to a divergence of the von Neumann entropy. Thus, we conclude that Hilbert-space partitioning is the only partition of the von Neumann entropy which is consistent with the Laws of Thermodynamics.	翻訳日:2024-02-29 17:06:53 公開日:2024-02-27
# 多重グラフにおける表現学習 : 情報の融合の方法と方法 Representation learning in multiplex graphs: Where and how to fuse information? ( http://arxiv.org/abs/2402.17906v1 ) ライセンス: Link先を確認	Piotr Bielak, Tomasz Kajdanowicz	(参考訳) 近年,教師なし,自己教師なしのグラフ表現学習が研究コミュニティで人気を集めている。しかし、ほとんどの提案手法は均質なネットワークに焦点を当てているが、実世界のグラフは複数のノードとエッジタイプを含むことが多い。ヘテロジニアスグラフの特殊なタイプである多重グラフは、よりリッチな情報を持ち、より良いモデリング機能を提供し、潜在的に異なるソースからより詳細なデータを統合する。多重グラフにおける多様なエッジタイプは、表現学習の基盤となるプロセスに関するコンテキストと洞察を提供する。本稿では,マルチプレックスネットワークにおけるノードの表現を教師なしあるいは自己管理的に学習する問題に対処する。そこで我々は,グラフ処理パイプラインの様々なレベルで実行される多様な情報融合方式について検討する。様々なシナリオの詳細な分析と実験的評価により、多重グラフを扱うGNNアーキテクチャの構築方法の改善が提案された。 In recent years, unsupervised and self-supervised graph representation learning has gained popularity in the research community. However, most proposed methods are focused on homogeneous networks, whereas real-world graphs often contain multiple node and edge types. Multiplex graphs, a special type of heterogeneous graphs, possess richer information, provide better modeling capabilities and integrate more detailed data from potentially different sources. The diverse edge types in multiplex graphs provide more context and insights into the underlying processes of representation learning. In this paper, we tackle the problem of learning representations for nodes in multiplex networks in an unsupervised or self-supervised manner. To that end, we explore diverse information fusion schemes performed at different levels of the graph processing pipeline. The detailed analysis and experimental evaluation of various scenarios inspired us to propose improvements in how to construct GNN architectures that deal with multiplex graphs.	翻訳日:2024-02-29 17:06:34 公開日:2024-02-27
# グラフニューラルネットワークによる地域文化の予測 Using Graph Neural Networks to Predict Local Culture ( http://arxiv.org/abs/2402.17905v1 ) ライセンス: Link先を確認	Thiago H Silva and Daniel Silver	(参考訳) 都市研究は長い間、近隣は動的で関係性が高いと認識してきた。しかし、データ、方法論、コンピュータ処理能力の欠如は、近隣関係ダイナミクスの形式的定量的な検証を妨げている。本研究は,gnn(graph neural network)を用いて,近隣住民の内部特性,過去の特徴,グループ間の流れに関する複数の情報ソースを組み合わせることで,予測モデルにおける表現力を高めることが可能な手法を提案する。 yelpの公開した大規模データセットを探索することにより,近隣属性の予測,特に地域文化の予測において,構造的接続性を考慮したアプローチの可能性を示す。結果は、従属的かつ方法論的な観点から有望である。統計的には、地域情報(地域人口統計など)またはグループプロファイル(yelpレビュアーの味)が地域文化の予測に最適な結果をもたらし、すべての研究ケースでほぼ同等であることがわかった。グループプロファイルを探索することは、様々なオンラインデータから自動的に抽出できるため、特定の分野のローカル情報を見つけるのが困難である。これにより、研究者や政策立案者は、他の地域情報が不足している場合に、さまざまなデータソースの使用を奨励することができる。 Urban research has long recognized that neighbourhoods are dynamic and relational. However, lack of data, methodologies, and computer processing power have hampered a formal quantitative examination of neighbourhood relational dynamics. To make progress on this issue, this study proposes a graph neural network (GNN) approach that permits combining and evaluating multiple sources of information about internal characteristics of neighbourhoods, their past characteristics, and flows of groups among them, potentially providing greater expressive power in predictive models. By exploring a public large-scale dataset from Yelp, we show the potential of our approach for considering structural connectedness in predicting neighbourhood attributes, specifically to predict local culture. Results are promising from a substantive and methodologically point of view. Substantively, we find that either local area information (e.g. area demographics) or group profiles (tastes of Yelp reviewers) give the best results in predicting local culture, and they are nearly equivalent in all studied cases. Methodologically, exploring group profiles could be a helpful alternative where finding local information for specific areas is challenging, since they can be extracted automatically from many forms of online data. Thus, our approach could empower researchers and policy-makers to use a range of data sources when other local area information is lacking.	翻訳日:2024-02-29 17:06:20 公開日:2024-02-27
# Surgment: セグメンテーション対応セマンティック検索と視覚質問作成とビデオベースの手術学習支援へのフィードバック Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning ( http://arxiv.org/abs/2402.17903v1 ) ライセンス: Link先を確認	Jingying Wang, Haoran Tang, Taylor Kantor, Tandis Soltani, Vitaliy Popov and Xu Wang	(参考訳) ビデオは手術室(OR)に入る前に手術訓練生を準備するための顕著な学習材料である。本研究では,ビデオベースの手術学習体験を充実させる技術を探究する。 Surgmentは、外科医が手術記録に基づいたフィードバックで演習を作成するのを支援するシステムである。 Surgmentは、数ショットの学習ベースのパイプライン(SegGPT+SAM)を使用して、手術シーンを分割し、精度92\%を達成する。セグメンテーションパイプラインは、フォーマティブな研究から外科医が望む視覚的な質問やフィードバックを作成することができる。サージメントは外科医が 1)スケッチを通して興味のあるフレームを取得し、 2)特定の解剖学的コンポーネントをターゲットにした視覚的フィードバックを提供する設計演習。 11名の外科医による評価研究において、被験者は関心のフレームを特定するための検索・バイ・スケッチ・アプローチを称賛し、画像に基づく質問とフィードバックは高い教育的価値を持つことがわかった。 Videos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert surgeons create exercises with feedback based on surgery recordings. Surgment is powered by a few-shot-learning-based pipeline (SegGPT+SAM) to segment surgery scenes, achieving an accuracy of 92\%. The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons from a formative study. Surgment enables surgeons to 1) retrieve frames of interest through sketches, and 2) design exercises that target specific anatomical components and offer visual feedback. In an evaluation study with 11 surgeons, participants applauded the search-by-sketch approach for identifying frames of interest and found the resulting image-based questions and feedback to be of high educational value.	翻訳日:2024-02-29 17:05:59 公開日:2024-02-27
# SequentialAttention++ for Block Sparsification: Differentiable Pruning with Combinatorial Optimization SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization ( http://arxiv.org/abs/2402.17902v1 ) ライセンス: Link先を確認	Taisuke Yasuda, Kyriakos Axiotis, Gang Fu, MohammadHossein Bateni, Vahab Mirrokni	(参考訳) ニューラルネットワークのプルーニングは、大規模でスケーラブルで、解釈可能で、一般化可能なモデルを構築するための重要なテクニックである。先行研究は,(1)パラメータの重要度を効率的に正確に評価するための微分可能プルーニング,(2)スパースモデルの空間を効率的に探索するための組合せ最適化,の2つの直交方向に沿って発展してきた。この2つのアプローチを理論的にも経験的にも統合し、構造的ニューラルネットワークのプルーニングのためのコヒーレントなフレームワークを作り、差別化可能なプルーニングが組合せ最適化アルゴリズムを導いて、最も重要なスパースパラメータセットを選択する。理論的には、既存の微分可能プルーニング手法を群スパース最適化の非凸正則化と解釈し、非凸正則化の広いクラスにおいて、大域的最適化は一意的であり、群スパースであり、スパース凸最適化問題に対する近似解となることを証明できる。提案するアルゴリズムであるシーケンシャルattention++は,imagenetおよびcriteoデータセット上での大規模ニューラルネットワークのブロックワイズプルーニングタスクにおける最先端技術である。 Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.	翻訳日:2024-02-29 17:05:39 公開日:2024-02-27
# マルチプラネタリーシステムにおける外惑星予測と人工知能による惑星とホスト星のパラメータの相関決定 Exoplanets Prediction in Multi-Planetary Systems and Determining the Correlation Between the Parameters of Planets and Host Stars Using Artificial Intelligence ( http://arxiv.org/abs/2402.17898v1 ) ライセンス: Link先を確認	Mahdiyar Mousavi-Sadr	(参考訳) 発見された太陽系外惑星の数は増えており、これまでに5千以上の太陽系外惑星が確認されている。現在では、惑星系を統治する法則の妥当性を検証し、惑星と恒星の物理的パラメータの関係を発見するためのステップを講じる機会がある。まず、少なくとも3つ以上の確認された惑星を含む229の多惑星系において、ティティウス・ボーデ(tb)関係として知られる太陽系の惑星間の対数間隔を用いて、追加の太陽系外惑星探索の結果を示す。これらの系のうち、$\sim53\%$の惑星は、太陽系の惑星よりも非常に優れた対数間隔関係にあることが判明した。 426個の太陽系外惑星が存在し、そのうち47個の惑星は居住可能領域(hz)内にあり、47個の惑星のうち5つは最大質量が0.1-2$m_{\oplus}$、最大半径が1.25$r_{\oplus}$以下である。次に, 効率的な機械学習手法を用いて, 762個の太陽系外惑星と8個の太陽系外惑星からなるデータセットを解析し, その基本量を特徴付ける。データは、r_{p}=8.13r_{\oplus}$と$m_{p}=52.48m_{\oplus}$の2つの主要なクラスに分類する。巨大惑星は密度が低く、H-He質量比が高いが、小さな惑星はより密度が高く、主に重い元素で構成されている。我々は、惑星の質量、軌道周期、恒星質量が太陽系外惑星半径を予測する重要な役割を果たすことを強調した。巨大惑星では、惑星の半径と主星の質量の間に強い相関関係が観察され、巨大惑星の形成と恒星の特徴の関係に関する興味深い洞察が得られます。 The number of extrasolar planets discovered is increasing, so that more than five thousand exoplanets have been confirmed to date. Now we have an opportunity to test the validity of the laws governing planetary systems and take steps to discover the relationships between the physical parameters of planets and stars. Firstly, we present the results of a search for additional exoplanets in 229 multi-planetary systems that house at least three or more confirmed planets, employing a logarithmic spacing between planets in our Solar System known as the Titius-Bode (TB) relation. We find that the planets in $\sim53\%$ of these systems adhere to a logarithmic spacing relation remarkably better than the Solar System planets. We predict the presence of 426 additional exoplanets, 47 of which are located within the habitable zone (HZ), and five of the 47 planets have a maximum mass limit of 0.1-2$M_{\oplus}$ and a maximum radius lower than 1.25$R_{\oplus}$. Secondly, we employ efficient machine learning approaches to analyze a dataset comprising 762 confirmed exoplanets and eight Solar System planets, aiming to characterize their fundamental quantities. We classify the data into two main classes: 'small' and 'giant' planets, with cut-off values at $R_{p}=8.13R_{\oplus}$ and $M_{p}=52.48M_{\oplus}$. Giant planets have lower densities, suggesting higher H-He mass fractions, while small planets are denser, composed mainly of heavier elements. We highlight that planetary mass, orbital period, and stellar mass play crucial roles in predicting exoplanet radius. Notably, our study reveals a noteworthy result: for giant planets, we observe a strong correlation between planetary radius and the mass of their host stars, which might provide intriguing insights into the relationship between giant planet formation and stellar characteristics.	翻訳日:2024-02-29 17:05:15 公開日:2024-02-27
# オントロジーにおける新しい概念配置のための言語モデルに基づくフレームワーク A Language Model based Framework for New Concept Placement in Ontologies ( http://arxiv.org/abs/2402.17897v1 ) ライセンス: Link先を確認	Hang Dong, Jiaoyan Chen, Yuan He, Yongsheng Gao, Ian Horrocks	(参考訳) 言語モデルを用いて,テキストから抽出した新たな概念をオントロジーに挿入する作業について検討する。エッジ探索(edge search)は、挿入する候補位置(つまり概念間の仮定)を見つけること、エッジ形成とエンリッチメント(edge formation and enrichment)は、オントロジ構造を利用してエッジ候補を生成して拡張すること、エッジを最終的に配置するエッジ選択(edge selection)である。あらゆるステップにおいて、我々は、エッジサーチにBERTのような埋め込みベースの手法や、事前学習された言語モデル(PLM)を応用し、GPTシリーズ、FLAN-T5、Llama 2などの大規模言語モデル(LLM)とBERTファインタニングベースのマルチラベルエッジ-クロスエンコーダを適応するニューラルネットワーク手法を提案する。 SNOMED CTオントロジーとMedMentionsエンティティリンクベンチマークを用いて,最近のデータセットの手法を評価する。私たちのフレームワークの最良の設定は、検索にplmを微調整し、選択にマルチラベルクロスエンコーダを使用します。 LLMのゼロショットプロンプトは、まだそのタスクには不十分であり、性能向上のための説明可能なLLMのインストラクションチューニングを提案する。本研究はPLMの利点を示し,今後の研究を動機づけるPLMの促進性能を強調した。 We investigate the task of inserting new concepts extracted from texts into an ontology using language models. We explore an approach with three steps: edge search which is to find a set of candidate locations to insert (i.e., subsumptions between concepts), edge formation and enrichment which leverages the ontological structure to produce and enhance the edge candidates, and edge selection which eventually locates the edge to be placed into. In all steps, we propose to leverage neural methods, where we apply embedding-based methods and contrastive learning with Pre-trained Language Models (PLMs) such as BERT for edge search, and adapt a BERT fine-tuning-based multi-label Edge-Cross-encoder, and Large Language Models (LLMs) such as GPT series, FLAN-T5, and Llama 2, for edge selection. We evaluate the methods on recent datasets created using the SNOMED CT ontology and the MedMentions entity linking benchmark. The best settings in our framework use fine-tuned PLM for search and a multi-label Cross-encoder for selection. Zero-shot prompting of LLMs is still not adequate for the task, and we proposed explainable instruction tuning of LLMs for improved performance. Our study shows the advantages of PLMs and highlights the encouraging performance of LLMs that motivates future studies.	翻訳日:2024-02-29 17:04:39 公開日:2024-02-27
# 研究課題:LLM Webエージェントのための多目的分解質問のデータセット Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents ( http://arxiv.org/abs/2402.17896v1 ) ライセンス: Link先を確認	Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng, Ahmed Awadallah, Jennifer Neville, Nikhil Rao	(参考訳) 既存の質問応答(QA)データセットは、ほとんどの強力な大規模言語モデル(LLM)にとってもはや困難ではない。 TriviaQA、NaturalQuestions、ELI5、HotpotQAといった従来のQAベンチマークは、主に「未知の未知」について、何が欠けているのか、どのように答えるかを明確に示して研究している。したがって、これらのベンチマークでの優れたパフォーマンスは、誤ったセキュリティ感覚をもたらします。 NLPコミュニティのまだ未成熟なニーズは、多くの未知の情報要求、すなわち ''未知の未知' を含む、非ファクトイドで多面的な疑問の銀行である。私たちは、そのような質問は検索エンジンのログで見つけることができると主張している。本稿では,非ファクト型,`decompositional',マルチパースペクティブな検索クエリのデータセットであるresearchy questionsを提案する。ユーザがこれらの質問に対してクリックやセッション長などの信号で‘effort’をたくさん使い、また、GPT-4にも挑戦していることを示す。サブクエストへの分解など,‘スロー思考’の回答テクニックは,直接回答するよりもメリットがあることを示す。クリックしたClueweb22URLとともに、$\sim$ 100k Researchy Questionsをリリースしました。 Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. Hence, good performance on these benchmarks provides a false sense of security. A yet unmet need of the NLP community is a bank of non-factoid, multi-perspective questions involving a great deal of unclear information needs, i.e. ``unknown uknowns''. We claim we can find such questions in search engine logs, which is surprising because most question-intent queries are indeed factoid. We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, ``decompositional'' and multi-perspective. We show that users spend a lot of ``effort'' on these questions in terms of signals like clicks and session length, and that they are also challenging for GPT-4. We also show that ``slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly. We release $\sim$ 100k Researchy Questions, along with the Clueweb22 URLs that were clicked.	翻訳日:2024-02-29 17:04:12 公開日:2024-02-27
# セマンティックセグメンテーションのためのスワッピングアサインメントを用いた弱教師付き協調訓練 Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation ( http://arxiv.org/abs/2402.17891v1 ) ライセンス: Link先を確認	Xinyu Yang, Hossein Rahmani, Sue Black, Bryan M. Williams	(参考訳) クラスアクティベーションマップ(CAM)は通常、擬似ラベルを生成するために弱教師付きセマンティックセグメンテーション(WSSS)で使用される。不完全または過度なクラスアクティベーションのため、既存の研究ではオフラインカムの改良や追加ステージの導入、オフラインモジュールの提案がしばしば行われている。これにより、単段法の最適化が難しくなり、一般化が制限される。本研究では,改良プロセスへの依存を軽減するため,観測されたCAMの不整合と誤りを低減することを目的とする。我々は、ガイド付きCAMを組み込んだエンドツーエンドWSSSモデルを提案し、CAMをオンラインで同時最適化しながらセグメンテーションモデルを訓練する。提案手法は,スワッピングアサインメント (CoSA) を用いた協調学習であり,一方のサブネットワークが他方が生成するスワップアサインメントから学習するデュアルストリームフレームワークを利用する。 3つのテクニックを紹介します一不確実な地域を罰するソフト複雑度に基づく規則化二信頼しきい値を動的に修正するための閾値探索アプローチ三共存問題に対処するための対照的な分離 CoSAは例外的な性能を示し、VOCとCOCOの検証データセットでそれぞれ76.2\%と51.0\%のmIoUを達成し、既存のベースラインをかなり上回っている。特に、CoSAは、追加の監督対象を含む既存のマルチステージメソッドをすべて上回る、最初のシングルステージアプローチである。コードは \url{https://github.com/youshyee/cosa} で書ける。 Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels. Due to incomplete or excessive class activation, existing studies often resort to offline CAM refinement, introducing additional stages or proposing offline modules. This can cause optimization difficulties for single-stage methods and limit generalizability. In this study, we aim to reduce the observed CAM inconsistency and error to mitigate reliance on refinement processes. We propose an end-to-end WSSS model incorporating guided CAMs, wherein our segmentation model is trained while concurrently optimizing CAMs online. Our method, Co-training with Swapping Assignments (CoSA), leverages a dual-stream framework, where one sub-network learns from the swapped assignments generated by the other. We introduce three techniques: i) soft perplexity-based regularization to penalize uncertain regions; ii) a threshold-searching approach to dynamically revise the confidence threshold; and iii) contrastive separation to address the coexistence problem. CoSA demonstrates exceptional performance, achieving mIoU of 76.2\% and 51.0\% on VOC and COCO validation datasets, respectively, surpassing existing baselines by a substantial margin. Notably, CoSA is the first single-stage approach to outperform all existing multi-stage methods including those with additional supervision. Code is avilable at \url{https://github.com/youshyee/CoSA}.	翻訳日:2024-02-29 17:03:48 公開日:2024-02-27
# 逆最適化からEMMへの可能性 From Inverse Optimization to Feasibility to ERM ( http://arxiv.org/abs/2402.17890v1 ) ライセンス: Link先を確認	Saurabh Mishra, Anant Raj, Sharan Vaswani	(参考訳) 逆最適化は、既知の解から最適化問題の未知のパラメータを推定することを含み、輸送、電力システム、医療などの分野で広く利用されている。追加の文脈情報を利用して未知の問題パラメータをより正確に予測する文脈逆最適化設定について検討する。我々は、文脈逆線形プログラミング(CILP)に注目し、LPの非微分性に起因する課題に対処する。線形予測モデルでは、CILPを凸実現可能性問題に還元し、交互プロジェクションのような標準アルゴリズムを使用する。結果として得られるcilpのアルゴリズムは、退化や補間のような追加の仮定なしに線形収束保証を備える。次に,polyak-lojasiewicz条件を満たす滑らかな凸損失に対して,cilpを経験的リスク最小化(erm)に還元する。この削減により、拡張性のある一階最適化手法を用いることで、凸設定における理論的保証を維持しながら、大きな非凸問題の解決が可能になる。最後に,合成問題と実世界の問題の両方に対するアプローチを実験的に検証し,既存手法と比較して性能が向上したことを示す。 Inverse optimization involves inferring unknown parameters of an optimization problem from known solutions, and is widely used in fields such as transportation, power systems and healthcare. We study the contextual inverse optimization setting that utilizes additional contextual information to better predict the unknown problem parameters. We focus on contextual inverse linear programming (CILP), addressing the challenges posed by the non-differentiable nature of LPs. For a linear prediction model, we reduce CILP to a convex feasibility problem allowing the use of standard algorithms such as alternating projections. The resulting algorithm for CILP is equipped with a linear convergence guarantee without additional assumptions such as degeneracy or interpolation. Next, we reduce CILP to empirical risk minimization (ERM) on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition. This reduction enables the use of scalable first-order optimization methods to solve large non-convex problems, while maintaining theoretical guarantees in the convex setting. Finally, we experimentally validate our approach on both synthetic and real-world problems, and demonstrate improved performance compared to existing methods.	翻訳日:2024-02-29 17:03:20 公開日:2024-02-27
# ConjNorm: 分布外検出のためのトラクタブル密度推定 ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection ( http://arxiv.org/abs/2402.17888v1 ) ライセンス: Link先を確認	Bo Peng, Yadan Luo, Yonggang Zhang, Yixuan Li, Zhen Fang	(参考訳) ポストホックアウト・オブ・ディストリビューション(OOD)検出は、信頼性の高い機械学習において大きな注目を集めている。ログ、距離、厳密なデータ分布の仮定に基づいてスコア関数を導出し、低スコアのOODサンプルを識別する。それでもこれらの推定値は、真のデータ密度を正確に反映したり、非現実的な制約を課すことに失敗する可能性がある。密度に基づくスコア設計の統一的な視点を提供するため, 分布の指数関数族を包含する分布の考察を拡張し, ブレグマン・ダイバージェンスに基づく新しい理論的枠組みを提案する。定理で明らかにされた共役制約を活用して、与えられたデータセットに対して最適なノルム係数$p$の探索として密度関数設計をフレーミングする「textsc{ConjNorm} 法」を導入する。正規化の計算課題を考慮し,モンテカルロを用いた重要サンプリング手法を用いて,分割関数の非バイアスで解析的に抽出可能な推定器を考案した。 OOD検出ベンチマークの広範な実験により、提案した \textsc{ConjNorm} が様々な OOD 検出設定において新しい最先端技術を確立し、CIFAR-100 と ImageNet-1K でそれぞれ 13.25$\%$ と 28.19$\%$ (FPR95) をそれぞれ上回ったことが実証された。 Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient $p$ against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25$\%$ and 28.19$\%$ (FPR95) on CIFAR-100 and ImageNet-1K, respectively.	翻訳日:2024-02-29 17:03:04 公開日:2024-02-27
# JMLR: 推論と専門的質問応答能力向上のための共同医療LLMと検索訓練 JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability ( http://arxiv.org/abs/2402.17887v1 ) ライセンス: Link先を確認	Junda Wang, Zhichao Yang, Zonghai Yao, Hong Yu	(参考訳) 医療データの爆発的な成長と人工知能技術の急速な発展により、精密医療は医療サービスの質と効率を高める鍵となった。この文脈では、大規模言語モデル(llm)は医学的知識獲得と質問応答システムにおいてますます重要な役割を担っている。医療領域におけるこれらのシステムの性能をさらに向上させるために,情報検索(ir)システムとllmを協調して微調整段階で訓練する革新的な手法を提案する。 JMLR(Joint Medical LLM and Retrieval Training)と呼ばれるこのアプローチは、医療質問応答タスクの処理において従来のモデルが直面する課題を克服するために設計されている。同期トレーニング機構を利用することで、JMLRは計算リソースの需要を減らし、推論や回答のための医療知識を活用するモデルの能力を高める。 JMLR-13B (81.2%, MedQAは61.3%, MedQAは61.3%, AMBOSSは76.4%, MedQAは60.3%) は従来の事前学習および微調整によるモデルより優れていた。同じ7Bスケールのモデルでは、JMLR-7B(68.7%、MedQAは51.7%)は、他の公開モデル(Meditron-7B: 50.1%、47.9%)よりも優れており、コスト(トレーニング時間:37時間、伝統的な手法:144時間)、効率、医療質問応答タスクにおける効率、有効性を証明している。本研究は,医療情報検索と質問応答システムにIRとLLMトレーニングを統合する大きな可能性を示す,医療のための新しい,効率的な知識向上ツールを提供する。 With the explosive growth of medical data and the rapid development of artificial intelligence technology, precision medicine has emerged as a key to enhancing the quality and efficiency of healthcare services. In this context, Large Language Models (LLMs) play an increasingly vital role in medical knowledge acquisition and question-answering systems. To further improve the performance of these systems in the medical domain, we introduce an innovative method that jointly trains an Information Retrieval (IR) system and an LLM during the fine-tuning phase. This approach, which we call Joint Medical LLM and Retrieval Training (JMLR), is designed to overcome the challenges faced by traditional models in handling medical question-answering tasks. By employing a synchronized training mechanism, JMLR reduces the demand for computational resources and enhances the model's ability to leverage medical knowledge for reasoning and answering questions. Our experimental results demonstrate that JMLR-13B (81.2% on Amboos, 61.3% on MedQA) outperforms models using conventional pre-training and fine-tuning Meditron-70B (76.4% on AMBOSS, 60.3% on MedQA). For models of the same 7B scale, JMLR-7B(68.7% on Amboos, 51.7% on MedQA) significantly outperforms other public models (Meditron-7B: 50.1%, 47.9%), proving its superiority in terms of cost (our training time: 37 hours, traditional method: 144 hours), efficiency, and effectiveness in medical question-answering tasks. Through this work, we provide a new and efficient knowledge enhancement tool for healthcare, demonstrating the great potential of integrating IR and LLM training in precision medical information retrieval and question-answering systems.	翻訳日:2024-02-29 17:02:32 公開日:2024-02-27
# 表データによる大規模言語モデル -- 調査 Large Language Models on Tabular Data -- A Survey ( http://arxiv.org/abs/2402.17944v1 ) ライセンス: Link先を確認	Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos	(参考訳) 大規模言語モデリングにおける近年のブレークスルーは、予測、表データ合成、質問応答、テーブル理解など、表データモデリングに関連する様々なタスクにおいて、彼らのアプリケーションの厳密な探索を促進する。各タスクは固有の課題と機会を提供する。しかし、現在、この研究領域における重要な技術、メトリクス、データセット、モデル、最適化アプローチを要約し比較する包括的なレビューが欠けている。この調査は、これらの領域における最近の進歩を集約し、使用するデータセット、メトリクス、方法論の詳細な調査と分類を提供することによって、このギャップに対処することを目的としている。既存の文献における強み、限界、未開拓領域、ギャップを識別し、このバイタルで急速に進化する分野における今後の研究方向についての洞察を提供する。関連するコードやデータセットの参照も提供する。この総合的なレビューを通じて、興味のある読者に関連する参照と洞察に富んだ視点を提供し、この分野の一般的な課題を効果的にナビゲートし解決するために必要なツールと知識を彼らに与えたいと思っています。 Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field. It also provides relevant code and datasets references. Through this comprehensive review, we hope to provide interested readers with pertinent references and insightful perspectives, empowering them with the necessary tools and knowledge to effectively navigate and address the prevailing challenges in the field.	翻訳日:2024-02-29 16:57:30 公開日:2024-02-27
# SoS密度推定と$\alpha$-divergencesを用いた逐次輸送マップ Sequential transport maps using SoS density estimation and $\alpha$-divergences ( http://arxiv.org/abs/2402.17943v1 ) ライセンス: Link先を確認	Benjamin Zanger, Tiangang Cui, Martin Schreiber, Olivier Zahm	(参考訳) 輸送ベース密度推定法は, 近似密度から試料を効率よく生成できるため, 関心が高まりつつある。 arxiv:2106.04170 arxiv:2303.02554 から提案されたシーケンシャルトランスポートマップフレームワークをさらに反転させ、knothe-rosenblatt (kr) マップの配列を構築した。これらのマップは、まず中程度の複雑さの中間密度を推定し、次に基準密度から予め計算された近似密度まで正確なkrマップを計算することによって構築される。本研究では,中間密度の近似に Sum-of-Squares (SoS) 密度と$\alpha$-divergences を用いることを検討した。 SoS密度と$\alpha$-divergenceを組み合わせることで、半定値プログラミングで効率的に解ける凸最適化問題が得られる。 $\alpha$-divergencesの主な利点は、非正規化密度での作業を可能にすることである。特に、逐次輸送写像の2つの新しい収束解析(三角形的不等式に基づくものと、非正規化密度に対する$\alpha$-divergencesの情報幾何学的性質に基づくもの)を提供する。中間密度の選択は, 方法の効率性にも重要である。テンパー密度(またはアニール密度)は最先端の密度であるが,試料からのみ得られる密度を近似できる拡散型中間密度を導入する。このような中間密度は、生成モデリングのための機械学習においてよく確立されている。最後に,高次元問題を扱うための異なる低次元写像(あるいは遅延写像)を提案し,その手法をベイズ推論問題や教師なし学習タスクなど,いくつかのベンチマークで数値的に示す。 Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $\alpha$-divergences for approximating the intermediate densities. Combining SoS densities with $\alpha$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $\alpha$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide two new convergence analyses of the sequential transport maps: one based on a triangle-like inequality and the second on information geometric properties of $\alpha$-divergences for unnormalizied densities. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose and try different low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on several benchmarks, including Bayesian inference problems and unsupervised learning task.	翻訳日:2024-02-29 16:57:10 公開日:2024-02-27
# EmMark: 組み込み量子化大規模言語モデルのIP保護のためのロバストな透かし EmMark: Robust Watermarks for IP Protection of Embedded Quantized Large Language Models ( http://arxiv.org/abs/2402.17938v1 ) ライセンス: Link先を確認	Ruisi Zhang, Farinaz Koushanfar	(参考訳) 本稿では,リソース制約されたエッジデバイス上に展開された組み込み大言語モデルの知的財産権(IP)を保護するための新しい透かしフレームワークであるEmMarkを紹介する。悪意のあるエンドユーザによるip盗難リスクに対処するため、emmarkでは、ウォーターマーク付きモデル重みをクエリし、挿入されたシグネチャにマッチさせることで、所有者が所有権を認証できるようにする。 EmMarkの斬新さは、戦略的なウォーターマーク重みパラメータの選択、堅牢性の向上、モデル品質の維持にある。 optおよびllama-2ファミリーのモデルの広範な概念実証評価は、emmarkの忠実性を示し、モデル性能保存によるウォーターマーク抽出に100%成功している。 EmMarkは、透かし除去と鍛造攻撃に対するレジリエンスも披露した。 This paper introduces EmMark,a novel watermarking framework for protecting the intellectual property (IP) of embedded large language models deployed on resource-constrained edge devices. To address the IP theft risks posed by malicious end-users, EmMark enables proprietors to authenticate ownership by querying the watermarked model weights and matching the inserted signatures. EmMark's novelty lies in its strategic watermark weight parameters selection, nsuring robustness and maintaining model quality. Extensive proof-of-concept evaluations of models from OPT and LLaMA-2 families demonstrate EmMark's fidelity, achieving 100% success in watermark extraction with model performance preservation. EmMark also showcased its resilience against watermark removal and forging attacks.	翻訳日:2024-02-29 16:56:33 公開日:2024-02-27
# マルチモーダル入力から言語知識を得る Acquiring Linguistic Knowledge from Multimodal Input ( http://arxiv.org/abs/2402.17936v1 ) ライセンス: Link先を確認	Theodor Amariucai, Alex Warstadt	(参考訳) 子どもとは対照的に、言語モデル(LM)は言語習得時のデータ効率が著しく劣っている。本稿では,BabyLM Challenge (Warstadt et al., 2023) への投稿において,このデータ効率ギャップは,典型的な言語モデルの学習環境におけるマルチモーダル入力の欠如と基礎化に起因するという仮説を検証した。これまでの研究では、マルチモーダルトレーニングは言語のみのパフォーマンスを損なう可能性があるが、キャプションデータの微調整によって複雑な言語を壊滅的に忘れてしまうことに起因していると推測されている。本仮説を検証するために,FLAVA (Singh et al., 2022) というマルチモーダル・ビジョン・アンド・ランゲージ・モデルを用いて,テキストと視覚入力のボリュームを独立に変化させて,異なるデータスケールでのビジョンによってどの程度のテキストデータがオフセットできるかを定量化する。我々は,ウィキペディアをベースとした比較的多様なデータセットであるWiT(Srinivasan et al., 2021)からサンプリングされたテキストのみのタスクとデータを含むマルチタスク事前学習システムを通じて,破滅的な忘れを抑えることを目的としている。マルチモーダル事前トレーニングは、私たちのモデルの言語性能に影響を与えませんが、一貫しては役に立ちません。とは言っても、私たちの結論は、少数の実行しかできなかったことによるものです。マルチモーダル入力は、LMと人間の間のデータ効率のギャップの一部を説明できる可能性を広げなければならないが、この仮説の肯定的な証拠は、マルチモーダルトレーニングのためのより良いアーキテクチャと技術を必要とするだろう。 In contrast to children, language models (LMs) exhibit considerably inferior data efficiency when acquiring language. In this submission to the BabyLM Challenge (Warstadt et al., 2023), we test the hypothesis that this data efficiency gap is partly caused by a lack of multimodal input and grounding in the learning environment of typical language models. Although previous work looking into this question found that multimodal training can even harm language-only performance, we speculate that these findings can be attributed to catastrophic forgetting of complex language due to fine-tuning on captions data. To test our hypothesis, we perform an ablation study on FLAVA (Singh et al., 2022), a multimodal vision-and-language model, independently varying the volume of text and vision input to quantify how much text data (if any) can be offset by vision at different data scales. We aim to limit catastrophic forgetting through a multitask pretraining regime that includes unimodal text-only tasks and data sampled from WiT, the relatively diverse Wikipedia-based dataset (Srinivasan et al., 2021). Our results are largely negative: Multimodal pretraining does not harm our models' language performance but does not consistently help either. That said, our conclusions are limited by our having been able to conduct only a small number of runs. While we must leave open the possibility that multimodal input explains some of the gap in data efficiency between LMs and humans, positive evidence for this hypothesis will require better architectures and techniques for multimodal training.	翻訳日:2024-02-29 16:56:16 公開日:2024-02-27
# 飽和低ランク混合を用いたマルチタスク多言語モデル適応 Multitask Multilingual Model Adaptation with Featurized Low-Rank Mixtures ( http://arxiv.org/abs/2402.17934v1 ) ライセンス: Link先を確認	Chu-Cheng Lin and Xinyi Wang and Jonathan H. Clark and Han Lu and Yun Zhu and Chenxi Whitehouse and Hongkun Yu	(参考訳) 事前訓練された大規模言語モデル(llm)を数十から数百の人間の言語で様々な下流タスクに適応させるのは計算コストがかかる。パラメータ効率のよい微調整(PEFT)は、少数のパラメータのみをチューニングすることで、適応コストを大幅に削減する。しかし,LoRA (Hu et al., 2022) などのPEFT法を多種多様なデータセットに直接適用すると,パラメータ容量の制限やデータセット間の負の干渉による最適以下の性能が向上する可能性がある。本研究では,マルチタスク多言語チューニングのための新しいPEFT手法であるFeaturized Low-rank Mixtures (FLix)を提案する。 FLixは、データセットの言語やタスクなど、それぞれのユニークなデータセット機能と、独自の低ランクの重み更新パラメータを関連付ける。各データセットに特有のパラメータを構成することで、FLixは多様なデータセットの混合を許容し、目に見えないデータセットをより一般化することができる。実験の結果,FLix は教師付き学習とゼロショット設定の両方において,異なる学習データ混合を用いた様々なタスクに対して,大幅な改善をもたらすことがわかった。 Adapting pretrained large language models (LLMs) to various downstream tasks in tens or hundreds of human languages is computationally expensive. Parameter-efficient fine-tuning (PEFT) significantly reduces the adaptation cost, by tuning only a small amount of parameters. However, directly applying PEFT methods such as LoRA (Hu et al., 2022) on diverse dataset mixtures could lead to suboptimal performance due to limited parameter capacity and negative interference among different datasets. In this work, we propose Featurized Low-rank Mixtures (FLix), a novel PEFT method designed for effective multitask multilingual tuning. FLix associates each unique dataset feature, such as the dataset's language or task, with its own low-rank weight update parameters. By composing feature-specific parameters for each dataset, FLix can accommodate diverse dataset mixtures and generalize better to unseen datasets. Our experiments show that FLix leads to significant improvements over a variety of tasks for both supervised learning and zero-shot settings using different training data mixtures.	翻訳日:2024-02-29 16:55:42 公開日:2024-02-27
# 協調言語誘導逆計画による実践的指導と目標支援 Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning ( http://arxiv.org/abs/2402.17930v1 ) ライセンス: Link先を確認	Tan Zhi-Xuan, Lance Ying, Vikash Mansinghka, Joshua B. Tenenbaum	(参考訳) 人々はしばしば、自分の行動や目標が意図を曖昧にすることを期待して、さらなる文脈なしに意味が曖昧である指示を与える。そのような指示に従う補助エージェントを、柔軟で文脈に敏感な方法でどうやって構築できるのか? 本稿では,実用的指導支援のためのベイジアンエージェントアーキテクチャであるclips(colleborative language-guided inverse plan search)を提案する。エージェントは, 協調プランナーとして人間をモデル化し, 補助者に対して共同計画を伝えるとともに, 行動や言語からの目標に対するマルチモーダルベイズ推定を行い, 大規模言語モデル(LLM)を用いて, 仮説的計画に基づく指導の可能性を評価する。この後続を前提として,我々のアシスタントは,目標達成コストの最小化を図り,不明瞭な指示を実践的に追従し,目標が不確実であっても効果的な支援を行う。本研究は,2つの協調計画領域(Doors, Keys & Gems, VirtualHome)において,CLIPSがGPT-4V, LLMをベースとしたリテラル命令, および不定型逆計画において, 精度と有用性の両方において有意に優れており, 推論と補助的判断とを密接に一致させた。 People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction following and goal assistance. Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant, then performs multimodal Bayesian inference over the human's goal from actions and language, using large language models (LLMs) to evaluate the likelihood of an instruction given a hypothesized plan. Given this posterior, our assistant acts to minimize expected goal achievement cost, enabling it to pragmatically follow ambiguous instructions and provide effective assistance even when uncertain about the goal. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome), finding that CLIPS significantly outperforms GPT-4V, LLM-based literal instruction following and unimodal inverse planning in both accuracy and helpfulness, while closely matching the inferences and assistive judgments provided by human raters.	翻訳日:2024-02-29 16:55:21 公開日:2024-02-27
# 統計的学習のための確率モデルと近似モデル Certain and Approximately Certain Models for Statistical Learning ( http://arxiv.org/abs/2402.17926v1 ) ライセンス: Link先を確認	Cheng Zhen, Nischal Aryal, Arash Termehchy, Amandeep Singh Chabada	(参考訳) 現実世界のデータはしばしば不完全であり、値が不足している。実世界のデータセット上で正確なモデルをトレーニングするには、ユーザーは膨大な時間とリソースを投入し、欠落したデータアイテムの適切な値を見つける必要がある。本稿では,特定のトレーニングデータや対象モデルに対して,不足値を持つデータから直接正確なモデルを学習できることを実証する。本稿では,様々な機械学習パラダイムにまたがって,正確なモデルを学ぶためのデータインプテーションの必要性をチェックするための統一的アプローチを提案する。この必要性を理論的に保証した効率的なアルゴリズムを構築し、インプテーションが不要な場合に正確なモデルを返す。実験の結果,提案アルゴリズムは計算オーバーヘッドを伴わずに,データ計算に要する時間と労力を大幅に削減できることがわかった。 Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In this paper, we demonstrate that it is possible to learn accurate models directly from data with missing values for certain training data and target models. We propose a unified approach for checking the necessity of data imputation to learn accurate models across various widely-used machine learning paradigms. We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary. Our extensive experiments indicate that our proposed algorithms significantly reduce the amount of time and effort needed for data imputation without imposing considerable computational overhead.	翻訳日:2024-02-29 16:54:55 公開日:2024-02-27
# 量子カーネル推定のための線形フォトニックスワップ試験回路 A linear photonic swap test circuit for quantum kernel estimation ( http://arxiv.org/abs/2402.17923v1 ) ライセンス: Link先を確認	Alessio Baldazzi, Nicol\`o Leone, Matteo Sanna, Stefano Azzini and Lorenzo Pavesi	(参考訳) 教師付き学習モデルの中で、Support Vector Machineは、データクラスタを分類する最も堅牢で効率的なモデルのひとつである。この手法のコアでは、カーネル関数を用いてデータセットの異なる要素間の距離を計算し、それらの分類を可能にする。すべてのカーネル関数はスカラー積として表現できるので、確率振幅とスカラー積が基本対象である量子力学を用いて推定することができる。スワップテストは、2つの任意の波動関数のスカラー積を計算できる量子アルゴリズムであり、量子スピードアップを可能にする可能性がある。本稿では,スワップテストアルゴリズムを実装した集積フォトニック回路を提案する。我々のアプローチは、一組の導波路を伝搬する減衰レーザービームからの単一光子によって表される線形光集積成分とquditのみに依存する。 2$^3$の空間自由度をquditsに利用することにより、任意の2量子ビット状態を設定しスワップテストを実行するために必要なアレンジを全て設定できる。これは回路要素の要求を単純化し、マルチキュービットゲートを達成するために非線形性、ヘラルド、ポスト選択の必要性をなくす。我々のフォトニックスワップ試験回路は、2つの量子ビットを符号化し、そのスカラー積を0.05以下の根平均二乗誤差で推定する。この結果は、室温で動作している堅牢なデバイスで量子機械学習タスクを実行できる統合フォトニックアーキテクチャの開発の道を開くものである。 Among supervised learning models, Support Vector Machine stands out as one of the most robust and efficient models for classifying data clusters. At the core of this method, a kernel function is employed to calculate the distance between different elements of the dataset, allowing for their classification. Since every kernel function can be expressed as a scalar product, we can estimate it using Quantum Mechanics, where probability amplitudes and scalar products are fundamental objects. The swap test, indeed, is a quantum algorithm capable of computing the scalar product of two arbitrary wavefunctions, potentially enabling a quantum speed-up. Here, we present an integrated photonic circuit designed to implement the swap test algorithm. Our approach relies solely on linear optical integrated components and qudits, represented by single photons from an attenuated laser beam propagating through a set of waveguides. By utilizing 2$^3$ spatial degrees of freedom for the qudits, we can configure all the necessary arrangements to set any two-qubits state and perform the swap test. This simplifies the requirements on the circuitry elements and eliminates the need for non-linearity, heralding, or post-selection to achieve multi-qubits gates. Our photonic swap test circuit successfully encodes two qubits and estimates their scalar product with a measured root mean square error smaller than 0.05. This result paves the way for the development of integrated photonic architectures capable of performing Quantum Machine Learning tasks with robust devices operating at room temperature.	翻訳日:2024-02-29 16:54:43 公開日:2024-02-27
# 2段階量子推定と量子強調透過センシングの漸近 Two-stage Quantum Estimation and the Asymptotics of Quantum-enhanced Transmittance Sensing ( http://arxiv.org/abs/2402.17922v1 ) ライセンス: Link先を確認	Zihao Gong and Boulat A. Bash	(参考訳) 量子クラム・ラオ境界(Quantum Cram\'er-Rao bound)は、量子状態に埋め込まれた未知のパラメータの偏りのない推定に対する平均2乗誤差の極限である。多数の量子状態コピーに対して漸近的に達成できるが、要求される測定は、しばしば関心のパラメータの真の値に依存する。このパラドックスは2005年に林と松本が2段階アプローチで解決した。残念ながら、それらの分析は量子測定結果に適用される古典的推定器のクラスを厳しく制限する条件を課し、この方法の適用を妨げる。これらの条件を緩和し、2段階法の漸近特性をわずかに弱めるコストで使用可能な推定器のクラスを大幅に拡張する。本研究の結果を応用して,量子強調透過センシングの漸近性を得る。 Quantum Cram\'er-Rao bound is the ultimate limit of the mean squared error for unbiased estimation of an unknown parameter embedded in a quantum state. While it can be achieved asymptotically for large number of quantum state copies, the measurement required often depends on the true value of the parameter of interest. This paradox was addressed by Hayashi and Matsumoto using a two-stage approach in 2005. Unfortunately, their analysis imposes conditions that severely restrict the class of classical estimators applied to the quantum measurement outcomes, hindering applications of this method. We relax these conditions to substantially broaden the class of usable estimators at the cost of slightly weakening the asymptotic properties of the two-stage method. We apply our results to obtain the asymptotics of quantum-enhanced transmittance sensing.	翻訳日:2024-02-29 16:54:13 公開日:2024-02-27
# シーカーのジレンマ:ハードウェアトロイの木馬検出のためのリアルな定式化とベンチマーク The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection ( http://arxiv.org/abs/2402.17918v1 ) ライセンス: Link先を確認	Amin Sarihi, Ahmad Patooghy, Abdel-Hameed A. Badawy, Peter Jamieson	(参考訳) 本研究は,ハードウェアトロイジャン検出(HT)の現実的問題を明確に定義することにより,ハードウェア設計分野におけるセキュリティ研究の進展に焦点を当てる。問題は「探索者のジレンマ」(グラフ上のhid&seekの拡張)として表現され、検出エージェントはhtsによって回路が感染しているかどうかを知らない。この理論的問題定式化を用いて,HTフリー回路とHT感染回路を混合したベンチマークを作成した。再構成された回路はhtsによってランダムに感染し、回路が感染したかどうかディフェンダーが不確かである。当社の革新的なデータセットは,回路分類の成功率を比較することで,さまざまな方法の検出品質をコミュニティが判断する上で有効だと考えています。開発したベンチマークを用いて3つの最先端HT検出ツールを評価し,提案手法のベースライン結果を示す。我々はベンチマークの強度を評価するために主成分分析を用い、再構成されたHT感染回路の一部がHTフリー回路に密にマッピングされていることを観察し、検出器によるラベルのかなりの誤分類をもたらす。 This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as "The Seeker's Dilemma" (an extension of Hide&Seek on a graph), where a detecting agent is unaware of whether circuits are infected by HTs or not. Using this theoretical problem formulation, we create a benchmark that consists of a mixture of HT-free and HT-infected restructured circuits while preserving their original functionalities. The restructured circuits are randomly infected by HTs, causing a situation where the defender is uncertain if a circuit is infected or not. We believe that our innovative dataset will help the community better judge the detection quality of different methods by comparing their success rates in circuit classification. We use our developed benchmark to evaluate three state-of-the-art HT detection tools to show baseline results for this approach. We use Principal Component Analysis to assess the strength of our benchmark, where we observe that some restructured HT-infected circuits are mapped closely to HT-free circuits, leading to significant label misclassification by detectors.	翻訳日:2024-02-29 16:53:52 公開日:2024-02-27
# 日常的に収集された多変量icu生理信号における共通潜在表現の協調学習 Collaborative learning of common latent representations in routinely collected multivariate ICU physiological signals ( http://arxiv.org/abs/2402.17917v1 ) ライセンス: Link先を確認	Hollan Haule, Ian Piper, Patricia Jones, Tsz-Yan Milly Lo, Javier Escudero	(参考訳) Intensive Care Units (ICU) では、多変量時系列が豊富であることは、機械学習(ML)が患者の表現力を高める機会となる。電子健康記録(EHR)に着目した以前の研究とは対照的に,日常的に収集された生理的時系列データを用いた表現のMLアプローチを提案する。我々の新しいアルゴリズムは、Long Short-Term Memory(LSTM)ネットワークと協調フィルタリングの概念を統合し、患者間で共通の生理状態を特定する。脳損傷者における脳内高血圧(IH)検出のための実世界ICU臨床データを用いて,AUCが0.889,APが0.725であった。さらに,本アルゴリズムは,生理的信号のより構造化された潜在表現の学習において,オートエンコーダよりも優れる。これらの知見は, 日常的に収集した多変量時系列を活用し, 患者表現型化の方法論が期待できることを浮き彫りにした。 In Intensive Care Units (ICU), the abundance of multivariate time series presents an opportunity for machine learning (ML) to enhance patient phenotyping. In contrast to previous research focused on electronic health records (EHR), here we propose an ML approach for phenotyping using routinely collected physiological time series data. Our new algorithm integrates Long Short-Term Memory (LSTM) networks with collaborative filtering concepts to identify common physiological states across patients. Tested on real-world ICU clinical data for intracranial hypertension (IH) detection in patients with brain injury, our method achieved an area under the curve (AUC) of 0.889 and average precision (AP) of 0.725. Moreover, our algorithm outperforms autoencoders in learning more structured latent representations of the physiological signals. These findings highlight the promise of our methodology for patient phenotyping, leveraging routinely collected multivariate time series to improve clinical care practices.	翻訳日:2024-02-29 16:53:11 公開日:2024-02-27
# 逆攻撃による LLM-Resistant Math Word Problem 生成 LLM-Resistant Math Word Problem Generation via Adversarial Attacks ( http://arxiv.org/abs/2402.17916v1 ) ライセンス: Link先を確認	Roy Xie, Chengxuan Huang, Junlin Wang, Bhuwan Dhingra	(参考訳) 大型言語モデル(LLM)は教育の景観を大きく変えた。現在の盗作検出ツールは、LLMの急速な進歩に追随するために苦労しているため、教育コミュニティは、LLMの存在下での生徒の真の問題解決能力を評価するという課題に直面している。本研究は,評価対象の質問の構造と難易度を保ちつつも,LLMでは解決できないような,公正な評価を保証するための新たなパラダイムを探求する。数学用語問題の領域に着目し,抽象構文木を用いて,問題内の数値を単純に編集することによって,llmが不正確な回答を生じさせる敵意的な例を生成する。我々は様々なオープン・クローズド・ソース LLM の実験を行い、定量的かつ質的に、我々の手法が数学の問題解決能力を著しく低下させることを示した。 LLM間で共有脆弱性を識別し,高コストモデルに対するコスト効率の高いアプローチを提案する。さらに, 数学問題の自動解析を行い, LLMの数学的能力に関する今後の研究を導くのに失敗の原因について検討する。 Large language models (LLMs) have significantly transformed the educational landscape. As current plagiarism detection tools struggle to keep pace with LLMs' rapid advancements, the educational community faces the challenge of assessing students' true problem-solving abilities in the presence of LLMs. In this work, we explore a new paradigm for ensuring fair evaluation -- generating adversarial examples which preserve the structure and difficulty of the original questions aimed for assessment, but are unsolvable by LLMs. Focusing on the domain of math word problems, we leverage abstract syntax trees to structurally generate adversarial examples that cause LLMs to produce incorrect answers by simply editing the numeric values in the problems. We conduct experiments on various open- and closed-source LLMs, quantitatively and qualitatively demonstrating that our method significantly degrades their math problem-solving ability. We identify shared vulnerabilities among LLMs and propose a cost-effective approach to attack high-cost models. Additionally, we conduct automatic analysis on math problems and investigate the cause of failure to guide future research on LLM's mathematical capability.	翻訳日:2024-02-29 16:52:39 公開日:2024-02-27
# 解釈可能な方言分類器による方言の語彙的特徴の抽出 Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers ( http://arxiv.org/abs/2402.17914v1 ) ライセンス: Link先を確認	Roy Xie, Orevaoghene Ahia, Yulia Tsvetkov, Antonios Anastasopoulos	(参考訳) 言語の方言間の言語的差異を特定するには、しばしば専門家の知識と細心の注意深い人間分析が必要である。これは、様々な方言の研究に関わる複雑さとニュアンスが原因である。本稿では,人間がいなくても解釈可能な方言分類器を用いて,方言の語彙特徴を識別する新しい手法を提案する。本研究は,マンダリン,イタリア語,低サクソン語について実験を行い,方言変化に寄与する言語固有の語彙特徴の同定に成功していることを実験的に証明した。 Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis. This is largely due to the complexity and nuance involved in studying various dialects. We present a novel approach to extract distinguishing lexical features of dialects by utilizing interpretable dialect classifiers, even in the absence of human experts. We explore both post-hoc and intrinsic approaches to interpretability, conduct experiments on Mandarin, Italian, and Low Saxon, and experimentally demonstrate that our method successfully identifies key language-specific lexical features that contribute to dialectal variations.	翻訳日:2024-02-29 16:52:07 公開日:2024-02-27
# feduv:不均質なフェデレーション学習のための一様性と分散 FedUV: Uniformity and Variance for Heterogeneous Federated Learning ( http://arxiv.org/abs/2402.18372v1 ) ライセンス: Link先を確認	Ha Min Son, Moon Hyun Kim, Tai-Myoung Chung, Chao Huang, Xin Liu	(参考訳) フェデレーション学習は、広く分散されたデータでニューラルネットワークをトレーニングするための有望なフレームワークである。しかし、性能は異種分散データで大きく劣化する。最近の研究によると、これはネットワークの最終層が局所バイアスの最もやすいためであり、一部は直交分類器として最終層を凍結させることに成功したためである。凍結重量が一定の特異値をもたらすという観測によって動機付けられた重みにSVDを適用して分類器の訓練力学を考察する。 IIDと非IID設定でのトレーニングには違いがあることがわかった。この結果に基づき,(1)分類器の次元的確率分布のばらつき,(2)エンコーダの表現の超球面的一様性,という,iid設定を連続的にエミュレートするための局所学習のための2つの正規化項を導入する。これらの正規化は、ローカルなデータ分布に関わらず、ローカルなモデルがIDD設定であるかのように振る舞うように促すため、データに柔軟でありながらバイアスの傾向を相殺する。ラベルシフト設定と機能シフト設定の両方で広範な実験を行った結果,大規模モデルやデータセットにスケーラブルであることに加えて,特に非iidケースでは高いマージンで高い性能が得られることを確認した。 Federated learning is a promising framework to train neural networks with widely distributed data. However, performance degrades heavily with heterogeneously distributed data. Recent work has shown this is due to the final layer of the network being most prone to local bias, some finding success freezing the final layer as an orthogonal classifier. We investigate the training dynamics of the classifier by applying SVD to the weights motivated by the observation that freezing weights results in constant singular values. We find that there are differences when training in IID and non-IID settings. Based on this finding, we introduce two regularization terms for local training to continuously emulate IID settings: (1) variance in the dimension-wise probability distribution of the classifier and (2) hyperspherical uniformity of representations of the encoder. These regularizations promote local models to act as if it were in an IID setting regardless of the local data distribution, thus offsetting proneness to bias while being flexible to the data. On extensive experiments in both label-shift and feature-shift settings, we verify that our method achieves highest performance by a large margin especially in highly non-IID cases in addition to being scalable to larger models and datasets.	翻訳日:2024-02-29 14:46:12 公開日:2024-02-27
# 逆例スープ:複数の逆例の平均化は、生成時間を増やすことなく転送性を改善する Adversarial example soups: averaging multiple adversarial examples improves transferability without increasing additional generation time ( http://arxiv.org/abs/2402.18370v1 ) ライセンス: Link先を確認	Bo Yang, Hengwei Zhang, Chenwei Li, Jindong Wang	(参考訳) 転送ベースの攻撃の場合、攻撃例がsurrogateモデルに基づいて作成され、ターゲットモデルを効果的に誤解させるために実装される。相反伝達可能性の最大化には、(1)代替モデル上で相反例の複数のバッチを生成するためのハイパーパラメータの微調整、(2)代替モデルおよび対象モデルにおいて最も包括的な性能を有する相反例のバッチの保存、および他のモデルを捨てることが含まれる。本研究では, このプロセスの第2段階を, 微調整ハイパーパラメータの文脈で再検討し, 複数バッチの細調整逆数例が単一高誤差の丘頂にしばしば現れるような逆数例を創出する。異なるパラメータ構成で複数の対向例を平均化する「逆向例スープ(adversarial example soups)」は、しばしば対向移動性を高めることを実証する。従来の手法と比較して,提案手法は生成時間と計算コストを加算しない。また,本手法は既存のトランスファーベース手法と直交し,シームレスに組み合わせてより転送可能な逆例を生成することができる。 ImageNetデータセットの大規模な実験により、我々の手法は最先端の攻撃よりも高い攻撃成功率を達成することが示された。 For transfer-based attacks, the adversarial examples are crafted on the surrogate model, which can be implemented to mislead the target model effectively. The conventional method for maximizing adversarial transferability involves: (1) fine-tuning hyperparameters to generate multiple batches of adversarial examples on the substitute model; (2) conserving the batch of adversarial examples that have the best comprehensive performance on substitute model and target model, and discarding the others. In this work, we revisit the second step of this process in the context of fine-tuning hyperparameters to craft adversarial examples, where multiple batches of fine-tuned adversarial examples often appear in a single high error hilltop. We demonstrate that averaging multiple batches of adversarial examples under different hyperparameter configurations, which refers to as "adversarial example soups", can often enhance adversarial transferability. Compared with traditional methods, the proposed method incurs no additional generation time and computational cost. Besides, our method is orthogonal to existing transfer-based methods and can be combined with them seamlessly to generate more transferable adversarial examples. Extensive experiments on the ImageNet dataset show that our methods achieve a higher attack success rate than the state-of-the-art attacks.	翻訳日:2024-02-29 14:45:48 公開日:2024-02-27
# 微分プログラミングによるSGP4と高精度伝播のギャップの解消 Closing the Gap Between SGP4 and High-Precision Propagation via Differentiable Programming ( http://arxiv.org/abs/2402.04830v3 ) ライセンス: Link先を確認	Giacomo Acciarini, At{\i}l{\i}m G\"une\c{s} Baydin, Dario Izzo	(参考訳) SGP4(Simplified General Perturbations 4)軌道伝搬法は、地球周回物体の位置と速度を迅速かつ確実に予測するために広く用いられている。連続的な改良にもかかわらず、SGPモデルは数値プロパゲータの精度に欠けており、誤差は大幅に小さい。本研究では、PyTorchを用いて実装されたSGP4の新しい微分可能バージョンであるdSGP4を提案する。 SGP4を微分可能にすることで、dSGP4は、宇宙船の軌道決定、状態変換、共分散変換、状態遷移行列計算、共分散伝播など、様々な宇宙関連の応用を促進する。さらに、dsgp4のpytorch実装は、2ライン要素セット(tles)のバッチをまたいだ恥ずかしいほど並列な軌道伝播を可能にし、将来の衛星位置の分散予測にcpu、gpu、高度なハードウェアの計算能力を活用する。さらに、dSGP4の微分性は、現代の機械学習技術との統合を可能にする。そこで我々は,ニューラルネットを軌道伝搬器に統合した新しい軌道伝搬パラダイムML-dSGP4を提案する。確率勾配降下により、この合成モデルの入力、出力、パラメータは反復的に洗練され、SGP4の精度を超える。ニューラルネットワークはデフォルトでアイデンティティ演算子として機能し、SGP4の振舞いに固執する。しかし、dSGP4の微分性は、エフェメリスデータによる微調整を可能にし、計算速度を維持しながら精度を向上させる。これにより、衛星オペレーターや研究者は、特定のエフェミリや高精度数値伝播データを用いてモデルを訓練し、軌道予測能力を大幅に向上させることができる。 The Simplified General Perturbations 4 (SGP4) orbital propagation method is widely used for predicting the positions and velocities of Earth-orbiting objects rapidly and reliably. Despite continuous refinement, SGP models still lack the precision of numerical propagators, which offer significantly smaller errors. This study presents dSGP4, a novel differentiable version of SGP4 implemented using PyTorch. By making SGP4 differentiable, dSGP4 facilitates various space-related applications, including spacecraft orbit determination, state conversion, covariance transformation, state transition matrix computation, and covariance propagation. Additionally, dSGP4's PyTorch implementation allows for embarrassingly parallel orbital propagation across batches of Two-Line Element Sets (TLEs), leveraging the computational power of CPUs, GPUs, and advanced hardware for distributed prediction of satellite positions at future times. Furthermore, dSGP4's differentiability enables integration with modern machine learning techniques. Thus, we propose a novel orbital propagation paradigm, ML-dSGP4, where neural networks are integrated into the orbital propagator. Through stochastic gradient descent, this combined model's inputs, outputs, and parameters can be iteratively refined, surpassing SGP4's precision. Neural networks act as identity operators by default, adhering to SGP4's behavior. However, dSGP4's differentiability allows fine-tuning with ephemeris data, enhancing precision while maintaining computational speed. This empowers satellite operators and researchers to train the model using specific ephemeris or high-precision numerical propagation data, significantly advancing orbital prediction capabilities.	翻訳日:2024-02-29 12:05:50 公開日:2024-02-27
# 異常検出のための注意-GAN:サイバーセキュリティ脅威管理へのカット-エッジアプローチ Attention-GAN for Anomaly Detection: A Cutting-Edge Approach to Cybersecurity Threat Management ( http://arxiv.org/abs/2402.15945v2 ) ライセンス: Link先を確認	Mohammed Abo Sen	(参考訳) 本稿では,異常検出に焦点をあてた,サイバーセキュリティ向上のための革新的な注意-GANフレームワークを提案する。サイバー脅威の絶え間なく進化する性質から生じる課題に対応するため、提案手法は多様な現実的な合成攻撃シナリオを生成し、データセットを充実させ、脅威識別を改善することを目的としている。 GAN(Generative Adversarial Networks)と注意機構を統合することが提案手法の重要な特徴である。注意機構は、微妙で複雑な攻撃パターンを検出するのに不可欠な、関連する特徴にフォーカスするモデルの能力を強化する。さらに、GANは、既知の脅威と出現する脅威を含む、追加のさまざまな攻撃データを生成することによって、データの不足の問題に対処する。この二重アプローチは、システムは継続的に進化するサイバー攻撃に対して関連性を持ち、効果的であることを保証する。 kdd cupとcicids2017データセットは、このモデルの検証に使用され、異常検出を大幅に改善した。 kddデータセットでは99.69%、cicids2017データセットでは97.93%の精度を達成し、精度、リコール、f1-scoreは97%以上となり、複雑な攻撃パターンの認識に有効性を示している。本研究は,高度でダイナミックなサイバー脅威に直面した異常検出のためのスケーラブルで適応可能なソリューションを提供することで,サイバーセキュリティに大きく貢献する。データ拡張のためのGANの探索は、特にデータ制限がサイバーセキュリティシステムの開発を制限する状況において、将来の研究にとって有望な方向を示す。 attention-ganフレームワークは先駆的なアプローチとして登場し、高度なサイバー防衛戦略の新しいベンチマークを設定した。 This paper proposes an innovative Attention-GAN framework for enhancing cybersecurity, focusing on anomaly detection. In response to the challenges posed by the constantly evolving nature of cyber threats, the proposed approach aims to generate diverse and realistic synthetic attack scenarios, thereby enriching the dataset and improving threat identification. Integrating attention mechanisms with Generative Adversarial Networks (GANs) is a key feature of the proposed method. The attention mechanism enhances the model's ability to focus on relevant features, essential for detecting subtle and complex attack patterns. In addition, GANs address the issue of data scarcity by generating additional varied attack data, encompassing known and emerging threats. This dual approach ensures that the system remains relevant and effective against the continuously evolving cyberattacks. The KDD Cup and CICIDS2017 datasets were used to validate this model, which exhibited significant improvements in anomaly detection. It achieved an accuracy of 99.69% on the KDD dataset and 97.93% on the CICIDS2017 dataset, with precision, recall, and F1-scores above 97%, demonstrating its effectiveness in recognizing complex attack patterns. This study contributes significantly to cybersecurity by providing a scalable and adaptable solution for anomaly detection in the face of sophisticated and dynamic cyber threats. The exploration of GANs for data augmentation highlights a promising direction for future research, particularly in situations where data limitations restrict the development of cybersecurity systems. The attention-GAN framework has emerged as a pioneering approach, setting a new benchmark for advanced cyber-defense strategies.	翻訳日:2024-02-29 11:57:11 公開日:2024-02-27
# 多要素攻撃に対する最適ゼロショット検出器 Optimal Zero-Shot Detector for Multi-Armed Attacks ( http://arxiv.org/abs/2402.15808v2 ) ライセンス: Link先を確認	Federica Granese, Marco Romanelli, Pablo Piantanida	(参考訳) 本稿では、悪意あるアクターがマルチアーム攻撃戦略を用いてデータサンプルを操作し、データセットにノイズを導入する様々な方法を提案する。私たちの中心的な目的は、入力の変更を検出することでデータを保護することです。我々は、攻撃者に比べて情報が少ない環境で、この防御戦略に最大限の注意を払ってアプローチする。具体的には、ディフェンダーは防衛モデルをトレーニングしたり、チャンネルの完全性を検証するためにデータサンプルを利用できない。代わりに、ディフェンダーは「棚から」容易に入手できる既存の検出器のセットにのみ依存する。この課題に対処するために、これらの検出器による決定を最適に集約する革新的な情報理論の防衛アプローチを導き、いかなるトレーニングデータも不要にする。我々はさらに,攻撃者が事前訓練された分類器を持ち,知名度の高い攻撃を仕掛ける,経験的評価のための実用的なユースケースシナリオについて検討する。実験では,最適設定から逸脱したシナリオにおいても,提案手法の有効性を強調した。 This paper explores a scenario in which a malicious actor employs a multi-armed attack strategy to manipulate data samples, offering them various avenues to introduce noise into the dataset. Our central objective is to protect the data by detecting any alterations to the input. We approach this defensive strategy with utmost caution, operating in an environment where the defender possesses significantly less information compared to the attacker. Specifically, the defender is unable to utilize any data samples for training a defense model or verifying the integrity of the channel. Instead, the defender relies exclusively on a set of pre-existing detectors readily available "off the shelf". To tackle this challenge, we derive an innovative information-theoretic defense approach that optimally aggregates the decisions made by these detectors, eliminating the need for any training data. We further explore a practical use-case scenario for empirical evaluation, where the attacker possesses a pre-trained classifier and launches well-known adversarial attacks against it. Our experiments highlight the effectiveness of our proposed solution, even in scenarios that deviate from the optimal setup.	翻訳日:2024-02-29 11:56:23 公開日:2024-02-27
# 機械学習による未知多様体上のpdes解法 Solving PDEs on Unknown Manifolds with Machine Learning ( http://arxiv.org/abs/2106.06682v4 ) ライセンス: Link先を確認	Senwei Liang and Shixiao W. Jiang and John Harlim and Haizhao Yang	(参考訳) 本稿では,拡散マップ(DM)とディープラーニングに基づいて,点雲と同一視される未知多様体上の楕円型PDEを解くためのメッシュフリー計算フレームワークと機械学習理論を提案する。 PDEソルバは、PDEを近似する代数方程式を課す最小二乗回帰問題を解くための教師付き学習タスクとして定式化される。この代数方程式は、二階楕円微分作用素の一貫した推定器であるDM漸近展開によって得られるグラフ-ラプラシア型行列を含む。その結果,ニューラルネットワーク(NN)の仮説空間から解を導いた,非凸な経験的リスク最小化問題の解法が得られた。十分に仮定された楕円型pde設定では、仮説空間が無限幅または深さのニューラルネットワークからなるとき、経験的損失関数の大域的最小化は大きなトレーニングデータの限界における一貫した解であることを示す。仮説空間が2層ニューラルネットワークである場合、十分に広い幅に対して、勾配降下は経験的損失関数の大域的最小化を識別できることを示す。数値的な例を支持することは、解の収束を示し、低次元かつ高次元の単純多様体から境界のない粗曲面までである。また,提案したNNソルバは,Nystromを用いた補間法に代えて,トレーニングエラーとほぼ同一の一般化誤差を持つ新たなデータポイント上でPDE解を強固に一般化できることを示す。 This paper proposes a mesh-free computational framework and machine learning theory for solving elliptic PDEs on unknown manifolds, identified with point clouds, based on diffusion maps (DM) and deep learning. The PDE solver is formulated as a supervised learning task to solve a least-squares regression problem that imposes an algebraic equation approximating a PDE (and boundary conditions if applicable). This algebraic equation involves a graph-Laplacian type matrix obtained via DM asymptotic expansion, which is a consistent estimator of second-order elliptic differential operators. The resulting numerical method is to solve a highly non-convex empirical risk minimization problem subjected to a solution from a hypothesis space of neural networks (NNs). In a well-posed elliptic PDE setting, when the hypothesis space consists of neural networks with either infinite width or depth, we show that the global minimizer of the empirical loss function is a consistent solution in the limit of large training data. When the hypothesis space is a two-layer neural network, we show that for a sufficiently large width, gradient descent can identify a global minimizer of the empirical loss function. Supporting numerical examples demonstrate the convergence of the solutions, ranging from simple manifolds with low and high co-dimensions, to rough surfaces with and without boundaries. We also show that the proposed NN solver can robustly generalize the PDE solution on new data points with generalization errors that are almost identical to the training errors, superseding a Nystrom-based interpolation method.	翻訳日:2024-02-29 01:33:27 公開日:2024-02-27
# OneLog: ソフトウェアログ異常検出におけるエンドツーエンドトレーニングを目指して OneLog: Towards End-to-End Training in Software Log Anomaly Detection ( http://arxiv.org/abs/2104.07324v2 ) ライセンス: Link先を確認	Shayan Hashemi, Mika M\"antyl\"a	(参考訳) オンラインサービス、IoTデバイス、DevOps指向ソフトウェア開発の成長に伴い、ソフトウェアログの異常検出がますます重要になっている。従来の4段階アーキテクチャ(Preprocessor、Parser、Vectorizer、Classifier)に主に従っている。本論文では,複数のコンポーネントの代わりに1つのディープニューラルネットワーク(DNN)を利用するOneLogを提案する。 OneLogは、CNN(Convolutional Neural Networks)を文字レベルで利用して、従来の作業で削除された数字、数字、句読点を、主要な自然言語テキストとともに考慮している。このアプローチを,HDFS,Hadoop,BGL,Thunderbird,Spirit,Libertyの6つのメッセージおよびシーケンスベースのデータセットで評価する。私たちはonelogを、シングル、マルチ、クロスプロジェクトのセットアップで実験します。 Onelogは私たちのデータセットで最先端のパフォーマンスを提供します。 onelogはトレーニング中に複数のプロジェクトデータセットを同時に利用することができます。マルチプロジェクトトレーニングは、個々のプロジェクトで限られたトレーニングデータが利用できる場合に理想的なOnelogのパフォーマンスも改善する。また,一対のプロジェクト(自由とスピリット)で,プロジェクト横断異常検出が可能であった。モデル内部の分析は、1つのログに複数の異常検出モードがあり、モデルが手動でログメッセージのパースルールを学習することを示している。文字ベースのCNNは、ログ異常検出におけるエンドツーエンド学習への有望なアプローチである。複数のデータセットに対して優れたパフォーマンスと一般化を提供する。この論文が受け入れられ次第、私たちのスクリプトを公開します。 With the growth of online services, IoT devices, and DevOps-oriented software development, software log anomaly detection is becoming increasingly important. Prior works mainly follow a traditional four-staged architecture (Preprocessor, Parser, Vectorizer, and Classifier). This paper proposes OneLog, which utilizes a single Deep Neural Network (DNN) instead of multiple separate components. OneLog harnesses Convolutional Neural Networks (CNN) at the character level to take digits, numbers, and punctuations, which were removed in prior works, into account alongside the main natural language text. We evaluate our approach in six message- and sequence-based data sets: HDFS, Hadoop, BGL, Thunderbird, Spirit, and Liberty. We experiment with Onelog with single-, multi-, and cross-project setups. Onelog offers state-of-the-art performance in our datasets. Onelog can utilize multi-project datasets simultaneously during training, which suggests our model can generalize between datasets. Multi-project training also improves Onelog performance making it ideal when limited training data is available for an individual project. We also found that cross-project anomaly detection is possible with a single project pair (Liberty and Spirit). Analysis of model internals shows that one log has multiple modes of detecting anomalies and that the model learns manually validated parsing rules for the log messages. We conclude that character-based CNNs are a promising approach toward end-to-end learning in log anomaly detection. They offer good performance and generalization over multiple datasets. We will make our scripts publicly available upon the acceptance of this paper.	翻訳日:2024-02-29 01:32:04 公開日:2024-02-27
# DoubleML - Rにおけるダブル機械学習のオブジェクト指向実装 DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R ( http://arxiv.org/abs/2103.09603v5 ) ライセンス: Link先を確認	Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen	(参考訳) RパッケージDoubleMLはChernozhukovらのダブル/デバイアスの機械学習フレームワークを実装している(2018年)。機械学習手法に基づいて因果モデルのパラメータを推定する機能を提供する。 double machine learningフレームワークは、ニーマン直交性、高品質な機械学習推定、サンプル分割という3つの主要な要素で構成されている。迷惑コンポーネントの推定は、mlr3エコシステムで利用可能なさまざまな最先端機械学習手法によって行うことができる。 DoubleMLは、部分的に線形でインタラクティブな回帰モデルや、機器変数推定の拡張を含む、さまざまな因果モデルで推論を行うことができる。 DoubleMLのオブジェクト指向実装は、モデル仕様の柔軟性を高め、容易に拡張できるようにする。本稿では、ダブル機械学習フレームワークとRパッケージDoubleMLについて紹介する。シミュレーションおよび実データを用いた再現可能なコード例では,doublemlユーザが機械学習手法に基づいて有効な推論を行う方法を示す。 The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.	翻訳日:2024-02-29 01:31:41 公開日:2024-02-27
# 非可換多項式最適化問題としての線形力学系の学習 Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimization Problem ( http://arxiv.org/abs/2002.01444v6 ) ライセンス: Link先を確認	Quan Zhou and Jakub Marecek	(参考訳) 不適切な学習として知られる線形力学系(lds)の次の観測を予測したり、ldsの適切な学習として知られる系行列の推定を行うのが最近の進歩である。本稿では,この問題の非凸性に拘わらず,最小二乗推定器への数値解の大域収束を保証する,ldsの適切な学習手法を提案する。我々は有望な計算結果を示す。 There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.	翻訳日:2024-02-29 01:30:44 公開日:2024-02-27
# 階層構造概念の学習 Learning Hierarchically Structured Concepts ( http://arxiv.org/abs/1909.04559v6 ) ライセンス: Link先を確認	Nancy Lynch and Frederik Mallmann-Trenn	(参考訳) 構造を持つ概念が脳内でどのように表現されるかという問題を考察する。具体的には,階層構造を持つ概念のモデルを導入し,生物学的に妥当なニューラルネットワークがこれらの概念をどのように認識できるか,そもそもどのように学習できるかを示す。私たちの主なゴールは、これらのタスクのための一般的なフレームワークを導入し、認識と学習の両方がどうやって達成できるかを正式に証明することです。ノイズがあっても、両方のタスクが達成できることを示す。学習においては,オジャの規則を正式に分析し,シナプスの重みを調節する生物学的に有望な規則を定式化した。我々は、階層的深さの概念を認識するためには、ニューラルネットワークが対応する層数を持つ必要があると主張する下限で学習結果を補完する。 We study the question of how concepts that have structure get represented in the brain. Specifically, we introduce a model for hierarchically structured concepts and we show how a biologically plausible neural network can recognize these concepts, and how it can learn them in the first place. Our main goal is to introduce a general framework for these tasks and prove formally how both (recognition and learning) can be achieved. We show that both tasks can be accomplished even in presence of noise. For learning, we analyze Oja's rule formally, a well-known biologically-plausible rule for adjusting the weights of synapses. We complement the learning results with lower bounds asserting that, in order to recognize concepts of a certain hierarchical depth, neural networks must have a corresponding number of layers.	翻訳日:2024-02-29 01:30:36 公開日:2024-02-27
# 大質量スピン1/2$粒子を用いた非相対論的速度でのウィグナー回転の検証実験の提案 Proposal for an experiment to verify Wigner's rotation at non-relativistic speeds with massive spin-$1/2$ particles ( http://arxiv.org/abs/1604.03389v2 ) ライセンス: Link先を確認	Veiko Palge, Jacob Dunningham, Yuji Hasegawa, Christian Pfeifer	(参考訳) スピンによる量子粒子のウィグナー回転は、特殊相対性理論と量子力学の2つの基本的な側面の間の相互作用の興味深い結果の1つである。スピン-1/2$の粒子のウィグナー回転角の直接の高精度な検証は、一方の自然のローレンツ対称性と他方の量子力学におけるその実現を検証した。本稿では,非相対論的速度の領域におけるウィグナーの回転を,質量スピン-1/2$粒子に対して2 \cdot 10^3$ m/sで直接検証する実験を提案する。実験室で低速中性子を用いて実験を行う方法について検討する。非相対論的速度の測定は、ウィグナー回転が累積効果であるため、中性子を十分に長時間伝播させることによって可能となる。 The Wigner rotation of quantum particles with spin is one of the fascinating consequences of interplay between two fundamental aspects of Nature: special relativity and quantum mechanics. A direct high precision verification of the Wigner rotation angle of spin-$1/2$ particles would test on the one hand the Lorentz symmetry of Nature and on the other its realization in quantum mechanics. In this paper we propose such an experiment to directly verify Wigner's rotation in the regime of non-relativistic velocities at $2 \cdot 10^3$ m/s for massive spin-$1/2$ particles. We discuss how the experiment could be carried out in a laboratory using slow neutrons. The measurement at non-relativistic velocities becomes possible through letting neutrons propagate for a sufficiently long time because Wigner rotation is a cumulative effect.	翻訳日:2024-02-29 01:29:47 公開日:2024-02-27
# サブゴールモデルによる目標空間計画 Goal-Space Planning with Subgoal Models ( http://arxiv.org/abs/2206.02902v5 ) ライセンス: Link先を確認	Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White	(参考訳) 本稿では,動的プログラミング更新とモデルフリー更新を混合(近似)する,背景計画を用いたモデルベース強化学習の新しいアプローチについて検討する。学習モデルを用いたバックグラウンドプランニングは、メモリや計算量が非常に多いにもかかわらず、double dqnのようなモデルフリーの代替案よりも悪い場合が多い。根本的な問題は、学習したモデルが不正確であり、特に多くのステップを繰り返すと、しばしば無効な状態を生成することである。本稿では,背景プランニングを一連のサブゴールに制約し,ローカルなサブゴール条件付きモデルのみを学習することで,この制限を回避する。このゴールスペース計画(GSP)アプローチは計算効率が良く、時間的抽象化を組み込んで長期計画の高速化を実現し、遷移ダイナミクスを完全に学習するのを避ける。我々は,GSPアルゴリズムが抽象空間から様々な基礎学習者に対して,異なる領域でより高速に学習することを可能にする方法を示す。 This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.	翻訳日:2024-02-29 01:25:04 公開日:2024-02-27
# Snapture -- 静的および動的ハンドジェスチャ認識を併用したニューラルアーキテクチャ Snapture -- A Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition ( http://arxiv.org/abs/2205.15862v2 ) ライセンス: Link先を確認	Hassan Ali, Doreen Jirak, Stefan Wermter	(参考訳) ロボットは人々の日常生活にもっと関与することが期待されているため、直感的なユーザーインターフェースを実現するフレームワークが要求される。ハンドジェスチャー認識システムは自然なコミュニケーション方法を提供しており、シームレスなヒューマンロボットインタラクション(HRI)の不可欠な部分である。近年、ディープラーニングによる計算モデルの膨大な進化が目撃されている。しかし、最先端モデルは、エンブレムや共同音声など、さまざまなジェスチャー領域にまたがる拡張に不足している。本稿では,新しい手ジェスチャー認識システムを提案する。我々のアーキテクチャは静的なジェスチャーと動的ジェスチャーの両方の学習を可能にし、そのピーク時にジェスチャーパフォーマンスのいわゆる「スナップショット」をキャプチャすることで、ダイナミックな動きとハンドポーズを統合する。さらに,ジェスチャーの動作プロファイルを分析し,その動的特性を明らかにすることで,動作量に基づいて静的チャネルを制御できる手法を提案する。 CNNLSTMベースラインと比較して,2つのジェスチャベンチマークに対するアプローチが優れていることを示す。また、パフォーマンス改善のためのSnaptureアーキテクチャの可能性を明らかにするジェスチャークラスに基づく分析も提供します。モジュラ実装により,HRIシナリオの重要な手がかりである表情やヘッドトラッキングといった,他のマルチモーダルデータをひとつのアーキテクチャに統合することが可能になる。そこで本研究は,ロボットとの非言語コミュニケーションのためのジェスチャー認識研究と機械学習応用の両方に貢献する。 As robots are expected to get more involved in people's everyday lives, frameworks that enable intuitive user interfaces are in demand. Hand gesture recognition systems provide a natural way of communication and, thus, are an integral part of seamless Human-Robot Interaction (HRI). Recent years have witnessed an immense evolution of computational models powered by deep learning. However, state-of-the-art models fall short in expanding across different gesture domains, such as emblems and co-speech. In this paper, we propose a novel hybrid hand gesture recognition system. Our architecture enables learning both static and dynamic gestures: by capturing a so-called "snapshot" of the gesture performance at its peak, we integrate the hand pose along with the dynamic movement. Moreover, we present a method for analyzing the motion profile of a gesture to uncover its dynamic characteristics and which allows regulating a static channel based on the amount of motion. Our evaluation demonstrates the superiority of our approach on two gesture benchmarks compared to a CNNLSTM baseline. We also provide an analysis on a gesture class basis that unveils the potential of our Snapture architecture for performance improvements. Thanks to its modular implementation, our framework allows the integration of other multimodal data like facial expressions and head tracking, which are important cues in HRI scenarios, into one architecture. Thus, our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.	翻訳日:2024-02-29 01:24:45 公開日:2024-02-27
# 光量子ウォークによる2量子絡み合い測定のためのユニバーサルデバイス Universal device for two-qubit entangled measurements via photonic quantum walks ( http://arxiv.org/abs/2204.11310v2 ) ライセンス: Link先を確認	Wen-Zhe Yan, Zhibo Hou, Jun-Feng Tang, Guo-Yong Xiang, Chuan-Feng Li, Guang-Can Guo, Marc-Olivier Renou	(参考訳) 高度化量子測定は、多くの情報問題において量子上の優位性を得るための基礎となる。ここでは、2キュービットの純状態で符号化された方向を推定する作業を検討する。我々は,非理想状態からでも最適方向推定(忠実度と最大度スコアで測定)を再現できることを実験的に証明した。このプロトコルは9段階のフォトニック量子ウォークを用いて、0.9850以上のフィディティを持つ最適な5出力2量子ビット集団計測を実装している。棄却により,方向推定スコアの10倍以上の改善(最適推定スコアへの偏差)が得られた。本研究は,光量子ウォークの多量子化手法の汎用性を実証するものである。 Sophisticated quantum measurements are fundamental to obtain a quantum advantage in many informational problems. Here, we consider the task of guessing a direction encoded in a two-qubit pure state. We experimentally demonstrate that abstention can be used to recover optimal direction guessing (measured in terms of the fidelity and maximum likelihood scores) even from non ideal states. Our protocol uses nine-step photonic quantum walks to implement the optimal five-output two-qubit collective measurements with fidelities above 0.9850. Thanks to abstention, we obtain more than a 10-fold improvement of the direction guessing scores (in terms of deviation to the optimal guessing scores). Our work demonstrates the versatility of photonic quantum walks for implementing many-qubit sophisticated measurements.	翻訳日:2024-02-29 01:24:09 公開日:2024-02-27
# 逆行性攻撃に対する適応的摂動 Adaptive Perturbation for Adversarial Attack ( http://arxiv.org/abs/2111.13841v3 ) ライセンス: Link先を確認	Zheng Yuan, Jie Zhang, Zhaoyan Jiang, Liangliang Li, Shiguang Shan	(参考訳) 近年、ディープラーニングモデルのセキュリティは、敵の例に弱いニューラルネットワークの急速な発展によって、ますます注目を集めている。既存のグラデーションベースの攻撃手法のほとんどすべてが生成時の符号関数を使用して、$l_\infty$ のノルムに対する摂動予算の要件を満たす。しかし, 符号関数は, 正確な勾配方向を変更するため, 逆例生成には不適切である可能性がある。符号関数の代わりに, 対向摂動を発生させるスケーリング係数を用いて, 正確な勾配方向を直接利用し, 対向的な摂動が少なくても対向的な例の攻撃成功率を向上させることを提案する。同時に,この手法がブラックボックス転送性の向上を理論的に証明する。また、最適なスケーリング係数が画像によって異なることを考慮し、各画像に対して適切なスケーリング係数を求める適応スケーリング係数生成器を提案し、スケーリング係数を手動で検索する計算コストを回避する。本手法は,攻撃成功率を改善するため,既存の攻撃手法のほとんどすべてと統合することができる。 CIFAR10とImageNetデータセットの大規模な実験により、我々の手法は高い転送可能性を示し、最先端の手法よりも優れていることが示された。 In recent years, the security of deep learning models achieves more and more attentions with the rapid development of neural networks, which are vulnerable to adversarial examples. Almost all existing gradient-based attack methods use the sign function in the generation to meet the requirement of perturbation budget on $L_\infty$ norm. However, we find that the sign function may be improper for generating adversarial examples since it modifies the exact gradient direction. Instead of using the sign function, we propose to directly utilize the exact gradient direction with a scaling factor for generating adversarial perturbations, which improves the attack success rates of adversarial examples even with fewer perturbations. At the same time, we also theoretically prove that this method can achieve better black-box transferability. Moreover, considering that the best scaling factor varies across different images, we propose an adaptive scaling factor generator to seek an appropriate scaling factor for each image, which avoids the computational cost for manually searching the scaling factor. Our method can be integrated with almost all existing gradient-based attack methods to further improve their attack success rates. Extensive experiments on the CIFAR10 and ImageNet datasets show that our method exhibits higher transferability and outperforms the state-of-the-art methods.	翻訳日:2024-02-29 01:23:25 公開日:2024-02-27
# カーネルを用いた複合適合試験 Composite Goodness-of-fit Tests with Kernels ( http://arxiv.org/abs/2111.10275v4 ) ライセンス: Link先を確認	Oscar Key, Arthur Gretton, Fran\c{c}ois-Xavier Briol, Tamara Fernandez	(参考訳) モデルの不特定は確率的モデルの実装に重大な課題を生じさせうるため、この問題を直接的に考慮する様々な堅牢な手法の開発につながっている。しかし、これらのより関連するメソッドが必要かどうかは、モデルが本当に誤った仕様であるかどうかに依存し、この質問に答える一般的な方法が欠如している。本稿では,そのような方法を提案する。より正確には、あるパラメトリックな家系の任意の分布からデータが得られるかどうかに関心を持つ、難しい複合テスト問題に対するカーネルベースの仮説テストを提案する。実験では,最小距離推定器を用いて,最大平均誤差とカーネルのスタイン誤差を推定する。これらは広く適用可能であり、パラメトリックモデルの密度が正規化定数まで分かる場合や、モデルがシミュレータの形式を取る場合などである。その結果,適切なテストレベルを維持しつつ,パラメータを推定し,同じデータに対して(データ分割を伴わずに)テストを行うことが可能であることが判明した。提案手法は, 異常な非パラメトリック密度モデルの有効性の検証や, 生体細胞ネットワークの難易度生成モデルなど, 様々な問題について考察する。 Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.	翻訳日:2024-02-29 01:23:04 公開日:2024-02-27
# ロバストなサイバーセキュリティトピック分類ツール A Robust Cybersecurity Topic Classification Tool ( http://arxiv.org/abs/2109.02473v4 ) ライセンス: Link先を確認	Elijah Pelofske, Lorie M. Liebrock, Vincent Urias	(参考訳) 本研究では,インターネット上の3つのテキストソース(reddit, stackexchange, arxiv)のユーザ定義ラベルを用いて,21種類の機械学習モデルを学習し,サイバーセキュリティの議論を自然テキストで検出するトピック分類タスクを行う。クロス検証実験において,21モデル各々の偽陽性率と偽陰性率を解析した。次に、サイバーセキュリティ関連テキストを検出する決定機構として、21のトレーニングされた機械学習モデルの多数決を取り入れたサイバーセキュリティトピック分類(ctc)ツールを提案する。また、CTCツールの過半数投票機構は、21種類のモデルの平均値よりも、偽陰性率と偽陽性率を低くすることを示した。 CTCツールは、何十万ものドキュメントにスケーラブルで、時間順にウォールクロックがあることを示している。 In this research, we use user defined labels from three internet text sources (Reddit, Stackexchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural text. We analyze the false positive and false negative rates of each of the 21 model's in a cross validation experiment. Then we present a Cybersecurity Topic Classification (CTC) tool, which takes the majority vote of the 21 trained machine learning models as the decision mechanism for detecting cybersecurity related text. We also show that the majority vote mechanism of the CTC tool provides lower false negative and false positive rates on average than any of the 21 individual models. We show that the CTC tool is scalable to the hundreds of thousands of documents with a wall clock time on the order of hours.	翻訳日:2024-02-29 01:21:46 公開日:2024-02-27
# 相対エントロピー規則化による経験的リスク最小化 Empirical Risk Minimization with Relative Entropy Regularization ( http://arxiv.org/abs/2211.06617v4 ) ライセンス: Link先を確認	Samir M. Perlaza, Gaetan Bisson, I\~naki Esnaola, Alain Jean-Marie, Stefano Rini	(参考訳) 相対エントロピー正則化(ERM-RER)を伴う経験的リスク最小化(ERM)問題は、基準測度が$\sigma$-finite測度であり、必ずしも確率測度ではないという仮定の下で検討される。この仮定の下では、ERM-RER問題を一般化し、事前知識を組み込む柔軟性がより高められ、多くの関連する性質が記述される。これらの性質のうち、この問題の解が存在すれば、一意な確率測度であることが示され、相互に基準測度と絶対連続である。そのような解は、後者が解を持つかどうかに関わらず、ERM問題に対するおそらくほぼ正しい保証を示す。固定されたデータセットと特定の条件下では、モデルが ERM-RER 問題への解からサンプリングされるとき、経験的リスクが準ガウス確率変数であることが示される。 ERM-RER問題に対する解の一般化能力(ギブスアルゴリズム)は、そのような解から代替確率測度への偏差に対する期待された経験的リスクの感度によって研究される。最後に、感度、一般化誤差、ラウテン情報の間の興味深い接続を確立する。 The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $\sigma$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.	翻訳日:2024-02-29 01:15:07 公開日:2024-02-27
# GmGM: 高速マルチ軸ガウスグラフモデル GmGM: a Fast Multi-Axis Gaussian Graphical Model ( http://arxiv.org/abs/2211.02920v3 ) ライセンス: Link先を確認	Bailey Andrew, David Westhead, Luisa Cutillo	(参考訳) 本稿では,行列およびテンソル変量データのスパースグラフ表現を構成するモデルであるガウス多元グラフモデルを紹介する。我々は,この領域における先行研究を,軸を共有する数個のテンソルで同時に学習することにより一般化し,マルチオミクスで遭遇したようなマルチモーダルデータセットの解析を可能にする。我々のアルゴリズムは1軸あたり1つの固有分解しか使用せず、一般化されていない場合の先行処理よりも桁違いのスピードアップを達成する。これにより,従来のアプローチでは困難であったシングルセルマルチオミクスデータなど,大規模なマルチモーダルデータセット上での方法論の利用が可能となった。合成データと実世界の5つのデータセットでモデルを検証した。 This paper introduces the Gaussian multi-Graphical Model, a model to construct sparse graph representations of matrix- and tensor-variate data. We generalize prior work in this area by simultaneously learning this representation across several tensors that share axes, which is necessary to allow the analysis of multimodal datasets such as those encountered in multi-omics. Our algorithm uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case. This allows the use of our methodology on large multi-modal datasets such as single-cell multi-omics data, which was challenging with previous approaches. We validate our model on synthetic data and five real-world datasets.	翻訳日:2024-02-29 01:14:23 公開日:2024-02-27
# 発音改善のためのフローベース音声変換による言語間テキスト音声合成 Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation ( http://arxiv.org/abs/2210.17264v2 ) ライセンス: Link先を確認	Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Georgia Maniati, Panos Kakoulidis, June Sig Sung, Inchul Hwang, Spyros Raptis, Aimilios Chalamandaris, Pirros Tsiakoulis	(参考訳) 本稿では,従来の話者の言語によらず,対象言語の発音を維持することを目的とした,エンドツーエンドの言語間テキスト合成(TTS)手法を提案する。使用するモデルは非接触型タコトロンアーキテクチャに基づいており、デコーダは話者識別に基づく正規化フローネットワークに置き換えられ、ttsと音声変換(vc)の両方が固有の言語内容と話者識別の不等角性のために同じモデルで実行できるようになった。言語横断的な設定で使用する場合、まずターゲット言語のネイティブ話者を用いて音響的特徴が生成され、その後、これらの特徴を対象話者の音声に変換するために、同じモデルで音声変換が適用される。主観的および主観的な評価を通じて,本手法がベースライン間言語合成よりも有効であることを示す。平均7.5分間の講演者を含めることで、低リソースシナリオに対する肯定的な結果も提示する。 This paper presents a method for end-to-end cross-lingual text-to-speech (TTS) which aims to preserve the target language's pronunciation regardless of the original speaker's language. The model used is based on a non-attentive Tacotron architecture, where the decoder has been replaced with a normalizing flow network conditioned on the speaker identity, allowing both TTS and voice conversion (VC) to be performed by the same model due to the inherent linguistic content and speaker identity disentanglement. When used in a cross-lingual setting, acoustic features are initially produced with a native speaker of the target language and then voice conversion is applied by the same model in order to convert these features to the target speaker's voice. We verify through objective and subjective evaluations that our method can have benefits compared to baseline cross-lingual synthesis. By including speakers averaging 7.5 minutes of speech, we also present positive results on low-resource scenarios.	翻訳日:2024-02-29 01:14:10 公開日:2024-02-27
# All the Feels:大面積触覚センサ付き豪華な手 All the Feels: A dexterous hand with large-area tactile sensing ( http://arxiv.org/abs/2210.15658v3 ) ライセンス: Link先を確認	Raunaq Bhirangi, Abigail DeFranco, Jacob Adkins, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Vikash Kumar	(参考訳) 高いコストと信頼性の欠如は、ロボット工学におけるデクスタラスハンドの普及を妨げている。さらに、手の全領域を感知できる実用的な触覚センサーの欠如は、高度な操作スキルの学習を改善するリッチで低レベルなフィードバックを妨げている。本稿では,深層ロボット学習パラダイムが要求する大規模データ収集能力を満足しつつ,これらの課題を解決することを目的とした,安価でモジュール性,堅牢,スケーラブルなプラットフォームであるdmanusについて紹介する。人間の操作に関する研究は、日常作業における低レベルの触覚フィードバックの臨界性を示している。 dmanusには、手のひらの表面だけでなく指先全体に対する反射センサーが付いている。触覚認識タスク -- ビンの選択とソート - において,完全統合システムの有効性を示す。コード、ドキュメンテーション、デザインファイル、詳細なアセンブリ命令、トレーニングされたモデル、タスクビデオ、セットアップを再現するために必要な追加資料はすべてhttps://sites.google.com/view/roboticsbenchmarks/platforms/dmanusにある。 High cost and lack of reliability has precluded the widespread adoption of dexterous hands in robotics. Furthermore, the lack of a viable tactile sensor capable of sensing over the entire area of the hand impedes the rich, low-level feedback that would improve learning of dexterous manipulation skills. This paper introduces an inexpensive, modular, robust, and scalable platform -- the DManus -- aimed at resolving these challenges while satisfying the large-scale data collection capabilities demanded by deep robot learning paradigms. Studies on human manipulation point to the criticality of low-level tactile feedback in performing everyday dexterous tasks. The DManus comes with ReSkin sensing on the entire surface of the palm as well as the fingertips. We demonstrate effectiveness of the fully integrated system in a tactile aware task -- bin picking and sorting. Code, documentation, design files, detailed assembly instructions, trained models, task videos, and all supplementary materials required to recreate the setup can be found on https://sites.google.com/view/roboticsbenchmarks/platforms/dmanus.	翻訳日:2024-02-29 01:13:26 公開日:2024-02-27
# 非ラベルデータを用いたコントローラ誘導部分ラベル一貫性規則化 Controller-Guided Partial Label Consistency Regularization with Unlabeled Data ( http://arxiv.org/abs/2210.11194v4 ) ライセンス: Link先を確認	Qian-Wei Wang, Bowen Zhao, Mingyan Zhu, Tianxiang Li, Zimo Liu, Shu-Tao Xia	(参考訳) 部分ラベル学習(PLL)は、複数の候補ラベルに関連付けられたトレーニング例から学習する。近年, 曖昧な監視処理能力と, 最新のデータ拡張手法の推進力により, 整合性正規化に基づくPLL法が成功し, 主流になってきた。しかし、部分アノテーションが不十分になると、パフォーマンスは大幅に低下する。本稿では,ラベルの整合性の部分的正則化を容易にするために,アクセスし易いラベルなし例を利用する。部分的教師付き損失に加えて,ラベル付きデータの助けを借りて,ラベルレベルと表現レベルの両方でコントローラ誘導整合正則化を行う。初期教師付きモデルの欠点を最小限に抑えるため,制御器を用いて各予測の信頼度を推定し,その後の整合性正規化を導出する。さらに, 信頼度閾値を動的に調整し, 整合正則化に参加する各クラスの標本数が大まかに等しいようにし, クラス不均衡の問題を緩和する。実験により,本手法はより実用的な状況で十分な性能を得られ,既存のpll法にもモジュールを適用できることを示した。 Partial label learning (PLL) learns from training examples each associated with multiple candidate labels, among which only one is valid. In recent years, benefiting from the strong capability of dealing with ambiguous supervision and the impetus of modern data augmentation methods, consistency regularization-based PLL methods have achieved a series of successes and become mainstream. However, as the partial annotation becomes insufficient, their performances drop significantly. In this paper, we leverage easily accessible unlabeled examples to facilitate the partial label consistency regularization. In addition to a partial supervised loss, our method performs a controller-guided consistency regularization at both the label-level and representation-level with the help of unlabeled data. To minimize the disadvantages of insufficient capabilities of the initial supervised model, we use the controller to estimate the confidence of each current prediction to guide the subsequent consistency regularization. Furthermore, we dynamically adjust the confidence thresholds so that the number of samples of each class participating in consistency regularization remains roughly equal to alleviate the problem of class-imbalance. Experiments show that our method achieves satisfactory performances in more practical situations, and its modules can be applied to existing PLL methods to enhance their capabilities.	翻訳日:2024-02-29 01:13:06 公開日:2024-02-27
# ランダム行列理論からの動的量子相転移 Dynamical quantum phase transitions from random matrix theory ( http://arxiv.org/abs/2208.01659v2 ) ライセンス: Link先を確認	David P\'erez-Garc\'ia, Leonardo Santilli and Miguel Tierz	(参考訳) ランダム行列理論とそれに伴う平面極限の概念を用いて、新しい動的量子相転移を明らかにする。等方性xyハイゼンベルクスピン鎖について研究する。このため、ロスヒミットエコーを用いてリアルタイムのダイナミクスを探索する。これは、我々が開発する新しい技術的考察を必要とする複素重みを持つランダム行列アンサンブルの研究に繋がる。主な結果は3つある。 1) 再スケールされた臨界時に第3次相転移が存在すると判断する。 2) 3次相転移は熱力学的限界から遠ざかっている。 3) 臨界値以下の場合, 熱力学的限界と有限鎖との差はシステムサイズとともに指数関数的に減少する。これらの結果はすべて、忠実性に適合する量子状態の反転スピンの数のパリティに依存する。 We uncover a novel dynamical quantum phase transition, using random matrix theory and its associated notion of planar limit. We study it for the isotropic XY Heisenberg spin chain. For this, we probe its real-time dynamics through the Loschmidt echo. This leads to the study of a random matrix ensemble with a complex weight, whose analysis requires novel technical considerations, that we develop. We obtain three main results: 1) There is a third order phase transition at a rescaled critical time, that we determine. 2) The third order phase transition persists away from the thermodynamic limit. 3) For times below the critical value, the difference between the thermodynamic limit and a finite chain decreases exponentially with the system size. All these results depend in a rich manner on the parity of the number of flipped spins of the quantum state conforming the fidelity.	翻訳日:2024-02-29 01:12:28 公開日:2024-02-27
# インクリメンタルパルス再探索による時間効率クイディットゲート Time-Efficient Qudit Gates through Incremental Pulse Re-seeding ( http://arxiv.org/abs/2206.14975v2 ) ライセンス: Link先を確認	Lennart Maximilian Seifert, Jason Chadwick, Andrew Litteken, Frederic T. Chong, Jonathan M. Baker	(参考訳) 量子コンピュータを構築するための現在の取り組みは主に2状態の量子ビットに焦点を当てている。本研究では,この抽象化を分解し,一般化されたd状態qudit上でのゲートの短デュレーション制御パルスを合成する。そこで本研究では,最適制御ソフトウェアを最下位のパルスに導くための実用的な手法であるインクリメンタルパルス再訪法を提案する。トランスモン上の1-および2-キュートゲートの明示的なパルス最適化により、ヒルベルト空間次元とゲート持続時間の間のニアリニア関係を見いだす。以上の結果から,qudit操作は従来想定されていたより効率的であり,現在のハードウェアの計算能力を大幅に向上できる可能性が示唆された。 Current efforts to build quantum computers focus mainly on the two-state qubit, which often involves suppressing readily-available higher states. In this work, we break this abstraction and synthesize short-duration control pulses for gates on generalized d-state qudits. We present Incremental Pulse Re-seeding, a practical scheme to guide optimal control software to the lowest-duration pulse by iteratively seeding the optimizer with previous results. We find a near-linear relationship between Hilbert space dimension and gate duration through explicit pulse optimization for one- and two-qudit gates on transmons. Our results suggest that qudit operations are much more efficient than previously expected in the practical regime of interest and have the potential to significantly increase the computational power of current hardware.	翻訳日:2024-02-29 01:11:30 公開日:2024-02-27
# 局所的および大域的低レベル問題を用いた通信効率のよい二重レベル最適化 Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems ( http://arxiv.org/abs/2302.06701v2 ) ライセンス: Link先を確認	Junyi Li, Feihu Huang, Heng Huang	(参考訳) バイレベル最適化は、新しい効率的なアルゴリズムで最近顕著な進歩をみせた。しかし、フェデレーション学習におけるその応用は比較的過小評価されており、二レベルアルゴリズムの収束に対するフェデレーション学習固有の課題の影響はいまだに不明である。本稿では,FedBiOAccという通信効率の高いアルゴリズムを提案する。このアルゴリズムは、分散環境での過勾配の効率的な推定と、モーメントに基づく分散還元加速度を利用する。注目すべきは、FedBiOAccは通信複雑性$O(\epsilon^{-1})$、サンプル複雑性$O(\epsilon^{-1.5})$、クライアント数に対する線形スピードアップを達成することである。また,低レベルの問題をクライアントがローカルに管理するフェデレート二レベル最適化問題の特別なケースについても分析した。我々は、FedBiOAcc-Localの修正版であるFedBiOAcc-Localが、この種の問題に対して同じ速度で収束していることを証明する。最後に,フェデレーションデータクリーニングとフェデレーションハイパー表現学習という2つの実世界のタスクを通して,提案アルゴリズムを検証する。実験結果はアルゴリズムの優れた性能を示す。 Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms. However, its application in the Federated Learning setting remains relatively underexplored, and the impact of Federated Learning's inherent challenges on the convergence of bilevel algorithms remain obscure. In this work, we investigate Federated Bilevel Optimization problems and propose a communication-efficient algorithm, named FedBiOAcc. The algorithm leverages an efficient estimation of the hyper-gradient in the distributed setting and utilizes the momentum-based variance-reduction acceleration. Remarkably, FedBiOAcc achieves a communication complexity $O(\epsilon^{-1})$, a sample complexity $O(\epsilon^{-1.5})$ and the linear speed up with respect to the number of clients. We also analyze a special case of the Federated Bilevel Optimization problems, where lower level problems are locally managed by clients. We prove that FedBiOAcc-Local, a modified version of FedBiOAcc, converges at the same rate for this type of problems. Finally, we validate the proposed algorithms through two real-world tasks: Federated Data-cleaning and Federated Hyper-representation Learning. Empirical results show superior performance of our algorithms.	翻訳日:2024-02-29 01:06:46 公開日:2024-02-27
# uvdoc:neural gridベースのドキュメントアンワーピング UVDoc: Neural Grid-based Document Unwarping ( http://arxiv.org/abs/2302.02887v2 ) ライセンス: Link先を確認	Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung	(参考訳) 折りたたまれたページのカジュアルな写真から印刷された文書のオリジナルの平らな外観を復元することは日常的な問題である。本稿では,グリッドベースの単一画像文書のアンウォープ手法を提案する。提案手法は,文書の3次元グリッドメッシュとそれに対応する2次元アンウォープグリッドを二重タスク方式で予測し,紙の形状と2次元画像との結合を暗黙的に符号化する,完全畳み込み型ディープニューラルネットワークを用いて幾何的歪み補正を行う。一般的なDoc3Dデータセットよりもリアルに見えるデータに基づいてアンウォープモデルをトレーニングできるように、擬似フォトリアリスティックな文書イメージと物理的に正確な3D形状とアンウォープ関数アノテーションを組み合わせた、UVDocと呼ばれるデータセットを作成し、公開します。私たちのデータセットには、典型的に野生のデータセットで見られる基盤構造が欠如していることに対処可能な、別々の損失関数を設計することなく、アンウォーピングネットワークをトレーニングするために必要なすべての情報がラベル付けされています。我々は、新しい擬似フォトリアリスティックデータセットを含めることで、DocUNetベンチマークで比較的小さなネットワークアーキテクチャが最先端の結果を達成することを示す詳細な評価を行う。 UVDocデータセットの擬似フォトリアリスティックな性質は、照明補正MS-SSIMのような新しい評価方法を可能にする。このような評価を容易にする新しいベンチマークデータセットを提案し、アンウォープ後の直線直線性を定量化する指標を提案する。私たちのコード、結果、UVDocデータセットはhttps://github.com/tanguymagne/UVDocで利用可能です。 Restoring the original, flat appearance of a printed document from casual photographs of bent and wrinkled pages is a common everyday problem. In this paper we propose a novel method for grid-based single-image document unwarping. Our method performs geometric distortion correction via a fully convolutional deep neural network that learns to predict the 3D grid mesh of the document and the corresponding 2D unwarping grid in a dual-task fashion, implicitly encoding the coupling between the shape of a 3D piece of paper and its 2D image. In order to allow unwarping models to train on data that is more realistic in appearance than the commonly used synthetic Doc3D dataset, we create and publish our own dataset, called UVDoc, which combines pseudo-photorealistic document images with physically accurate 3D shape and unwarping function annotations. Our dataset is labeled with all the information necessary to train our unwarping network, without having to engineer separate loss functions that can deal with the lack of ground-truth typically found in document in the wild datasets. We perform an in-depth evaluation that demonstrates that with the inclusion of our novel pseudo-photorealistic dataset, our relatively small network architecture achieves state-of-the-art results on the DocUNet benchmark. We show that the pseudo-photorealistic nature of our UVDoc dataset allows for new and better evaluation methods, such as lighting-corrected MS-SSIM. We provide a novel benchmark dataset that facilitates such evaluations, and propose a metric that quantifies line straightness after unwarping. Our code, results and UVDoc dataset are available at https://github.com/tanguymagne/UVDoc.	翻訳日:2024-02-29 01:05:41 公開日:2024-02-27
# プライベート、公平、正確:医療画像における大規模プライバシー保護aiモデルのトレーニング Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging ( http://arxiv.org/abs/2302.01622v4 ) ライセンス: Link先を確認	Soroosh Tayebi Arasteh, Alexander Ziller, Christiane Kuhl, Marcus Makowski, Sven Nebelung, Rickmer Braren, Daniel Rueckert, Daniel Truhn, Georgios Kaissis	(参考訳) 人工知能(AI)モデルは、医療分野でますます使われている。しかし、医療データは極めて敏感であるため、保護を確実にするための特別な予防措置が必要である。プライバシー保護の金本位制は、モデルトレーニングに差分プライバシー(dp)を導入することである。先行研究は、DPがモデル精度と公平性に悪影響を及ぼすことを示しており、医療では受け入れられず、プライバシ保護技術の普及への障壁となっている。本研究では,AIモデルのプライバシ保護トレーニングが,非プライベートトレーニングと比較して精度と公平性に与える影響を評価した。そこで我々は,(1)高品質胸部x線画像の大規模データセット(n=193,311),(2)3次元腹部ct画像のデータセット(n=1,625)の2つのデータを用い,膵管腺癌(pdac)の存在を分類した。どちらも遡及的に収集され、経験豊富な放射線学者によって手動でラベル付けされた。次に、Pearsonのrまたは統計パリティ差(統計パリティ差)として測定された、非プライベートディープ畳み込みニューラルネットワーク(CNN)とプライバシ保護(DP)モデルの、受信者-操作特性曲線(AUROC)の領域として測定されたプライバシユーティリティトレードオフと、プライバシ-フェアネストレードオフを比較した。プライバシー保護のトレーニングは正確さを低下させたが、年齢、性別、共傷行為に対する差別を増幅することはなかった。本研究は, 実生活における臨床データセットの現実的な状況において, 診断深層学習モデルのプライバシ保護トレーニングが, 優れた診断精度と公平性で可能であることを示す。 Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. For this, we used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1,625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that, while the privacy-preserving trainings yielded lower accuracy, they did largely not amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.	翻訳日:2024-02-29 01:05:10 公開日:2024-02-27
# 証拠とその強さ:ベイズ要因または相対的信念比? How to Measure Evidence and Its Strength: Bayes Factors or Relative Belief Ratios? ( http://arxiv.org/abs/2301.08994v2 ) ライセンス: Link先を確認	Luai Al-Labadi, Ayman Alzaatreh, Michael Evans	(参考訳) ベイズ因子と相対的信念比の両方が証拠の原理を満たすので、統計的証拠の有効な尺度と見なすことができる。ベイズ要因は定期的に採用されている。問題は、これらの証拠のどれがより適切かということです。ここでは、ベイズ因子の現在の一般的な定義の妥当性について、事前の混合に基づいて疑問があり、全てを考慮すると、相対的信念比は証拠の尺度としてより良い性質を持つ。さらに, 混合先に対する自然制限が課されると, ベイズ係数は, 混合先を使わずに得られる相対的信念比と等しくなることを示した。この制限にもかかわらず、証拠の強さがどのように測定されるのかという疑問が残る。ここでは、ベイズ因子のサイズを用いて強度を測定するという現在の実践は正しくなく、この問題に対する解決策が提示されている。これらの証拠に関する一般的な批判も議論され、取り扱われている。 Both the Bayes factor and the relative belief ratio satisfy the principle of evidence and so can be seen to be valid measures of statistical evidence. Certainly Bayes factors are regularly employed. The question then is: which of these measures of evidence is more appropriate? It is argued here that there are questions concerning the validity of a current commonly used definition of the Bayes factor based on a mixture prior and, when all is considered, the relative belief ratio has better properties as a measure of evidence. It is further shown that, when a natural restriction on the mixture prior is imposed, the Bayes factor equals the relative belief ratio obtained without using the mixture prior. Even with this restriction, this still leaves open the question of how the strength of evidence is to be measured. It is argued here that the current practice of using the size of the Bayes factor to measure strength is not correct and a solution to this issue is presented. Several general criticisms of these measures of evidence are also discussed and addressed.	翻訳日:2024-02-29 01:03:40 公開日:2024-02-27
# キタエフスピン液体二層におけるエノン凝縮と閉じ込め転移 Anyon condensation and confinement transition in a Kitaev spin liquid bilayer ( http://arxiv.org/abs/2301.05721v3 ) ライセンス: Link先を確認	Kyusung Hwang	(参考訳) 量子スピン液体(QSL)間の遷移はランダウのパラダイムを超えた基本的な問題であり、トポロジカル秩序と呼ばれるQSLの絡み合い構造を深く理解する必要がある。エノン凝縮という新しい概念は、トポロジカル秩序の間の様々な遷移を予測できる理論的なメカニズムとして提案されてきたが、量子スピン系においてそのメカニズムを確認することは長い間考えられてきた。ここでは, アノン凝縮遷移のメカニズムを包含するコンクリートスピンモデルを紹介する。本モデルでは, パラメータ領域の異なる2つの位相QSL, 非アーベルキタエフスピン液体(KSL)2層状態と共鳴価結合(RVB)状態を有する。 2層膜-KSL-RVB遷移は、パルトン理論と正確な対角化研究を用いて同定した、エノン凝縮の機構によって実際に起こる。さらに,高エネルギー物理学におけるクォーク閉じ込めと同様の「アニオン閉じ込め」現象を数値計算で観測した。すなわち、二層kslの非可換イジングアノンは、rvb状態への遷移に制限される。本研究の意義と拡張は, 様々な側面から議論されている。 (i)アノン凝縮多層構造キタエフのアノン理論の16倍の方法 (ii)キタエフ二層系において、rvbからvalence bond solid(vbs)への付加的なバイソン凝縮遷移 (iii)非エルミート・キタエフ二層における動的アノン凝縮 (iv)我々のモデルの他の格子幾何への一般化、 (v)実験的な実現。この研究は、現代の凝縮物と量子物理学で広範囲に研究されている2つの興味深いqslを具体的スピンモデルにまとめ、キタエフスピン液体と共鳴原子価結合のオノン物理学を統一する包括的なイメージを提供する。 Transitions between quantum spin liquids (QSLs) are fundamental problems lying beyond the Landau paradigm and requiring a deep understanding of the entanglement structures of QSLs called topological orders. The novel concept of anyon condensation has been proposed as a theoretical mechanism, predicting various possible transitions between topological orders, but it has long been elusive to confirm the mechanism in quantum spin systems. Here, we introduce a concrete spin model that incarnates the mechanism of anyon condensation transition. Our model harbors two topological QSLs in different parameter regions, a non-abelian Kitaev spin liquid (KSL) bilayer state and a resonating valence bond (RVB) state. The bilayer-KSL-to-RVB transition indeed occurs by the mechanism of anyon condensation, which we identify by using parton theories and exact diagonalization studies. Moreover, we observe "anyon confinement" phenomena in our numerical results, akin to the quark confinement in high energy physics. Namely, non-abelian Ising anyons of the bilayer KSL are confined in the transition to the RVB state. Implications and extensions of this study are discussed in various aspects such as (i) anyon-condensed multilayer construction of the Kitaev's sixteenfold way of anyon theories, (ii) additional vison condensation transition from the RVB to a valence bond solid (VBS) in the Kitaev bilayer system, (iii) dynamical anyon condensation in a non-Hermitian Kitaev bilayer, (iv) generalizations of our model to other lattice geometries, and (v) experimental realizations. This work puts together the two fascinating QSLs that are extensively studied in modern condensed matter and quantum physics into a concrete spin model, offering a comprehensive picture that unifies the anyon physics of the Kitaev spin liquids and the resonating valence bonds.	翻訳日:2024-02-29 01:03:24 公開日:2024-02-27
# 大振幅光猫状態を決定論的に生成する方法 Method to deterministically generate large-amplitude optical cat states ( http://arxiv.org/abs/2301.02839v2 ) ライセンス: Link先を確認	Zheng-Hong Li, Fei Yu, Zhen-Ya Li, M. Al-Amri, and M. Suhail Zubairy	(参考訳) 大振幅光猫状態の決定論的調製法を提案する。相互作用のない測定と量子ゼノ効果を利用してマクロとマイクロシステムの絡み合いを生成することが鍵となる。この方法では、量子マイクロシステムと直接相互作用することなく、強い光場を緩やかに操作できる。直接相互作用は、量子マイクロシステムと相互作用する光子のほんの一部しか持たない複数の相互作用によってバイパスされる。そこで,本手法では,弱磁場環境内で量子マイクロシステムを完全に機能させることができる。また,古典デバイスによる光学的損失が低い限り,量子マイクロシステムが大きな光子損失を被った場合でも,猫状態の調製が可能であることを示す。量子マイクロシステムを説明するためにキャビティと原子のカップリングシステムを用いる。相互作用の数が増加するにつれて、原子の自発的放出と原子とキャビティの間のデチューニングの両方に対する感度が低下することを示した。したがって、古典光学系を改良して完全化することにより、猫状態の忠実度を高めることができる。 A deterministic preparation method for large-amplitude optical cat state is proposed. The key ingredient is to generate entanglement between macro and micro systems by utilizing interaction-free measurement and quantum Zeno effect. Our method enables the quantum microsystem to gently manipulate strong light field without directly interacting with it. The direct interaction is bypassed by multiple interactions, each of which has only a small fraction of photons interacting with the quantum microsystem. Therefore, our method allows the quantum microsystem to function entirely within a weak field environment, which is a distinct advantage of our method. Moreover, we also show that the cat state preparation can be achieved even if the quantum microsystem suffers from large photon loss, as long as optical losses from classical devices remain low. We use a cavity-atom coupling system to illustrate the quantum microsystem. We demonstrate that as the number of interactions increases, our scheme becomes less and less sensitive to both atomic spontaneous emission and detuning between the atom and the cavity. Therefore, the fidelity of the cat state can be increased by improving and perfecting the classical optical system.	翻訳日:2024-02-29 01:02:28 公開日:2024-02-27
# 複合力学の合同生成モデルのためのマルチモーダルデータの統合 Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics ( http://arxiv.org/abs/2212.07892v2 ) ライセンス: Link先を確認	Manuel Brenner and Florian Hess and Georgia Koppe and Daniel Durstewitz	(参考訳) 科学に関心を持つシステムの多くは自然に非線形力学系として記述されている。経験的には、時系列測定によってこれらのシステムによくアクセスする。このような時系列は連続的な測定ではなく離散的なランダム変数で構成されたり、同時に観測される複数のデータモーダルから測定される。例えば神経科学では、スパイクカウントや継続的な生理的記録に加えて行動ラベルがあるかもしれない。動的システム再構成(DSR)の深層学習に関する文献は,現在,盛んに行われているが,この文脈ではマルチモーダルデータの統合はほとんど検討されていない。本稿では,DSRトレーニング技術の最近の進歩を生かして,再構成モデルのトレーニングを誘導するスパース教師信号を生成するマルチモーダル変分オートエンコーダ上に,効率的で柔軟なアルゴリズムフレームワークを提供する。最適な再構成のために様々な情報ソースを組み合わせることができ、シンボリックデータ(クラスラベル)のみからの再構築を可能にし、共通の潜在力学空間内で異なるタイプの観測を接続する。科学的応用のための従来のマルチモーダルデータ統合技術とは対照的に、我々のフレームワークは完全に \textit{generative} であり、トレーニングの後に、基底真理系と同じ幾何学的および時間的構造を持つ軌道を生成する。 Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems. Empirically, we commonly access these systems through time series measurements. Often such time series may consist of discrete random variables rather than continuous measurements, or may be composed of measurements from multiple data modalities observed simultaneously. For instance, in neuroscience we may have behavioral labels in addition to spike counts and continuous physiological recordings. While by now there is a burgeoning literature on deep learning for dynamical systems reconstruction (DSR), multimodal data integration has hardly been considered in this context. Here we provide such an efficient and flexible algorithmic framework that rests on a multimodal variational autoencoder for generating a sparse teacher signal that guides training of a reconstruction model, exploiting recent advances in DSR training techniques. It enables to combine various sources of information for optimal reconstruction, even allows for reconstruction from symbolic data (class labels) alone, and connects different types of observations within a common latent dynamics space. In contrast to previous multimodal data integration techniques for scientific applications, our framework is fully \textit{generative}, producing, after training, trajectories with the same geometrical and temporal structure as those of the ground truth system.	翻訳日:2024-02-29 01:01:24 公開日:2024-02-27
# ベイズ型ニューラルネットワークの後方推定シャープ化による暗黙的視覚バイアス軽減 Implicit Visual Bias Mitigation by Posterior Estimate Sharpening of a Bayesian Neural Network ( http://arxiv.org/abs/2303.16564v3 ) ライセンス: Link先を確認	Rebecca S Stone, Nishant Ravikumar, Andrew J Bulpitt, David C Hogg	(参考訳) ディープニューラルネットワークの公平性は、データセットバイアスとスプリアス相関に強く影響され、どちらも現代の機能豊富な複雑なビジュアルデータセットに通常存在する。タスクの難易度と可変性のため、単一の脱バイアス手法は一般には成功していない。特に、バイアス変数の明示的な知識を必要としない暗黙的手法は、実世界のアプリケーションにとって特に関係がある。そこで本研究では,ベイズ型ニューラルネットワークを用いた暗黙的緩和法を提案する。提案手法は,高い不確実性に寄与しないコア機能にネットワークを集中させることを奨励するものである。 3つのベンチマークデータセットによる実験結果から, ベイジアンネットワークは従来の手法と相容れない性能を示し, さらなる探索にふさわしい可能性が示唆された。 The fairness of a deep neural network is strongly affected by dataset bias and spurious correlations, both of which are usually present in modern feature-rich and complex visual datasets. Due to the difficulty and variability of the task, no single de-biasing method has been universally successful. In particular, implicit methods not requiring explicit knowledge of bias variables are especially relevant for real-world applications. We propose a novel implicit mitigation method using a Bayesian neural network, allowing us to leverage the relationship between epistemic uncertainties and the presence of bias or spurious correlations in a sample. Our proposed posterior estimate sharpening procedure encourages the network to focus on core features that do not contribute to high uncertainties. Experimental results on three benchmark datasets demonstrate that Bayesian networks with sharpened posterior estimates perform comparably to prior existing methods and show potential worthy of further exploration.	翻訳日:2024-02-29 00:56:31 公開日:2024-02-27
# 深部生成モデルによるマルチモーダル・マルチコントラスト画像融合 Multimodal and multicontrast image fusion via deep generative models ( http://arxiv.org/abs/2303.15963v2 ) ライセンス: Link先を確認	Giovanna Maria Dimitri, Simeon Spasov, Andrea Duggento, Luca Passamonti, Pietro Li`o, Nicola Toschi	(参考訳) 近年,古典的診断ラベルが,いくつかの臨床表現型の複雑さや多様性を確実に記述できないことが次第に明らかになっている。これは、幅広い神経精神医学疾患(うつ病、不安障害、行動表現型など)に対して特に当てはまる。患者の不均一性は、伝統的なカテゴリー境界を越えて広がる横断性連続体の経験的に派生したセクションに基づいて、個人を新しいカテゴリに分類することでより良く説明できる。この文脈では、神経画像データは各患者の脳に関する時空間的に解決された豊富な情報を運ぶ。しかしながら、通常は、モデルトレーニングの一部としては学習されず、結果として下流予測タスクに最適化されない手順を通じて、優先度が大幅に崩れる。これは、通常、各被験者は複数の脳の3D画像モダリティを伴い、しばしば深い遺伝子型と表現型の特徴が伴うため、重大な計算課題が生じるためである。本稿では,モジュラーアプローチと分離可能な畳み込みブロックに根ざした生成モデルに基づくディープラーニングアーキテクチャを設計する。 a) ボクセルレベルで複数の3次元神経画像のモダリティを融合させる b) 重次元の縮小により情報を潜伏埋め込みに変換すること。 c) 良好な一般化性と情報損失を最小限に抑えること。概念実証として, 優れた特徴を持つHuman Connectome Projectデータベース上でアーキテクチャを検証し, 潜伏埋め込みが容易に分離可能な対象層にクラスタ化され, 組込み生成プロセスに含まれない表現型情報にマップされることを示した。これは、疾患の進化と薬物反応を予測する助けとなり、したがって機械的疾患の理解と臨床試験の強化を支援する。 Recently, it has become progressively more evident that classic diagnostic labels are unable to reliably describe the complexity and variability of several clinical phenotypes. This is particularly true for a broad range of neuropsychiatric illnesses (e.g., depression, anxiety disorders, behavioral phenotypes). Patient heterogeneity can be better described by grouping individuals into novel categories based on empirically derived sections of intersecting continua that span across and beyond traditional categorical borders. In this context, neuroimaging data carry a wealth of spatiotemporally resolved information about each patient's brain. However, they are usually heavily collapsed a priori through procedures which are not learned as part of model training, and consequently not optimized for the downstream prediction task. This is because every individual participant usually comes with multiple whole-brain 3D imaging modalities often accompanied by a deep genotypic and phenotypic characterization, hence posing formidable computational challenges. In this paper we design a deep learning architecture based on generative models rooted in a modular approach and separable convolutional blocks to a) fuse multiple 3D neuroimaging modalities on a voxel-wise level, b) convert them into informative latent embeddings through heavy dimensionality reduction, c) maintain good generalizability and minimal information loss. As proof of concept, we test our architecture on the well characterized Human Connectome Project database demonstrating that our latent embeddings can be clustered into easily separable subject strata which, in turn, map to different phenotypical information which was not included in the embedding creation process. This may be of aid in predicting disease evolution as well as drug response, hence supporting mechanistic disease understanding and empowering clinical trials.	翻訳日:2024-02-29 00:56:16 公開日:2024-02-27
# dance the quantum waltz: 3量子ビットゲートを4レベルアーキテクチャにコンパイルする Dancing the Quantum Waltz: Compiling Three-Qubit Gates on Four Level Architectures ( http://arxiv.org/abs/2303.14069v3 ) ライセンス: Link先を確認	Andrew Litteken (1), Lennart Maximilian Seifert (1), Jason D. Chadwick (1), Natalia Nottingham (1), Tanay Roy (1 and 2), Ziqian Li (1 and 3), David Schuster (1 and 3), Frederic T. Chong (1), Jonathan M. Baker (4) ((1) University of Chicago, (2) Fermilab, (3) Stanford University, (4) Duke University)	(参考訳) 超伝導量子デバイスは量子計算の最先端技術であるが、いくつかの課題を抱えている。ゲートエラー、コヒーレンスエラー、接続性の欠如はいずれも、信頼性の低い結果に寄与する。特に接続制限は、3量子ゲートを1または2量子ゲートに分解する必要があるゲートセットを強制する。これにより、実行すべき2ビットゲートの数を大幅に増加させる。しかし、多くの量子デバイスはより高いエネルギーレベルにアクセスできる。 qubitの$\|0\rangle$と$\|1\rangle$の抽象化を$\|2\rangle$と$\|3\rangle$の状態にアクセスできるququartに拡張できます。これにより、2つの量子ビットを1つの量子ビットにエンコードすることができ、2つの隣接する量子ビットから4つの完全に接続された量子ビットへの物理ユニット間の仮想接続が増加する。この接続方式により、2つの物理デバイス間でより効率的に3量子ビットゲートを実行できる。最適制御により合成された数個の3量子ゲートの直接対パルス実装を行い、最適制御により設計された4レベルキュートゲートの最初の実験実験で、4レベルデバイスにアクセス可能な超伝導アーキテクチャ上に3量子ゲートをコンパイルする。我々は、トッフォリゲートの実行に一時的に高レベル状態を使用し、常に高レベル状態を使用して量子回路のフィダリティを改善する戦略を示す。これらの手法は,中間符号化を用いて回路サイズを2倍に向上し,完全符号化クォートコンパイルでは3倍に向上する。 Superconducting quantum devices are a leading technology for quantum computation, but they suffer from several challenges. Gate errors, coherence errors and a lack of connectivity all contribute to low fidelity results. In particular, connectivity restrictions enforce a gate set that requires three-qubit gates to be decomposed into one- or two-qubit gates. This substantially increases the number of two-qubit gates that need to be executed. However, many quantum devices have access to higher energy levels. We can expand the qubit abstraction of $\|0\rangle$ and $\|1\rangle$ to a ququart which has access to the $\|2\rangle$ and $\|3\rangle$ state, but with shorter coherence times. This allows for two qubits to be encoded in one ququart, enabling increased virtual connectivity between physical units from two adjacent qubits to four fully connected qubits. This connectivity scheme allows us to more efficiently execute three-qubit gates natively between two physical devices. We present direct-to-pulse implementations of several three-qubit gates, synthesized via optimal control, for compilation of three-qubit gates onto a superconducting-based architecture with access to four-level devices with the first experimental demonstration of four-level ququart gates designed through optimal control. We demonstrate strategies that temporarily use higher level states to perform Toffoli gates and always use higher level states to improve fidelities for quantum circuits. We find that these methods improve expected fidelities with increases of 2x across circuit sizes using intermediate encoding, and increases of 3x for fully-encoded ququart compilation.	翻訳日:2024-02-29 00:55:35 公開日:2024-02-27
# CoLo-CAM:弱ラベル非拘束ビデオにおけるオブジェクトのコローカライゼーションのためのクラスアクティベーションマッピング CoLo-CAM: Class Activation Mapping for Object Co-Localization in Weakly-Labeled Unconstrained Videos ( http://arxiv.org/abs/2303.09044v3 ) ライセンス: Link先を確認	Soufiane Belharbi, Shakeeb Murtaza, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger	(参考訳) ビデオにおける時空間情報の活用は、弱教師付きビデオオブジェクトローカライゼーション(WSVOL)タスクにおいて重要である。しかし、最先端の手法は視覚と運動の手がかりにのみ依存するが、識別情報の破棄は不正確なローカライゼーションを許容する。近年,時間的クラスアクティベーションマッピング(CAM)法を用いたWSVOLタスクの識別モデルが検討されている。結果は有望だが、オブジェクトはフレームからフレームへの移動が限られていると仮定され、比較的長期の依存関係でパフォーマンスが低下する。本稿では、物体の位置を拘束することなく、訓練中の活性化マップの時空間情報を活用できる新しいWSVOLのCAM手法を提案する。訓練はコローカライゼーションに依存しており、CoLo-CAMという名称である。フレームのシーケンスが与えられると、オブジェクトが連続するフレームで同様の色を持つと仮定して、対応するマップ全体から抽出されたカラーキューに基づいて、ローカライゼーションを共同で学習する。 CAMアクティベーションは、同様の色を持つピクセルに対して同様の反応を制限され、コローカライゼーションが達成される。これは、共同学習がすべての画像位置と全フレーム間の直接通信を生成し、ローカライゼーションの転送、集約、修正を可能にするため、ローカライゼーション性能を向上させる。コローカライゼーションは、条件付きランダムフィールド(CRF)ロスの色項をフレーム/CAMのシーケンス上で最小化することにより、トレーニングに統合される。制約のないビデオの2つの挑戦的なYouTube-Objectsデータセットに対する大規模な実験は、当社のCoLo-CAMメソッドのメリットと、長期依存に対する堅牢性を示し、WSVOLタスクの新たな最先端パフォーマンスにつながった。 Leveraging spatiotemporal information in videos is critical for weakly supervised video object localization (WSVOL) tasks. However, state-of-the-art methods only rely on visual and motion cues, while discarding discriminative information, making them susceptible to inaccurate localizations. Recently, discriminative models have been explored for WSVOL tasks using a temporal class activation mapping (CAM) method. Although their results are promising, objects are assumed to have limited movement from frame to frame, leading to degradation in performance for relatively long-term dependencies. This paper proposes a novel CAM method for WSVOL that exploits spatiotemporal information in activation maps during training without constraining an object's position. Its training relies on Co-Localization, hence, the name CoLo-CAM. Given a sequence of frames, localization is jointly learned based on color cues extracted across the corresponding maps, by assuming that an object has similar color in consecutive frames. CAM activations are constrained to respond similarly over pixels with similar colors, achieving co-localization. This improves localization performance because the joint learning creates direct communication among pixels across all image locations and over all frames, allowing for transfer, aggregation, and correction of localizations. Co-localization is integrated into training by minimizing the color term of a conditional random field (CRF) loss over a sequence of frames/CAMs. Extensive experiments on two challenging YouTube-Objects datasets of unconstrained videos show the merits of our CoLo-CAM method, and its robustness to long-term dependencies, leading to new state-of-the-art performance for WSVOL task.	翻訳日:2024-02-29 00:55:02 公開日:2024-02-27
# 正規化異方性球状ガウスによるオンライン神経経路誘導 Online Neural Path Guiding with Normalized Anisotropic Spherical Gaussians ( http://arxiv.org/abs/2303.08064v2 ) ライセンス: Link先を確認	Jiawei Huang, Akito Iizuka, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura	(参考訳) 物理ベースレンダリングのばらつき低減速度は, 重要サンプリング技術によって大きく影響を受ける。本稿では,確率的レイサンプルを用いて,単一のニューラルネットワークを用いて空間変動密度モデルを学ぶための新しいオンラインフレームワークを提案する。そこで本研究では, 正規化異方性球状ガウス混合と呼ばれる, 複雑な照射場を少数のパラメータで表現できる新しい閉形式密度モデルを提案する。我々のフレームワークは、段階的に分布を学習し、ウォームアップフェーズは不要である。密度モデルのコンパクトで表現力に富んだ表現のため、このフレームワークはgpu上で完全に実装でき、限られた計算リソースで高品質な画像を生成することができます。 The variance reduction speed of physically-based rendering is heavily affected by the adopted importance sampling technique. In this paper we propose a novel online framework to learn the spatial-varying density model with a single small neural network using stochastic ray samples. To achieve this task, we propose a novel closed-form density model called the normalized anisotropic spherical gaussian mixture, that can express complex irradiance fields with a small number of parameters. Our framework learns the distribution in a progressive manner and does not need any warm-up phases. Due to the compact and expressive representation of our density model, our framework can be implemented entirely on the GPU, allowing it produce high quality images with limited computational resources.	翻訳日:2024-02-29 00:54:32 公開日:2024-02-27
# 合成データ:方法、ユースケース、リスク Synthetic Data: Methods, Use Cases, and Risks ( http://arxiv.org/abs/2303.01230v3 ) ライセンス: Link先を確認	Emiliano De Cristofaro	(参考訳) データの共有は、しばしば魅力的なアプリケーションや分析を可能にする。しかし、多くの場合、貴重なデータセットにはセンシティブな性質の情報が含まれており、共有することはユーザーや組織のプライバシーを危険にさらす可能性がある。研究コミュニティと業界の両方で勢いを増している可能性のある代替手段は、代わりに合成データを共有することだ。そのアイデアは、実際のデータに似た人工的に生成されたデータセットをリリースすることです。本稿では,合成データに関する穏やかな紹介と,そのユースケース,未対応のプライバシ課題,効果的なプライバシ向上技術として固有の制限について論じる。 Sharing data can often enable compelling applications and analytics. However, more often than not, valuable datasets contain information of a sensitive nature, and thus, sharing them can endanger the privacy of users and organizations. A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. The idea is to release artificially generated datasets that resemble the actual data -- more precisely, having similar statistical properties. In this article, we provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.	翻訳日:2024-02-29 00:53:27 公開日:2024-02-27
# マルチエージェント社会選択によるダイナミックフェアネス・アウェア・レコメンデーション Dynamic fairness-aware recommendation through multi-agent social choice ( http://arxiv.org/abs/2303.00968v3 ) ライセンス: Link先を確認	Amanda Aird, Paresha Farastu, Joshua Sun, Elena \v{S}tefancov\'a, Cassidy All, Amy Voida, Nicholas Mattei, Robin Burke	(参考訳) パーソナライズドレコメンデーションの文脈におけるアルゴリズム的公平性は、分類タスクでよく遭遇する人々とは大きく異なる課題を示している。分類を研究する研究者は一般に、公正性は保護されたグループと保護されていないグループの間の結果の平等を達成する問題であるとみなし、この基準に基づいてアルゴリズムによる介入を構築した。私たちは、現実世界のアプリケーション全般、特にパーソナライズドレコメンデーションの文脈における公平性は、より複雑で多面的であり、より一般的なアプローチを必要とすると主張している。 2段階の社会的選択問題として,レコメンダシステムにおけるマルチテイクホルダフェアネスを定式化するモデルを提案する。特に,公平性問題とパーソナライズド・レコメンデーション規定の両方を統合したアロケーション問題とアグリゲーション問題の新たな組み合わせとしてレコメンデーション・フェアネスを表現し,この定式化に基づく新しいレコメンデーション手法を導出する。シミュレーションは、フレームワークが動的に複数の公正な関心事を統合する能力を示している。 Algorithmic fairness in the context of personalized recommendation presents significantly different challenges to those commonly encountered in classification tasks. Researchers studying classification have generally considered fairness to be a matter of achieving equality of outcomes between a protected and unprotected group, and built algorithmic interventions on this basis. We argue that fairness in real-world application settings in general, and especially in the context of personalized recommendation, is much more complex and multi-faceted, requiring a more general approach. We propose a model to formalize multistakeholder fairness in recommender systems as a two stage social choice problem. In particular, we express recommendation fairness as a novel combination of an allocation and an aggregation problem, which integrate both fairness concerns and personalized recommendation provisions, and derive new recommendation techniques based on this formulation. Simulations demonstrate the ability of the framework to integrate multiple fairness concerns in a dynamic way.	翻訳日:2024-02-29 00:53:17 公開日:2024-02-27
# PEM: 自動運転車の仮想テストにおける知覚誤差モデル PEM: Perception Error Model for Virtual Testing of Autonomous Vehicles ( http://arxiv.org/abs/2302.11919v2 ) ライセンス: Link先を確認	Andrea Piazzoni, Jim Cherian, Justin Dauwels, Lap-Pui Chau	(参考訳) 自律走行車(AV)のバーチャルテストは安全性評価に不可欠と認識されているものの、AVシミュレータはまだ活発な開発が続けられている。特に難しい問題のひとつは、S&P(Sensing and Perception)サブシステムをシミュレーションループに効果的に組み込むことである。本稿では,知覚誤差がAV安全性に与える影響を,センサ自体をモデル化することなく解析できる仮想シミュレーションコンポーネントである知覚誤りモデル(PEM)を定義する。本稿では,パラメトリックモデリングのための汎用的なデータ駆動手法を提案し,それをオープンソース駆動ソフトウェアであるapolloと,パブリックavデータセットであるnuscenesを用いて評価する。さらに,オープンソースの車両シミュレータSVLにPEMを実装した。さらに、カメラ、LiDAR、カメラ-LiDARのセットアップを評価することにより、PEMベースの仮想テストの有用性を示す。仮想テストでは,現状の評価基準の限界が強調され,提案手法はavの安全性に対する知覚誤差の影響を検証できる。 Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that can enable the analysis of the impact of perception errors on AV safety, without the need to model the sensors themselves. We propose a generalized data-driven procedure towards parametric modeling and evaluate it using Apollo, an open-source driving software, and nuScenes, a public AV dataset. Additionally, we implement PEMs in SVL, an open-source vehicle simulator. Furthermore, we demonstrate the usefulness of PEM-based virtual tests, by evaluating camera, LiDAR, and camera-LiDAR setups. Our virtual tests highlight limitations in the current evaluation metrics, and the proposed approach can help study the impact of perception errors on AV safety.	翻訳日:2024-02-29 00:52:07 公開日:2024-02-27
# ファウショットきめ細かな視覚認識のためのロバスト・サリエンシ・アウェア蒸留法 Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition ( http://arxiv.org/abs/2305.07180v3 ) ライセンス: Link先を確認	Haiqi Liu, C. L. Philip Chen, Xinrong Gong and Tong Zhang	(参考訳) サンプルが少ない新しいサブカテゴリを認識することは、コンピュータビジョンにおいて不可欠で挑戦的な研究課題である。既存の文献では、意味のあるオブジェクト固有の意味理解を十分に促進しない局所的な表現アプローチを採用することで、この課題に対処している。さらに、それらは主に高次元局所ディスクリプタに依存して複雑な埋め込み空間を構築し、一般化を制限している。上記の課題に対処するため,本論文では,数発のきめ細かな視覚認識のための新しいモデルであるロバスト・サリエンシ・アウェア蒸留(RSaD)を提案する。 RSaDは、本質的な識別領域に焦点を合わせるために、塩分検出による追加の塩分対応監視を導入している。具体的には、各サブカテゴリの臨界領域を強調するためにサリエンシ検出モデルを使用し、より詳細な予測のための追加のオブジェクト固有情報を提供する。 RSaDは、これらの情報を2つの対称分岐で相互学習パラダイムで伝達する。さらに、rsadは地域間の関係を利用して表現のインフォメーションを高めるとともに、強調された詳細をコンテキスト埋め込みに要約し、効果的な転送を促進し、新しいサブカテゴリへの迅速な一般化を可能にする。提案手法は3つのベンチマークで実証的に評価され,優れた性能を示す。 Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches, which may not sufficiently facilitate meaningful object-specific semantic understanding, leading to a reliance on apparent background correlations. Moreover, they primarily rely on high-dimensional local descriptors to construct complex embedding space, potentially limiting the generalization. To address the above challenges, this article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition. RSaD introduces additional saliency-aware supervision via saliency detection to guide the model toward focusing on the intrinsic discriminative regions. Specifically, RSaD utilizes the saliency detection model to emphasize the critical regions of each sub-category, providing additional object-specific information for fine-grained prediction. RSaD transfers such information with two symmetric branches in a mutual learning paradigm. Furthermore, RSaD exploits inter-regional relationships to enhance the informativeness of the representation and subsequently summarize the highlighted details into contextual embeddings to facilitate the effective transfer, enabling quick generalization to novel sub-categories. The proposed approach is empirically evaluated on three widely used benchmarks, demonstrating its superior performance.	翻訳日:2024-02-29 00:46:22 公開日:2024-02-27
# HeySQuAD: データセットに疑問を投げかける HeySQuAD: A Spoken Question Answering Dataset ( http://arxiv.org/abs/2304.13689v2 ) ライセンス: Link先を確認	Yijing Wu, SaiKrishna Rallabandi, Ravisutha Srinivasamurthy, Parag Pravin Dakle, Alolika Gon, Preethi Raghavan	(参考訳) 音声質問応答システム(SQA)は,デジタルアシスタントやその他の実世界のユースケースにおいて重要であるが,人為的な質問の重要性から,その性能評価が課題である。本研究では,HeySQuADと呼ばれる大規模コミュニティ共有SQAデータセットを提案する。このデータセットには,76万件の質問,97万件のマシン生成質問,およびそれに対応するSQAD QAデータセットからのテキスト回答が含まれている。我々の目標は、機械が雑音の多い質問を正確に理解し、信頼できる回答を提供する能力を測定することである。広範にわたるテストを通じて,人間-音声および原文の質問を書写した訓練は,原文の質問のみを用いた訓練に比べて,人間-音声の質問に対する回答が大幅に向上する (12.51%) ことを実証した。さらに、高品質な転写での評価は2.03%のさらなる改善につながる可能性がある。本研究は,SQAシステムの開発と実世界のシナリオにおけるユーザニーズを満たす能力に重要な意味を持つ。 Spoken question answering (SQA) systems are critical for digital assistants and other real-world use cases, but evaluating their performance is a challenge due to the importance of human-spoken questions. This study presents a new large-scale community-shared SQA dataset called HeySQuAD, which includes 76k human-spoken questions, 97k machine-generated questions, and their corresponding textual answers from the SQuAD QA dataset. Our goal is to measure the ability of machines to accurately understand noisy spoken questions and provide reliable answers. Through extensive testing, we demonstrate that training with transcribed human-spoken and original SQuAD questions leads to a significant improvement (12.51%) in answering human-spoken questions compared to training with only the original SQuAD textual questions. Moreover, evaluating with a higher-quality transcription can lead to a further improvement of 2.03%. This research has significant implications for the development of SQA systems and their ability to meet the needs of users in real-world scenarios.	翻訳日:2024-02-29 00:43:41 公開日:2024-02-27
# 量子ニューラルネットワークとテンソルネットワークを用いた断面ストックリターン予測 The cross-sectional stock return predictions via quantum neural network and tensor network ( http://arxiv.org/abs/2304.12501v2 ) ライセンス: Link先を確認	Nozomu Kobayashi, Yoshiyuki Suimon, Koichi Miyamoto, Kosuke Mitarai	(参考訳) 本稿では,量子および量子に触発された機械学習アルゴリズムのストックリターン予測への応用について検討する。具体的には,ノイズの多い中間スケール量子コンピュータに適したアルゴリズムであるquantum neural networkと,線形回帰やニューラルネットワークなどの古典モデルに対する量子学習アルゴリズムであるtensor networkの性能を評価する。それらの能力を評価するため、予測に基づいてポートフォリオを構築し、投資実績を測定する。日本の株式市場における実証研究によれば、テンソルネットワークモデルは、線形およびニューラルネットワークモデルを含む従来のベンチマークモデルよりも優れた性能を達成している。量子ニューラルネットワークモデルは、古典的ニューラルネットワークモデルよりもずっと低いリスク調整過剰リターンを達成するが、量子ニューラルネットワークとテンソルネットワークモデルの両方が、最新の市場環境において優れたパフォーマンスを示しており、入力特徴間の非線形性をキャプチャする能力を示している。 In this paper, we investigate the application of quantum and quantum-inspired machine learning algorithms to stock return predictions. Specifically, we evaluate the performance of quantum neural network, an algorithm suited for noisy intermediate-scale quantum computers, and tensor network, a quantum-inspired machine learning algorithm, against classical models such as linear regression and neural networks. To evaluate their abilities, we construct portfolios based on their predictions and measure investment performances. The empirical study on the Japanese stock market shows the tensor network model achieves superior performance compared to classical benchmark models, including linear and neural network models. Though the quantum neural network model attains a lowered risk-adjusted excess return than the classical neural network models over the whole period, both the quantum neural network and tensor network models have superior performances in the latest market environment, which suggests the capability of the model's capturing non-linearity between input features.	翻訳日:2024-02-29 00:43:23 公開日:2024-02-27
# 大規模離散行動空間の動的近傍構築 Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces ( http://arxiv.org/abs/2305.19891v4 ) ライセンス: Link先を確認	Fabian Akkerman, Julius Luy, Wouter van Heeswijk, Maximilian Schiffer	(参考訳) 大規模離散行動空間(LDAS)は、強化学習における中心的な課題である。既存のソリューションアプローチでは、最大数百万のアクションで非構造化LDASを処理できる。しかし、物流、生産、輸送システムにおける現実世界のアプリケーションの多くは、小さなインスタンスでも数百万以上のアクションを展開する複合的なアクションスペースを持っている。幸いなことに、そのような作用空間は構造、例えば等間隔の離散リソース単位を示す。本稿では,現在のベンチマークでは処理できないサイズで構造化lda(sldas)を扱うことに焦点を当て,sldasの新しい活用パラダイムであるdynamic neighborhood construction(dnc)を提案する。本稿では,このパラダイムを応用したスケーラブルな近傍探索ヒューリスティックを提案し,最大10〜73ドルのアクションを持つ構造化された行動空間における連続的プロキシアクションの周囲の離散的近傍を効率的に探索する。 2つの異なる環境にまたがる大きな離散的アクション空間向けに設計された3つの最先端のアプローチに対してベンチマークすることで,本手法の性能を実証する。以上の結果から,dncは計算効率が向上しつつ,最先端の手法に匹敵することを示した。さらに,本手法は,既存の手法では計算的に難解な動作空間にスケールする。 Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning. Existing solution approaches can handle unstructured LDAS with up to a few million actions. However, many real-world applications in logistics, production, and transportation systems have combinatorial action spaces, whose size grows well beyond millions of actions, even on small instances. Fortunately, such action spaces exhibit structure, e.g., equally spaced discrete resource units. With this work, we focus on handling structured LDAS (SLDAS) with sizes that cannot be handled by current benchmarks: we propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS. We present a scalable neighborhood exploration heuristic that utilizes this paradigm and efficiently explores the discrete neighborhood around the continuous proxy action in structured action spaces with up to $10^{73}$ actions. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across two distinct environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being computationally more efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.	翻訳日:2024-02-28 23:01:23 公開日:2024-02-27
# リアルタイム反復学習の約束と限界を探る Exploring the Promise and Limits of Real-Time Recurrent Learning ( http://arxiv.org/abs/2305.19044v2 ) ライセンス: Link先を確認	Kazuki Irie, Anand Gopalakrishnan, J\"urgen Schmidhuber	(参考訳) シーケンス処理リカレントニューラルネットワーク(rnns)のためのリアルタイムリカレント学習(rtrl)は、バックプロパゲーション時間(bptt)よりも概念上の利点を提供する。 RTRLは過去のアクティベーションやトラッピングコンテキストをキャッシュする必要がなく、オンライン学習を可能にする。しかし、rtrlの時間と空間の複雑さは実用的でない。この問題を解決するために、RTRLに関する最近の研究は近似理論に焦点を当てているが、実験は診断設定に限られることが多い。本稿では,より現実的な環境でのRTRLの実践的可能性について考察する。 DMLab-30, ProcGen, Atari-2600環境のいくつかのサブセットにおいて, RTRLとポリシー勾配を組み合わせたアクタ批判手法を検証した。 DMLabのメモリタスクでは、1.2B未満の環境フレームでトレーニングしたシステムは、よく知られたIMPALAとR2D2のベースラインで10Bフレームでトレーニングしたよりも優れている。このような困難なタスクにスケールするために、要素毎の繰り返しを伴う既知のニューラルアーキテクチャにフォーカスし、rtrlを近似することなく扱いやすいものにした。重要なのは、マルチレイヤの場合の複雑さなど、実世界のアプリケーションにおけるRTRLの制限にほとんど対処しないことだ。 Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. Importantly, we also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.	翻訳日:2024-02-28 23:01:04 公開日:2024-02-27
# 接続性を考慮した等価回路平均化のための量子フレドキンとトフォリゲートの浅一元分解 Shallow unitary decompositions of quantum Fredkin and Toffoli gates for connectivity-aware equivalent circuit averaging ( http://arxiv.org/abs/2305.18128v3 ) ライセンス: Link先を確認	Pedro M. Q. Cruz, Bruno Murta	(参考訳) 制御SWAPと制御制御NOTゲートは、FredkinとToffoliによる可逆的古典計算の提案の中心である。量子計算において広く使われているのは、量子アルゴリズムの古典論理サブルーチンの実装と、直接古典的手法を持たない量子スキームの両方であり、異なる物理プラットフォームに固有の下層ゲートセットの観点でそれらの効率的な分解を追求することが、早くから必須である。ここでは、全てのおよび線形量子ビット接続の下で、トフォリゲートとフレドキンゲートに対して論理的に等価な回路を提供し、後者は制御とターゲット量子ビットのための2つの異なるルーティングを持つ。これら全ての構成の文献における最低cnot数を達成するとともに、等価回路平均化による近距離量子コンピュータにおけるコヒーレントエラーの軽減における、得られた分解の有効性を実証する。まず,コヒーレントノイズモデルを用いてシリコの手法の性能を定量化し,超伝導量子プロセッサで実験的に検証する。さらに、トフォリゲートやフレドキンゲートが非自明に作用する3つのキュービットが隣接していない場合について考察し、SWAP毎に1つのCNOTを節約する新しいスキームを提案する。このスキームは、長距離CNOTの浅い実装にも使われる。本結果は,効率的な量子回路の設計において,異なる絡み合うゲート構造と接続制約を考えることの重要性を強調した。 The controlled-SWAP and controlled-controlled-NOT gates are at the heart of the original proposal of reversible classical computation by Fredkin and Toffoli. Their widespread use in quantum computation, both in the implementation of classical logic subroutines of quantum algorithms and in quantum schemes with no direct classical counterparts, has made it imperative early on to pursue their efficient decomposition in terms of the lower-level gate sets native to different physical platforms. Here, we add to this body of literature by providing several logically equivalent circuits for the Toffoli and Fredkin gates under all-to-all and linear qubit connectivity, the latter with two different routings for control and target qubits. Besides achieving the lowest CNOT counts in the literature for all these configurations, we also demonstrate the remarkable effectiveness of the obtained decompositions at mitigating coherent errors on near-term quantum computers via equivalent circuit averaging. We first quantify the performance of the method in silico with a coherent-noise model before validating it experimentally on a superconducting quantum processor. In addition, we consider the case where the three qubits on which the Toffoli or Fredkin gates act nontrivially are not adjacent, proposing a novel scheme to reorder them that saves one CNOT for every SWAP. This scheme also finds use in the shallow implementation of long-range CNOTs. Our results highlight the importance of considering different entangling gate structures and connectivity constraints when designing efficient quantum circuits.	翻訳日:2024-02-28 23:00:41 公開日:2024-02-27
# 測定不整合性は外乱よりも強い Measurement incompatibility is strictly stronger than disturbance ( http://arxiv.org/abs/2305.16931v5 ) ライセンス: Link先を確認	Marco Erba, Paolo Perinotti, Davide Rolino, Alessandro Tosini	(参考訳) ハイゼンベルクの不確実性原理に対するヒューリスティックな議論の核心は、有名な$\gamma$-ray microscope $\textit{Gedankenexperiment}$ であり、それらが作用しているシステムの状態を不可逆的に変化させる測定の存在に依存し、その後の測定に不可解な障害を引き起こす。この議論は、量子論における測定の不整合性を正当化するため、すなわち、測定障害の不可逆性とは異なると理解されている、共同で行うことができない測定の存在を正当化するために行われた。本稿では,測定の不整合性が測定障害の不可逆性に十分な条件であることを示す説得力のある論証を提供する一方で,逆含意の反例である最小古典理論(MCT)と呼ばれる玩具理論を示す。この理論は古典的であり、相補性も不確実性の関係も持たず、コッチェン・スペックと一般化された非文脈的関係である。しかし、MCTは測定障害の可逆性だけでなく、外乱や放送のない情報の性質も満足しており、非古典性のシグネチャとして$\textit{per se}$を理解できないことを意味する。 The core of Heisenberg's heuristic argument for the uncertainty principle, involving the famous $\gamma$-ray microscope $\textit{Gedankenexperiment}$, hinges upon the existence of measurements that irreversibly alter the state of the system on which they are acting, causing an irreducible disturbance on subsequent measurements. The argument was put forward to justify measurement incompatibility in quantum theory, namely, the existence of measurements that cannot be performed jointly$-$a feature that is now understood to be different from irreversibility of measurement disturbance, though related to it. In this article, on the one hand, we provide a compelling argument showing that measurement incompatibility is indeed a sufficient condition for irreversibility of measurement disturbance; while, on the other hand, we exhibit a toy theory, termed the minimal classical theory (MCT), that is a counterexample for the converse implication. This theory is classical, hence it does not have complementarity nor preparation uncertainty relations, and it is both Kochen-Specker and generalised noncontextual. However, MCT satisfies not only irreversibility of measurement disturbance, but also the properties of no-information without disturbance and no-broadcasting, implying that these cannot be understood $\textit{per se}$ as signatures of nonclassicality.	翻訳日:2024-02-28 22:59:23 公開日:2024-02-27
# adaptive chameleon or stubborn sloth: 知識衝突における大規模言語モデルの振る舞いを明らかにする Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts ( http://arxiv.org/abs/2305.13300v4 ) ライセンス: Link先を確認	Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su	(参考訳) 大規模言語モデル(LLM)に外部情報を提供することにより、LLMの静的パラメトリックメモリの限界に対処するための有望なソリューションとしてツール拡張(検索拡張を含む)が登場した。しかし、その証拠がパラメトリックメモリと矛盾する場合、LCMはこのような外部証拠に対してどの程度受容的か? 知識衝突に遭遇したLLMの行動に関する包括的かつ制御された最初の調査について述べる。本研究では,LLMから高品質なパラメトリックメモリを抽出し,対応する対向メモリを構築するための体系的枠組みを提案する。本研究は, LLMの動作に矛盾すると思われることが判明した。一方, 従来の知恵とは違って, LLM は, パラメトリックメモリと矛盾する場合であっても, 外部の証拠が一貫性があり, 説得力があることを考えると, 外部の証拠に対して高い受容性を持つことがわかった。一方、LCMは、矛盾する証拠を同時に提示されているにもかかわらず、外部証拠がパラメトリックメモリと整合した情報を含む場合、強い確証バイアスを示す。これらの結果は,ツールおよび検索拡張LDMのさらなる開発と展開に注意すべき重要な意味を持つ。リソースはhttps://github.com/OSU-NLP-Group/LLM-Knowledge-Conflictで入手できる。 By providing external information to large language models (LLMs), tool augmentation (including retrieval augmentation) has emerged as a promising solution for addressing the limitations of LLMs' static parametric memory. However, how receptive are LLMs to such external evidence, especially when the evidence conflicts with their parametric memory? We present the first comprehensive and controlled investigation into the behavior of LLMs when encountering knowledge conflicts. We propose a systematic framework to elicit high-quality parametric memory from LLMs and construct the corresponding counter-memory, which enables us to conduct a series of controlled experiments. Our investigation reveals seemingly contradicting behaviors of LLMs. On the one hand, different from prior wisdom, we find that LLMs can be highly receptive to external evidence even when that conflicts with their parametric memory, given that the external evidence is coherent and convincing. On the other hand, LLMs also demonstrate a strong confirmation bias when the external evidence contains some information that is consistent with their parametric memory, despite being presented with conflicting evidence at the same time. These results pose important implications that are worth careful consideration for the further development and deployment of tool- and retrieval-augmented LLMs. Resources are available at https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict.	翻訳日:2024-02-28 22:58:16 公開日:2024-02-27
# ニューラルテキスト生成のためのフラストレーションに簡素な復号法 A Frustratingly Simple Decoding Method for Neural Text Generation ( http://arxiv.org/abs/2305.12675v2 ) ライセンス: Link先を確認	Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi	(参考訳) ニューラルネットワーク生成にFSD(Frustratingly Simple Decoding)と呼ぶ,非常に単純で,超効率的で,驚くほど効果的な復号法を導入する。 FSDの背景にある考え方は単純で、私たちは以前に生成されたテキストに基づいてアンチLMを構築し、このアンチLMを使用して、生成したものの将来の世代を罰する。アンチlmはn-gram言語モデルやベクタ化変種のように簡単に実装できる。このように、FSDは余分なモデルパラメータや無視可能な計算オーバーヘッドを導入しない(FSDは欲求探索と同じくらい高速である)。実験によれば、fsdは、最近提案されたいくつかの強力なベースラインと同様に、現在の標準的手法(すなわち核サンプリング)よりも優れています。 We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.	翻訳日:2024-02-28 22:57:32 公開日:2024-02-27
# プロンプトチューニングにおけるスケルトン援用プロンプト転送によるマイトショット対話要約 Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer in Prompt Tuning ( http://arxiv.org/abs/2305.12077v2 ) ライセンス: Link先を確認	Kaige Xie, Tong Yu, Haoliang Wang, Junda Wu, Handong Zhao, Ruiyi Zhang, Kanak Mahadik, Ani Nenkova, Mark Riedl	(参考訳) 実世界のシナリオでは、対話要約のためのラベル付きサンプルは通常、高品質な対話要約のための高いアノテーションコストのために制限される。少数のサンプルから効率的に学習するために、以前の研究では、他の下流タスクからの大量の注釈付きデータを活用し、プロンプトチューニングでプロンプト転送を行い、クロスタスクの知識転送を可能にした。しかし、既存の汎用的なプロンプト転送技術は、対話特有の情報を考慮していない。本稿では,対話状態追跡から対話要約への素早い伝達を改善することに着目し,スケルトン生成を個別のソースとターゲットタスクを繋ぐ媒体として機能する余分な監督として活用し,対話状態情報のより優れた消費を実現するためのスケルトン支援プロントランスファー(sapt)を提案する。骨格生成のための教師付きトレーニングデータとして対話スケルトンを自動的に抽出するために,アノテーションやドメイン知識を必要としない摂動型プローブを用いた新しいアプローチを設計する。このようなスケルトン上でモデルをトレーニングすることは、即時転送時のモデル能力の維持にも役立ちます。我々の手法は既存のベースラインを大きく上回る。本手法は,対話要約におけるタスク間知識伝達の促進に有効であることを示す。 In real-world scenarios, labeled samples for dialogue summarization are usually limited (i.e., few-shot) due to high annotation costs for high-quality dialogue summaries. To efficiently learn from few-shot samples, previous works have utilized massive annotated data from other downstream tasks and then performed prompt transfer in prompt tuning so as to enable cross-task knowledge transfer. However, existing general-purpose prompt transfer techniques lack consideration for dialogue-specific information. In this paper, we focus on improving the prompt transfer from dialogue state tracking to dialogue summarization and propose Skeleton-Assisted Prompt Transfer (SAPT), which leverages skeleton generation as extra supervision that functions as a medium connecting the distinct source and target task and resulting in the model's better consumption of dialogue state information. To automatically extract dialogue skeletons as supervised training data for skeleton generation, we design a novel approach with perturbation-based probes requiring neither annotation effort nor domain knowledge. Training the model on such skeletons can also help preserve model capability during prompt transfer. Our method significantly outperforms existing baselines. In-depth analyses demonstrate the effectiveness of our method in facilitating cross-task knowledge transfer in few-shot dialogue summarization.	翻訳日:2024-02-28 22:56:52 公開日:2024-02-27
# 情報最大化による機能的十分な次元削減と分類への応用 Functional sufficient dimension reduction through information maximization with application to classification ( http://arxiv.org/abs/2305.10880v3 ) ライセンス: Link先を確認	Xinyu Li and Jianjun Xu and Wenquan Cui and Haoyang Cheng	(参考訳) 応答変数がカテゴリー変数であり、予測器がランダム関数である場合を考えると、相互情報と正方損失相互情報に基づいて2つの新しい機能的十分次元還元法(FSDR)が提案される。関数スライスされた逆回帰法や関数スライスされた平均分散推定法などの古典的FSDR法と比較して,比較的少数のカテゴリ,特にバイナリ応答において,複数の有効次元縮小方向を推定できるため,提案手法は魅力的である。さらに,提案手法では,制約付き線形条件付き平均仮定と定数共分散仮定は不要である。彼らは共分散作用素の逆問題を避け、しばしば機能的十分次元の還元で遭遇する。トランケーションを用いた機能主成分分析は正規化機構として用いられる。穏やかな条件下では,提案手法の統計的一貫性が確立される。この2つの手法は,シミュレーションと実データ解析による既存のFSDR法と競合することを示した。 Considering the case where the response variable is a categorical variable and the predictor is a random function, two novel functional sufficient dimensional reduction (FSDR) methods are proposed based on mutual information and square loss mutual information. Compared to the classical FSDR methods, such as functional sliced inverse regression and functional sliced average variance estimation, the proposed methods are appealing because they are capable of estimating multiple effective dimension reduction directions in the case of a relatively small number of categories, especially for the binary response. Moreover, the proposed methods do not require the restrictive linear conditional mean assumption and the constant covariance assumption. They avoid the inverse problem of the covariance operator which is often encountered in the functional sufficient dimension reduction. The functional principal component analysis with truncation be used as a regularization mechanism. Under some mild conditions, the statistical consistency of the proposed methods is established. It is demonstrated that the two methods are competitive compared with some existing FSDR methods by simulations and real data analyses.	翻訳日:2024-02-28 22:56:15 公開日:2024-02-27
# ジョイントベル計測による可変量子固有解法高速化 Accelerated variational quantum eigensolver with joint Bell measurement ( http://arxiv.org/abs/2307.00766v3 ) ライセンス: Link先を確認	Chenfeng Cao, Hiroshi Yano, Yuya O. Nakagawa	(参考訳) 変分量子固有解法(VQE)は、量子化学において分子ハミルトニアンの基底状態を得るために、短期量子コンピュータのための顕著な量子古典ハイブリッドアルゴリズムである。しかし、ハミルトニアンにおけるパウリ作用素の非可換性のため、量子コンピュータに要求される測定量は、システムのサイズが大きくなるにつれて著しく増加し、VQEの実用的な応用を妨げる可能性がある。本稿では,JBM-VQE (Joint Bell Measurement VQE) と呼ばれるプロトコルを提案する。本手法では、ハミルトニアンに存在するパウリ作用素のすべての期待値の絶対値を同時に測定できるジョイントベル測定器を用いる。最適化の過程では、jbm-vqeはジョイントベル測定により各イテレーション毎のポーリ演算子の期待値の絶対値を推定するが、それらの符号は従来の方法による期待値の測定ではより少ない頻度で測定される。我々のアプローチは、最適化中に標識が頻繁に変化しないという経験的観察に基づいている。小分子の分子ハミルトニアン基底状態を求める数値シミュレーションによる従来のVQEと比較して、JBM-VQEの高速化と、最適化の初期段階におけるJBM-VQEの高速化は、大規模システムではますます顕著になっている。共同ベル測定に基づくアプローチは、VQEに限らず、コスト関数が多くのパウリ演算子の期待値である様々な量子アルゴリズムで利用することができる。 The variational quantum eigensolver (VQE) stands as a prominent quantum-classical hybrid algorithm for near-term quantum computers to obtain the ground states of molecular Hamiltonians in quantum chemistry. However, due to the non-commutativity of the Pauli operators in the Hamiltonian, the number of measurements required on quantum computers increases significantly as the system size grows, which may hinder practical applications of VQE. In this work, we present a protocol termed joint Bell measurement VQE (JBM-VQE) to reduce the number of measurements and speed up the VQE algorithm. Our method employs joint Bell measurements, enabling the simultaneous measurement of the absolute values of all expectation values of Pauli operators present in the Hamiltonian. In the course of the optimization, JBM-VQE estimates the absolute values of the expectation values of the Pauli operators for each iteration by the joint Bell measurement, while the signs of them are measured less frequently by the conventional method to measure the expectation values. Our approach is based on the empirical observation that the signs do not often change during optimization. We illustrate the speed-up of JBM-VQE compared to conventional VQE by numerical simulations for finding the ground states of molecular Hamiltonians of small molecules, and the speed-up of JBM-VQE at the early stage of the optimization becomes increasingly pronounced in larger systems. Our approach based on the joint Bell measurement is not limited to VQE and can be utilized in various quantum algorithms whose cost functions are expectation values of many Pauli operators.	翻訳日:2024-02-28 22:50:24 公開日:2024-02-27
# atlas: atlasによる3次元医用画像分割のためのテスト時間適応法 Pay Attention to the Atlas: Atlas-Guided Test-Time Adaptation Method for Robust 3D Medical Image Segmentation ( http://arxiv.org/abs/2307.00676v2 ) ライセンス: Link先を確認	Jingjie Guo, Weitong Zhang, Matthew Sinclair, Daniel Rueckert, Chen Chen	(参考訳) 畳み込みニューラルネットワーク(CNN)は、トレーニング(ソース)データ分布とは異なるターゲットデータでテストした場合、特に、異なる臨床部位とスキャナーにわたるイメージングプロトコルのバリエーションが異なる画像の出現につながる医療画像アプリケーションにおいて、パフォーマンスが低下することが多い。しかし、教師なしドメイン適応のためのソーストレーニングデータの再アクセスやモデル微調整のための追加テストデータへのラベル付けは、それぞれプライバシー問題と高いラベル付けコストのために困難である。そこで本研究では,AdaAtlas と呼ばれる,堅牢な3次元医用画像分割のための新しいatlas-guided test-time adaptation (TTA)法を提案する。 AdaAtlasは1つの未ラベルのテストサンプルのみを入力として取り、アトラスベースの損失を最小限に抑えてセグメンテーションネットワークに適応する。具体的には、登録後の予測がatlas空間で学習されたatlasと一致するようにネットワークを適応させ、テスト時に解剖学的セグメンテーションエラーを低減させる。また、セグメント化ネットワークにおけるバッチ正規化ブロックへの適応を制限する既存のほとんどのTTA手法とは異なり、テスト時の適応性を向上させるためにチャネルおよび空間アテンションブロックの利用をさらに活用する。 AdaAtlas-Attention(AdaAtlas-Attention)に適応したアテンションブロックを持つAdaAtlasは優れたパフォーマンス向上を実現し、他の競合するTTA手法よりも大幅に優れていた。 Convolutional neural networks (CNNs) often suffer from poor performance when tested on target data that differs from the training (source) data distribution, particularly in medical imaging applications where variations in imaging protocols across different clinical sites and scanners lead to different imaging appearances. However, re-accessing source training data for unsupervised domain adaptation or labeling additional test data for model fine-tuning can be difficult due to privacy issues and high labeling costs, respectively. To solve this problem, we propose a novel atlas-guided test-time adaptation (TTA) method for robust 3D medical image segmentation, called AdaAtlas. AdaAtlas only takes one single unlabeled test sample as input and adapts the segmentation network by minimizing an atlas-based loss. Specifically, the network is adapted so that its prediction after registration is aligned with the learned atlas in the atlas space, which helps to reduce anatomical segmentation errors at test time. In addition, different from most existing TTA methods which restrict the adaptation to batch normalization blocks in the segmentation network only, we further exploit the use of channel and spatial attention blocks for improved adaptability at test time. Extensive experiments on multiple datasets from different sites show that AdaAtlas with attention blocks adapted (AdaAtlas-Attention) achieves superior performance improvements, greatly outperforming other competitive TTA methods.	翻訳日:2024-02-28 22:49:50 公開日:2024-02-27
# 人間アライメントのための選好ランキング最適化 Preference Ranking Optimization for Human Alignment ( http://arxiv.org/abs/2306.17492v2 ) ライセンス: Link先を確認	Feifan Song, Bowen Yu, Minghao Li, Haiyang Yu, Fei Huang, Yongbin Li and Houfeng Wang	(参考訳) 大規模言語モデル(llm)は、しばしば誤解を招くコンテンツを含んでおり、安全なaiシステムを確保するために、それらを人間の価値観に合わせる必要性を強調している。人間のフィードバックからの強化学習(RLHF)がこのアライメントを達成するために採用されている。しかしながら、(1)RLHFはSFTとは対照的に複雑さ、不安定性、過度パラメータに対する感受性を示す。 2) 大規模な試行錯誤にもかかわらず,複数サンプリングはペアのコントラストに還元され,マクロの観点からのコントラストが欠如している。本稿では,人間のアライメントを直接微調整するための効率的なSFTアルゴリズムとして,優先度ランキング最適化(PRO)を提案する。 PROは任意の長さの選好ランクに対応するためにペアワイズコントラストを拡張する。候補を反復的に対比することにより、Prop は LLM に対して、残りの反応を段階的にランク付けしながら、最良の応答を優先順位付けするように指示する。このように、Proは人間のアライメントを効果的に変換し、LLMが生成したn応答の確率ランキングと人間の選好ランクをこれらの応答に整合させる。 Proはベースラインアルゴリズムより優れており、自動ベース、報酬ベース、GPT-4、および人間の評価によって、ChatGPTと人間の反応に匹敵する結果が得られる。 Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values to ensure secure AI systems. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment. However, it encompasses two main drawbacks: (1) RLHF exhibits complexity, instability, and sensitivity to hyperparameters in contrast to SFT. (2) Despite massive trial-and-error, multiple sampling is reduced to pair-wise contrast, thus lacking contrasts from a macro perspective. In this paper, we propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to directly fine-tune LLMs for human alignment. PRO extends the pair-wise contrast to accommodate preference rankings of any length. By iteratively contrasting candidates, PRO instructs the LLM to prioritize the best response while progressively ranking the rest responses. In this manner, PRO effectively transforms human alignment into aligning the probability ranking of n responses generated by LLM with the preference ranking of humans towards these responses. Experiments have shown that PRO outperforms baseline algorithms, achieving comparable results to ChatGPT and human responses through automatic-based, reward-based, GPT-4, and human evaluations.	翻訳日:2024-02-28 22:49:21 公開日:2024-02-27
# オペレーター学習における次元の呪い The curse of dimensionality in operator learning ( http://arxiv.org/abs/2306.15924v2 ) ライセンス: Link先を確認	Samuel Lanthaler and Andrew M. Stuart	(参考訳) ニューラルネットワークを用いて、関数のバナッハ空間間の演算子マッピングを近似し、エミュレーションによってモデル評価を加速したり、データからモデルを発見したりすることができる。その結果,近年,この手法が注目され,オペレーター学習の分野が急速に拡大している。この論文の第一の貢献は、C^r$-あるいはリプシッツ正則性のみによって特徴づけられる作用素の一般クラスに対して、無限次元の入力および出力関数空間の表現に関して正確に定義された次元性の呪いに苦しむことである。その結果は、PCA-Net、DeepONet、FNOなど、さまざまな既存のニューラル演算子に適用できる。この論文の第二の貢献は、ハミルトン・ヤコビ方程式によって定義される解作用素に対して、次元性の一般的な呪いが克服可能であることを証明することである。この目的のために、hj-netと呼ばれる新しいニューラルオペレーターアーキテクチャが導入され、基盤となるハミルトン系の特性情報を明示的に考慮した。 hj-net の誤差と複雑性の推定は、このアーキテクチャが無限次元の入出力関数空間に関連する次元の呪いを打ち負かすことができることを示している。 Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a curse of dimensionality, defined precisely here in terms of representations of the infinite-dimensional input and output function spaces. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO. The second contribution of the paper is to prove that the general curse of dimensionality can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of dimensionality related to the infinite-dimensional input and output function spaces.	翻訳日:2024-02-28 22:48:59 公開日:2024-02-27
# 効率的なコンテクストフォーマ:学習画像圧縮における高速コンテクストモデリングのための時空間ウィンドウアテンション Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression ( http://arxiv.org/abs/2306.14287v2 ) ライセンス: Link先を確認	A. Burakhan Koyuncu, Panqi Jia, Atanas Boev, Elena Alshina, Eckehard Steinbach	(参考訳) エントロピー推定は学習画像圧縮の性能に不可欠である。変換器に基づくエントロピーモデルが高い圧縮比を達成する上で重要であることが実証されているが、かなりの計算努力を犠牲にしている。本稿では, 学習画像圧縮のための, 計算効率の良い変換器に基づく自己回帰文脈モデルである, 効率的なコンテキストフォーマ(eContextformer)を提案する。 eContextformerは、並列コンテキストモデリングのためのパッチワイド、チェッカード、チャネルワイドのグルーピング技術を効率よく融合し、シフトウインドウスパ比チャネルアテンション機構を導入する。より優れたトレーニング戦略とアーキテクチャ設計を検討し、さらなる複雑さの最適化を導入します。符号化中,提案手法は注意スパンを動的にスケールし,それまでの注意力計算をキャッシュし,モデルとランタイムの複雑さを劇的に低減する。非並列アプローチと比較して,提案手法はモデルの複雑さが約145倍小さく,デコード速度が約210倍速く,kodak,clic2020,tecnickのデータセットで平均的なビット節約を実現する。さらに,コンテクストモデルの複雑さが低く,オンラインレートゆがみアルゴリズムが可能となり,圧縮性能がさらに向上した。汎用ビデオ符号化(vvc)テストモデル(vtm)16.2のイントラコーディングよりも最大17%のビットレート節約を達成し,様々な学習に基づく圧縮モデルを上回る。 Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autoregressive context model for learned image compression. The eContextformer efficiently fuses the patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling, and introduces a shifted window spatio-channel attention mechanism. We explore better training strategies and architectural designs and introduce additional complexity optimizations. During decoding, the proposed optimization techniques dynamically scale the attention span and cache the previous attention computations, drastically reducing the model and runtime complexity. Compared to the non-parallel approach, our proposal has ~145x lower model complexity and ~210x faster decoding speed, and achieves higher average bit savings on Kodak, CLIC2020, and Tecnick datasets. Additionally, the low complexity of our context model enables online rate-distortion algorithms, which further improve the compression performance. We achieve up to 17% bitrate savings over the intra coding of Versatile Video Coding (VVC) Test Model (VTM) 16.2 and surpass various learning-based compression models.	翻訳日:2024-02-28 22:48:35 公開日:2024-02-27
# 教師付き学習のためのマスキング強化 Masking Augmentation for Supervised Learning ( http://arxiv.org/abs/2306.11339v2 ) ライセンス: Link先を確認	Byeongho Heo, Taekyung Kim, Sangdoo Yun, Dongyoon Han	(参考訳) ランダムマスキングを用いた事前トレーニングは、トレーニング技術の新たなトレンドとして現れている。しかしながら、教師付き学習は、主に不安定なトレーニングのために、マスキング強化を採用するという課題に直面している。本稿では,masksub (masked sub-model) と呼ばれるマスキング拡張を含む新しい手法を提案する。 MaskSubはメインモデルとサブモデルで構成され、前者は通常のトレーニングレシピを楽しみ、後者はトレーニングにおける強力なマスキング強化の利点を活用する。 MaskSubは、自己蒸留損失に似た緩和された損失関数を通じて副作用を緩和することで、この課題に対処する。分析の結果,MaskSubはトレーニングの損失が通常のトレーニングよりも早く収束し,パフォーマンスが向上することが示唆された。さらに、DeiT-III、MAEファインチューニング、CLIPファインチューニング、ResNet、Swin Transformerなど、さまざまなトレーニングレシピやモデルのMaskSubを検証する。 masksubは,すべてのケースにおいて,一貫して大幅なパフォーマンス向上を実現しています。 MaskSubは、様々なトレーニングレシピの下で追加の正規化を導入するための実用的で効果的なソリューションを提供する。コードはhttps://github.com/naver-ai/augsubで利用可能 Pre-training using random masking has emerged as a novel trend in training techniques. However, supervised learning faces a challenge in adopting masking augmentations, primarily due to unstable training. In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-model (MaskSub). MaskSub consists of the main-model and sub-model; while the former enjoys conventional training recipes, the latter leverages the benefit of strong masking augmentations in training. MaskSub addresses the challenge by mitigating adverse effects through a relaxed loss function similar to a self-distillation loss. Our analysis shows that MaskSub improves performance, with the training loss converging even faster than regular training, which suggests our method facilitates training. We further validate MaskSub across diverse training recipes and models, including DeiT-III, MAE fine-tuning, CLIP fine-tuning, ResNet, and Swin Transformer. Our results show that MaskSub consistently provides significant performance gains across all the cases. MaskSub provides a practical and effective solution for introducing additional regularization under various training recipes. Code available at https://github.com/naver-ai/augsub	翻訳日:2024-02-28 22:47:38 公開日:2024-02-27
# マルチバススピンボーソンモデルにおけるエンタングルメントの強化 Enhanced entanglement in multi-bath spin-boson models ( http://arxiv.org/abs/2306.11036v2 ) ライセンス: Link先を確認	Charlie R. Hogg, Federico Cerisola, James D. Cresser, Simon A. R. Horsley, Janet Anders	(参考訳) スピン-ボーソンモデルは、通常、スピンと単一のボソニック浴との結合を考える。しかし、いくつかの物理的状況ではスピンを複数の環境に結合する必要がある。例えば、スピンは3次元磁気材料中のフォノンと相互作用する。ここではスピン結合を3つの独立浴に等方的に考える。複数浴室との結合は, スピンと環境との絡み合いを0温度で著しく増大させることを示した。この効果は、平均力平衡状態におけるスピンの期待値を減少させることである。対照的に、古典的な3塩基スピン平衡状態は環境結合から完全に独立であることが判明した。これらの結果から、多重バス結合から生じる純粋に量子効果が明らかとなり、磁気材料など幅広い分野で応用される可能性がある。 The spin-boson model usually considers a spin coupled to a single bosonic bath. However, some physical situations require coupling of the spin to multiple environments. For example, spins interacting with phonons in three-dimensional magnetic materials. Here, we consider a spin coupled isotropically to three independent baths. We show that coupling to multiple baths can significantly increase entanglement between the spin and its environment at zero temperature. The effect of this is to reduce the spin's expectation values in the mean force equilibrium state. In contrast, the classical three-bath spin equilibrium state turns out to be entirely independent of the environmental coupling. These results reveal purely quantum effects that can arise from multi-bath couplings, with potential applications in a wide range of settings, such as magnetic materials.	翻訳日:2024-02-28 22:47:17 公開日:2024-02-27
# OCAtari:オブジェクト中心のAtari 2600強化学習環境 OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments ( http://arxiv.org/abs/2306.08649v2 ) ライセンス: Link先を確認	Quentin Delfosse, Jannis Bl\"uml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting	(参考訳) 認知科学と心理学は、複雑なシーンのオブジェクト中心の表現が、低レベルの知覚的特徴から効率的な抽象的推論を実現するための有望なステップであることを示唆している。しかし、最も深い強化学習アプローチは、自然のシーンの合成特性を捉えないピクセルベースの表現にのみ依存する。そのためには、オブジェクト指向アプローチの作業と評価を可能にする環境とデータセットが必要です。本研究は,OCAtariの導入により,深層RLアプローチの最も有用な評価フレームワークであるAtari Learning Environmentsを拡張し,これらのゲームに対して,オブジェクト中心状態の資源効率の高い抽出を行う。我々のフレームワークは、オブジェクト発見、オブジェクト表現学習、およびオブジェクト中心のRLを可能にします。我々はOCAtariの検出能力と資源効率を評価する。ソースコードはgithub.com/k4ntz/oc_atariから入手できます。 Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches only rely on pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. In our work, we extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari, that performs resource-efficient extractions of the object-centric states for these games. Our framework allows for object discovery, object representation learning, as well as object-centric RL. We evaluate OCAtari's detection capabilities and resource efficiency. Our source code is available at github.com/k4ntz/OC_Atari.	翻訳日:2024-02-28 22:47:07 公開日:2024-02-27
# gnomonic equiangular projectionを用いた生成逆ネットワークを用いたhrtfアップサンプリング HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection ( http://arxiv.org/abs/2306.05812v2 ) ライセンス: Link先を確認	Aidan O. T. Hogg, Mads Jenkins, He Liu, Isaac Squires, Samuel J. Cooper and Lorenzo Picinali	(参考訳) 個人化された頭部関連伝達関数(HRTF)は、現実的な仮想現実(VR)と拡張現実(AR)環境を作成する上で非常に重要である。しかし、高品質のHRTFを音響的に測定するには高価な機器と音響実験室が必要だ。これらの制限を克服し、この測定をより効率的にするために、高分解能HRTFが低分解能のHRTFから生成される過去に利用されてきた。本稿では,hrtfアップサンプリングにgans(generative adversarial network)を適用する方法を示す。本稿では,畳み込み型超解像生成対向ネットワーク(SRGAN)を用いてHRTFデータを直接利用するための新しい手法を提案する。この新しいアプローチは、barycentric upsampling、 sphere harmonic (sh) upsampling、hrtf selection approachの3つのベースラインに対してベンチマークされている。実験の結果,入力hrtfがスパース(測定位置20以下)である場合,対数スペクトル歪み(lsd)および知覚モデルを用いた局所化性能において,提案手法が3つのベースラインを上回った。 An individualised head-related transfer function (HRTF) is very important for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against three baselines: barycentric upsampling, spherical harmonic (SH) upsampling and an HRTF selection approach. Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse (less than 20 measured positions).	翻訳日:2024-02-28 22:46:25 公開日:2024-02-27
# 変分ガウス過程拡散過程 Variational Gaussian Process Diffusion Processes ( http://arxiv.org/abs/2306.02066v3 ) ライセンス: Link先を確認	Prakhar Verma, Vincent Adam, Arno Solin	(参考訳) 拡散過程は、動的モデリングタスクで自然に発生する豊かな表現型モデル群を提供する確率微分方程式(sdes)のクラスである。非線型拡散過程が先行する潜在過程を持つ生成モデルの下での確率的推論と学習は難解な問題である。我々は,後方過程を線形拡散過程として近似し,そのアプローチの病理を指摘している。サイトベース指数関数型家族記述を用いたガウス変分過程の代替パラメータ化を提案する。これにより、自然な勾配降下に類似した凸最適化のための高速アルゴリズムに対して、固定点反復と遅い推論アルゴリズムを交換することが可能となり、モデルパラメータを学習するためのより良い目的がもたらされる。 Diffusion processes are a class of stochastic differential equations (SDEs) providing a rich family of expressive models that arise naturally in dynamic modelling tasks. Probabilistic inference and learning under generative models with latent processes endowed with a non-linear diffusion process prior are intractable problems. We build upon work within variational inference, approximating the posterior process as a linear diffusion process, and point out pathologies in the approach. We propose an alternative parameterization of the Gaussian variational process using a site-based exponential family description. This allows us to trade a slow inference algorithm with fixed-point iterations for a fast algorithm for convex optimization akin to natural gradient descent, which also provides a better objective for learning model parameters.	翻訳日:2024-02-28 22:45:56 公開日:2024-02-27
# 視覚言語モデルのための一貫性誘導型プロンプト学習 Consistency-guided Prompt Learning for Vision-Language Models ( http://arxiv.org/abs/2306.01195v3 ) ライセンス: Link先を確認	Shuvendu Roy, Ali Etemad	(参考訳) 視覚言語モデルのための新しい微調整手法であるConsistency-Guided Prompt Learning (CoPrompt)を提案する。提案手法は,下流タスクを数ショットで微調整した場合に,大規模な基礎モデルの一般化を改善する。 CoPromptの基本的な考え方は、トレーニング可能なモデルと事前訓練されたモデルの予測に一貫性の制約を適用して、下流タスクの過度な適合を防ぐことである。さらに,2つの入力に一貫性を強制し,チューニング,プロンプト,アダプタという2つの支配的なパラダイムを組み合わせることで,一貫性の制約をさらに向上させます。摂動入力における一貫性の強制は、一貫性の制約をさらに規則化し、一般化を改善するのに役立つ。さらに、アダプタとプロンプトの統合により、下流タスクのパフォーマンスが向上するだけでなく、入出力スペースにおけるチューニング柔軟性も向上している。これにより、数ショットの学習環境で下流タスクへのより効果的な適応が可能になる。実験により、CoPromptは、ベース・ツー・ノーベルの一般化、ドメインの一般化、データセット間の評価など、様々な評価スイートにおいて既存の手法よりも優れていることが示された。一般化では、CoPromptはゼロショットタスクの最先端と11データセットの全体的な調和平均を改善している。詳細なアブレーション研究は、CoPromptの各成分の有効性を示している。コードはhttps://github.com/shuvenduroy/copromptで利用可能です。 We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input serves to further regularize the consistency constraint, thereby improving generalization. Moreover, the integration of adapters and prompts not only enhances performance on downstream tasks but also offers increased tuning flexibility in both input and output spaces. This facilitates more effective adaptation to downstream tasks in a few-shot learning setting. Experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation. On generalization, CoPrompt improves the state-of-the-art on zero-shot tasks and the overall harmonic mean over 11 datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt. We make our code available at https://github.com/ShuvenduRoy/CoPrompt.	翻訳日:2024-02-28 22:45:44 公開日:2024-02-27
# 教師付き注意型マルチインスタンス学習による多視点超音波画像からの心疾患検出 Detecting Heart Disease from Multi-View Ultrasound Images via Supervised Attention Multiple Instance Learning ( http://arxiv.org/abs/2306.00003v2 ) ライセンス: Link先を確認	Zhe Huang, Benjamin S. Wessler, Michael C.Hughes	(参考訳) 大動脈弁狭窄症(as)は、実質的な死亡と致死を引き起こす変性弁疾患である。この状態は診断され、治療されない。臨床では、心臓の数十枚の超音波画像を生成する経胸部心エコー検査のエキスパートレビューで診断される。これらの図のいくつかだけが大動脈弁を示している。 ASのスクリーニングを自動化するためには、深層ネットワークは、ヒトの専門家が大動脈弁の視界を識別し、関連する画像を集約して研究レベルの診断を生成する能力の模倣を学ぶ必要がある。従来のAS検出手法では、画像間の非フレキシブル平均に依存するため、精度が不十分であった。さらに,市販の注目型マルチインスタンス学習 (MIL) では性能が低かった。 2つの重要な方法論的革新で、新しいエンドツーエンドのMILアプローチに貢献します。まず、教師付注意技法は、学習した注意機構を導き、関連する視点を優先する。第2に,新しい自己教師付き事前学習戦略は,先行文献で一般的である個々の画像ではなく,研究全体の表現に対比学習を適用する。オープンアクセスデータセットと外部検証セットを用いた実験により,モデルサイズを削減しつつ,高い精度が得られることを示す。 Aortic stenosis (AS) is a degenerative valve condition that causes substantial morbidity and mortality. This condition is under-diagnosed and under-treated. In clinical practice, AS is diagnosed with expert review of transthoracic echocardiography, which produces dozens of ultrasound images of the heart. Only some of these views show the aortic valve. To automate screening for AS, deep networks must learn to mimic a human expert's ability to identify views of the aortic valve then aggregate across these relevant images to produce a study-level diagnosis. We find previous approaches to AS detection yield insufficient accuracy due to relying on inflexible averages across images. We further find that off-the-shelf attention-based multiple instance learning (MIL) performs poorly. We contribute a new end-to-end MIL approach with two key methodological innovations. First, a supervised attention technique guides the learned attention mechanism to favor relevant views. Second, a novel self-supervised pretraining strategy applies contrastive learning on the representation of the whole study instead of individual images as commonly done in prior literature. Experiments on an open-access dataset and an external validation set show that our approach yields higher accuracy while reducing model size.	翻訳日:2024-02-28 22:45:18 公開日:2024-02-27
# 量子メッセージにおける分割状態非可算符号と秘密共有方式 Split-State Non-Malleable Codes and Secret Sharing Schemes for Quantum Messages ( http://arxiv.org/abs/2308.06466v2 ) ライセンス: Link先を確認	Naresh Goud Boddu, Vipul Goyal, Rahul Jain, Jo\~ao Ribeiro	(参考訳) 非可算符号は暗号理論と符号化理論の交わる基本対象である。これらのコードは、エラー訂正と検出が不可能な設定でもセキュリティ保証を提供しており、他の暗号処理にも応用されている。最も強力でよく研究されている敵対的改ざんモデルの一つが、2ドルの分割状態改ざんだ。ここでは、コードワードを2つの部分に分割し、敵は任意の関数を使ってそれぞれを独立に改ざんすることができる。このモデルは、敵が各共有に独立して改ざんすることで、複数の当事者と秘密の共有設定に自然に拡張することができる。分割状態改ざんモデルにおける非可算符号化と秘密共有に関する以前の研究は、 \emph{classical} メッセージの符号化のみを考慮していた。さらに、Aggarwal、Boddu、Jain(IEEE Trans)による最近の研究まで。略称は「inf」。 2024年理論では、量子能力を持つ敵や、 \emph{shared entanglement} は考慮されておらず、このモデルで以前のスキームが安全であるかどうかは定かではない。本稿では,分割状態非可算符号の概念と,量子メッセージのエンタングルメントを共用した量子敵に対してセキュアな秘密共有方式を提案する。次に、このようなスキームを明示的に構成し、低エラー非可算性を実現する。より正確には、外部システムとの絡み合いを保ち、コードワード長が$n$、最大で$n^{\Omega(1)}$、エラー$\epsilon=2^{-{n^{\Omega(1)}}}$と共有する量子敵に対するセキュリティを実現するために、効率よくエンコード可能かつ復号可能な分割可能コードと秘密共有スキームを構築した。 emph{average-case}非可算性を簡単に設定することで、1/11ドル近いレートで効率的な非可算符号化を実現する。 Non-malleable codes are fundamental objects at the intersection of cryptography and coding theory. These codes provide security guarantees even in settings where error correction and detection are impossible, and have found applications to several other cryptographic tasks. One of the strongest and most well-studied adversarial tampering models is $2$-split-state tampering. Here, a codeword is split into two parts and the adversary can then independently tamper with each part using arbitrary functions. This model can be naturally extended to the secret sharing setting with several parties by having the adversary independently tamper with each share. Previous works on non-malleable coding and secret sharing in the split-state tampering model only considered the encoding of \emph{classical} messages. Furthermore, until recent work by Aggarwal, Boddu, and Jain (IEEE Trans.\ Inf.\ Theory 2024), adversaries with quantum capabilities and \emph{shared entanglement} had not been considered, and it is a priori not clear whether previous schemes remain secure in this model. In this work, we introduce the notions of split-state non-malleable codes and secret sharing schemes for quantum messages secure against quantum adversaries with shared entanglement. Then, we present explicit constructions of such schemes that achieve low-error non-malleability. More precisely, we construct efficiently encodable and decodable split-state non-malleable codes and secret sharing schemes for quantum messages preserving entanglement with external systems and achieving security against quantum adversaries having shared entanglement with codeword length $n$, any message length at most $n^{\Omega(1)}$, and error $\epsilon=2^{-{n^{\Omega(1)}}}$. In the easier setting of \emph{average-case} non-malleability, we achieve efficient non-malleable coding with rate close to $1/11$.	翻訳日:2024-02-28 22:41:30 公開日:2024-02-27
# 量子ゆらぎの熱力学的視点 Thermodynamic perspective on quantum fluctuations ( http://arxiv.org/abs/2308.04951v2 ) ライセンス: Link先を確認	Akira Sone, Kanu Sinha, and Sebastian Deffner	(参考訳) 大きなシステムと小さなシステムの主な違いは何ですか? 小スケールでは力学は揺らぎに支配されるが、大規模ではゆらぎは無関係である。したがって、量子系の熱力学的に一貫した記述は、変動の性質と結果の完全な理解を必要とする。本章では, 変動力とゆらぎ定理を別々に考慮した, 密接に関連する二つの研究分野について概説する。現代研究におけるこれらのエキサイティングで活発な分野の要点に焦点を当て、お互いについて学ぶことに関心のある研究者のコミュニティの双方に指導的なエントリーポイントを提供しようとしている。 What is the major difference between large and small systems? At small length-scales the dynamics is dominated by fluctuations, whereas at large scales fluctuations are irrelevant. Therefore, any thermodynamically consistent description of quantum systems necessitates a thorough understanding of the nature and consequences of fluctuations. In this chapter, we outline two closely related fields of research that are commonly considered separately -- fluctuation forces and fluctuation theorems. Focusing on the main gist of these exciting and vivid fields of modern research, we seek to provide a instructive entry point for both communities of researchers interested in learning about the other.	翻訳日:2024-02-28 22:40:15 公開日:2024-02-27
# 確率的位置埋め込みはマスク画像モデリングを改善する Stochastic positional embeddings improve masked image modeling ( http://arxiv.org/abs/2308.00566v2 ) ライセンス: Link先を確認	Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun	(参考訳) Masked Image Modeling (MIM)は、ラベルなし画像からの学習を可能にする、有望な自己教師型学習アプローチである。最近の成功にもかかわらず、正確な場所で適切なセマンティックコンテンツを予測する必要があるため、MIMによる優れた表現の学習は依然として困難である。例えば、犬の不完全な画像を考えると、尾があると推測できるが、正確な位置を決定することはできない。本研究では,確率的位置埋め込み(StoP)を用いて位置不確実性をMIMに組み込むことを提案する。具体的には、ガウス分布から引き出された確率的マスキングトークン位置のモデルを記述する。 StoPは、ロケーション機能への過度な適合を減らし、ロケーションの不確実性に対して堅牢な学習機能に向けてモデルを導く。定量的には、StoPは様々なダウンストリームタスクのダウンストリームMIM性能を改善しており、例えば、VT-Bを使用したイメージネット線形プローブの$+1.7\%、データの$1\%を使用する$+2.5\%である。 Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty into MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a Gaussian distribution. StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, StoP improves downstream MIM performance on a variety of downstream tasks, including $+1.7\%$ on ImageNet linear probing using ViT-B, and $+2.5\%$ for ViT-H using $1\%$ of the data.	翻訳日:2024-02-28 22:39:25 公開日:2024-02-27
# 3D/2Dレジストレーションにおける損失関数とシーン表現が単一視野蛍光X線量推定に及ぼす影響 The Impact of Loss Functions and Scene Representations for 3D/2D Registration on Single-view Fluoroscopic X-ray Pose Estimation ( http://arxiv.org/abs/2308.00214v3 ) ライセンス: Link先を確認	Chaochao Zhou, Syed Hasib Akhter Faruqui, Abhinav Patel, Ramez N. Abdalla, Michael C. Hurley, Ali Shaibani, Matthew B. Potts, Babak S. Jahromi, Sameer A. Ansari, Donald R. Cantrell	(参考訳) 画像誘導プロシージャで実行される多くのタスクは、特定の投影が3次元空間のターゲットに到達するために選択されるポーズ推定問題としてキャストすることができる。本研究では,新たに提案した2つの手法であるneural tuned tomography (nett) と masked neural radiance fields (mnerf) を含む,コーンビーム型コンピュータ断層撮影 (cbct) またはニューラルネットワークシーン表現と自動微分可能なデジタル再構成ラジオグラフ (drr) の効率的な計算のための微分可能投影 (diffproj) レンダリングフレームワークを開発した。次に, 様々な候補損失関数を用いた反復勾配降下法によるポーズ推定を行い, 合成したDRRの地上蛍光X線像に対する画像差を定量化する。代替損失関数と比較して、相互情報損失関数はポーズ推定精度を著しく向上することができ、局所最適の侵入を効果的に防止することができる。この相互情報損失を用いて、50人の患者の頭蓋骨のトモグラフィx線データセット上でのポーズ推定の総合的な評価により、diffprojにおける離散化(cbct)または神経(nett/mnerf)のシーン表現のどちらかを使用することで、drの出現とポーズ推定のパフォーマンスが同等であることが示されている(3d角度誤差:平均$\leq$ 3.2{\deg} と 90% quantile $\leq$ 3.4{\deg})。これらの知見は, 広汎な画像誘導介入における蛍光x線像推定の効率と有効性を改善するための適切な方法を選択するのに有用である。 Many tasks performed in image-guided procedures can be cast as pose estimation problems, where specific projections are chosen to reach a target in 3D space. In this study, we first develop a differentiable projection (DiffProj) rendering framework for the efficient computation of Digitally Reconstructed Radiographs (DRRs) with automatic differentiability from either Cone-Beam Computerized Tomography (CBCT) or neural scene representations, including two newly proposed methods, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). We then perform pose estimation by iterative gradient descent using various candidate loss functions, that quantify the image discrepancy of the synthesized DRR with respect to the ground-truth fluoroscopic X-ray image. Compared to alternative loss functions, the Mutual Information loss function can significantly improve pose estimation accuracy, as it can effectively prevent entrapment in local optima. Using the Mutual Information loss, a comprehensive evaluation of pose estimation performed on a tomographic X-ray dataset of 50 patients$'$ skulls shows that utilizing either discretized (CBCT) or neural (NeTT/mNeRF) scene representations in DiffProj leads to comparable performance in DRR appearance and pose estimation (3D angle errors: mean $\leq$ 3.2{\deg} and 90% quantile $\leq$ 3.4{\deg}), despite the latter often incurring considerable training expenses and time. These findings could be instrumental for selecting appropriate approaches to improve the efficiency and effectiveness of fluoroscopic X-ray pose estimation in widespread image-guided interventions.	翻訳日:2024-02-28 22:39:07 公開日:2024-02-27
# ラベル不足下でのラーニング・トゥ・ランドにおけるGBDTよりも優れた事前学習深度モデル Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity ( http://arxiv.org/abs/2308.00177v3 ) ライセンス: Link先を確認	Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi	(参考訳) 表データでは、現在のディープラーニング(dl)モデルが勾配強化決定木(gbdts)とよく似ているが、外れ値データでは著しくパフォーマンスが低下していることが、多くの文献から示されている。ラベル不足下では,GBDTよりもDLモデルの方が優れた自然な表付きデータセットを同定する。検索やレコメンデーションを含むタブラルLTRアプリケーションは、ラベルなしデータが多く、ラベル付きデータが少ないことが多い。 dlランカは教師なし事前学習を利用してラベルなしのデータを利用することができる。パブリックデータセットとプロプライエタリデータセットの両方に関する広範な実験では、事前トレーニング済みのDLローダが、ランキングのメトリクスでGBDTローダよりも一貫して優れています。 On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. We identify a natural tabular data setting where DL models can outperform GBDTs: tabular Learning-to-Rank (LTR) under label scarcity. Tabular LTR applications, including search and recommendation, often have an abundance of unlabeled data, and scarce labeled data. We show that DL rankers can utilize unsupervised pretraining to exploit this unlabeled data. In extensive experiments over both public and proprietary datasets, we show that pretrained DL rankers consistently outperform GBDT rankers on ranking metrics -- sometimes by as much as $38\%$ -- both overall and on outliers.	翻訳日:2024-02-28 22:38:25 公開日:2024-02-27
# EFLNet:赤外小ターゲット検出のための特徴学習の強化 EFLNet: Enhancing Feature Learning for Infrared Small Target Detection ( http://arxiv.org/abs/2307.14723v2 ) ライセンス: Link先を確認	Bo Yang, Xinyu Zhang, Jian Zhang, Jun Luo, Mingliang Zhou, Yangjun Pi	(参考訳) 単一フレームの赤外小ターゲット検出は、ターゲットと背景の極端不均衡のため困難な作業であり、境界ボックスの回帰は、赤外線小ターゲットに対して極めて敏感であり、高レベルのセマンティック層では、ターゲット情報が失われやすい。本稿では,これらの問題に対処する機能学習ネットワーク(EFLNet)を提案する。まず、赤外線画像の背景とターゲットとの間には非常に不均衡があることに気づき、モデルがターゲット特徴よりも背景特徴に注意を払うようにした。この問題に対処するために、ターゲットと背景を分離する新しい適応しきい値焦点損失(ATFL)関数を提案し、アダプティブメカニズムを用いて損失重量を調整し、モデルに目標特徴により多くの注意を向けるよう強制する。第二に、正規化されたガウス・ワッサーシュタイン距離(NWD)を導入し、赤外小目標に対する境界箱回帰の極端感度による収束の困難を緩和する。最後に,動的なヘッド機構をネットワークに組み込んで,各意味層の相対的重要度を適応的に学習する。実験により,本手法は,最先端(SOTA)深層学習法と比較して,赤外線小目標の検出性能が向上することを示した。ソースコードとバウンディングボックスの注釈付きデータセットはhttps://github.com/YangBo0411/infrared-small-targetで入手できる。 Single-frame infrared small target detection is considered to be a challenging task, due to the extreme imbalance between target and background, bounding box regression is extremely sensitive to infrared small target, and target information is easy to lose in the high-level semantic layer. In this article, we propose an enhancing feature learning network (EFLNet) to address these problems. First, we notice that there is an extremely imbalance between the target and the background in the infrared image, which makes the model pay more attention to the background features rather than target features. To address this problem, we propose a new adaptive threshold focal loss (ATFL) function that decouples the target and the background, and utilizes the adaptive mechanism to adjust the loss weight to force the model to allocate more attention to target features. Second, we introduce the normalized Gaussian Wasserstein distance (NWD) to alleviate the difficulty of convergence caused by the extreme sensitivity of the bounding box regression to infrared small target. Finally, we incorporate a dynamic head mechanism into the network to enable adaptive learning of the relative importance of each semantic layer. Experimental results demonstrate our method can achieve better performance in the detection performance of infrared small target compared to the state-of-the-art (SOTA) deep-learning-based methods. The source codes and bounding box annotated datasets are available at https://github.com/YangBo0411/infrared-small-target.	翻訳日:2024-02-28 22:38:09 公開日:2024-02-27
# AdvDiff:拡散モデルを用いた非制限逆例の生成 AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models ( http://arxiv.org/abs/2307.12499v3 ) ライセンス: Link先を確認	Xuelong Dai, Kaisheng Liang and Bin Xiao	(参考訳) 制限のない敵攻撃は、深層学習モデルや敵防衛技術に深刻な脅威をもたらす。防御機構を効果的にバイパスできるため、深層学習アプリケーションには深刻なセキュリティ問題が発生する。しかし、従来の攻撃手法では、理論的に証明不可能なGAN(Generative Adversarial Networks)がよく使われており、特にImageNetのような大規模データセットにおいて、敵の目的を組み込んで非現実的な例を生成する。本稿では,拡散モデルを用いた非制限逆例を生成するAdvDiffという新しい手法を提案する。本研究では,拡散モデルの逆生成過程において,新たな2つの逆サンプリング手法を設計する。これら2つの手法は、ターゲット分類器の勾配を解釈可能に統合することにより、高品質で現実的な逆例を生成するのに効果的で安定である。 MNIST と ImageNet データセットの実験結果から,AdvDiff は攻撃性能と生成品質の点で GAN ベースの手法よりも優れた非制限逆例を生成するのに有効であることが示された。 Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often utilize Generative Adversarial Networks (GANs), which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable to generate high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective to generate unrestricted adversarial examples, which outperforms GAN-based methods in terms of attack performance and generation quality.	翻訳日:2024-02-28 22:37:47 公開日:2024-02-27
# 3次元分子前処理のためのフラクタルデノイング Fractional Denoising for 3D Molecular Pre-training ( http://arxiv.org/abs/2307.10683v3 ) ライセンス: Link先を確認	Shikun Feng and Yuyan Ni and Yanyan Lan and Zhi-Ming Ma and Wei-Ying Ma	(参考訳) coordinate denoisingは有望な3d分子前訓練法であり、様々な下流の薬物発見タスクで顕著な性能を達成した。理論的には、この目的は下流のタスクに有用な力場を学ぶことと等価である。それにもかかわらず、効果的な力場、すなわち、低カバレッジサンプルと等方力場を学ぶための座標化の課題は2つある。その根底にある理由は、既存の分極法によって仮定される分子分布が分子の異方性特性を捉えないからである。これらの課題に対処するために,2面天使と座標の両方のノイズを含む,新しいハイブリッドノイズ戦略を提案する。しかし、そのようなハイブリッドノイズを伝統的な方法で発音することは、もはや力場を学ぶことと等価ではない。理論的推論により、この問題は共分散に対する入力コンホメーションの依存性によって引き起こされる。そこで本研究では,2種類の雑音を分離し,後者の座標部のみをデノー化する新しい分数デノージング法(frad)を設計することを提案する。このように、フラッドはより低エネルギーな構造をサンプリングする利点と力場等価性の両方を享受している。広範な実験により、分子表現におけるfradの有効性が示され、qm9の12のタスクのうち9つとmd17の8つのターゲットのうち7つに新しい状態が示された。 Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.	翻訳日:2024-02-28 22:37:28 公開日:2024-02-27
# 量子シミュレータにおける統計関連情報のデータ駆動検出 Data-driven discovery of statistically relevant information in quantum simulators ( http://arxiv.org/abs/2307.10040v3 ) ライセンス: Link先を確認	R. Verdel, V. Vitale, R. K. Panda, E. D. Donkor, A. Rodriguez, S. Lannig, Y. Deller, H. Strobel, M. K. Oberthaler, M. Dalmonte	(参考訳) 量子シミュレータは強い相関を持つ量子物質を調べる強力な手段を提供する。しかし,このようなシステムにおける測定結果の解釈には大きな課題が伴う。本稿では,スピノルボース・アインシュタイン凝縮実験における量子クエンチの場合の合成量子物質の情報抽出に関する理論的枠組みについて述べる。情報コンテンツの異なる尺度を提供する非パラメトリックな教師なし学習ツールを用いて,支配的自由度を識別するためのシステム非依存的アプローチを示す。これにより、実効場理論と同様に、作用素の関連性に応じてランク付けすることができる。対応する効果的な記述を特徴付けるために、データセットの固有次元をダイナミクスの複雑さの尺度として検討する。これは、研究システムにおける時間依存的普遍行動の出現と相関するデータ構造を単純化することを明らかにする。我々の仮定自由アプローチは、すぐに様々な実験プラットフォームに適用できる。 Quantum simulators offer powerful means to investigate strongly correlated quantum matter. However, interpreting measurement outcomes in such systems poses significant challenges. Here, we present a theoretical framework for information extraction in synthetic quantum matter, illustrated for the case of a quantum quench in a spinor Bose-Einstein condensate experiment. Employing non-parametric unsupervised learning tools that provide different measures of information content, we demonstrate a system-agnostic approach to identify dominant degrees of freedom. This enables us to rank operators according to their relevance, akin to effective field theory. To characterize the corresponding effective description, we then explore the intrinsic dimension of data sets as a measure of the complexity of the dynamics. This reveals a simplification of the data structure, which correlates with the emergence of time-dependent universal behavior in the studied system. Our assumption-free approach can be immediately applied in a variety of experimental platforms.	翻訳日:2024-02-28 22:37:01 公開日:2024-02-27
# 強化学習サロゲートによるカットプレーンアルゴリズムの高速化 Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates ( http://arxiv.org/abs/2307.08816v2 ) ライセンス: Link先を確認	Kyle Mana, Fernando Acero, Stephen Mak, Parisa Zehtabi, Michael Cashmore, Daniele Magazzeni, Manuela Veloso	(参考訳) 離散最適化は$\mathcal{NP}$-hard問題の集合に属し、混合整数プログラミングや組合せ最適化のような分野にまたがる。凸離散最適化問題を解くための現在の標準的なアプローチは切断平面アルゴリズム(英語版)(cut-plane algorithm)であり、これは実現可能な集合を洗練するために \textit{cuts} として知られる不等式を反復的に加えることによって最適解に達する。多くの汎用カット生成アルゴリズムが存在するにもかかわらず、大規模離散最適化問題は難解さに苦しんでいる。本研究では,強化学習による切削面アルゴリズムの高速化手法を提案する。私たちのアプローチでは、カット生成手順の$\mathcal{np}$-hard要素の代理として学習ポリシーを使用します。 (i)収束を加速し、 (ii)最適性の保証を保持する。提案手法は,確率最適化と混合整数二次計画法という,切削平面アルゴリズムが一般的に用いられる2種類の問題に適用する。我々は,Benders分解(確率的最適化)および反復的損失近似(四進法プログラミング)に適用した場合の手法の利点を観察し,現代の代替アルゴリズムと比較して最大45\%の高速平均収束を実現する。 Discrete optimization belongs to the set of $\mathcal{NP}$-hard problems, spanning fields such as mixed-integer programming and combinatorial optimization. A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms, which reach optimal solutions by iteratively adding inequalities known as \textit{cuts} to refine a feasible set. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. In this work, we propose a method for accelerating cutting-plane algorithms via reinforcement learning. Our approach uses learned policies as surrogates for $\mathcal{NP}$-hard elements of the cut generating procedure in a way that (i) accelerates convergence, and (ii) retains guarantees of optimality. We apply our method on two types of problems where cutting-plane algorithms are commonly used: stochastic optimization, and mixed-integer quadratic programming. We observe the benefits of our method when applied to Benders decomposition (stochastic optimization) and iterative loss approximation (quadratic programming), achieving up to $45\%$ faster average convergence when compared to modern alternative algorithms.	翻訳日:2024-02-28 22:36:47 公開日:2024-02-27
# 非マルコフ自由フェルミオンはしごにおける測定誘起遷移 Measurement induced transitions in non-Markovian free fermion ladders ( http://arxiv.org/abs/2307.06624v3 ) ライセンス: Link先を確認	Mikheil Tsitsishvili, Dario Poletti, Marcello Dalmonte and Giuliano Chiriac\`o	(参考訳) 近年、測定誘起遷移を理解するための懸命な努力がなされているが、これらの現象に対する非マルコフ効果についてはまだよく理解されていない。そこで我々は,2つの結合した自由フェルミオン鎖,一つは利子系として機能し,もう一つは浴槽として機能する。バスチェインはマルコフ測定の対象であり、量子軌道の観点からの数値研究にはまだ適している系チェインに作用する効果的な非マルコフ散逸ダイナミクスをもたらす。本設定では,システムチェーン内の絡み合いを解析し,ラダーホッピングパラメータと測定確率に基づいて位相図を特徴付ける。純粋な状態進化の場合、このシステムはバスチェーンの内部ホッピングが小さい場合の領域法相であり、バスのダイナミクスが速い場合には非領域法相が現れる。非領域法則は、エントロピーの対数的スケーリングと共形相との整合性を示すだけでなく、我々が研究できる有限系サイズの線形補正も示している。混合状態の進化の場合、その代わりに、両領域の領域を観察し、絡み合いの負性性の非領域スケーリングを観察する。我々は、系の連鎖力学の非マルコビアン性を定量化し、我々の研究するパラメータの体系において、より強い非マルコビアン性はシステム内のより大きな絡み合いと関連している。 Recently there has been an intense effort to understand measurement induced transitions, but we still lack a good understanding of non-Markovian effects on these phenomena. To that end, we consider two coupled chains of free fermions, one acting as the system of interest, and one as a bath. The bath chain is subject to Markovian measurements, resulting in an effective non-Markovian dissipative dynamics acting on the system chain which is still amenable to numerical studies in terms of quantum trajectories. Within this setting, we study the entanglement within the system chain, and use it to characterize the phase diagram depending on the ladder hopping parameters and on the measurement probability. For the case of pure state evolution, the system is in an area law phase when the internal hopping of the bath chain is small, while a non-area law phase appears when the dynamics of the bath is fast. The non-area law exhibits a logarithmic scaling of the entropy compatible with a conformal phase, but also displays linear corrections for the finite system sizes we can study. For the case of mixed state evolution, we instead observe regions with both area, and non-area scaling of the entanglement negativity. We quantify the non-Markovianity of the system chain dynamics and find that for the regimes of parameters we study, a stronger non-Markovianity is associated to a larger entanglement within the system.	翻訳日:2024-02-28 22:36:27 公開日:2024-02-27
# 進化的アルゴリズムによる大規模言語モデルの接続による高能率プロンプト最適化 Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers ( http://arxiv.org/abs/2309.08532v2 ) ライセンス: Link先を確認	Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang	(参考訳) 大規模言語モデル(LLM)は様々なタスクに優れるが、しばしば人的努力を必要とする注意深いプロンプトに依存している。本稿では,このプロセスを自動化するために,進化アルゴリズム(EA)の概念を借用し,優れた性能と高速収束を示す,離散的なプロンプト最適化のための新しいフレームワークであるEvoPromptを提案する。一貫性と人間可読性が必要な自然言語表現である個別のプロンプトでEAが動作できるようにするため、LEMをEAと接続する。このアプローチにより、LLMの強力な言語処理能力とEAの効率的な最適化性能を同時に活用できる。具体的には、いかなる勾配やパラメータも含まず、evopromptはプロンプトの集団から始まり、進化演算子に基づいたllmによる新しいプロンプトを反復的に生成し、開発セットに基づいて人口を増加させる。我々は、言語理解、生成タスク、BIG-Bench Hard(BBH)タスクを含む31のデータセットに対して、GPT-3.5やAlpacaを含むクローズドおよびオープンソースLLMのプロンプトを最適化する。 EvoPromptは人為的なプロンプトと既存の自動プロンプト生成方法(例えばBBHでは25%)を大きく上回っている。さらに、evoprompt は、llm と eas をつなぐことによって相乗効果が生まれ、llm と従来のアルゴリズムの組み合わせに関するさらなる研究が促進されることを示した。 Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.	翻訳日:2024-02-28 22:31:31 公開日:2024-02-27
# 競合する位相秩序における隠れサブシステム対称性保護状態 Hidden subsystem symmetry protected states in competing topological orders ( http://arxiv.org/abs/2309.02307v2 ) ライセンス: Link先を確認	Shi Feng	(参考訳) 本研究では,2次元サブシステム対称性保護トポロジカル状態と2次元トポロジカル秩序の相互関係を明らかにする。このモデルは、トーリック符号(tc)とその双対相互作用の強化であり、サブシステム対称性と部分拡張基底状態縮退性を持つ双対格子上で定義されるモデルにマッピングすることができる。地図は、フラストレーションされたTCを、線形サブシステム対称性を持つ強いSSPTモデルとして、トポロジカルプラケットイジングモデル(TPIM)の2つのコピーに正確に接続する。 TPIMの膜秩序パラメータは、フラストレーションTCモデルの秩序パラメータとして双対TC安定化器に正確にマッピングされ、TPIMのSSPTギャップレスエッジ状態は、開境界下のゼロエネルギーダングリング作用素にマッピングされ、SSPT順序TPIMから自明な常磁性相への遷移は、2つの異なる位相秩序間の遷移にマッピングされる。また,このマッピングを用いて他のSSPTモデルの構造を解明し,2次元におけるSSPT秩序と位相秩序との微妙な結びつきを反映できることを示した。 We reveal the connection between two-dimensional subsystem symmetry-protected topological (SSPT) states and two-dimensional topological orders via a self-dual frustrated toric code model. This model, an enrichment of the toric code (TC) with its dual interactions, can be mapped to a model defined on the dual lattice with subsystem symmetries and subextensive ground state degeneracy. The map connects exactly the frustrated TC to two copies of the topological plaquette Ising model (TPIM), as a strong SSPT model with linear subsystem symmetries. The membrane order parameter of the TPIM is exactly mapped to dual TC stabilizers as the order parameter of the frustrated TC model, SSPT gapless edge states of the TPIM are mapped to zero-energy dangling operators under open boundaries, and the transition from the SSPT-ordered TPIM to the trivial paramagnetic phase is mapped to the transition between two distinct topological orders. We also demonstrate that this mapping can be used to elucidate the structure of other SSPT models, reflecting the subtle linkage between SSPT order and topological order in two dimensions.	翻訳日:2024-02-28 22:30:09 公開日:2024-02-27
# 局所定常グラフプロセス Locally Stationary Graph Processes ( http://arxiv.org/abs/2309.01657v2 ) ライセンス: Link先を確認	Abdullah Canbolat and Elif Vural	(参考訳) 定常グラフプロセスモデルは、不規則なネットワークトポロジ上に収集されたデータセットの分析と推論によく用いられる。既存の手法のほとんどは、グラフ全体に対してグローバルに有効である単一の定常プロセスモデルを持つグラフ信号を表すが、多くの実践的な問題では、そのプロセスの特徴はグラフの異なる領域の局所的な変化に該当する可能性がある。本研究では,局所定常性の概念を不規則グラフ領域に拡張することを目的とした,局所定常グラフ処理(lsgp)モデルを提案する。我々は,各成分に付着する過程の程度がグラフ上でスムーズに変化するように,各成分プロセスの集合の組み合わせとして全体プロセスを表現することにより,局所定常性を特徴付ける。プロセスの実現からLSGPモデルを計算するためのアルゴリズムを提案し、またWSSプロセスを用いてLSGPを局所的に近似する。信号補間問題に関する実験は,提案手法が技術と競合する正確な信号表現を提供することを示す。 Stationary graph process models are commonly used in the analysis and inference of data sets collected on irregular network topologies. While most of the existing methods represent graph signals with a single stationary process model that is globally valid on the entire graph, in many practical problems, the characteristics of the process may be subject to local variations in different regions of the graph. In this work, we propose a locally stationary graph process (LSGP) model that aims to extend the classical concept of local stationarity to irregular graph domains. We characterize local stationarity by expressing the overall process as the combination of a set of component processes such that the extent to which the process adheres to each component varies smoothly over the graph. We propose an algorithm for computing LSGP models from realizations of the process, and also study the approximation of LSGPs locally with WSS processes. Experiments on signal interpolation problems show that the proposed process model provides accurate signal representations competitive with the state of the art.	翻訳日:2024-02-28 22:29:44 公開日:2024-02-27
# ゼロショット異常位置同定のためのブートストラップファイングラインドビジョンランゲージアライメント Bootstrap Fine-Grained Vision-Language Alignment for Unified Zero-Shot Anomaly Localization ( http://arxiv.org/abs/2308.15939v2 ) ライセンス: Link先を確認	Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, Xingyu Li	(参考訳) コントラスト型言語画像事前学習(clip)モデルは,自然言語管理下での視覚表現の学習により,ゼロショット視覚認識タスクにおいて有望な性能を示す。近年の研究では、CLIPを用いて、画像と正常および異常状態のプロンプトをマッチングすることで、ゼロショット異常検出に取り組んでいる。しかし、CLIPはペア化されたテキストプロンプトとグローバルな画像レベルの表現との対応性の構築に重点を置いているため、テキストアライメントに対するきめ細かいパッチレベルのビジョンの欠如は、正確な視覚的異常なローカライゼーションの能力を制限する。本研究では,ゼロショット異常局所化のためのAnoCLIPを提案する。ビジュアルエンコーダでは、パッチレベルのローカル記述のためのCLIPの固有のローカルトークンを抽出する、トレーニング不要なバリューワイドアテンション機構を導入する。テキストの監督の観点からは、特に、きめ細かい視覚言語マッチングのための、ドメイン認識のコントラスト状態プロンプトテンプレートをデザインする。提案するAnoCLIPに加えて,AnoCLIPの擬似ラベルとノイズ破損トークンを用いて,視覚的エンコーダの軽量アダプタを最適化し,視覚的異常なローカライゼーション結果を洗練するためのテスト時適応(TTA)機構を導入する。 AnoCLIPとTTAの両方を用いて、ゼロショット異常局所化のためのCLIPの可能性を大幅に活用し、各種データセットに対するAnoCLIPの有効性を実証した。 Contrastive Language-Image Pre-training (CLIP) models have shown promising performance on zero-shot visual recognition tasks by learning visual representations under natural language supervision. Recent studies attempt the use of CLIP to tackle zero-shot anomaly detection by matching images with normal and abnormal state prompts. However, since CLIP focuses on building correspondence between paired text prompts and global image-level representations, the lack of fine-grained patch-level vision to text alignment limits its capability on precise visual anomaly localization. In this work, we propose AnoCLIP for zero-shot anomaly localization. In the visual encoder, we introduce a training-free value-wise attention mechanism to extract intrinsic local tokens of CLIP for patch-level local description. From the perspective of text supervision, we particularly design a unified domain-aware contrastive state prompting template for fine-grained vision-language matching. On top of the proposed AnoCLIP, we further introduce a test-time adaptation (TTA) mechanism to refine visual anomaly localization results, where we optimize a lightweight adapter in the visual encoder using AnoCLIP's pseudo-labels and noise-corrupted tokens. With both AnoCLIP and TTA, we significantly exploit the potential of CLIP for zero-shot anomaly localization and demonstrate the effectiveness of AnoCLIP on various datasets.	翻訳日:2024-02-28 22:29:28 公開日:2024-02-27
# 量子物理学における1ハーフ位相数 One-Half Topological Number in Entangled Quantum Physics ( http://arxiv.org/abs/2308.14062v3 ) ライセンス: Link先を確認	Karyn Le Hur	(参考訳) トポロジカル位相は、放射磁場の結果としてヘッジホッグ構造を示すスピン-1/2のブロッホ球からの量子物理学で設計することができる。 1つの極における絡み合った波動関数の形成と、2スピンモデル、および1つの半位相数の興味深い対との関係について詳述する。超伝導体のクーパー対と同様に、アインシュタイン-ポドルスキー-ローゼン対またはベル状態は半フラックス量子化を生じ、これは表面上のベリー曲率の半分のフラックスを指す。これらの1/2数はまた、極に自由マヨラナフェルミオンの存在を示す。位相応答は、北から南へ走行する場合や、保護された横流の量子化または半量子化の性質を示す極の円偏波場から測定することができる。バンド構造における絡み合った波動関数の応用を示し、運動量空間に局所位相マーカーを導入し、二層幾何学における2次元半金属の位相応答を特徴付ける。 A topological phase can be engineered in quantum physics from the Bloch sphere of a spin-1/2 showing an hedgehog structure as a result of a radial magnetic field. We elaborate on a relation between the formation of an entangled wavefunction at one pole, in a two-spins model, and an interesting pair of one-half topological numbers. Similar to Cooper pairs in superconductors, the Einstein-Podolsky-Rosen pair or Bell state produces a half flux quantization, which here refers to the halved flux of the Berry curvature on the surface. These 1/2-numbers also reveal the presence of a free Majorana fermion at a pole. The topological responses can be measured when driving from north to south and also from a circularly polarized field at the poles revealing the quantized or half-quantized nature of the protected transverse currents. We show applications of entangled wavefunctions in band structures, introducing a local topological marker in momentum space, to characterize the topological response of two-dimensional semimetals in bilayer geometries.	翻訳日:2024-02-28 22:28:17 公開日:2024-02-27
# 物理過程の反転における希釈の役割--表上可逆性と一般化熱操作 Role of Dilations in Reversing Physical Processes: Tabletop Reversibility and Generalized Thermal Operations ( http://arxiv.org/abs/2308.13909v2 ) ライセンス: Link先を確認	Clive Cenxin Aw, Lin Htoo Zaw, Maria Balanz\'o-Juand\'o and Valerio Scarani	(参考訳) 熱力学と情報理論の両方において重要な可逆性は、(前)チャネルと関連する逆チャネルの進化を比較することによって自然に研究されている。この逆チャネルを定義するには2つの自然な方法がある。論理的推論を用いて、逆チャネルは元のベイズ回帰(量子形式論におけるペッツ回復写像)である。また物理学では、すべての可逆過程が開システムとしてモデル化できることが分かっている: 対応する閉システムを定義するには、浴槽(拡張)を追加し、大域可逆過程を自明に反転させ、最終的に浴槽を再び取り除く。 2つのレシピは、古典と量子形式の両方において、システムと浴の間に形成された相関を考慮に入れれば、厳密に同一であることが証明される。これを確立した後、マップの特別なクラスを定義し、研究する: 製品保存マップ(一般化された熱写像を含む)は、ある状態に対してそのようなシステム・バス相関を形成せず、テーブルトップの時間反転可能なマップは、逆チャネルを元のものと同一のデバイスで実装できる。これらのクラスを繋ぐいくつかの一般的な結果を確立し、システムと浴槽の両方が1キュービットである場合の詳細な特徴付けを行う。特に, 逆チャネルが適切に定義されている場合, 製品保存はテーブルトップ可逆性に十分な条件であるが, 局所エネルギースペクトルの保存は一般的な熱操作に必要な条件であることを示す。 Irreversibility, crucial in both thermodynamics and information theory, is naturally studied by comparing the evolution -- the (forward) channel -- with an associated reverse -- the reverse channel. There are two natural ways to define this reverse channel. Using logical inference, the reverse channel is the Bayesian retrodiction (the Petz recovery map in the quantum formalism) of the original one. Alternatively, we know from physics that every irreversible process can be modeled as an open system: one can then define the corresponding closed system by adding a bath ("dilation"), trivially reverse the global reversible process, and finally remove the bath again. We prove that the two recipes are strictly identical, both in the classical and in the quantum formalism, once one accounts for correlations formed between system and the bath. Having established this, we define and study special classes of maps: product-preserving maps (including generalized thermal maps), for which no such system-bath correlations are formed for some states; and tabletop time-reversible maps, when the reverse channel can be implemented with the same devices as the original one. We establish several general results connecting these classes, and a very detailed characterization when both the system and the bath are one qubit. In particular, we show that when reverse channels are well-defined, product-preservation is a sufficient but not necessary condition for tabletop reversibility; and that the preservation of local energy spectra is a necessary and sufficient condition to generalized thermal operations.	翻訳日:2024-02-28 22:28:01 公開日:2024-02-27
# FedSOL: フェデレートラーニングにおける直交学習の安定化 FedSOL: Stabilized Orthogonal Learning in Federated Learning ( http://arxiv.org/abs/2308.12532v4 ) ライセンス: Link先を確認	Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun	(参考訳) フェデレーション学習(fl)は、グローバルモデルを構築するために個々のクライアントからローカルにトレーニングされたモデルを集約する。 flはデータプライバシを備えたモデル学習を可能にするが、クライアントデータ分散が異種である場合、パフォーマンスが著しく低下することが多い。従来のFLアルゴリズムの多くは、様々な近位制限を導入してこの問題に対処してきた。これらの制限は、地域学習のグローバル目標からの逸脱を制限することによって、グローバルアライメントを促進することを目的としている。しかし、それらは本来、本来のローカルな目的に干渉することによって、ローカルな学習を制限する。近年,局所学習の一般性向上に向けた新たなアプローチが出現している。スムーズな損失環境の中でローカルモデルを得ることで、このアプローチは、クライアントの異なるローカル目的間の競合を緩和する。しかし、地域学習はグローバルな目標を考慮していないため、安定したグローバルアライメントは保証されていない。本研究では,グローバルアライメントの概念と局所的一般性を組み合わせたFedSoL(Federated Stability on Learning)を提案する。 FedSoLでは、局所学習は近位摂動に対して頑健なパラメータ領域を求める。この戦略は、パラメータ更新の本来のローカル目的を維持しながら、局所学習における暗黙の近位制限効果を導入する。実験の結果,FedSoLは様々な設定で常に最先端の性能を実現していることがわかった。 Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.	翻訳日:2024-02-28 22:27:19 公開日:2024-02-27
# 運転の危険を予知するビジュアル・アブダクティブ・推論 Visual Abductive Reasoning Meets Driving Hazard Prediction ( http://arxiv.org/abs/2310.04671v3 ) ライセンス: Link先を確認	Korawat Charoenpitaks, Van-Quang Nguyen, Masanori Suganuma, Masahiro Takahashi, Ryoma Niihara, Takayuki Okatani	(参考訳) 本稿では,運転中に運転者が遭遇する危険を予知する問題に対処する。車両ダッシュカムが捉えた単一入力画像を用いて,事故の予知作業として定式化する。計算シミュレーションや映像からの異常検出に依存するハザード予測の既存手法とは異なり,本研究は静的画像からのハイレベルな推論に焦点を当てている。この問題は、不確実な観測に基づいて将来の出来事を予測し、推論する必要がある。この調査対象領域の研究を可能にするために、DHPR(Driving Hazard Prediction and Reasoning)データセットと呼ばれる新しいデータセットが作成されている。データセットは、ストリートシーンの15Kダシュカム画像で構成され、各画像は、車速、仮説上の危険記述、シーンに存在する視覚的実体を含むタプルに関連付けられている。これらのアノテーションは、危険シーンを特定し、数秒後に起こりうる潜在的な事故の説明を提供する人間のアノテーションによって注釈される。我々は,いくつかのベースライン手法を提示し,データセット上での性能評価を行い,残る課題を特定し,今後の方向性について考察する。本研究は,ハザード予測のためのマルチモーダルaiの可能性を探ることを可能にする,新しい問題定式化とデータセットを導入することで,この分野に寄与する。 This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from static images. The problem needs predicting and reasoning about future events based on uncertain observations, which falls under visual abductive reasoning. To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created. The dataset consists of 15K dashcam images of street scenes, and each image is associated with a tuple containing car speed, a hypothesized hazard description, and visual entities present in the scene. These are annotated by human annotators, who identify risky scenes and provide descriptions of potential accidents that could occur a few seconds later. We present several baseline methods and evaluate their performance on our dataset, identifying remaining issues and discussing future directions. This study contributes to the field by introducing a novel problem formulation and dataset, enabling researchers to explore the potential of multi-modal AI for driving hazard prediction.	翻訳日:2024-02-28 22:22:42 公開日:2024-02-27
# Decoder-Only Side Information を用いた分散ディープジョイントソースチャネル符号化 Distributed Deep Joint Source-Channel Coding with Decoder-Only Side Information ( http://arxiv.org/abs/2310.04311v2 ) ライセンス: Link先を確認	Selim F. Yilmaz, Ezgi Ozyilkan, Deniz Gunduz, Elza Erkip	(参考訳) 本稿では,受信側のみに相関する側情報が存在する場合,ノイズの多い無線チャネル上での低遅延画像伝送を検討する(Wyner-Ziv シナリオ)。特に,従来,有限長法において従来の分離ベースアプローチよりも優れていたデータ駆動型ジョイント・ソース・チャネル・コーディング(jscc)手法を用いた実用的なスキームの開発と,チャネル品質の優雅な劣化を実現することに関心を寄せている。本稿では,デコーダのみの側情報をレシーバ側の複数段階に組み込んだニューラルネットワークアーキテクチャを提案する。提案手法は,低チャネル信号-雑音比 (SNRs) と小帯域比 (BRs) において, 様々な品質対策の観点から, 全チャネル条件での性能向上を図り, サイド情報の統合に成功していることを示す。提案手法のソースコードを公開し,さらなる研究と再現性について検討した。 We consider low-latency image transmission over a noisy wireless channel when correlated side information is present only at the receiver side (the Wyner-Ziv scenario). In particular, we are interested in developing practical schemes using a data-driven joint source-channel coding (JSCC) approach, which has been previously shown to outperform conventional separation-based approaches in the practical finite blocklength regimes, and to provide graceful degradation with channel quality. We propose a novel neural network architecture that incorporates the decoder-only side information at multiple stages at the receiver side. Our results demonstrate that the proposed method succeeds in integrating the side information, yielding improved performance at all channel conditions in terms of the various quality measures considered here, especially at low channel signal-to-noise ratios (SNRs) and small bandwidth ratios (BRs). We have made the source code of the proposed method public to enable further research, and the reproducibility of the results.	翻訳日:2024-02-28 22:22:01 公開日:2024-02-27
# 質量ニュートリノを用いた大規模構造形成のvlasovシミュレーションのための量子アルゴリズム Quantum algorithm for the Vlasov simulation of the large-scale structure formation with massive neutrinos ( http://arxiv.org/abs/2310.01832v2 ) ライセンス: Link先を確認	Koichi Miyamoto, Soichiro Yamazaki, Fumio Uchida, Kotaro Fujisawa, Naoki Yoshida	(参考訳) ニュートリノが有限質量を持つという事実の宇宙論的含意を調べることは基礎物理学にとって重要である。特に質量ニュートリノは宇宙の大規模構造(LSS)の形成に影響を与え、逆にLSSの観測はニュートリノ質量に制約を与える。従来の暗黒物質とともに, 巨大ニュートリノを含むLSS生成の数値シミュレーションが重要な課題である。このために、vlasov方程式を解いて位相空間内のニュートリノ分布を計算することは適切なアプローチであるが、これには$(6+1)$次元空間で pde を解く必要があり、したがって計算的に要求される: $n_\mathrm{gr}$ 座標内のグリッド点と $n_t$ 時間グリッド点を設定すると、$o(n_\mathrm{gr}^6)$ メモリ空間と $o(n_tn_\mathrm{gr}^6)$ が離散化された pde の係数に対するクエリとなる。我々はこの課題に対して量子アルゴリズムを提案する。ニュートリノの相対的弱自己重力を無視してブラゾフ方程式を線形化することにより、ニュートリノの位相空間分布をエンコードする量子状態を生成するハミルトンシミュレーションを行う。また,量子振幅推定の精度$\epsilon$と$\widetilde{o}((n_\mathrm{gr} + n_t)/\epsilon)$のクエリ複雑性を用いて,量子状態からニュートリノ密度摂動のパワースペクトルを抽出する手法を提案する。また、量子ランダムアクセスメモリを$O(n_\mathrm{gr}^3)$エントリで使用しながら、量子ビット数の観点から、空間複雑性を$O(\mathrm{polylog}(n_\mathrm{gr}/\epsilon))$に低減する。われわれが知る限り、これはLSSシミュレーションのための最初の量子アルゴリズムであり、精度を保証して実用的関心の量を出力する。 Investigating the cosmological implication of the fact that neutrino has finite mass is of importance for fundamental physics. In particular, massive neutrino affects the formation of the large-scale structure (LSS) of the universe, and, conversely, observations of the LSS can give constraints on the neutrino mass. Numerical simulations of the LSS formation including massive neutrino along with conventional cold dark matter is thus an important task. For this, calculating the neutrino distribution in the phase space by solving the Vlasov equation is a suitable approach, but it requires solving the PDE in the $(6+1)$-dimensional space and is thus computationally demanding: Configuring $n_\mathrm{gr}$ grid points in each coordinate and $n_t$ time grid points leads to $O(n_\mathrm{gr}^6)$ memory space and $O(n_tn_\mathrm{gr}^6)$ queries to the coefficients in the discretized PDE. We propose a quantum algorithm for this task. Linearizing the Vlasov equation by neglecting the relatively weak self-gravity of the neutrino, we perform the Hamiltonian simulation to produce quantum states that encode the phase space distribution of neutrino. We also propose a way to extract the power spectrum of the neutrino density perturbations as classical data from the quantum state by quantum amplitude estimation with accuracy $\epsilon$ and query complexity of order $\widetilde{O}((n_\mathrm{gr} + n_t)/\epsilon)$. Our method also reduces the space complexity to $O(\mathrm{polylog}(n_\mathrm{gr}/\epsilon))$ in terms of the qubit number, while using quantum random access memories with $O(n_\mathrm{gr}^3)$ entries. As far as we know, this is the first quantum algorithm for the LSS simulation that outputs the quantity of practical interest with guaranteed accuracy.	翻訳日:2024-02-28 22:20:59 公開日:2024-02-27
# LLMはプロンプトを通してグラフ構造情報を効果的に活用できるのか? Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why? ( http://arxiv.org/abs/2309.16595v3 ) ライセンス: Link先を確認	Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma	(参考訳) 大規模言語モデル(LLM)は、特にゼロショット方式で、リッチテキスト属性でグラフを処理する能力に注目が集まっている。最近の研究では、llmは一般的なテキストリッチグラフベンチマークで適切なテキスト分類性能を得ており、エンコードされた構造情報を自然言語としてプロンプトに付加することで、パフォーマンスを向上させることができる。グラフデータに固有の構造情報の取り込みにより,LLMの予測性能が向上する理由を理解することを目的とする。まず、新しいリークフリーデータセットをキュレートし、以前に広く使用されていたデータセットと比較分析を行うことで、データ漏洩の懸念を解消する。第二に、過去の研究は通常、自然言語でグラフ構造を記述することで、エゴグラフをエンコードするので、LLMは、プロンプトデザイナの意図に従ってグラフ構造を理解するのか? 第3に,LLMが構造情報を組み込んだ後,性能を向上できる理由を検討する。これらの疑問を探究した結果 i) LLMの性能がデータ漏洩に大きく起因しているという実質的な証拠はない。 (ii)プロンプトをプロンプトデザイナーが意図するグラフ構造として理解する代わりに、llmはプロンプトをコンテクスト段落として処理する傾向がある。 (iii)プロンプトに含まれる局所近傍の最も効率的な要素は、グラフ構造ではなく、ノードラベルに関連する句である。 Large language models (LLMs) are gaining increasing attention for their capability to process graphs with rich text attributes, especially in a zero-shot fashion. Recent studies demonstrate that LLMs obtain decent text classification performance on common text-rich graph benchmarks, and the performance can be improved by appending encoded structural information as natural languages into prompts. We aim to understand why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs. First, we rule out the concern of data leakage by curating a novel leakage-free dataset and conducting a comparative analysis alongside a previously widely-used dataset. Second, as past work usually encodes the ego-graph by describing the graph structure in natural language, we ask the question: do LLMs understand the graph structure in accordance with the intent of the prompt designers? Third, we investigate why LLMs can improve their performance after incorporating structural information. Our exploration of these questions reveals that (i) there is no substantial evidence that the performance of LLMs is significantly attributed to data leakage; (ii) instead of understanding prompts as graph structures as intended by the prompt designers, LLMs tend to process prompts more as contextual paragraphs and (iii) the most efficient elements of the local neighborhood included in the prompt are phrases that are pertinent to the node label, rather than the graph structure.	翻訳日:2024-02-28 22:19:46 公開日:2024-02-27
# asymformer:モバイルプラットフォームリアルタイムrgb-dセマンティクスセグメンテーションのための非対称クロスモーダル表現学習 AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation ( http://arxiv.org/abs/2309.14065v5 ) ライセンス: Link先を確認	Siqi Du, Weixi Wang, Renzhong Guo and Shengjun Tang	(参考訳) ロボットインテリジェンスの世界では、効率的で正確なRGB-Dセマンティックセグメンテーションを実現することが鍵となる。最先端のマルチモーダルセマンティクスセグメンテーション手法は、主に対称スケルトンネットワークに根ざしており、計算効率と精度の調和が困難である。本研究では,実時間rgb-d意味セグメンテーションのための新しいネットワークであるasymformerを提案する。計算資源の分散を最適化することで超流動パラメータの最小化を目標とし,マルチモーダル特徴の効果的な融合を可能にする非対称バックボーンを導入する。さらに,パラメータ数を大幅に増加させることなく,特徴選択を再定義し,マルチモーダルな自己相似特徴を抽出することにより,ネットワークの精度を高める手法を検討する。さらに、LAFS(Local Attention-Guided Feature Selection)モジュールは、依存関係を活用することで、異なるモダリティから機能を選択的にフューズするために使用される。その後、CMA(Cross-Modal Attention-Guided Feature correlation Embedding)モジュールを導入し、クロスモーダル表現をさらに抽出する。この手法はNYUv2とSUNRGBDのデータセットで評価され、AsymFormerはNYUv2で54.1% mIoU、SUNRGBDで49.1% mIoUと競合する結果を示した。特に、AsymFormerは65 FPSの推論速度を達成し、混合精度量子化を実装した後、RTX3090上で79 FPSの予測速度を得る。これは既存のマルチモーダル法を大きく上回り、asymformerはrgb-dセマンティクスセグメンテーションの精度と効率のバランスを取ることができる。 In the realm of robotic intelligence, achieving efficient and precise RGB-D semantic segmentation is a key cornerstone. State-of-the-art multimodal semantic segmentation methods, primarily rooted in symmetrical skeleton networks, find it challenging to harmonize computational efficiency and precision. In this work, we propose AsymFormer, a novel network for real-time RGB-D semantic segmentation, which targets the minimization of superfluous parameters by optimizing the distribution of computational resources and introduces an asymmetrical backbone to allow for the effective fusion of multimodal features. Furthermore, we explore techniques to bolster network accuracy by redefining feature selection and extracting multi-modal self-similarity features without a substantial increase in the parameter count, thereby ensuring real-time execution on robotic platforms. Additionally, a Local Attention-Guided Feature Selection (LAFS) module is used to selectively fuse features from different modalities by leveraging their dependencies. Subsequently, a Cross-Modal Attention-Guided Feature Correlation Embedding (CMA) module is introduced to further extract cross-modal representations. This method is evaluated on NYUv2 and SUNRGBD datasets, with AsymFormer demonstrating competitive results with 54.1% mIoU on NYUv2 and 49.1% mIoU on SUNRGBD. Notably, AsymFormer achieves an inference speed of 65 FPS and after implementing mixed precision quantization, it attains an impressive inference speed of 79 FPS on RTX3090. This significantly outperforms existing multi-modal methods, thereby demonstrating that AsymFormer can strike a balance between high accuracy and efficiency for RGB-D semantic segmentation.	翻訳日:2024-02-28 22:18:52 公開日:2024-02-27
# EvalLM: ユーザ定義基準に基づく大規模言語モデルの対話的評価 EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria ( http://arxiv.org/abs/2309.13633v2 ) ライセンス: Link先を確認	Tae Soo Kim, Yoonjoo Lee, Jamin Shin, Young-Ho Kim, Juho Kim	(参考訳) プロンプトを構成するだけで、開発者はLarge Language Models (LLM)を使った新しい生成アプリケーションをプロトタイプできる。しかし、プロトタイプを製品化するためには、開発者は弱点を診断するために出力を評価することでプロンプトを反復的に修正する必要がある。フォーマティブ・インタビュー(N=8)では、開発者は文脈特化基準と主観的基準を評価する際に、アウトプットを手作業で評価することに多大な努力を払っていることが明らかになった。ユーザ定義基準に基づいて複数の出力を評価することで,プロンプトを反復精製するインタラクティブシステムであるEvalLMを提案する。自然言語の基準を記述することにより、ユーザはシステムのLCMベースの評価器を使用して、どのプロンプトがエキサイティングか、失敗かを概観し、評価器のフィードバックに基づいて改善することができる。比較研究(N=12)では、手動による評価と比較すると、EvalLMは、参加者がより多様な基準を策定し、出力の2倍を検査し、59%のリビジョンで満足なプロンプトに達するのに役立った。プロンプト以外にも、作業は特定のアプリケーションコンテキストにおけるモデル評価とアライメントの強化にまで拡張できます。 By simply composing prompts, developers can prototype novel generative applications with Large Language Models (LLMs). To refine prototypes into products, however, developers must iteratively revise prompts by evaluating outputs to diagnose weaknesses. Formative interviews (N=8) revealed that developers invest significant effort in manually evaluating outputs as they assess context-specific and subjective criteria. We present EvalLM, an interactive system for iteratively refining prompts by evaluating multiple outputs on user-defined criteria. By describing criteria in natural language, users can employ the system's LLM-based evaluator to get an overview of where prompts excel or fail, and improve these based on the evaluator's feedback. A comparative study (N=12) showed that EvalLM, when compared to manual evaluation, helped participants compose more diverse criteria, examine twice as many outputs, and reach satisfactory prompts with 59% fewer revisions. Beyond prompts, our work can be extended to augment model evaluation and alignment in specific application contexts.	翻訳日:2024-02-28 22:18:17 公開日:2024-02-27
# 多変量時系列予測のためのピラミッド隠れマルコフモデル Pyramidal Hidden Markov Model For Multivariate Time Series Forecasting ( http://arxiv.org/abs/2310.14341v2 ) ライセンス: Link先を確認	YeXin Huang	(参考訳) 隠れマルコフモデル(HMM)は、現在の値と過去の値に基づいて時系列の将来値を予測することができ、様々な種類の時系列を扱うための強力なアルゴリズムである。多くの研究が先進的手法を用いてHMMの改良を探求し、様々なHMMの開発に繋がった。これらの研究は、他の高度なアルゴリズムと比較してHMMの競争力の増大を示しているが、その性能に多段階確率的状態を導入することの重要性と影響を認識しているものは少ない。本研究では,複数段階の確率的状態をキャプチャできるPraamidal Hidden Markov Model (PHMM)を提案する。当初、多段階HMMは、短い多段階確率状態の抽出のために設計されている。次に、ピラミッドのような積み重ねを利用して長い多段階確率状態を適応的に同定するPHMMに基づく新しい時系列予測構造を提案する。これら2つのスキームを使用することで,非定常データやノイズデータを効果的に処理できると同時に,より正確で包括的な予測のための長期的な依存関係を確立することができる。多変量時系列データセットの実験結果は、時系列予測における競合相手と比較して提案したPHMMの優れた性能を確実に実証している。 The Hidden Markov Model (HMM) can predict the future value of a time series based on its current and previous values, making it a powerful algorithm for handling various types of time series. Numerous studies have explored the improvement of HMM using advanced techniques, leading to the development of several variations of HMM. Despite these studies indicating the increased competitiveness of HMM compared to other advanced algorithms, few have recognized the significance and impact of incorporating multistep stochastic states into its performance. In this work, we propose a Pyramidal Hidden Markov Model (PHMM) that can capture multiple multistep stochastic states. Initially, a multistep HMM is designed for extracting short multistep stochastic states. Next, a novel time series forecasting structure is proposed based on PHMM, which utilizes pyramid-like stacking to adaptively identify long multistep stochastic states. By employing these two schemes, our model can effectively handle non-stationary and noisy data, while also establishing long-term dependencies for more accurate and comprehensive forecasting. The experimental results on diverse multivariate time series datasets convincingly demonstrate the superior performance of our proposed PHMM compared to its competitive peers in time series forecasting.	翻訳日:2024-02-28 22:12:34 公開日:2024-02-27
# 分散学習タスクにおける生成モデルの評価について On the Evaluation of Generative Models in Distributed Learning Tasks ( http://arxiv.org/abs/2310.11714v3 ) ライセンス: Link先を確認	Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu	(参考訳) 生成的逆ネットワーク(gans)や拡散モデルを含む深層生成モデルの評価は文献で広く研究されている。既存の評価方法は、主に単一のクライアントが格納したトレーニングデータによる集中学習問題を対象としているが、生成モデルの多くの応用は、複数のクライアント間でトレーニングデータを収集し分散するフェデレーション学習シナリオなど、分散学習設定に関するものである。本稿では,異種データ分布を持つ分散学習タスクにおける生成モデルの評価について検討する。まず、Fr'echet開始距離(FID)に着目し、クライアントに対する以下のFIDベースの集計スコアを検討する。 1)クライアントの個別FIDスコアの平均としてのFID-avg 2)FID-allは、訓練されたモデルからすべてのクライアントのデータを含む集合データセットまでのFID距離である。 FID-allとFID-avgのスコアによるモデルランキングは矛盾する可能性があり、2つのスコアに応じて最適な生成モデルを生成することができる。次に、カーネル開始距離(KID)を考察し、同様にKID-avgおよびKID-allアグリゲーションを定義する。 FIDの場合とは異なり、KID-allとKID-avgは生成モデルと同じランキングになる。我々は,分散学習問題における生成モデルの評価に関する理論的知見を支援するために,標準画像データセットとトレーニングスキームに関する数値実験を行った。 The evaluation of deep generative models including generative adversarial networks (GANs) and diffusion models has been extensively studied in the literature. While the existing evaluation methods mainly target a centralized learning problem with training data stored by a single client, many applications of generative models concern distributed learning settings, e.g. the federated learning scenario, where training data are collected by and distributed among several clients. In this paper, we study the evaluation of generative models in distributed learning tasks with heterogeneous data distributions. First, we focus on the Fr\'echet inception distance (FID) and consider the following FID-based aggregate scores over the clients: 1) FID-avg as the mean of clients' individual FID scores, 2) FID-all as the FID distance of the trained model to the collective dataset containing all clients' data. We prove that the model rankings according to the FID-all and FID-avg scores could be inconsistent, which can lead to different optimal generative models according to the two aggregate scores. Next, we consider the kernel inception distance (KID) and similarly define the KID-avg and KID-all aggregations. Unlike the FID case, we prove that KID-all and KID-avg result in the same rankings of generative models. We perform several numerical experiments on standard image datasets and training schemes to support our theoretical findings on the evaluation of generative models in distributed learning problems.	翻訳日:2024-02-28 22:10:58 公開日:2024-02-27
# 医療におけるマルチモーダルフェデレート学習の展望 Multimodal Federated Learning in Healthcare: a Review ( http://arxiv.org/abs/2310.09650v2 ) ライセンス: Link先を確認	Jacob Thrasher, Alina Devkota, Prasiddha Siwakotai, Rohit Chivukula, Pranav Poudel, Chaunbo Hu, Binod Bhattarai, Prashnna Gyawali	(参考訳) マルチモーダル機械学習の最近の進歩は、医療領域、特に集中型データベースシステムにおいて、正確で堅牢なAIシステムの開発を促進する。同時に、フェデレートラーニング(FL)が進展し、データを統合する必要のない分散メカニズムを提供し、機密性の高い医療データのプライバシーとセキュリティを高める。これら2つの概念の統合は、医療におけるマルチモーダル学習の継続的な進歩をサポートしながら、ローカルなデータ保持機関内の患者の記録のセキュリティとプライバシを確保する。本稿では、医療におけるFLの重要性を簡潔に概説し、医療領域におけるMMFL(Multimodal Federated Learning)の最先端のアプローチについて概説する。この分野における既存の課題を包括的に調査し、現在のモデルの限界に光を当てる。最後に,最先端のai技術と医療アプリケーションにおける患者データプライバシの必要性とのギャップを埋めることを目的とした,この分野の今後の進歩への可能性について概説する。 Recent advancements in multimodal machine learning have empowered the development of accurate and robust AI systems in the medical domain, especially within centralized database systems. Simultaneously, Federated Learning (FL) has progressed, providing a decentralized mechanism where data need not be consolidated, thereby enhancing the privacy and security of sensitive healthcare data. The integration of these two concepts supports the ongoing progress of multimodal learning in healthcare while ensuring the security and privacy of patient records within local data-holding agencies. This paper offers a concise overview of the significance of FL in healthcare and outlines the current state-of-the-art approaches to Multimodal Federated Learning (MMFL) within the healthcare domain. It comprehensively examines the existing challenges in the field, shedding light on the limitations of present models. Finally, the paper outlines potential directions for future advancements in the field, aiming to bridge the gap between cutting-edge AI technology and the imperative need for patient data privacy in healthcare applications.	翻訳日:2024-02-28 22:09:37 公開日:2024-02-27
# PonderV2: ユニバーサルな事前学習パラダイムによる3Dファンデーションモデルへの道を開く PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm ( http://arxiv.org/abs/2310.08586v3 ) ライセンス: Link先を確認	Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang	(参考訳) 多くのNLPや2D視覚基礎モデルとは異なり、3D基礎モデルを学ぶことは大きな課題をもたらす。これは主に、ダウンストリームタスクの固有のデータばらつきと多様性に起因する。本稿では,効率的な3D表現の獲得を容易にするために設計された,新しいユニバーサル3D事前学習フレームワークを提案する。インフォメーション3d機能は、リアルな画像のレンダリングに使用できるリッチな幾何学と外観の手がかりをエンコードすべきであると考え、微分可能なニューラルネットワークによる3d表現を学習することを提案する。我々は、実画像と比較することにより、3Dバックボーンを設計したボリューム・ニューラル・レンダラーで訓練する。特に,本手法は学習した3Dエンコーダを様々な下流タスクにシームレスに統合する。これらのタスクは、3D検出やセグメンテーションといったハイレベルな課題だけでなく、3D再構成や画像合成といった低レベルな目標も含んでいる。また,提案手法を用いて2次元バックボーンを事前学習する能力を示し,従来のプレトレーニング法を大差で上回った。 PonderV2は11の室内および屋外ベンチマークで最先端のパフォーマンスを達成した。コードとモデルはhttps://github.com/opengvlab/ponderv2で入手できる。 In contrast to numerous NLP and 2D vision foundational models, learning a 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and diversity of downstream tasks. In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models. Considering that informative 3D features should encode rich geometry and appearance cues that can be utilized to render realistic images, we propose to learn 3D representations by differentiable neural rendering. We train a 3D backbone with a devised volumetric neural renderer by comparing the rendered with the real images. Notably, our approach seamlessly integrates the learned 3D encoder into various downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness. Code and models are available at https://github.com/OpenGVLab/PonderV2.	翻訳日:2024-02-28 22:08:43 公開日:2024-02-27
# 現代の非参照画像とビデオ品質のロバスト性と敵攻撃の比較 Comparing the Robustness of Modern No-Reference Image- and Video-Quality Metrics to Adversarial Attacks ( http://arxiv.org/abs/2310.06958v4 ) ライセンス: Link先を確認	Anastasia Antsiferova, Khaled Abud, Aleksandr Gushchin, Ekaterina Shumitskaya, Sergey Lavrushkin, Dmitriy Vatolin	(参考訳) 現在、ニューラルネットワークベースの画像およびビデオ品質指標は、従来の方法よりもパフォーマンスが良い。しかし、視覚的品質を改善することなくメトリクスのスコアを上げる敵攻撃にもより脆弱になった。既存の品質指標のベンチマークは、主観的品質と計算時間との相関の観点からパフォーマンスを比較する。それでも、画像品質指標の敵対的堅牢性も研究に値する分野である。本稿では,様々な攻撃に対する現代の指標のロバスト性について分析する。コンピュータビジョンタスクからの敵対的攻撃を適応させ,15の非参照画像およびビデオ品質指標に対する攻撃の効率性を比較した。一部のメトリクスは、脆弱なメトリクスよりも安全なベンチマークでの使用を可能にする、敵対的攻撃に対する高い抵抗を示している。このベンチマークでは、攻撃に対してメトリクスをより堅牢にしたい研究者や、必要に応じてそのようなメトリクスを見つける研究者に、新たなメトリクスの提出を受け付けている。最新の結果は、https://videoprocessing.ai/benchmarks/metrics-robustness.htmlで見ることができる。 Nowadays, neural-network-based image- and video-quality metrics perform better than traditional methods. However, they also became more vulnerable to adversarial attacks that increase metrics' scores without improving visual quality. The existing benchmarks of quality metrics compare their performance in terms of correlation with subjective quality and calculation time. Nonetheless, the adversarial robustness of image-quality metrics is also an area worth researching. This paper analyses modern metrics' robustness to different adversarial attacks. We adapted adversarial attacks from computer vision tasks and compared attacks' efficiency against 15 no-reference image- and video-quality metrics. Some metrics showed high resistance to adversarial attacks, which makes their usage in benchmarks safer than vulnerable metrics. The benchmark accepts submissions of new metrics for researchers who want to make their metrics more robust to attacks or to find such metrics for their needs. The latest results can be found online: https://videoprocessing.ai/benchmarks/metrics-robustness.html.	翻訳日:2024-02-28 22:08:19 公開日:2024-02-27
# neco: 分布外検出に基づく神経崩壊 NECO: NEural Collapse Based Out-of-distribution detection ( http://arxiv.org/abs/2310.06823v3 ) ライセンス: Link先を確認	Mou\"in Ben Ammar, Nacim Belkhir, Sebastian Popescu, Antoine Manzanera, Gianni Franchi	(参考訳) アウト・オブ・ディストリビューション(ood)データの検出は、モデル過信(model overconfidence)による機械学習における重要な課題である。我々は、損失収束を超えて訓練されたモデルの分配データに影響を及ぼす現象である「神経崩壊」もOODデータに影響を与えると仮定する。この相互作用を生かしたNECOは,「神経崩壊」や主成分空間の幾何学的特性を活用してOODデータを識別する新しいポストホックなOOD検出法である。 NECOは,大規模OOD検出タスクと大規模OOD検出タスクの両方において,異なるネットワークアーキテクチャにまたがる強力な一般化能力を示しながら,最先端の成果が得られることを示す。さらに,OOD検出における本手法の有効性を理論的に説明する。コードはhttps://gitlab.com/drti/necoで入手できる。 Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. Code is available at https://gitlab.com/drti/neco	翻訳日:2024-02-28 22:08:06 公開日:2024-02-27
# 労働空間:大規模言語モデルによる労働市場の統一表現 Labor Space: A Unifying Representation of the Labor Market via Large Language Models ( http://arxiv.org/abs/2311.06310v3 ) ライセンス: Link先を確認	Seongwoon Kim, Yong-Yeol Ahn, Jaehyuk Park	(参考訳) 労働市場は、産業、職業、技能、企業など、多様な相互接続された組織からなる複雑なエコシステムである。これらの異種エンティティをマッピングするための体系的な方法が欠如していることから、各エンティティは孤立的あるいはペア的な関係を通じてのみ分析され、エコシステム全体の包括的理解を阻害している。ここでは,不均質な労働市場エンティティのベクトル空間埋め込みである$\textit{labor space}$を導入する。労働空間は、産業、職業、技能、企業のコヒーレントな統合分析を促進するとともに、タイプ固有のクラスタリングを維持しながら、様々な労働市場の構成要素の複雑な関係構造を公開する。我々は,「製造-医療」のような経済軸上で異質な実体を配置することを含む,前例のない分析能力を示す。さらに、これらの実体のベクトル演算を可能にして、労働空間は複雑な単位間関係の探索を可能にし、その後、個々の単位に対する経済ショックの分岐と労働市場全体の波及効果を推定する。労働空間は、政策立案者やビジネスリーダーに労働市場分析とシミュレーションのための包括的な統合枠組みを提供し、より曖昧で効果的な戦略的意思決定を促進すると仮定する。 The labor market is a complex ecosystem comprising diverse, interconnected entities, such as industries, occupations, skills, and firms. Due to the lack of a systematic method to map these heterogeneous entities together, each entity has been analyzed in isolation or only through pairwise relationships, inhibiting comprehensive understanding of the whole ecosystem. Here, we introduce $\textit{Labor Space}$, a vector-space embedding of heterogeneous labor market entities, derived through applying a large language model with fine-tuning. Labor Space exposes the complex relational fabric of various labor market constituents, facilitating coherent integrative analysis of industries, occupations, skills, and firms, while retaining type-specific clustering. We demonstrate its unprecedented analytical capacities, including positioning heterogeneous entities on an economic axes, such as `Manufacturing--Healthcare'. Furthermore, by allowing vector arithmetic of these entities, Labor Space enables the exploration of complex inter-unit relations, and subsequently the estimation of the ramifications of economic shocks on individual units and their ripple effect across the labor market. We posit that Labor Space provides policymakers and business leaders with a comprehensive unifying framework for labor market analysis and simulation, fostering more nuanced and effective strategic decision-making.	翻訳日:2024-02-28 22:01:22 公開日:2024-02-27
# 部分絡み合いエントロピーの測地:PEEスレッドからビットスレッドへ Geometrizing the Partial Entanglement Entropy: from PEE Threads to Bit Threads ( http://arxiv.org/abs/2311.02301v5 ) ライセンス: Link先を確認	Jiong Lin, Yizhou Lu, Qiang Wen	(参考訳) ホログラフィックCFTにおける部分絡み合いエントロピー(PEE)をAdS/CFTの文脈で測る手法を提案する。より具体的には、ある点 $\textbf{x}$ が与えられたとき、これらの2点を接続するバルク測地学の観点で、$\textbf{x}$ と他の任意の点の間の2点 PEE を測地する。我々はこれらの測地線を \textit{pee threads} と呼び、これは自然に分岐のないベクトル場 $v_{\textbf{x}}^{\mu}$ の積分曲線と見なすことができ、これは我々が \emph{pee thread flow} と呼ぶ。 PEEスレッドの密度を特徴付ける$V_{\textbf{x}}^{\mu}$のノルムは、PEEの物理的要求によって決定できる。任意の静的区間または球面領域$A$に対して、状態によって決定されるPEEスレッド構成からユニークなビットスレッド構成を生成することができることを示す。したがって、中性でないビットスレッドは、内在的なpeスレッドから発生する。静的非連結区間の場合、分散のない流れを記述するベクトル場はRT式を再現するのにより適している。我々は、PEEスレッドを任意のホモロジー曲面と交差する回数で重み付けする。代わりに、RT式は、全ての重みの割り当てが可能なPEEスレッドの和の最小化として完全に再構成される。 We give a scheme to geometrize the partial entanglement entropy (PEE) for holographic CFT in the context of AdS/CFT. More explicitly, given a point $\textbf{x}$ we geometrize the two-point PEEs between $\textbf{x}$ and any other points in terms of the bulk geodesics connecting these two points. We refer to these geodesics as the \textit{PEE threads}, which can be naturally regarded as the integral curves of a divergenceless vector field $V_{\textbf{x}}^{\mu}$, which we call \emph{PEE thread flow}. The norm of $V_{\textbf{x}}^{\mu}$ that characterizes the density of the PEE threads can be determined by some physical requirements of the PEE. We show that, for any static interval or spherical region $A$, a unique bit thread configuration can be generated from the PEE thread configuration determined by the state. Hence, the non-intrinsic bit threads are emergent from the intrinsic PEE threads. For static disconnected intervals, the vector fields describing a divergenceless flow is are longer suitable to reproduce the RT formula. We weight the PEE threads with the number of times it intersects with any homologous surface. Instead the RT formula is perfectly reformulated to be the minimization of the summation of the PEE threads with all possible assignment of weights.	翻訳日:2024-02-28 22:00:26 公開日:2024-02-27
# FairSeg: フェアエラー境界スケーリング付きセグメンテーションモデルを用いたフェアネス学習のための大規模医療画像セグメンテーションデータセット FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling ( http://arxiv.org/abs/2311.02189v3 ) ライセンス: Link先を確認	Yu Tian and Min Shi and Yan Luo and Ava Kouhana and Tobias Elze and Mengyu Wang	(参考訳) 人工知能モデルの公正さは、特に医学領域において、人々の幸福と生活にとって医療モデルの公正さが不可欠であるため、近年、注目されている。フェアネス学習研究を促進するためには、高品質な医療フェアネスデータセットが必要である。既存の医療用フェアネスデータセットはすべて分類作業のためであり、医療用セグメンテーションにはフェアネスデータセットは使用できないが、医療用セグメンテーションは分類として同等に重要な臨床課題であり、臨床医が評価できる臓器異常の詳細な空間情報を提供することができる。本稿では,1万件の被験者を対象とする医学的セグメンテーションのためのフェアネスデータセットであるHarvard-FairSegを提案する。さらに,segment anything model (sam) を用いて,各idグループにおける上位エラーバウンドによる損失関数の重み付けを行うための,公正なエラーバウンドスケーリング手法を提案する。各アイデンティティグループで高いトレーニングエラーでハードケースに明示的に対処することで、セグメンテーション性能のエクイティを向上できると予想する。公平な比較を容易にするために、新しいエクイティスケールのセグメンテーション性能指標を用いて、エクイティスケールのDice係数のようなフェアネスの文脈におけるセグメンテーション指標を比較する。総合的な実験を通して、我々の公正なエラーバウンドスケーリングアプローチは、最先端の公正学習モデルよりも優れているか同等の公平性性能を持つことを示した。データセットとコードはhttps://ophai.hms.harvard.edu/harvard-fairseg10kから公開されている。 Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation, while medical segmentation is an equally important clinical task as classifications, which can provide detailed spatial information on organ abnormalities ready to be assessed by clinicians. In this paper, we propose the first fairness dataset for medical segmentation named Harvard-FairSeg with 10,000 subject samples. In addition, we propose a fair error-bound scaling approach to reweight the loss function with the upper error-bound in each identity group, using the segment anything model (SAM). We anticipate that the segmentation performance equity can be improved by explicitly tackling the hard cases with high training errors in each identity group. To facilitate fair comparisons, we utilize a novel equity-scaled segmentation performance metric to compare segmentation metrics in the context of fairness, such as the equity-scaled Dice coefficient. Through comprehensive experiments, we demonstrate that our fair error-bound scaling approach either has superior or comparable fairness performance to the state-of-the-art fairness learning models. The dataset and code are publicly accessible via https://ophai.hms.harvard.edu/harvard-fairseg10k.	翻訳日:2024-02-28 22:00:01 公開日:2024-02-27
# 構造制約による進化的パレートセット学習 Evolutionary Pareto Set Learning with Structure Constraints ( http://arxiv.org/abs/2310.20426v3 ) ライセンス: Link先を確認	Xi Lin, Xiaoyuan Zhang, Zhiyuan Yang, Qingfu Zhang	(参考訳) 多目的進化最適化アルゴリズム(MOEA)は、多目的最適化問題(MOP)に取り組むための強力なアプローチであり、単一のランで近似パレート解の有限集合を見つけることができる。しかし、穏やかな正則性条件の下では、連続 MOP のパレート最適集合は無限の解を含む低次元連続多様体である。さらに、すべてのソリューション間で共有されるパターンを特徴付ける最適解集合全体の構造的制約は、多くの実生活アプリケーションで必要となる。既存の有限集団に基づくMOEAがこれらの構造制約を適切に扱うことは非常に困難である。本研究では,多目的最適化のための構造制約付き解集合全体を学習する最初のモデルベースアルゴリズムフレームワークを提案する。私たちのアプローチでは、paretoの最適性は、ソリューションセット全体の中で望ましい構造で切り離すことができます。また,構造制約のある集合モデルを学習するための効率的な進化的学習法を開発した。ベンチマークテストスイートと実世界のアプリケーション問題に関する実験的研究は,提案フレームワークの有望な性能を示すものである。 The multiobjective evolutionary optimization algorithm (MOEA) is a powerful approach for tackling multiobjective optimization problems (MOPs), which can find a finite set of approximate Pareto solutions in a single run. However, under mild regularity conditions, the Pareto optimal set of a continuous MOP could be a low dimensional continuous manifold that contains infinite solutions. In addition, structure constraints on the whole optimal solution set, which characterize the patterns shared among all solutions, could be required in many real-life applications. It is very challenging for existing finite population based MOEAs to handle these structure constraints properly. In this work, we propose the first model-based algorithmic framework to learn the whole solution set with structure constraints for multiobjective optimization. In our approach, the Pareto optimality can be traded off with a preferred structure among the whole solution set, which could be crucial for many real-world problems. We also develop an efficient evolutionary learning method to train the set model with structure constraints. Experimental studies on benchmark test suites and real-world application problems demonstrate the promising performance of our proposed framework.	翻訳日:2024-02-28 21:59:13 公開日:2024-02-27
# 軌道予測のための条件付き無臭オートエンコーダ Conditional Unscented Autoencoders for Trajectory Prediction ( http://arxiv.org/abs/2310.19944v2 ) ライセンス: Link先を確認	Faris Janjo\v{s}, Marcel Hallgarten, Anthony Knittel, Maxim Dolgov, Andreas Zell, J. Marius Z\"ollner	(参考訳) CVAEはADの軌道予測において最も広く使われているモデルの一つである。運転状況と地中の未来の間の相互作用を確率的潜在空間に捉え、それを用いて予測を生成する。本稿では,CVAE の重要な構成要素に挑戦する。 CVAEの基礎となるVAEの空間における最近の進歩を利用して,サンプリング手順の簡単な変更が性能に大きな恩恵をもたらすことを示す。任意の学習分布からサンプルを決定論的に抽出する非スパイスサンプリングは,潜在的に危険なランダムサンプリングよりも軌道予測に適していることがわかった。さらに、より構造化されたガウス混合ラテント空間や、CVAEによる推論を行う新しい、より表現力のある方法など、さらなる改善も提供します。 CelebAデータセット上の画像モデリングのタスクや,ベースラインのvanilla CVAEよりも優れた画像モデリングのタスクにおいて,InterAction予測データセット上で評価することで,我々のモデルの適用性を示す。コードはhttps://github.com/boschresearch/cuae-predictionで入手できる。 The CVAE is one of the most widely-used models in trajectory prediction for AD. It captures the interplay between a driving context and its ground-truth future into a probabilistic latent space and uses it to produce predictions. In this paper, we challenge key components of the CVAE. We leverage recent advances in the space of the VAE, the foundation of the CVAE, which show that a simple change in the sampling procedure can greatly benefit performance. We find that unscented sampling, which draws samples from any learned distribution in a deterministic manner, can naturally be better suited to trajectory prediction than potentially dangerous random sampling. We go further and offer additional improvements including a more structured Gaussian mixture latent space, as well as a novel, potentially more expressive way to do inference with CVAEs. We show wide applicability of our models by evaluating them on the INTERACTION prediction dataset, outperforming the state of the art, as well as at the task of image modeling on the CelebA dataset, outperforming the baseline vanilla CVAE. Code is available at https://github.com/boschresearch/cuae-prediction.	翻訳日:2024-02-28 21:58:56 公開日:2024-02-27
# 部分ベイズニューラルネットワークのFeynman-Kacトレーニングについて On Feynman--Kac training of partial Bayesian neural networks ( http://arxiv.org/abs/2310.19608v3 ) ライセンス: Link先を確認	Zheng Zhao and Sebastian Mair and Thomas B. Sch\"on and Jens Sj\"olund	(参考訳) 近年,パラメータのサブセットのみを確率的と考える部分ベイズニューラルネットワーク (pbnns) が,完全なベイズニューラルネットワークと競合することが示された。しかし、pBNNはしばしば潜在変数空間において多重モードであり、パラメトリックモデルに近似することは困難である。そこで本研究では,Feynman-Kacモデルのシミュレーションとして,pBNNのトレーニングを定式化した,効率的なサンプリングベーストレーニング戦略を提案する。次に,このモデルのパラメータと潜在後続分布を同時に計算可能な計算コストで推定できる逐次モンテカルロサンプリングの変種について述べる。様々な合成および実世界のデータセットを用いて,提案したトレーニング手法が予測性能において技術状況より優れていることを示す。 Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. Using various synthetic and real-world datasets we show that our proposed training scheme outperforms the state of the art in terms of predictive performance.	翻訳日:2024-02-28 21:58:01 公開日:2024-02-27
# 大規模言語モデルのコーディネートに微調整された小言語モデルは複雑な推論を改善する Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning ( http://arxiv.org/abs/2310.18338v2 ) ライセンス: Link先を確認	Gurusha Juneja, Subhabrata Dutta, Soumen Chakrabarti, Sunny Manchanda, Tanmoy Chakraborty	(参考訳) 大きな言語モデル(LLM)は、チェーン・オブ・シント(CoT)の生成を促し、素晴らしい推論能力を示します。複雑で多段階の推論問題への迅速な分解の試みは、LLMが同時に分解し解決する能力に依存している。重大な欠点は、基礎的なLLMは一般に微調整には利用できないことであり、適応が計算的に禁止されていることである。問題分解とソリューション生成は別個のキャパレイトであり、1つのモノリシックなllmよりも別個のモジュールで対処する方がよいと確信している(そして実証する)。我々は,分解生成器を用いて複雑な問題を,より少ない推論ステップを必要とする部分問題に分解するdaslamを紹介する。これらの下位問題は解法によって解かれる。比較的小さな (13B パラメータ) LM を分解生成器として使用し、政策勾配最適化を用いて(ブラックボックスとして無視される) LM と相互作用し、サブプロブレムを通して誘導する。複数の異なる推論データセットの評価により,提案手法では1750億のパラメータLM(text-davinci-003)が,その大容量の後継であるGPT-4と比較して,競争力や性能を向上できることがわかった。さらに,DaSLaMはスケールの関数としての解の能力に制限されないことを示し,例えば,様々な大きさの解のLMは,解の非依存分解技術による大幅な性能向上をもたらすことを示した。排他的アブレーション研究は、非常に大きな分解器LLMよりもモジュラー微調整技術が優れていることを示す。 Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.	翻訳日:2024-02-28 21:57:47 公開日:2024-02-27
# 低温原子中の自発な多極スピン密度波 Spontaneously sliding multipole spin density waves in cold atoms ( http://arxiv.org/abs/2310.17305v2 ) ライセンス: Link先を確認	G. Labeyrie, J. G. M. Walker, G. R. M. Robb, R. Kaiser, and T. Ackemann	(参考訳) レーザー駆動型ルビジウム原子の基底状態における自発ドリフト結合スピンと四極子密度波の観測について報告する。これらのレーザー冷却原子アンサンブルは、反射鏡から光フィードバックを受けると、光を媒介する相互作用によって自発的な磁性を示す。波のドリフト方向とキラリティは自発対称性の破れから生じる。この観測は、非平衡磁気系における新しい輸送過程を示す。 We report on the observation of spontaneously drifting coupled spin and quadrupolar density waves in the ground state of laser driven Rubidium atoms. These laser-cooled atomic ensembles exhibit spontaneous magnetism via light mediated interactions when submitted to optical feedback by a retro-reflecting mirror. Drift direction and chirality of the waves arise from spontaneous symmetry breaking. The observations demonstrate a novel transport process in out-of-equilibrium magnetic systems.	翻訳日:2024-02-28 21:56:52 公開日:2024-02-27
# ニューラルネットワークのトラクタブルサドルフリーニュートン最適化のためのヘシアンベクトル生成物シリーズ Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks ( http://arxiv.org/abs/2310.14901v2 ) ライセンス: Link先を確認	Elre T. Oldewage, Ross M. Clarke and Jos\'e Miguel Hern\'andez-Lobato	(参考訳) 連続最適化の分野で人気があるにもかかわらず、二階準ニュートン法はヘシアン行列が難解に大きいため、機械学習に適用するのが困難である。この計算上の負担は、例えば、サドルフリーニュートン法のようにヘッセンの固有値を変更することで、非凸性に対処する必要性によって悪化する。本稿では,この2つの問題に対処する最適化アルゴリズムを提案する。このアルゴリズムは,絶対値固有値を持つ正逆 Hessian を漸近的に用いた最初の効率のよい最適化アルゴリズムである。本手法は,主に二乗根と正方形ヘッセンを逆転させる級数としてこの問題を定式化し,それを用いて勾配ベクトルを前処理する。この無限列の切断は、実行時および最適化性能の両方において、他の一階および二階の最適化方法に匹敵するスケーラブルな新しい最適化アルゴリズムを提供する。 CIFAR-10でトレーニングされたResNet-18など、さまざまな環境でこれを実証しています。 Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact inverse Hessian with absolute-value eigenvalues. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.	翻訳日:2024-02-28 21:56:45 公開日:2024-02-27
# SED:Open-Vocabulary Semantic Segmentationのための簡易エンコーダデコーダ SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation ( http://arxiv.org/abs/2311.15537v2 ) ライセンス: Link先を確認	Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang	(参考訳) 開語彙のセマンティックセグメンテーションは、画素を開圏の集合から異なるセマンティックグループに区別しようとする。既存の手法の多くは、ピクセルレベルのセグメンテーションタスクに画像レベルモデルを採用することが鍵となる、事前学習された視覚言語モデルの利用を探求している。本稿では,階層的エンコーダに基づくコストマップ生成とカテゴリ早期拒絶を伴う段階的融合デコーダからなる,オープンボキャブラリー意味セグメンテーションのための簡易エンコーダ・デコーダsedを提案する。階層エンコーダベースのコストマップ生成では、ピクセルレベルの画像テキストコストマップを予測するために、プレーントランスフォーマーの代わりに階層バックボーンを使用する。平易なトランスに比べて、階層的なバックボーンは局所的な空間情報をよりよくキャプチャし、入力サイズに関して線形計算の複雑さを持つ。我々の段階的な融合デコーダは、コストマップと、セグメンテーションのための異なるバックボーンレベルの特徴マップを組み合わせるためにトップダウン構造を用いる。予測速度を高速化するために,デコーダの初期層に存在しない多くのカテゴリを拒絶し,最大4.7倍の高速化を実現するデコーダのカテゴリ早期拒絶方式を導入する。 sed法の有効性を示す複数のopen-vocabulary semantic segmentation dataset上で実験を行った。 convnext-bを使用する場合、sed は ade20k 上で 31.6\% の miou スコアを達成し、単一の a6000 上の画像当たり 82ミリ秒 (ms$) のカテゴリで 150 のカテゴリを成す。私たちはそれを \url{https://github.com/xb534/SED.git} でリリースします。 Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language models, in which the key is to adopt the image-level model for pixel-level segmentation task. In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection. The hierarchical encoder-based cost map generation employs hierarchical backbone, instead of plain transformer, to predict pixel-level image-text cost map. Compared to plain transformer, hierarchical backbone better captures local spatial information and has linear computational complexity with respect to input size. Our gradual fusion decoder employs a top-down structure to combine cost map and the feature maps of different backbone levels for segmentation. To accelerate inference speed, we introduce a category early rejection scheme in the decoder that rejects many no-existing categories at the early layer of decoder, resulting in at most 4.7 times acceleration without accuracy degradation. Experiments are performed on multiple open-vocabulary semantic segmentation datasets, which demonstrates the efficacy of our SED method. When using ConvNeXt-B, our SED method achieves mIoU score of 31.6\% on ADE20K with 150 categories at 82 millisecond ($ms$) per image on a single A6000. We will release it at \url{https://github.com/xb534/SED.git}.	翻訳日:2024-02-28 21:52:46 公開日:2024-02-27
# BS-Diff:胸部X線画像からの条件拡散モデルを用いた効果的な骨抑制 BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images ( http://arxiv.org/abs/2311.15328v2 ) ライセンス: Link先を確認	Zhanghao Chen, Yifei Sun, Wenjian Qin, Ruiquan Ge, Cheng Pan, Wenming Deng, Zhou Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang	(参考訳) 胸部X線(CXR)は肺検診の低用量モードとして一般的に用いられる。しかし、肺領域の約75%が骨と重なり、疾患の検出と診断を妨げているため、CXRsの有効性は幾らか阻害されている。改善策として骨抑制技術が導入された。現在の病院のデュアルエネルギーサブトラクションイメージング技術では、高価な機器と被写体が高放射線にさらされる必要がある。これらの問題を回避すべく,深層学習に基づく画像生成アルゴリズムが提案されている。しかし, 既存の手法では, 高品質な画像が得られず, 特に肺血管のテクスチャの細部が捉えられにくい。これらの課題に対処するために,U-Netアーキテクチャとオートエンコーダを組み込むシンプルな拡張モジュールを備えた条件拡散モデルを備えた骨抑制フレームワークであるBS-Diffを提案する。提案するネットワークは骨抑制率の高い軟部組織像を生成するだけでなく,微細な画像の詳細を捉える能力も備えている。また,2010年以降で最大のデータセットを収集し,高精細度CXRと軟部組織像を関連病院で収集した120例のデータを収集した。広範囲な実験、比較分析、アブレーション研究、臨床評価は、提案されたBS-Diffが複数の指標でいくつかの骨圧モデルより優れていることを示している。私たちのコードはhttps://github.com/Benny0323/BS-Diffでアクセスできます。 Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics. Our code can be accessed at https://github.com/Benny0323/BS-Diff.	翻訳日:2024-02-28 21:52:14 公開日:2024-02-27
# TEA:テストタイムエネルギー適応 TEA: Test-time Energy Adaptation ( http://arxiv.org/abs/2311.14402v2 ) ライセンス: Link先を確認	Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, Huawei Shen, Xueqi Cheng	(参考訳) テストタイム適応(TTA)は、テストデータがトレーニング分布から分岐する際のモデル一般化性を改善することを目的としており、特に大規模な事前訓練モデルのコンテキストにおいて、トレーニングデータやプロセスへのアクセスを必要としないという明確な利点を提供する。しかし、現在のTTA法では基本的な問題に対処できない:共変量シフト(covariate shift)、すなわち、一般化可能性の低下は、モデルのキャリブレーションを損なう可能性があるトレーニングデータの限界分布に依存しているためである。そこで本研究では, 学習データやプロセスへのアクセスを必要とせず, モデルによる対象データ分布の知覚を向上させる, エネルギーに基づく新しい視点を提案する。この観点から、訓練された分類器をエネルギーベースのモデルに変換し、モデルの分布をテストデータと整合させ、テスト分布を知覚する能力を高め、全体的な一般化性を改善する。複数のタスク、ベンチマーク、アーキテクチャにわたる大規模な実験は、最先端の手法に対するTEAの優れた一般化性能を示している。さらに詳細な分析により、TAAはテスト分布を包括的に知覚し、最終的には一般化とキャリブレーションの改善への道を開くことができることが明らかになった。 Test-time adaptation (TTA) aims to improve model generalizability when test data diverges from training distribution, offering the distinct advantage of not requiring access to training data and processes, especially valuable in the context of large pre-trained models. However, current TTA methods fail to address the fundamental issue: covariate shift, i.e., the decreased generalizability can be attributed to the model's reliance on the marginal distribution of the training data, which may impair model calibration and introduce confirmation bias. To address this, we propose a novel energy-based perspective, enhancing the model's perception of target data distributions without requiring access to training data or processes. Building on this perspective, we introduce $\textbf{T}$est-time $\textbf{E}$nergy $\textbf{A}$daptation ($\textbf{TEA}$), which transforms the trained classifier into an energy-based model and aligns the model's distribution with the test data's, enhancing its ability to perceive test distributions and thus improving overall generalizability. Extensive experiments across multiple tasks, benchmarks and architectures demonstrate TEA's superior generalization performance against state-of-the-art methods. Further in-depth analyses reveal that TEA can equip the model with a comprehensive perception of test distribution, ultimately paving the way toward improved generalization and calibration.	翻訳日:2024-02-28 21:51:49 公開日:2024-02-27
# 積分可能なスピン-$\frac{1}{2}$ XYZモデルにおける固有状態絡み合いエントロピー Eigenstate entanglement entropy in the integrable spin-$\frac{1}{2}$ XYZ model ( http://arxiv.org/abs/2311.10819v3 ) ライセンス: Link先を確認	Rafa{\l} \'Swi\k{e}tek, Maksymilian Kliczkowski, Lev Vidmar and Marcos Rigol	(参考訳) 我々は、積分可能な相互作用スピン-$$\frac{1}{2}$ XYZ鎖の高励起固有状態の絡み合いエントロピーの平均と標準偏差を、$U(1)$対称性と超対称性を持つ特別な直線から遠ざかる。平均固有状態絡み合いエントロピーは量子カオス相互作用モデルよりも小さい体積-法則係数を示す。超対称点において、縮退が計算平均に及ぼす影響を解消する。さらに、固有状態エンタングルメントエントロピーの正規化標準偏差はシステムサイズの増加とともに多項式的に減衰し、量子カオス相互作用モデルにおける指数減衰とは対照的である。この結果から,スピン=$\frac{1}{2}$鎖における積分性は,量子カオス相互作用モデルと比較して,高励起エネルギー固有状態の絡み合いエントロピーの標準偏差を減少させ,標準偏差を増大させることを示す。 We study the average and the standard deviation of the entanglement entropy of highly excited eigenstates of the integrable interacting spin-$\frac{1}{2}$ XYZ chain away from and at special lines with $U(1)$ symmetry and supersymmetry. We universally find that the average eigenstate entanglement entropy exhibits a volume-law coefficient that is smaller than that of quantum-chaotic interacting models. At the supersymmetric point, we resolve the effect that degeneracies have on the computed averages. We further find that the normalized standard deviation of the eigenstate entanglement entropy decays polynomially with increasing system size, which we contrast to the exponential decay in quantum-chaotic interacting models. Our results provide state-of-the art numerical evidence that integrability in spin-$\frac{1}{2}$ chains reduces the average, and increases the standard deviation, of the entanglement entropy of highly excited energy eigenstates when compared to those in quantum-chaotic interacting models.	翻訳日:2024-02-28 21:50:39 公開日:2024-02-27
# カスタマイズ可能なストックプールにおけるポートフォリオ管理のためのマスク可能なストック表現を用いた強化学習 Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools ( http://arxiv.org/abs/2311.10801v4 ) ライセンス: Link先を確認	Wentao Zhang, Yilei Zhao, Shuo Sun, Jie Ying, Yonggang Xie, Zitao Song, Xinrun Wang, Bo An	(参考訳) ポートフォリオ・マネジメント(pm)は金融取引の基本課題であり、長期利益を追求するために資本を異なる株式に最適に移すことを探求する。強化学習(rl)は金融市場との対話を通じてpmの有益なエージェントを訓練する可能性を最近示した。しかし、既存の仕事は、主に投資家の実際的な需要と矛盾する固定株プールに焦点を当てている。特に、異なる投資家のターゲットの株価プールは、市場国家との格差のために劇的に変動し、個々の投資家は、取引したい株式(例えば1つの人気株を追加する)を一時的に調整し、カスタマイズ可能な株式プール(csp)に繋がる可能性がある。既存のRL手法では、ストックプールを少し変更してもRLエージェントを再訓練する必要があるため、高い計算コストと不安定な性能が得られる。この課題に取り組むため,我々は,グローバルストックプール(gsp)でのワンショットトレーニングを通じてpmをcspで扱うための,マスキング可能なストック表現を備えた強化学習フレームワークであるearnmoreを提案する。具体的には,まず,ターゲットプールの外に在庫を隠蔽する機構を導入する。第2に,自己教師付きマスキングと再構築プロセスを通じて有意義な在庫表現を学習する。第3に、ポートフォリオが好意的な株式に集中し、ターゲットプールの外の株を無視するように再重み付けメカニズムが設計されている。米国株式市場の8つのサブセット株式プールに関する広範な実験を通じて、EarnMoreは、利益の40%以上向上した6つの一般的な財務指標において、14の最先端のベースラインを大きく上回っていることを実証した。 Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.	翻訳日:2024-02-28 21:50:18 公開日:2024-02-27
# オンラインでどこで物語を語るのか? オンラインコミュニティ全体でのストーリー検出 Where Do People Tell Stories Online? Story Detection Across Online Communities ( http://arxiv.org/abs/2311.09675v2 ) ライセンス: Link先を確認	Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper	(参考訳) オンラインコミュニティにおけるストーリーの検出は、ストーリーがコミュニティに散らばり、単一のテキスト内でノンストーリーテリングスパンと織り交ぜられるため、難しい作業である。我々は、502のreddit投稿とコメントの豊富な注釈付きデータセット、ソーシャルメディアコンテキストに適応した詳細なコードブック、ドキュメントとスパンレベルでストーリーテリングを予測するモデルを含む、storyseekerツールキットを構築してリリースすることで、この課題に対処します。私たちのデータセットは、33のトピックカテゴリにわたる数百の英語コミュニティからサンプルを受け取り、バイナリストーリーラベル、ストーリースパン、イベントスパンなど、詳細な専門家アノテーションが含まれています。我々は,本データを用いたさまざまな検出手法の評価を行い,新たなタスクとして紹介するストーリーテリングスパン検出に着目し,オンラインストーリーテリングの特徴を識別する。我々は,大規模なコミュニティ中心のソーシャルメディアプラットフォーム上でのストーリーテリングの分布特性を照らし,また,物語テリングを多くの説得的戦略の1つとして活用するr/ChangeMyViewのケーススタディも実施し,我々のデータとモデルがコミュニティ間およびコミュニティ内研究の両方に利用できることを示した。最後に,ナラトロジーにおけるツールの意味と分析,およびオンラインコミュニティの研究について論じる。 Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict storytelling at the document and span level. Our dataset is sampled from hundreds of popular English-language Reddit communities ranging across 33 topic categories, and it contains fine-grained expert annotations, including binary story labels, story spans, and event spans. We evaluate a range of detection methods using our data, and we identify the distinctive textual features of online storytelling, focusing on storytelling span detection, which we introduce as a new task. We illuminate distributional characteristics of storytelling on a large community-centric social media platform, and we also conduct a case study on r/ChangeMyView, where storytelling is used as one of many persuasive strategies, illustrating that our data and models can be used for both inter- and intra-community research. Finally, we discuss implications of our tools and analyses for narratology and the study of online communities.	翻訳日:2024-02-28 21:49:30 公開日:2024-02-27
# 知識ベース質問応答のためのマイナショット転送学習--教師付きモデルと文脈内学習の融合 Few-shot Transfer Learning for Knowledge Base Question Answering: Fusing Supervised Models with In-Context Learning ( http://arxiv.org/abs/2311.08894v2 ) ライセンス: Link先を確認	Mayur Patidar, Riya Sawhney, Avinash Singh, Biswajit Chatterjee, Mausam, Indrajit Bhattacharya	(参考訳) 既存のKnowledge Base Question Answering (KBQA)アーキテクチャは、注釈付きデータに飢えているため、デプロイに時間と費用がかかる。我々は、ターゲットドメインがいくつかのラベル付き例しか提供していないが、大きなラベル付きトレーニングデータセットがソースドメインで利用可能であるkbqaの、少数ショット転送学習の問題を紹介する。本稿では、複数のソーストレーニングされたレトリバーを用いてKB-retrievalを実行し、LLMを用いて再ランクし、これをLLMによる少数ショットインコンテキスト学習の入力として使用し、論理形式を生成するFuSIC-KBQAという新しいKBQAアーキテクチャを提案する。ソースターゲットKBQAの4組の実験により、FuSIC-KBQAはSoTA KBQAモデルの適応よりも大幅に優れていた。ドメイン内設定における追加実験により、FuSIC-KBQAは訓練データに制限がある場合、SoTA KBQAモデルよりも優れていることが示された。 Existing Knowledge Base Question Answering (KBQA) architectures are hungry for annotated data, which make them costly and time-consuming to deploy. We introduce the problem of few-shot transfer learning for KBQA, where the target domain offers only a few labeled examples, but a large labeled training dataset is available in a source domain. We propose a novel KBQA architecture called FuSIC-KBQA that performs KB-retrieval using multiple source-trained retrievers, re-ranks using an LLM and uses this as input for LLM few-shot in-context learning to generate logical forms, which are further refined using execution-guided feedback. Experiments over four source-target KBQA pairs of varying complexity show that FuSIC-KBQA significantly outperforms adaptations of SoTA KBQA models for this setting. Additional experiments in the in-domain setting show that FuSIC-KBQA also outperforms SoTA KBQA models when training data is limited.	翻訳日:2024-02-28 21:49:05 公開日:2024-02-27
# ActiveDC:Active Finetuningのための配電校正 ActiveDC: Distribution Calibration for Active Finetuning ( http://arxiv.org/abs/2311.07634v3 ) ライセンス: Link先を確認	Wenshuai Xu, Zhenghui Hu, Yu Lu, Jinzhou Meng, Qingjie Liu, Yunhong Wang	(参考訳) プレトレーニング・ファインタニングのパラダイムは様々なコンピュータビジョンタスクで人気を集めている。このパラダイムでは、大規模なデータとコストのかかるアノテーションの要求により、アクティブな微調整が出現する。アクティブな微調整は、アノテーションのためにラベルのないプールからデータのサブセットを選択し、その後の微調整を容易にする。しかし、限られた数のトレーニングサンプルを使用することでバイアスのある分布が生じ、モデルオーバーフィットにつながる可能性がある。本稿では,アクティブなファインタニングタスクのためのActiveDCと呼ばれる新しい手法を提案する。まず、選択すべき部分集合と連続空間における未ラベルプール全体の分布類似性を最適化することにより、アノテーションのためのサンプルを選択する。次に,ラベルなしプール内の暗黙のカテゴリ情報を利用して,選択したサンプルの分布を校正する。特徴の可視化は,分散キャリブレーションに対する我々のアプローチの有効性を直感的に把握する。サンプル比の異なる3つの画像分類データセットについて広範な実験を行った。その結果,ActiveDCは画像分類タスクのベースライン性能を一貫して上回ることがわかった。サンプリング比が低く、パフォーマンスが最大10%向上した場合には、特に改善が重要である。私たちのコードはリリースされます。 The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of training samples can lead to a biased distribution, potentially resulting in model overfitting. In this paper, we propose a new method called ActiveDC for the active finetuning tasks. Firstly, we select samples for annotation by optimizing the distribution similarity between the subset to be selected and the entire unlabeled pool in continuous space. Secondly, we calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool. The feature visualization provides an intuitive sense of the effectiveness of our approach to distribution calibration. We conducted extensive experiments on three image classification datasets with different sampling ratios. The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks. The improvement is particularly significant when the sampling ratio is low, with performance gains of up to 10%. Our code will be released.	翻訳日:2024-02-28 21:47:50 公開日:2024-02-27
# より高速なLDM推論のためのカスケード投機 Cascade Speculative Drafting for Even Faster LLM Inference ( http://arxiv.org/abs/2312.11462v4 ) ライセンス: Link先を確認	Ziyi Chen, Xiaocong Yang, Jiacheng Lin, Chenkai Sun, Kevin Chen-Chuan Chang, Jie Huang	(参考訳) 大規模言語モデル(LLM)推論の効率を高めるために導入された投機的復号法は、より小さなモデルでドラフトを生成する。より大きなターゲットモデルは、その出力に合わせてこのドラフトをレビューし、ターゲットモデルによる受け入れは、ターゲットモデルの実行数を減らす結果となり、最終的に効率が向上する。しかし、投機的復号法における起草過程は、自己回帰生成が遅いことを含み、その重要性に関係なくトークンの生成に等しい時間を割り当てる。これらの非効率性は総合的に投機的復号の最適性能に寄与する。 LLM推論をさらに改善するため、2種類のカスケードを組み込んだ投機的実行アルゴリズムであるカスケード投機ドラフト(CS Drafting)を導入する。 Vertical Cascadeはニューラルネットワークモデルからの自己回帰生成を排除し、Horizontal Cascadeはドラフトの時間割当を最適化して効率を向上する。両方のカスケードを組み合わせることで、CS Draftingは、ターゲットモデルと同じ出力分布を維持しながら、我々の実験で投機的復号化よりも最大81%高速化できる。私たちのコードはhttps://github.com/lfsszd/CS-Drafting.comで公開されています。 Introduced to enhance the efficiency of large language model (LLM) inference, speculative decoding operates by having a smaller model generate a draft. A larger target model then reviews this draft to align with its output, and any acceptance by the target model results in a reduction of the number of the target model runs, ultimately improving efficiency. However, the drafting process in speculative decoding includes slow autoregressive generation and allocates equal time to generating tokens, irrespective of their importance. These inefficiencies collectively contribute to the suboptimal performance of speculative decoding. To further improve LLM inference, we introduce Cascade Speculative Drafting (CS Drafting), a speculative execution algorithm that incorporates two types of cascades. The Vertical Cascade eliminates autoregressive generation from neural models, while the Horizontal Cascade optimizes time allocation in drafting for improved efficiency. Combining both cascades, CS Drafting achieves up to an 81 percent additional speedup over speculative decoding in our experiments, while maintaining the same output distribution as the target model. Our code is publicly available at https://github.com/lfsszd/CS-Drafting.	翻訳日:2024-02-28 21:41:36 公開日:2024-02-27
# グラフ上の一般化ニューラル拡散フレームワーク A Generalized Neural Diffusion Framework on Graphs ( http://arxiv.org/abs/2312.08616v3 ) ライセンス: Link先を確認	Yibo Li, Xiao Wang, Hongrui Liu, Chuan Shi	(参考訳) 近年の研究では、GNNと拡散過程の関連が明らかにされており、多くの拡散に基づくGNNが提案されている。しかしながら、これらの2つのメカニズムは密接に関連しているため、自然に1つの根本的な疑問が生じる: これらのGNNを正式に統一できる一般的な拡散フレームワークはあるか? この質問に対する回答は、GNNの学習プロセスの理解を深めるだけでなく、より広いクラスのGNNを設計するための新たな扉を開くかもしれない。本稿では,より多くのgnnと拡散過程の関係を形式的に確立する,忠実性項を持つ一般拡散方程式の枠組みを提案する。一方、この枠組みでは、グラフ拡散ネットワークの1つの特性、すなわち、現在の神経拡散過程は1次拡散方程式にのみ対応している。しかし, 実験により, 高次隣人のラベルは実際には単相性を示しており, 上位隣人のラベルに基づく類似性は, 一階隣人の類似性を必要としないことがわかった。この発見の動機は、新しい高次隣り合う拡散方程式を設計し、フレームワークに基づいた新しいタイプのグラフ拡散ネットワーク(HiD-Net)を導出することにある。高次拡散方程式では、hid-netは攻撃に対してより強固であり、ホモフィリーグラフとヘテロフィリーグラフの両方で動作する。我々は,HiD-Netと高次ランダムウォークの関係を理論的に解析するだけでなく,理論的収束保証を提供する。グラフ拡散ネットワークにおけるHiD-Netの有効性を実験的に検証した。 Recent studies reveal the connection between GNNs and the diffusion process, which motivates many diffusion-based GNNs to be proposed. However, since these two mechanisms are closely related, one fundamental question naturally arises: Is there a general diffusion framework that can formally unify these GNNs? The answer to this question can not only deepen our understanding of the learning process of GNNs, but also may open a new door to design a broad new class of GNNs. In this paper, we propose a general diffusion equation framework with the fidelity term, which formally establishes the relationship between the diffusion process with more GNNs. Meanwhile, with this framework, we identify one characteristic of graph diffusion networks, i.e., the current neural diffusion process only corresponds to the first-order diffusion equation. However, by an experimental investigation, we show that the labels of high-order neighbors actually exhibit monophily property, which induces the similarity based on labels among high-order neighbors without requiring the similarity among first-order neighbors. This discovery motives to design a new high-order neighbor-aware diffusion equation, and derive a new type of graph diffusion network (HiD-Net) based on the framework. With the high-order diffusion equation, HiD-Net is more robust against attacks and works on both homophily and heterophily graphs. We not only theoretically analyze the relation between HiD-Net with high-order random walk, but also provide a theoretical convergence guarantee. Extensive experimental results well demonstrate the effectiveness of HiD-Net over state-of-the-art graph diffusion networks.	翻訳日:2024-02-28 21:41:15 公開日:2024-02-27
# drivinggaussian: 動的自律走行シーンのための複合ガウスプレート DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes ( http://arxiv.org/abs/2312.07920v2 ) ライセンス: Link先を確認	Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang	(参考訳) 我々は動的自律走行シーンを囲む効率的かつ効果的なフレームワークであるDrivingGaussianを提案する。移動物体を持つ複雑なシーンでは、まずシーン全体の静的背景を段階的に、段階的に3Dガウスアンでモデル化する。次に,複合動的ガウスグラフを用いて複数の移動物体を処理し,個々の物体を個別に再構成し,それらの正確な位置と咬合関係を再現する。我々はさらに、ガウススプラッティングに先立ってLiDARを使用して、より詳細でシーンを再構築し、パノラマ一貫性を維持する。ドライビングガウシアンはシーン再構成の既存の手法よりも優れており、高忠実でマルチカメラの整合性を備えたフォトリアリスティックサラウンドビュー合成を可能にする。プロジェクトページは、https://github.com/VDIGPKU/DrivingGaussian.comです。 We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene with incremental static 3D Gaussians. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects, individually reconstructing each object and restoring their accurate positions and occlusion relationships within the scene. We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency. DrivingGaussian outperforms existing methods in driving scene reconstruction and enables photorealistic surround-view synthesis with high-fidelity and multi-camera consistency. Our project page is at: https://github.com/VDIGPKU/DrivingGaussian.	翻訳日:2024-02-28 21:40:49 公開日:2024-02-27
# ヘマトキシリンおよびエオシンスライス画像からの乳がんHER2の予測のためのフェデレート学習を用いたポイントトランスフォーマー Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images ( http://arxiv.org/abs/2312.06454v3 ) ライセンス: Link先を確認	Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian	(参考訳) ヒト表皮成長因子受容体2(HER2)を、広く利用可能なヘマトキシリンおよびエオシン含有全スライド画像(WSI)から直接予測することで、技術的コストを低減し、治療選択を迅速化することができる。 HER2を正確に予測するには、多地点WSIの大規模なコレクションが必要である。フェデレートラーニングは、ギガバイトサイズのWSIとデータプライバシの懸念なしに、これらのWSIの協調的なトレーニングを可能にする。しかし,実世界の多地点WSIにおけるラベル不均衡に対処する上で,連合学習は課題に直面している。さらに、既存のwsi分類手法では、フェデレーション学習のサイト-エンド特徴表現において、ローカルコンテキスト情報と長距離依存性を同時に利用することはできない。そこで本研究では,多地点her2状態予測のためのフェデレーション学習を伴う点トランスフォーマを提案する。我々のアプローチには2つの新しいデザインが組み込まれている。本稿では, 動的ラベル分布戦略と補助分類器を提案し, 適切な初期化モデルを確立し, サイト間でのラベル分布のばらつきを軽減する。さらに,コサイン距離に基づく最遠のコサインサンプリングを提案する。最も特徴的な特徴をサンプリングし、長距離の依存関係をキャプチャする。広範な実験と解析により,本手法は4地点で2687wsisの最先端性能を達成できた。さらに,本モデルが229 wsisの未発見部位に一般化できることを実証する。 Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation and data privacy concerns. However, federated learning encounters challenges in addressing label imbalance in multi-site WSIs from the real world. Moreover, existing WSI classification methods cannot simultaneously exploit local context information and long-range dependencies in the site-end feature representation of federated learning. To address these issues, we present a point transformer with federated learning for multi-site HER2 status prediction from HE-stained WSIs. Our approach incorporates two novel designs. We propose a dynamic label distribution strategy and an auxiliary classifier, which helps to establish a well-initialized model and mitigate label distribution variations across sites. Additionally, we propose a farthest cosine sampling based on cosine distance. It can sample the most distinctive features and capture the long-range dependencies. Extensive experiments and analysis show that our method achieves state-of-the-art performance at four sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can generalize to two unseen sites with 229 WSIs.	翻訳日:2024-02-28 21:40:37 公開日:2024-02-27
# knowgpt: 大きな言語モデルのための知識注入 KnowGPT: Knowledge Injection for Large Language Models ( http://arxiv.org/abs/2312.06185v3 ) ライセンス: Link先を確認	Qinggang Zhang, Junnan Dong, Hao Chen, Daochen Zha, Zailiang Yu, Xiao Huang	(参考訳) ChatGPTのようなジェネレーティブ大型言語モデル(LLM)は、人間-専門家レベルで一般的な質問に答えるインタラクティブAPIを提供する。しかしながら、これらのモデルは、トレーニングコーパスにカバーされていないドメイン固有の知識や専門的な知識を必要とする質問に直面した時に、不正確な、または誤った応答を与えることが多い。さらに、最先端のLLMの多くはオープンソースではないため、モデルAPIでのみ知識を注入することは困難である。本研究では,LLMのためのブラックボックス知識注入フレームワークであるKnowGPTを紹介する。 KnowGPTは、深い強化学習(RL)を活用して知識グラフ(KGs)から関連する知識を抽出し、マルチアーメッド帯域(MAB)を使用して各質問に最適なプロンプトを構築する。 3つのベンチマークデータセットに関する広範な実験では、knowgptが既存のメソッドを大幅に強化しています。特に、KnowGPTはChatGPTよりも平均23.7%改善し、GPT-4より平均2.9%改善した。さらに、KnowGPTはOpenbookQAの公式リーダーボードで91.6%の精度を達成している。 Generative Large Language Models (LLMs), such as ChatGPT, offer interactive APIs that can answer common questions at a human-expert level. However, these models often give inaccurate or incorrect responses when faced with questions requiring domain-specific or professional-specific knowledge not covered in their training corpus. Furthermore, many state-of-the-art LLMs are not open-source, making it challenging to inject knowledge with model APIs only. In this work, we introduce KnowGPT, a black-box knowledge injection framework for LLMs in question answering. KnowGPT leverages deep reinforcement learning (RL) to extract relevant knowledge from Knowledge Graphs (KGs) and use Multi-Armed Bandit (MAB) to construct the most suitable prompt for each question. Our extensive experiments on three benchmark datasets showcase that KnowGPT significantly enhances the existing methods. Notably, KnowGPT achieves an average improvement of 23.7% over ChatGPT and an average improvement of 2.9% over GPT-4. Additionally, KnowGPT attains a 91.6% accuracy on the OpenbookQA official leaderboard, which is comparable to human-level performance.	翻訳日:2024-02-28 21:40:12 公開日:2024-02-27
# InteractDiffusion:テキスト間拡散モデルにおける相互作用制御 InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.05849v2 ) ライセンス: Link先を確認	Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu	(参考訳) 大規模テキスト・ツー・イメージ(t2i)拡散モデルは、テキスト記述に基づいてコヒーレントな画像を生成する素晴らしい能力を示しており、コンテンツ生成における広大な応用を可能にしている。近年, 物体の局所化, 姿勢, 画像の輪郭などの要因の制御が進んでいるが, 生成コンテンツ中の物体間の相互作用を制御できる重要なギャップが残っている。生成した画像内の対話をうまく制御することで、対話的なキャラクターで現実的なシーンを作るといった有意義な応用が可能になる。本研究では,三重項ラベル(人,行動,対象)と対応する境界ボックスからなる人間-対象間相互作用(hoi)情報を用いたt2i拡散モデルの条件付け問題について検討する。我々は、既存の訓練済みT2I拡散モデルを拡張して、相互作用により良い条件付けを可能にする、InteractDiffusionと呼ばれるプラグイン可能な相互作用制御モデルを提案する。具体的には、HOI情報をトークン化し、インタラクション埋め込みを通じてそれらの関係を学習する。条件付き自己アテンション層は、HOIトークンを視覚トークンにマッピングするように訓練され、既存のT2I拡散モデルにおいて視覚トークンをよりよく条件付ける。提案モデルでは,既存のT2I拡散モデルにおける相互作用と位置の制御が可能であり,HOI検出スコアの差が大きく,FIDおよびKIDの忠実度も大きく向上する。プロジェクトページ: https://jiuntian.github.io/interactdiffusion。 Large-scale text-to-image (T2I) diffusion models have showcased incredible capabilities in generating coherent images based on textual descriptions, enabling vast applications in content generation. While recent advancements have introduced control over factors such as object localization, posture, and image contours, a crucial gap remains in our ability to control the interactions between objects in the generated content. Well-controlling interactions in generated images could yield meaningful applications, such as creating realistic scenes with interacting characters. In this work, we study the problems of conditioning T2I diffusion models with Human-Object Interaction (HOI) information, consisting of a triplet label (person, action, object) and corresponding bounding boxes. We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions. Specifically, we tokenize the HOI information and learn their relationships via interaction embeddings. A conditioning self-attention layer is trained to map HOI tokens to visual tokens, thereby conditioning the visual tokens better in existing T2I diffusion models. Our model attains the ability to control the interaction and location on existing T2I diffusion models, which outperforms existing baselines by a large margin in HOI detection score, as well as fidelity in FID and KID. Project page: https://jiuntian.github.io/interactdiffusion.	翻訳日:2024-02-28 21:39:50 公開日:2024-02-27
# 深層生成ネットワークに基づく音声合成のためのニューラル音声埋め込み Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks ( http://arxiv.org/abs/2312.05814v2 ) ライセンス: Link先を確認	Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee	(参考訳) 脳音声技術は、人工知能、脳-コンピュータインタフェース、音声合成の分野を含む学際的応用の融合を表す。ニューラル表現学習に基づく意図的復号と音声合成は、神経活動と人間の言語コミュニケーションの手段を直接接続し、コミュニケーションの自然性を大幅に向上させる。表現学習と音声合成技術の発展に関する最近の発見により、脳信号の音声への直接翻訳は大きな可能性を秘めている。特に、ニューラルネットワークに与えられた処理された入力特徴とニューラルスピーチ埋め込みは、脳信号からの音声生成に深い生成モデルを使用する場合、全体的なパフォーマンスにおいて重要な役割を果たす。本稿では,脳信号からの音声合成を可能とし,最終的には非言語コミュニケーションの革新を促進する現在の脳-音声技術を紹介する。また,音声合成作業において重要な役割を担っていると思われる,神経生理学的アクティベーションの基盤となる神経特徴や音声の埋め込みを包括的に分析する。 Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.	翻訳日:2024-02-28 21:39:26 公開日:2024-02-27
# 量子物理学における創発時間と時間移動 Emergent Time and Time Travel in Quantum Physics ( http://arxiv.org/abs/2312.05202v2 ) ライセンス: Link先を確認	Ana Alonso-Serrano (Humboldt-Universit\"at zu Berlin and Max-Planck-Institut f\"ur Gravitationsphysik, Potsdam), Sebastian Schuster (Charles University Prague), Matt Visser (Victoria University of Wellington)	(参考訳) タイムトラベルの可能性を得るには、基本物理学という重要な概念に必ず挑戦する。様々な確立された物理学の分野と異なる出発点を用いて、複数の論理矛盾を構築するのは比較的容易になる。時として、量子重力の完全な理論だけがこれらの論理的矛盾を解決できるという解釈がある。それでも、多くの問題が克服できるかどうかは不明だ。しかし、これは物理学における時間旅行の概念であるように思えるが、そのような量子重力への言及は、時間旅行に対するこれらの反論のほとんどに対して、長年にわたる挑戦を伴っている: これらの議論は時間に依存するが、量子重力は(明らかに)時間の問題に悩まされ、対処している。標準的な枠組みの中でこの問題に答えようとする試みの1つは、時間の概念としてページ・ウーターズ形式主義と最近のゲージ理論の再解釈をもたらした。ここでは、時間という創発的な概念が時間旅行の可能性について何を教えてくれるかを理解することを目的として、量子理論におけるハミルトンの制約を実装するおもちゃモデルを研究するプログラムを開始する。 Entertaining the possibility of time travel will invariably challenge dearly held concepts of fundamental physics. It becomes relatively easy to construct multiple logical contradictions using differing starting points from various well-established fields of physics. Sometimes, the interpretation is that only a full theory of quantum gravity will be able to settle these logical contradictions. Even then, it remains unclear if the multitude of problems could be overcome. Yet as definitive as this seems to the notion of time travel in physics, such a recourse to quantum gravity comes with its own, long-standing challenge to most of these counter-arguments to time travel: These arguments rely on time, while quantum gravity is (in)famously stuck with and dealing with the problem of time. One attempt to answer this problem within the canonical framework resulted in the Page-Wootters formalism, and its recent gauge-theoretic re-interpretation - as an emergent notion of time. Herein, we will begin a programme to study toy models implementing the Hamiltonian constraint in quantum theory, with an aim towards understanding what an emergent notion of time can tell us about the (im)possibility of time travel.	翻訳日:2024-02-28 21:39:08 公開日:2024-02-27
# AttriHuman-3D: 属性分解とインデックス化による編集可能な3次元アバター生成 AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing ( http://arxiv.org/abs/2312.02209v3 ) ライセンス: Link先を確認	Fan Yang, Tianyi Chen, Xiaosheng He, Zhongang Cai, Lei Yang, Si Wu, Guosheng Lin	(参考訳) ユーザインタラクション編集をサポートする編集可能な3D認識生成は、最近、急速な開発を目撃している。しかし、既存の編集可能な3d ganは高精度なローカル編集を達成できなかったり、膨大な計算コストを被ったりする。本稿では、上記の属性分解とインデックス化の問題に対処する編集可能な3次元人文生成モデルであるAttriHuman-3Dを提案する。提案モデルの中核となる考え方は、6つの特徴面を持つ全体属性空間において、すべての属性(人体、髪、衣服など)を生成し、それらを分解し、異なる属性インデックスで操作することである。生成した特徴平面から異なる属性の特徴を高精度に抽出するために,新しい属性索引法と直交射影正規化法を提案する。また,超ラテントトレーニング戦略と属性特異的サンプリング戦略を導入し,判別者からのスタイル絡み合いや誤解を招く罰を回避する。提案手法では, ユーザーが生成した3次元アバターの属性を対話的に編集し, 他者を固定する。質的かつ定量的な実験により,本モデルが異なる属性間の強い絡み合いを与え,精細な画像編集を可能にし,高品質な3dアバターを生成できることが証明された。 Editable 3D-aware generation, which supports user-interacted editing, has witnessed rapid development recently. However, existing editable 3D GANs either fail to achieve high-accuracy local editing or suffer from huge computational costs. We propose AttriHuman-3D, an editable 3D human generation model, which address the aforementioned problems with attribute decomposition and indexing. The core idea of the proposed model is to generate all attributes (e.g. human body, hair, clothes and so on) in an overall attribute space with six feature planes, which are then decomposed and manipulated with different attribute indexes. To precisely extract features of different attributes from the generated feature planes, we propose a novel attribute indexing method as well as an orthogonal projection regularization to enhance the disentanglement. We also introduce a hyper-latent training strategy and an attribute-specific sampling strategy to avoid style entanglement and misleading punishment from the discriminator. Our method allows users to interactively edit selected attributes in the generated 3D human avatars while keeping others fixed. Both qualitative and quantitative experiments demonstrate that our model provides a strong disentanglement between different attributes, allows fine-grained image editing and generates high-quality 3D human avatars.	翻訳日:2024-02-28 21:38:06 公開日:2024-02-27
# 量子極性計量学習: 古典的学習による量子埋め込み Quantum Polar Metric Learning: Efficient Classically Learned Quantum Embeddings ( http://arxiv.org/abs/2312.01655v3 ) ライセンス: Link先を確認	Vinayak Sharma and Aviral Shrivastava	(参考訳) deep metric learningは、最近、古典的なデータドメインで非常に有望な結果を示し、十分に分離された機能空間を作成しました。このアイデアは量子メトリックラーニング(QMeL)を通じて量子コンピュータにも適用された。 QMeLは、2段階のプロセスと古典的なモデルで構成され、データを圧縮して限られたキュービット数に収まるようにし、パラメータ化量子回路(PQC)を訓練してヒルベルト空間での分離を改善する。しかし、ノイズ中間スケール量子(NISQ)デバイス上では。 QMeLソリューションは高い回路幅と深さをもたらし、どちらもスケーラビリティを制限している。量子極距離学習(QPMeL)を提案し,古典的モデルを用いて量子ビットの極形パラメータを学習する。次に、$R_y$と$R_z$の浅いPQCを使って状態を作り、$ZZ(\theta)$-gatesのトレーニング可能なレイヤで絡み合いを学習します。この回路は、古典的および量子的両方のコンポーネントをトレーニングするために使用される、提案したFidelity Triplet Loss関数のSWAPテストを通じて、フィデリティを計算する。 QMeLアプローチと比較して、QPMeLはゲート数と深さの1/2しか使用せず、3倍優れたマルチクラス分離を実現する。また、QPMeLは、同様の構成の古典的ネットワークよりも優れており、量子損失関数を持つ完全古典的モデルの将来的な研究の道筋を示す。 Deep metric learning has recently shown extremely promising results in the classical data domain, creating well-separated feature spaces. This idea was also adapted to quantum computers via Quantum Metric Learning(QMeL). QMeL consists of a 2 step process with a classical model to compress the data to fit into the limited number of qubits, then train a Parameterized Quantum Circuit(PQC) to create better separation in Hilbert Space. However, on Noisy Intermediate Scale Quantum (NISQ) devices. QMeL solutions result in high circuit width and depth, both of which limit scalability. We propose Quantum Polar Metric Learning (QPMeL) that uses a classical model to learn the parameters of the polar form of a qubit. We then utilize a shallow PQC with $R_y$ and $R_z$ gates to create the state and a trainable layer of $ZZ(\theta)$-gates to learn entanglement. The circuit also computes fidelity via a SWAP Test for our proposed Fidelity Triplet Loss function, used to train both classical and quantum components. When compared to QMeL approaches, QPMeL achieves 3X better multi-class separation, while using only 1/2 the number of gates and depth. We also demonstrate that QPMeL outperforms classical networks with similar configurations, presenting a promising avenue for future research on fully classical models with quantum loss functions.	翻訳日:2024-02-28 21:37:39 公開日:2024-02-27
# 観測不能条件下での因果フェアネス:ニューラル・センシティビティ・フレームワーク Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework ( http://arxiv.org/abs/2311.18460v2 ) ライセンス: Link先を確認	Maresa Schr\"oder, Dennis Frauen, Stefan Feuerriegel	(参考訳) 機械学習の予測に対する公平さは、法的、倫理的、社会的理由のために広く求められている。既存の作業は、通常、観測されていない欠点のない設定に焦点を当てるが、観測されていない欠点は因果フェアネスを厳しく侵害し、したがって不公平な予測を引き起こす可能性がある。本研究では, 因果フェアネスの非観測的共振に対する感度を解析する。私たちの貢献は3倍です。第一に、異なる観測されていない共起の源の下で因果的公平度メトリクスの境界を導出する。これにより、実践者は、フェアネスクリティカルなアプリケーションで観測されていないコンファウンディングに対する機械学習モデルの感度を調べることができる。第2に,不測の一致によって因果的公平性が損なわれる可能性の最悪の場合の保証を可能にする,公平な予測学習のための新しいニューラルフレームワークを提案する。第3に,刑期予測に関する実世界のケーススタディを含む一連の実験において,この枠組みの有効性を実証する。私たちの知る限りでは、私たちの研究は観察できない一致の下で因果的公平性を研究する最初の研究です。この目的のために、我々の研究は、高スループットアプリケーションにおける予測の公平性を保証するための反論戦略として、直接的な実用的価値があります。 Fairness for machine learning predictions is widely required in practice for legal, ethical, and societal reasons. Existing work typically focuses on settings without unobserved confounding, even though unobserved confounding can lead to severe violations of causal fairness and, thus, unfair predictions. In this work, we analyze the sensitivity of causal fairness to unobserved confounding. Our contributions are three-fold. First, we derive bounds for causal fairness metrics under different sources of unobserved confounding. This enables practitioners to examine the sensitivity of their machine learning models to unobserved confounding in fairness-critical applications. Second, we propose a novel neural framework for learning fair predictions, which allows us to offer worst-case guarantees of the extent to which causal fairness can be violated due to unobserved confounding. Third, we demonstrate the effectiveness of our framework in a series of experiments, including a real-world case study about predicting prison sentences. To the best of our knowledge, ours is the first work to study causal fairness under unobserved confounding. To this end, our work is of direct practical value as a refutation strategy to ensure the fairness of predictions in high-stakes applications.	翻訳日:2024-02-28 21:37:10 公開日:2024-02-27
# DifFlow3D:反復拡散による不確実性を考慮したシーンフロー推定に向けて DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement ( http://arxiv.org/abs/2311.17456v2 ) ライセンス: Link先を確認	Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang	(参考訳) 動的シーンの点当たりの3次元変位を予測することを目的としたシーンフロー推定は,コンピュータビジョン分野の基本課題である。しかし,従来の研究は,局所的に制約された探索範囲による信頼できない相関や,粗い構造から生じる不正確な蓄積に苦慮することが多い。これらの問題を解決するために,拡散確率モデルを用いた新たな不確実性認識シーンフロー推定ネットワーク(difflow3d)を提案する。反復拡散に基づくリファインメントは、ダイナミクス、ノイズ入力、繰り返しパターンなど、挑戦的なケースに対する相関ロバスト性とレジリエンスを高めるように設計されている。生成の多様性を抑えるため,拡散モデルにおける3つの主要なフロー関連特徴を条件として利用した。さらに, 拡散中の不確かさ推定モジュールを開発し, 推定シーンフローの信頼性を評価する。 difflow3dはflyingthings3dとkitti 2015データセットでそれぞれ6.7\%と19.1\%のepe3d低減を実現しています。特に,本手法は,KITTIデータセット上での前例のないミリレベルの精度(EPE3Dでは0.0089m)を達成する。さらに,拡散型リファインメントパラダイムは,既存のシーンフローネットワークへのプラグアンドプレイモジュールとして容易に統合でき,推定精度が大幅に向上する。コードはhttps://github.com/IRMVLab/DifFlow3Dでリリースされる。 Scene flow estimation, which aims to predict per-point 3D displacements of dynamic scenes, is a fundamental task in the computer vision field. However, previous works commonly suffer from unreliable correlation caused by locally constrained searching ranges, and struggle with accumulated inaccuracy arising from the coarse-to-fine structure. To alleviate these problems, we propose a novel uncertainty-aware scene flow estimation network (DifFlow3D) with the diffusion probabilistic model. Iterative diffusion-based refinement is designed to enhance the correlation robustness and resilience to challenging cases, e.g., dynamics, noisy inputs, repetitive patterns, etc. To restrain the generation diversity, three key flow-related features are leveraged as conditions in our diffusion model. Furthermore, we also develop an uncertainty estimation module within diffusion to evaluate the reliability of estimated scene flow. Our DifFlow3D achieves state-of-the-art performance, with 6.7\% and 19.1\% EPE3D reduction respectively on FlyingThings3D and KITTI 2015 datasets. Notably, our method achieves an unprecedented millimeter-level accuracy (0.0089m in EPE3D) on the KITTI dataset. Additionally, our diffusion-based refinement paradigm can be readily integrated as a plug-and-play module into existing scene flow networks, significantly increasing their estimation accuracy. Codes will be released on https://github.com/IRMVLab/DifFlow3D.	翻訳日:2024-02-28 21:36:34 公開日:2024-02-27
# アモルファス酸化物トンネル接合の交流バイアスによる焼鈍 Alternating Bias Assisted Annealing of Amorphous Oxide Tunnel Junctions ( http://arxiv.org/abs/2401.07415v2 ) ライセンス: Link先を確認	David P. Pappas, Mark Field, Cameron Kopas, Joel A. Howard, Xiqiao Wang, Ella Lachman, Lin Zhou, Jinsu Oh, Kameshwar Yadavalli, Eyob A. Sete, Andrew Bestwick, Matthew J. Kramer and Joshua Y. Mutus	(参考訳) 熱酸化アモルファス酸化アルミニウムトンネル接合の電気的特性を制御的に調整するトランスフォーメーション技術を示す。従来の試験装置を用いて、加熱されたトンネル障壁に交互にバイアスを加えることで、室温抵抗の70%を超える巨大化を実現することができる。抵抗変化の速度は強い温度依存性を示し、サブミクロン系では接合サイズに依存しない。そのトンネル特性をmK温度で測定するために,この交互バイアス補助焼鈍法(ABAA)で処理したトランスモンクビット接合を特徴付ける。測定された周波数は、シフト抵抗と臨界電流の間のアンベガオカー-バラトフ関係に従う。さらに, 非処理試料と比較して, 共振・オフ共振・2レベル系の欠陥が有意に減少すると共に, 接合共振損失が約2 \times10^{-6}$の順に減少することを示した。高分解能TEMによるイメージングでは、バリアは依然として非晶質であり、未処理の接合に対するアルミニウムの配向がより均一に分布していることが示されている。この新しいアプローチは、アモルファス酸化アルミニウムや、現代のエレクトロニクスで使用される多くの金属-絶縁体-金属構造に依存する幅広いデバイスに適用できると期待されている。 We demonstrate a transformational technique for controllably tuning the electrical properties of fabricated thermally oxidized amorphous aluminum-oxide tunnel junctions. Using conventional test equipment to apply an alternating bias to a heated tunnel barrier, giant increases in the room temperature resistance, greater than 70%, can be achieved. The rate of resistance change is shown to be strongly temperature-dependent, and is independent of junction size in the sub-micron regime. In order to measure their tunneling properties at mK temperatures, we characterized transmon qubit junctions treated with this alternating-bias assisted annealing (ABAA) technique. The measured frequencies follow the Ambegaokar-Baratoff relation between the shifted resistance and critical current. Further, these studies show a reduction of junction-contributed loss on the order of $\approx 2 \times10^{-6}$, along with a significant reduction in resonant- and off-resonant-two level system defects when compared to untreated samples. Imaging with high-resolution TEM shows that the barrier is still predominantly amorphous with a more uniform distribution of aluminum coordination across the barrier relative to untreated junctions. This new approach is expected to be widely applicable to a broad range of devices that rely on amorphous aluminum oxide, as well as the many other metal-insulator-metal structures used in modern electronics.	翻訳日:2024-02-28 21:31:12 公開日:2024-02-27
# 神経言語モデルの解剖学 Anatomy of Neural Language Models ( http://arxiv.org/abs/2401.03797v2 ) ライセンス: Link先を確認	Majd Saleh and St\'ephane Paquelet	(参考訳) 生成AIと転写学習の分野は近年,特に自然言語処理(NLP)分野において顕著な進歩を遂げている。トランスフォーマーは、最先端のトランスフォーマーベース言語モデル(LM)が、様々な応用に新たな最先端の成果をもたらしたこれらの進歩の中心にある。神経lsmに関する研究は指数関数的に増加しているが、その大多数はハイレベルであり、自己完結にはほど遠い。したがって、この分野における文献の深い理解は、特にニューラルLMの主要なタイプを説明する統一された数学的枠組みが欠如している場合の難しい課題である。このチュートリアルでは、視覚的図形を伴って、より詳細に、単純化され、曖昧な数学的枠組みで、ニューラルLMを説明することが目的である。 BERT や GPT2 のような広く使われているモデルの具体例を探索する。最後に,言語モデリングを前提としたトランスフォーマーがコンピュータビジョンや時系列アプリケーションで広く採用されていることから,前述の領域でのトランスフォーマーの動作を読者が理解できるように,このようなソリューションのいくつかの例を,NLPのオリジナルと比較する。 The fields of generative AI and transfer learning have experienced remarkable advancements in recent years especially in the domain of Natural Language Processing (NLP). Transformers have been at the heart of these advancements where the cutting-edge transformer-based Language Models (LMs) have led to new state-of-the-art results in a wide spectrum of applications. While the number of research works involving neural LMs is exponentially increasing, their vast majority are high-level and far from self-contained. Consequently, a deep understanding of the literature in this area is a tough task especially in the absence of a unified mathematical framework explaining the main types of neural LMs. We address the aforementioned problem in this tutorial where the objective is to explain neural LMs in a detailed, simplified and unambiguous mathematical framework accompanied by clear graphical illustrations. Concrete examples on widely used models like BERT and GPT2 are explored. Finally, since transformers pretrained on language-modeling-like tasks have been widely adopted in computer vision and time series applications, we briefly explore some examples of such solutions in order to enable readers to understand how transformers work in the aforementioned domains and compare this use with the original one in NLP.	翻訳日:2024-02-28 21:29:21 公開日:2024-02-27
# 分子動力学シミュレーションのための高精度力場の生成と分子構成変換器を用いた化学反応機構の研究 Generating High-Precision Force Fields for Molecular Dynamics Simulations to Study Chemical Reaction Mechanisms using Molecular Configuration Transformer ( http://arxiv.org/abs/2401.00499v2 ) ライセンス: Link先を確認	Sihao Yuan, Xu Han, Jun Zhang, Zhaoxin Xie, Cheng Fan, Yunlong Xiao, Yi Qin Gao, Yi Issac Yang	(参考訳) 化学反応機構の理論的研究は有機化学において重要である。伝統的に、量子化学計算を用いた化学反応の遷移状態の手動構成の分子コンフォメーションを計算することが最も一般的に用いられる方法である。しかし、この方法は個々の経験と化学直観に大きく依存している。これまでの研究では,分子動力学シミュレーションにおいて拡張サンプリングを用いて化学反応を研究する研究パラダイムを提案した。このアプローチは化学反応の全過程を直接シミュレートすることができる。しかし、計算速度はシミュレーションにおける高精度ポテンシャルエネルギー関数の使用を制限する。本稿では,従来開発されたグラフニューラルネットワークに基づく分子モデルである分子構成変換器を用いて,分子モデリングのための高精度な力場を訓練する手法を提案する。このポテンシャルエネルギー関数は、低い計算コストで高精度なシミュレーションを可能にし、化学反応のメカニズムをより正確に計算することができる。マンガン触媒を用いたクレイゼン再配置反応とカルボニル挿入反応の研究に本手法を適用した。 Theoretical studies on chemical reaction mechanisms have been crucial in organic chemistry. Traditionally, calculating the manually constructed molecular conformations of transition states for chemical reactions using quantum chemical calculations is the most commonly used method. However, this way is heavily dependent on individual experience and chemical intuition. In our previous study, we proposed a research paradigm that uses enhanced sampling in molecular dynamics simulations to study chemical reactions. This approach can directly simulate the entire process of a chemical reaction. However, the computational speed limits the use of high-precision potential energy functions for simulations. To address this issue, we present a scheme for training high-precision force fields for molecular modeling using a previously developed graph-neural-network-based molecular model, molecular configuration transformer. This potential energy function allows for highly accurate simulations at a low computational cost, leading to more precise calculations of the mechanism of chemical reactions. We applied this approach to study a Claisen rearrangement reaction and a Carbonyl insertion reaction catalyzed by Manganese.	翻訳日:2024-02-28 21:28:36 公開日:2024-02-27
# 逆転送多目的最適化 Inverse Transfer Multiobjective Optimization ( http://arxiv.org/abs/2312.14713v3 ) ライセンス: Link先を確認	Jiao Liu, Abhishek Gupta, and Yew-Soon Ong	(参考訳) 転送最適化により、関連するソースタスクからの経験的事前情報を活用することで、ターゲットタスクのデータ効率の最適化が可能になる。これは、厳密な評価予算の下で一連のトレードオフソリューションを求める多目的最適化設定において特に有用である。本稿では,多目的最適化における逆移動の概念を紹介する。逆伝達は、目的空間のパフォーマンスベクトルをタスク固有の決定空間における集団探索分布にマッピングするために確率的逆モデルを用いることで際立っている。このアイデアに基づいて,InvTrEMO(Inverse Transfer Multiobjective Evolutionary Optimizer)を提案する。 invtremoの重要な特徴は、意思決定空間がタスク間で正確に一致していない場合でも、多くのアプリケーション領域で広く使われている共通の客観的関数を利用する能力である。これにより、invTrEMOは異種ソースタスクからの情報をユニークかつ効果的に利用することができる。さらに、invTrEMOは、高精度の逆モデルを重要な副産物として提供し、ユーザの好みに基づいて、オンデマンドで調整されたソリューションを生成する。多目的および多目的ベンチマーク問題に関する実証研究は、実例研究と同様に、最先端の進化的およびベイズ最適化アルゴリズムと比較して、invTrEMOの高速収束率とモデリング精度を示す。 invTrEMOのソースコードはhttps://github.com/LiuJ-2023/invTrEMOで公開されている。 Transfer optimization enables data-efficient optimization of a target task by leveraging experiential priors from related source tasks. This is especially useful in multiobjective optimization settings where a set of trade-off solutions is sought under tight evaluation budgets. In this paper, we introduce a novel concept of inverse transfer in multiobjective optimization. Inverse transfer stands out by employing probabilistic inverse models to map performance vectors in the objective space to population search distributions in task-specific decision space, facilitating knowledge transfer through objective space unification. Building upon this idea, we introduce the first Inverse Transfer Multiobjective Evolutionary Optimizer (invTrEMO). A key highlight of invTrEMO is its ability to harness the common objective functions prevalent in many application areas, even when decision spaces do not precisely align between tasks. This allows invTrEMO to uniquely and effectively utilize information from heterogeneous source tasks as well. Furthermore, invTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand based on user preferences. Empirical studies on multi- and many-objective benchmark problems, as well as a practical case study, showcase the faster convergence rate and modelling accuracy of the invTrEMO relative to state-of-the-art evolutionary and Bayesian optimization algorithms. The source code of the invTrEMO is made available at https://github.com/LiuJ-2023/invTrEMO.	翻訳日:2024-02-28 21:27:04 公開日:2024-02-27
# リモートセンシング画像セグメント参照のための回転マルチスケールインタラクションネットワーク Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation ( http://arxiv.org/abs/2312.12470v2 ) ライセンス: Link先を確認	Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji	(参考訳) Referring Remote Sensing Image Segmentation (RRSIS)は、コンピュータビジョンと自然言語処理を組み合わせた新しい課題であり、テキストクエリによって記述された、空中画像の特定の領域を記述している。従来の参照画像セグメンテーション(RIS)アプローチは、空中画像に見られる複雑な空間スケールと向きによって妨げられ、最適部分セグメンテーションの結果をもたらす。これらの課題に対処するために、RRSISのユニークな要求に対応する革新的なアプローチであるRotated Multi-Scale Interaction Network (RMSIN)を導入する。 RMSINは、複数のスケールで必要とされる細かな詳細に効果的に対処するために、IIM(Intra-scale Interaction Module)と、これらの詳細をネットワーク全体に整合的に統合するためのCIM(Cross-scale Interaction Module)を組み込んでいる。さらに、RMSINは適応回転畳み込み(ARC)を用いて、オブジェクトの様々な向きを考慮し、セグメント化の精度を大幅に向上させる新しいコントリビューションである。 RMSINの有効性を評価するため、17,402個の画像キャプションマスクトレーレットからなる拡張データセットをキュレートした。このデータセットは、幅広い空間シナリオと回転シナリオを持つモデルを示すだけでなく、RRSISタスクの厳密なベンチマークを確立し、厳密な性能評価を保証する。実験結果から,rmsinの性能は従来の最先端モデルをかなり上回っており,その性能は極めて高いことが示された。すべてのデータセットとコードはhttps://github.com/Lsan2401/RMSINで公開されている。 Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries. Traditional Referring Image Segmentation (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation results. To address these challenges, we introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS. RMSIN incorporates an Intra-scale Interaction Module (IIM) to effectively address the fine-grained detail required at multiple scales and a Cross-scale Interaction Module (CIM) for integrating these details coherently across the network. Furthermore, RMSIN employs an Adaptive Rotated Convolution (ARC) to account for the diverse orientations of objects, a novel contribution that significantly enhances segmentation accuracy. To assess the efficacy of RMSIN, we have curated an expansive dataset comprising 17,402 image-caption-mask triplets, which is unparalleled in terms of scale and variety. This dataset not only presents the model with a wide range of spatial and rotational scenarios but also establishes a stringent benchmark for the RRSIS task, ensuring a rigorous evaluation of performance. Our experimental evaluations demonstrate the exceptional performance of RMSIN, surpassing existing state-of-the-art models by a significant margin. All datasets and code are made available at https://github.com/Lsan2401/RMSIN.	翻訳日:2024-02-28 21:26:43 公開日:2024-02-27
# Killer Apps: 高速で大規模なAI兵器 Killer Apps: Low-Speed, Large-Scale AI Weapons ( http://arxiv.org/abs/2402.01663v2 ) ライセンス: Link先を確認	Philip Feldman, Aaron Dant, James R. Foulds	(参考訳) 人工知能(ai)と機械学習(ml)の加速は、openai、meta、antropicなどの組織による最先端生成前訓練トランスフォーマー(gpt)モデルの開発によって強調され、戦争とセキュリティにおける新たな挑戦と機会を提示している。現在注目されているのは、武器システムにおけるAIの統合と、速度論的衝突における迅速な意思決定におけるその役割である。しかし、同様に重要だが見落とされがちな側面は、情報領域内のインターネットスケールにおけるAIベースの心理的操作の可能性である。これらの能力は、世界中の個人、組織、社会に重大な脅威をもたらす可能性がある。本稿では,AI兵器の概念,その展開,検出,潜在的な対策について検討する。 The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.	翻訳日:2024-02-28 21:21:22 公開日:2024-02-27
# Vabs-Netを用いた多レベルタンパク質プレトレーニング Multi-level protein pre-training with Vabs-Net ( http://arxiv.org/abs/2402.01481v3 ) ライセンス: Link先を確認	Jiale Zhao, Wanru Zhuang, Jia Song, Yaqi Li, Shuqi Lu	(参考訳) 近年、3次元構造に基づく事前学習タンパク質モデルの開発が急増しており、様々な下流タスクにおける事前学習タンパク質言語モデルに対する顕著な進歩を示している。しかし、既存の構造に基づく事前訓練モデルは、主に残基レベル、すなわちアルファ炭素原子に焦点を当て、一方側鎖原子のような他の原子を無視している。側鎖の原子は、例えば分子ドッキングのような多くの下流のタスクにも重要であるので、残基と原子レベルのタンパク質のモデリングが重要であると我々は主張する。それにもかかわらず、予備訓練中に残基と原子情報を鼻で組み合わせることは通常失敗する。そこで,本研究では,残差レベルの事前学習タスクを自明に表現し,残差表現を不十分に表現する,入力に原子構造が組み込まれて情報漏洩が発生する原因を明らかにする。この問題に対処するために,3次元タンパク質鎖上でのスパンマスク事前学習戦略を導入し,残基と原子の有意義な表現を学習する。これにより、さまざまな下流タスクに適したタンパク質表現を学ぶための、シンプルで効果的なアプローチがもたらされる。バインディングサイト予測と関数予測タスクに関する広範囲な実験結果から,提案手法が他の手法を大きく上回ることを示した。私たちのコードは公開されます。 In recent years, there has been a surge in the development of 3D structure-based pre-trained protein models, representing a significant advancement over pre-trained protein language models in various downstream tasks. However, most existing structure-based pre-trained models primarily focus on the residue level, i.e., alpha carbon atoms, while ignoring other atoms like side chain atoms. We argue that modeling proteins at both residue and atom levels is important since the side chain atoms can also be crucial for numerous downstream tasks, for example, molecular docking. Nevertheless, we find that naively combining residue and atom information during pre-training typically fails. We identify a key reason is the information leakage caused by the inclusion of atom structure in the input, which renders residue-level pre-training tasks trivial and results in insufficiently expressive residue representations. To address this issue, we introduce a span mask pre-training strategy on 3D protein chains to learn meaningful representations of both residues and atoms. This leads to a simple yet effective approach to learning protein representation suitable for diverse downstream tasks. Extensive experimental results on binding site prediction and function prediction tasks demonstrate our proposed pre-training approach significantly outperforms other methods. Our code will be made public.	翻訳日:2024-02-28 21:21:08 公開日:2024-02-27
# 非分解性性能対策のための雑音ラベルからのマルチクラス学習 Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures ( http://arxiv.org/abs/2402.01055v2 ) ライセンス: Link先を確認	Mingyuan Zhang, Shivani Agarwal	(参考訳) 近年、ノイズラベルのあるデータから良い分類法を学ぶことに多くの関心が寄せられている。ノイズラベルから学習するほとんどの作業は、標準の損失ベースのパフォーマンス測定に重点を置いている。しかし、多くの機械学習問題は、個々の例における損失の期待や総和として表現できない非分解不能なパフォーマンス尺度を使用する必要があり、例えば、クラス不均衡設定におけるH平均、Q平均、G平均、情報検索におけるMicro $F_1$などである。本稿では,2種類の広帯域非分解性性能尺度,すなわち単調凸と線形比の2種類の雑音ラベルから学習するアルゴリズムを設計する。本研究は,Narasimhan et al. (2015) のフランク=ウルフ法とバイセクション法に基づく。どちらの場合も、広範に研究されているクラス条件ノイズモデルに基づいて、アルゴリズムのノイズ補正バージョンを開発する。アルゴリズムはノイズの多いデータで訓練されているにもかかわらず、その性能がクリーンな(ノイズのない)分布の最適性能に収束するという意味でベイズ一貫したものであることを証明し、後悔する(過剰なリスク)境界を提供する。本実験はラベルノイズの処理におけるアルゴリズムの有効性を示す。 There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro $F_1$ in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise.	翻訳日:2024-02-28 21:20:33 公開日:2024-02-27
# 最適スパース生存木 Optimal Sparse Survival Trees ( http://arxiv.org/abs/2401.15330v2 ) ライセンス: Link先を確認	Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin	(参考訳) 解釈性は、医師、病院、製薬会社、バイオテクノロジー企業にとって、人間の健康に関わる高リスク問題の分析と意思決定に不可欠である。木に基づく手法は, 高い解釈性と複雑な関係を捉える能力から, 生存分析に広く採用されている。しかし、生存木を生産する既存の方法のほとんどはヒューリスティックなアルゴリズムに依存しており、これは準最適モデルを生成するリスクがある。我々は動的プログラミングと境界付き手法を提案し, わずか数秒で可視的スパースサバイバルツリーモデルを見出す。 Interpretability is crucial for doctors, hospitals, pharmaceutical companies and biotechnology corporations to analyze and make decisions for high stakes problems that involve human health. Tree-based methods have been widely adopted for survival analysis due to their appealing interpretablility and their ability to capture complex relationships. However, most existing methods to produce survival trees rely on heuristic (or greedy) algorithms, which risk producing sub-optimal models. We present a dynamic-programming-with-bounds approach that finds provably-optimal sparse survival tree models, frequently in only a few seconds.	翻訳日:2024-02-28 21:19:19 公開日:2024-02-27
# ricciフロー誘導オートエンコーダによる学習時間依存ダイナミクス Ricci flow-guided autoencoders in learning time-dependent dynamics ( http://arxiv.org/abs/2401.14591v4 ) ライセンス: Link先を確認	Andrew Gracyk	(参考訳) 本稿では,時間的非線形力学,特に偏微分方程式 (PDE) を学習するための多様体ベースのオートエンコーダ法を提案する。これはリッチフローを物理的に変形した設定でシミュレートすることで達成でき、多様体量はリッチフローが経験的に達成されるように一致させることができる。我々の方法論では、多様体は訓練手順の一部として学習されるので、理想的な測地は識別されうるが、進化は静的な方法よりも共役な潜在表現を同時に引き起こす。本稿では,周期性やランダム性,分布内誤差,外挿シナリオなどの望ましい特徴を包含するPDEを用いた数値実験について述べる。 We present a manifold-based autoencoder method for learning nonlinear dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by simulating Ricci flow in a physics-informed setting, and manifold quantities can be matched so that Ricci flow is empirically achieved. With our methodology, the manifold is learned as part of the training procedure, so ideal geometries may be discerned, while the evolution simultaneously induces a more accommodating latent representation over static methods. We present our method on a range of numerical experiments consisting of PDEs that encompass desirable characteristics such as periodicity and randomness, remarking error on in-distribution and extrapolation scenarios.	翻訳日:2024-02-28 21:19:08 公開日:2024-02-27
# 正しい視線は時々正しい:シーケンスラベリングのためのデコーダのみのLCMの能力の検討 Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling ( http://arxiv.org/abs/2401.14556v2 ) ライセンス: Link先を確認	David Duki\'c, Jan \v{S}najder	(参考訳) マスク付き言語モデリング(MLM)に基づく事前学習言語モデルは、自然言語理解(NLU)タスクに優れている。微調整されたMLMベースのエンコーダは、因果言語モデリングデコーダを同等の大きさで一貫して上回るが、最近のデコーダのみの大規模言語モデル(LLM)は、より小さなMLMベースのエンコーダと同等に動作する。その性能は規模によって向上するが、LLMは情報抽出(IE)タスクにおける最先端の成果を達成できず、その多くがシーケンスラベリング(SL)として定式化されている。 LLMの貧弱なSL性能は因果マスキングに起因すると仮定し、現在のトークンの右側のトークンにモデルが入らないようにする。しかし、slにおけるllmsのパフォーマンスがどの程度改善できるかは、まだ不明である。 LLMの微調整中にCM(Cousal mask)を階層的に除去することで,IEタスク上でのオープンLCMのSL性能を向上させる手法を検討する。このアプローチは、最先端のSLモデルと競合する性能向上をもたらし、全てのブロックからCM除去結果のマッチングや性能向上を行う。その結果,層依存性CM除去によるオープンLCMは,MLMベースのエンコーダや命令調整LDMよりも優れていた。 Pre-trained language models based on masked language modeling (MLM) excel in natural language understanding (NLU) tasks. While fine-tuned MLM-based encoders consistently outperform causal language modeling decoders of comparable size, recent decoder-only large language models (LLMs) perform on par with smaller MLM-based encoders. Although their performance improves with scale, LLMs fall short of achieving state-of-the-art results in information extraction (IE) tasks, many of which are formulated as sequence labeling (SL). We hypothesize that LLMs' poor SL performance stems from causal masking, which prevents the model from attending to tokens on the right of the current token. Yet, how exactly and to what extent LLMs' performance on SL can be improved remains unclear. We explore techniques for improving the SL performance of open LLMs on IE tasks by applying layer-wise removal of the causal mask (CM) during LLM fine-tuning. This approach yields performance gains competitive with state-of-the-art SL models, matching or outperforming the results of CM removal from all blocks. Our findings hold for diverse SL tasks, demonstrating that open LLMs with layer-dependent CM removal outperform strong MLM-based encoders and even instruction-tuned LLMs.	翻訳日:2024-02-28 21:18:54 公開日:2024-02-27
# Face BiometricsのChatGPTはどれくらい良いか? 音声認識, ソフトバイオメトリックス, 説明可能性に関する一考察 How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability ( http://arxiv.org/abs/2401.13641v2 ) ライセンス: Link先を確認	Ivan DeAndres-Tame, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia	(参考訳) OpenAI が開発した GPT などの大規模言語モデル (LLM) は,すでに驚くべき結果を示し,社会の急速な変化をもたらした。これは chatgpt のリリースによって強化され、この分野の経験を一切必要とせずに、誰でも簡単に llms と会話できるようになっている。その結果、chatgptは、コードやソングライター、教育、バーチャルアシスタントなど、多くの異なるタスクに急速に適用され、訓練を受けていないタスク(ゼロショット学習)の印象的な結果を示している。本研究の目的は,顔バイオメトリクスの課題に対する最近のGPT-4マルチモーダルLCMに基づくChatGPTの能力を探ることである。特に,ChatGPTの顔認証,ソフトバイオメトリックス推定,結果の説明可能性などのタスクの実行能力を分析した。チャットGPTは、人間のシナリオにおける自動決定の説明可能性と透明性をさらに高めるために非常に有用である。実験はChatGPTの性能とロバスト性を評価するため,一般的なベンチマークを用いて実験を行い,その結果を現場における最先端の手法と比較した。本研究で得られた結果は, 顔バイオメトリックス, 特に説明可能性を高めるために, ChatGPT などの LLM の可能性を示している。再現性のために、すべてのコードをgithubにリリースします。 Large Language Models (LLMs) such as GPT developed by OpenAI, have already shown astonishing results, introducing quick changes in our society. This has been intensified by the release of ChatGPT which allows anyone to interact in a simple conversational way with LLMs, without any experience in the field needed. As a result, ChatGPT has been rapidly applied to many different tasks such as code- and song-writer, education, virtual assistants, etc., showing impressive results for tasks for which it was not trained (zero-shot learning). The present study aims to explore the ability of ChatGPT, based on the recent GPT-4 multimodal LLM, for the task of face biometrics. In particular, we analyze the ability of ChatGPT to perform tasks such as face verification, soft-biometrics estimation, and explainability of the results. ChatGPT could be very valuable to further increase the explainability and transparency of automatic decisions in human scenarios. Experiments are carried out in order to evaluate the performance and robustness of ChatGPT, using popular public benchmarks and comparing the results with state-of-the-art methods in the field. The results achieved in this study show the potential of LLMs such as ChatGPT for face biometrics, especially to enhance explainability. For reproducibility reasons, we release all the code in GitHub.	翻訳日:2024-02-28 21:18:26 公開日:2024-02-27
# 強結合開量子系に対する非エルミート擬モード:アンレーブ、相関、熱力学 Non-Hermitian Pseudomodes for Strongly Coupled Open Quantum Systems: Unravelings, Correlations and Thermodynamics ( http://arxiv.org/abs/2401.11830v2 ) ライセンス: Link先を確認	Paul Menczel, Ken Funo, Mauro Cirio, Neill Lambert, and Franco Nori	(参考訳) 擬モードフレームワークは、非マルコフ環境に結合した開量子系の力学の正確な記述を提供する。この枠組みを用いて、オープンシステムは時間-局所マスター方程式に従う有限個の非物理的擬似モードに結合される同値なモデルにおいて、システムに対する環境の影響を研究する。このマスター方程式が擬似モード状態のエルミティキティーを保存する必要はないという洞察に基づいて、本項ではシステムの元のダイナミクスの正確な再現を保証するマスター方程式の最も一般的な条件を求める。一般化した手法は、例えば、有限温度で劣化した環境をモデル化するのに必要となる擬似モードの数を減少させることを実証する。また,非エルミート状態の量子ジャンプ軌道へのマスター方程式の展開を提案し,容易に並列化可能なモンテカルロシミュレーションを用いることで数値計算における擬モード法の利用をさらに促進する。最後に、擬似モデムは、その非物理的性質にもかかわらず、システムバス相関の生成や熱交換といった物理過程を研究することができる自然像であることを示す。したがって、この結果は、マルコフの弱結合限界から遠く離れた開量子系をよりよく理解するために、今後のシステム環境相互作用の研究の道を開く。 The pseudomode framework provides an exact description of the dynamics of an open quantum system coupled to a non-Markovian environment. Using this framework, the influence of the environment on the system is studied in an equivalent model, where the open system is coupled to a finite number of unphysical pseudomodes that follow a time-local master equation. Building on the insight that this master equation does not need to conserve the hermiticity of the pseudomode state, we here ask for the most general conditions on the master equation that guarantee the correct reproduction of the system's original dynamics. We demonstrate that our generalized approach decreases the number of pseudomodes that are required to model, for example, underdamped environments at finite temperature. We also provide an unraveling of the master equation into quantum jump trajectories of non-Hermitian states, which further facilitates the utilization of the pseudomode technique for numerical calculations by enabling the use of easily parallelizable Monte Carlo simulations. Finally, we show that pseudomodes, despite their unphysical nature, provide a natural picture in which physical processes, such as the creation of system-bath correlations or the exchange of heat, can be studied. Hence, our results pave the way for future investigations of the system-environment interaction leading to a better understanding of open quantum systems far from the Markovian weak-coupling limit.	翻訳日:2024-02-28 21:17:41 公開日:2024-02-27
# FAIR Enough: 大規模言語モデルのトレーニングにFAIR互換のデータセットをどのように開発し評価するか? FAIR Enough: How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training? ( http://arxiv.org/abs/2401.11033v3 ) ライセンス: Link先を確認	Shaina Raza, Shardul Ghuge, Chen Ding, Elham Dolatabadi, Deval Pandya	(参考訳) 大規模言語モデル(LLM)の急速な進化は、AI開発における倫理的考慮とデータの完全性の重要性を強調し、FAIR(Findable, Accessible, Interoperable, Reusable)データ原則の役割を強調している。これらの原則は長年、倫理データスチュワードシップの基盤となっているが、LLMトレーニングデータへの応用はそれほど一般的ではない。本研究は,既存文献のレビューから始まり,モデルトレーニングにおけるデータ管理における公平な原則の重要性を強調する。この基盤の上に構築され、FAIR原則をLLMトレーニングプロセスに組み込む新しいフレームワークを導入します。このアプローチの重要な側面は包括的なチェックリストであり、モデル開発ライフサイクルを通じて、研究者や開発者が公平なデータ原則を一貫して適用することを支援するように設計されている。我々のフレームワークの実践性と有効性は、バイアスを検出して低減するFAIR準拠のデータセットを作成するケーススタディによって実証される。このケーススタディは、我々のフレームワークの有用性を検証するだけでなく、LLMトレーニングにおけるより公平で透明で倫理的な実践のための新しいベンチマークを確立する。我々は、技術的に進歩し、倫理的に健全で、社会的に責任のあるAIモデルを促進する手段として、このフレームワークをコミュニティに提供する。 The rapid evolution of Large Language Models (LLMs) underscores the critical importance of ethical considerations and data integrity in AI development, emphasizing the role of FAIR (Findable, Accessible, Interoperable, Reusable) data principles. While these principles have long been a cornerstone of ethical data stewardship, their application in LLM training data is less prevalent, an issue our research aims to address. Our study begins with a review of existing literature, highlighting the significance of FAIR principles in data management for model training. Building on this foundation, we introduce a novel framework that incorporates FAIR principles into the LLM training process. A key aspect of this approach is a comprehensive checklist, designed to assist researchers and developers in consistently applying FAIR data principles throughout the model development lifecycle. The practicality and effectiveness of our framework are demonstrated through a case study that involves creating a FAIR-compliant dataset to detect and reduce biases. This case study not only validates the usefulness of our framework but also establishes new benchmarks for more equitable, transparent, and ethical practices in LLM training. We offer this framework to the community as a means to promote technologically advanced, ethically sound, and socially responsible AI models.	翻訳日:2024-02-28 21:17:20 公開日:2024-02-27
# 粗粒核融合によるrgb赤外物体検出の改善と除去 Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion ( http://arxiv.org/abs/2401.10731v2 ) ライセンス: Link先を確認	Tianyi Zhao, Maoxun Yuan, Xingxing Wei	(参考訳) 近年,可視光(RGB)と赤外線(IR)画像の物体検出が広く行われている。オブジェクト検出器は、RGBとIR画像の補完特性を活用して、昼夜の信頼性と堅牢な物体位置決めを提供する。既存の融合戦略は、RGBとIR画像を畳み込みニューラルネットワークに直接注入し、検出性能が劣る。 RGB と IR の特徴はモーダリティ特有のノイズを持っているため、これらの戦略は伝搬とともに融合した特徴を悪化させる。人間の脳がマルチモーダル情報を処理するメカニズムに触発され、この研究は2つのモダリティの特徴を精製し融合するための新しい粗い視点を導入する。具体的には,各モダリティ内の干渉情報を粗末に除去する冗長スペクトル除去モジュールと,特徴融合に必要な特徴を微細に選択する動的特徴選択モジュールを設計した。粗大な核融合戦略の有効性を検証するため,除去・選択検出器 (RSDet) と呼ばれる新しい物体検出器を構築した。 3つのRGB-IRオブジェクト検出データセットの大規模な実験により,本手法の優れた性能が検証された。 Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Existing fusion strategies directly inject RGB and IR images into convolution neural networks, leading to inferior detection performance. Since the RGB and IR features have modality-specific noise, these strategies will worsen the fused features along with the propagation. Inspired by the mechanism of human brain processing multimodal information, this work introduces a new coarse-to-fine perspective to purify and fuse two modality features. Specifically, following this perspective, we design a Redundant Spectrum Removal module to coarsely remove interfering information within each modality and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called Removal and Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.	翻訳日:2024-02-28 21:16:55 公開日:2024-02-27
# Data-to-Text NLGのシステムレビュー A Systematic Review of Data-to-Text NLG ( http://arxiv.org/abs/2402.08496v3 ) ライセンス: Link先を確認	Chinonso Cynthia Osuji, Thiago Castro Ferreira, Brian Davis	(参考訳) この体系的なレビューは、データからテキストへの生成、分野におけるギャップ、課題、今後の方向性に関する現在の研究を包括的に分析している。本分野におけるデータセット,評価指標,応用領域,多言語主義,言語モデル,幻覚緩和手法に関する文献を概説する。データ対テキスト生成における幻覚の課題に対処し,高品質テキストを生成する様々な方法を検討した。これらの手法には、リグレード、従来型およびニューラルパイプラインアーキテクチャ、プランニングアーキテクチャ、データクリーニング、制御された生成、モデルとトレーニングテクニックの変更が含まれる。その効果と限界は評価され、幻覚を緩和するための普遍的な適用戦略の必要性が強調される。レビューでは、自動評価と人的評価の両方に重点を置いて、評価指標とともにデータセットの使用、人気、影響についても検討している。さらに,データ・ツー・テキストモデルの発展,特にトランスフォーマーモデルの普及について述べる。テキスト品質の進歩にもかかわらず、このレビューは、低リソース言語における研究の重要性と、排他性を促進するためにこれらの言語におけるデータセットのエンジニアリングを強調している。最後に、データ・ツー・テキストのアプリケーションドメインのいくつかが強調され、そのようなドメインとの関連性が強調される。全体として、このレビューはイノベーションを促進し、データからテキストへの生成を促進するための指針となる。 This systematic review undertakes a comprehensive analysis of current research on data-to-text generation, identifying gaps, challenges, and future directions within the field. Relevant literature in this field on datasets, evaluation metrics, application areas, multilingualism, language models, and hallucination mitigation methods is reviewed. Various methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. These methods include re-ranking, traditional and neural pipeline architecture, planning architectures, data cleaning, controlled generation, and modification of models and training techniques. Their effectiveness and limitations are assessed, highlighting the need for universally applicable strategies to mitigate hallucinations. The review also examines the usage, popularity, and impact of datasets, alongside evaluation metrics, with an emphasis on both automatic and human assessment. Additionally, the evolution of data-to-text models, particularly the widespread adoption of transformer models, is discussed. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages and the engineering of datasets in these languages to promote inclusivity. Finally, several application domains of data-to-text are highlighted, emphasizing their relevance in such domains. Overall, this review serves as a guiding framework for fostering innovation and advancing data-to-text generation.	翻訳日:2024-02-28 21:11:36 公開日:2024-02-27
# triaug : 超音波による非平衡乳腺病変の検出 TriAug: Out-of-Distribution Detection for Imbalanced Breast Lesion in Ultrasound ( http://arxiv.org/abs/2402.07452v2 ) ライセンス: Link先を確認	Yinyu Ye, Shijing Chen, Dong Ni, Ruobing Huang	(参考訳) 乳腺病変の組織学的亜型のような異なる疾患は、頻度が著しく異なる。大量のID(In-distriion)データで訓練されたモデルでも、臨床の分野では見当たらないクラスに属するOOD(Out-of-distriion)サンプルに遭遇することが多い。そこで本研究では,乳房超音波画像に対する長期OOD検出タスクに基づく新しい枠組みを提案する。有望なOOD検出性能を維持しながらID分類精度を向上させる三重項状態拡張(TriAug)を備える。一方、クラス不均衡問題を扱うために、バランスの取れた球損失を設計した。実験の結果、このモデルはID分類(F1-score=42.12%)とOOD検出(AUROC=78.06%)の両方において最先端のOODアプローチより優れていることが示された。 Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem. Experimental results show that the model outperforms state-of-art OOD approaches both in ID classification (F1-score=42.12%) and OOD detection (AUROC=78.06%).	翻訳日:2024-02-28 21:10:17 公開日:2024-02-27
# MIGC:テキスト・画像合成のためのマルチインスタンス生成制御 MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis ( http://arxiv.org/abs/2402.05408v2 ) ライセンス: Link先を確認	Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang	(参考訳) 本稿では,複数のインスタンスを同時に生成するマルチインスタンス生成(MIG)タスクを提案する。事前に定義された座標とその対応する記述が与えられたタスクは、生成されたインスタンスが指定された場所の正確な位置にあり、すべてのインスタンスの属性が対応する記述に準拠していることを保証する。これにより、シングルインテンス生成に関する現在の研究の範囲が拡大され、より多様で実用的な次元に拡張される。そこで我々は,MIGタスクの課題に対処するため,MIGC(Multi-Instance Generation Controller)という革新的なアプローチを導入する。まず、MIGタスクをいくつかのサブタスクに分割します。各インスタンスの正確なシェーディングを確保するために,インスタンス強化注意機構を導入する。最後に、安定拡散(SD)において複数のインスタンスを正確に生成するために必要な情報を提供するために、すべての陰影インスタンスを集約する。 MIGタスクにおける生成モデルの性能を評価するため、COCO-MIGベンチマークと評価パイプラインを提供する。提案したCOCO-MIGベンチマークおよび様々な一般的なベンチマークで大規模な実験を行った。評価結果は、量、位置、属性、および相互作用の観点から、我々のモデルの特別な制御能力を示す。コードとデモはhttps://migcproject.github.io/で公開される。 We present a Multi-Instance Generation (MIG) task, simultaneously generating multiple instances with diverse controls in one image. Given a set of predefined coordinates and their corresponding descriptions, the task is to ensure that generated instances are accurately at the designated locations and that all instances' attributes adhere to their corresponding description. This broadens the scope of current research on Single-instance generation, elevating it to a more versatile and practical dimension. Inspired by the idea of divide and conquer, we introduce an innovative approach named Multi-Instance Generation Controller (MIGC) to address the challenges of the MIG task. Initially, we break down the MIG task into several subtasks, each involving the shading of a single instance. To ensure precise shading for each instance, we introduce an instance enhancement attention mechanism. Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD). To evaluate how well generation models perform on the MIG task, we provide a COCO-MIG benchmark along with an evaluation pipeline. Extensive experiments were conducted on the proposed COCO-MIG benchmark, as well as on various commonly used benchmarks. The evaluation results illustrate the exceptional control capabilities of our model in terms of quantity, position, attribute, and interaction. Code and demos will be released at https://migcproject.github.io/.	翻訳日:2024-02-28 21:09:59 公開日:2024-02-27
# 潜伏心理学の探究としての大規模言語モデル Large language models as probes into latent psychology ( http://arxiv.org/abs/2402.04470v2 ) ライセンス: Link先を確認	Zhicheng Lin	(参考訳) aiの進歩は、言語モデルの誤用を人間の心や参加者のスタンインとして招き、これらの統計アルゴリズムを根本的に誤用する。我々は、言語モデルが柔軟なシミュレーションツールとして受け入れられるべきであり、人間の言語データで明らかな幅広い行動、視点、心理的属性を模倣できるが、モデル自体が人間の心と同等あるいは擬人化されるべきではないと主張する。 Advances in AI invite the misuse of language models as stand-ins for human minds or participants, which fundamentally mischaracterizes these statistical algorithms. We argue that language models should be embraced as flexible simulation tools, able to mimic a wide range of behaviors, perspectives, and psychological attributes evident in human language data, but the models themselves should not be equated to or anthropomorphized as human minds.	翻訳日:2024-02-28 21:09:30 公開日:2024-02-27
# harmbench: 自動レッドチーム編成とロバスト拒否のための標準化された評価フレームワーク HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ( http://arxiv.org/abs/2402.04249v2 ) ライセンス: Link先を確認	Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks	(参考訳) 自動化されたレッドチームリングは、大規模言語モデル(LLM)の悪意ある使用に伴うリスクを発見・緩和する上で大きな約束を持っているが、新しいメソッドを厳格に評価するための標準化された評価フレームワークが欠如している。この問題に対処するために、自動化レッドチームのための標準化された評価フレームワークであるHarmBenchを紹介します。これらの基準を満たすために、レッドチーム評価で未確認のいくつかの望ましい特性を特定し、体系的にHarmBenchを設計する。 harmbenchを用いて18のレッドチーム編成法と33の目標llmと防御法を大規模比較し,新たな知見を得た。また,幅広い攻撃におけるllmのロバスト性を大幅に向上させ,harmonchが攻撃と防御の共開発を可能にすることを実証する,高度に効率的な敵訓練手法を提案する。私たちはHarmBenchをhttps://github.com/centerforaisafety/HarmBenchでオープンソースにしています。 Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at https://github.com/centerforaisafety/HarmBench.	翻訳日:2024-02-28 21:09:21 公開日:2024-02-27
# anls* -- 生成型大規模言語モデルのためのユニバーサルドキュメント処理メトリック ANLS* -- A Universal Document Processing Metric for Generative Large Language Models ( http://arxiv.org/abs/2402.03848v2 ) ライセンス: Link先を確認	David Peer, Philemon Sch\"opf, Volckmar Nebendahl, Alexander Rietzler, Sebastian Stabinger	(参考訳) 伝統的に、差別モデルが文書分類や情報抽出といったタスクの主要な選択肢となっている。これらのモデルは、限定された定義済みのクラスに該当する予測を行い、バイナリ真または偽の評価を容易にし、F1スコアのようなメトリクスの直接計算を可能にする。しかし、ジェネレーティブ大言語モデル(gllm)の最近の進歩により、ゼロショット能力が強化され、ダウンストリームデータセットと計算コストの高い微調整の必要性がなくなるため、この分野はシフトした。しかし、GLLM の評価は、識別モデルに使用される二項真偽の評価が、GLLM の予測には適用できないため、課題となる。本稿では,情報抽出や分類タスクを含む幅広いタスクを評価するために,anlsと呼ばれる生成モデルのための新しいメトリクスを提案する。 ANLSメトリックは、既存のANLSメトリクスをドロップイン置換として拡張し、以前報告されたANLSスコアと互換性がある。また、ANLS測定値を用いて、7つの異なるデータセットと3つの異なるGLLMの評価を行い、提案手法の重要性を示した。また、SFTと呼ばれる文書のプロンプトを生成する新しい手法を、LATINなどの他のプロンプト技術に対してベンチマークする。 21例中15例では、SFTは他のテクニックよりも優れており、最先端の技術を改善している。ソースはhttps://github.com/deepopinion/anls_star_metricにある。 Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs. This paper introduces a new metric for generative models called ANLS for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets and 3 different GLLMs using the ANLS* metric is also provided, demonstrating the importance of the proposed metric. We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In 15 out of 21 cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as 15 percentage points. Sources are available at https://github.com/deepopinion/anls_star_metric	翻訳日:2024-02-28 21:09:04 公開日:2024-02-27
# 未知の言語モデルから未知の言語を識別する Distinguishing the Knowable from the Unknowable with Language Models ( http://arxiv.org/abs/2402.03563v2 ) ライセンス: Link先を確認	Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman	(参考訳) 本研究では,自由形式テキスト上での大規模言語モデル(llm)の出力における認識的不確実性(知識の欠如を反映する)の同定の可能性について検討した。地中真理確率の欠如において, LLMの不確かさを(ほぼ)解消するために, 地中真理の代用として, はるかに大きなモデルが成立する環境を探究する。凍った事前学習されたモデルの埋め込みに基づいて訓練された小さな線形プローブは、より大きなモデルがトークンレベルでより自信を持つようになるタイミングを正確に予測し、あるテキストドメインで訓練されたプローブが他のものに一般化することを示す。さらに,同一タスクにおいて非自明な精度を実現する完全教師なし手法を提案する。まとめて、これらの結果は、LLMが様々な種類の不確実性の内的表現を自然に含んでいるという証拠として解釈し、様々な実践的な環境でモデル信頼性のより有益な指標を考案する可能性がある。 We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings.	翻訳日:2024-02-28 21:08:00 公開日:2024-02-27
# guard: 大規模な言語モデルのガイドライン準拠をテストするために、自然言語脱獄を生成するロールプレイング GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models ( http://arxiv.org/abs/2402.03299v2 ) ライセンス: Link先を確認	Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang	(参考訳) 大規模言語モデル(LLM)の安全フィルタをバイパスする"jailbreaks"の発見と有害な応答により、コミュニティは安全対策を実施するようになった。主要な安全対策の1つは、リリース前にLLMをジェイルブレイクで積極的にテストすることである。そのため、このようなテストはジェイルブレイクを大量かつ効率的に生成できる方法を必要とする。本稿では,人間の世代のスタイルでジェイルブレイクを発生させる新奇かつ直感的な戦略について述べる。我々は,新しいジェイルブレイクに協力するために,4つの異なる役割をユーザLLMに割り当てるロールプレイングシステムを提案する。さらに、既存のジェイルブレイクを収集し、クラスタリング周波数と文による意味パターンを用いて、異なる独立した特徴に分割する。これらの特徴を知識グラフに整理し、よりアクセスしやすく、検索しやすくします。我々の異なる役割のシステムは、この知識グラフを利用して新しいジェイルブレイクを生成するが、これはLLMを非倫理的またはガイドライン違反の応答を生成するのに有効である。さらに,llmがガイドラインに従っているかどうかをテストするために,政府発行のガイドラインに従って自動的にジェイルブレイクを発生させるシステムの設定の先駆者でもある。本稿では,GUARD (Guideline Upholding through Adaptive Role-play Diagnostics) と呼ぶ。我々は,GUARDが3つの最先端オープンソースLLM(Vicuna-13B,LongChat-7B,Llama-2-7B)および広く利用されている商用LLM(ChatGPT)に対する有効性を実証的に検証した。さらに,我々の研究は視覚言語モデル(minigpt-v2とgemini vision pro)の領域にまで及んで,ガードの汎用性を示し,多様なモダリティにまたがってより安全で信頼性の高いllmベースのアプリケーションを開発する上で有用な洞察を与えています。 The discovery of "jailbreaks" to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation. We propose a role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks. Furthermore, we collect existing jailbreaks and split them into different independent characteristics using clustering frequency and semantic patterns sentence by sentence. We organize these characteristics into a knowledge graph, making them more accessible and easier to retrieve. Our system of different roles will leverage this knowledge graph to generate new jailbreaks, which have proved effective in inducing LLMs to generate unethical or guideline-violating responses. In addition, we also pioneer a setting in our system that will automatically follow the government-issued guidelines to generate jailbreaks to test whether LLMs follow the guidelines accordingly. We refer to our system as GUARD (Guideline Upholding through Adaptive Role-play Diagnostics). We have empirically validated the effectiveness of GUARD on three cutting-edge open-sourced LLMs (Vicuna-13B, LongChat-7B, and Llama-2-7B), as well as a widely-utilized commercial LLM (ChatGPT). Moreover, our work extends to the realm of vision language models (MiniGPT-v2 and Gemini Vision Pro), showcasing GUARD's versatility and contributing valuable insights for the development of safer, more reliable LLM-based applications across diverse modalities.	翻訳日:2024-02-28 21:07:38 公開日:2024-02-27
# ISCUTE: テキスト埋め込みを用いたケーブルのインスタンス分割 ISCUTE: Instance Segmentation of Cables Using Text Embedding ( http://arxiv.org/abs/2402.11996v2 ) ライセンス: Link先を確認	Shir Kozlovsky, Omkar Joglekar and Dotan Di Castro	(参考訳) ロボット工学と自動化の分野では、電線やケーブル、柔軟なチューブといった変形可能な線形オブジェクト(dlos)を知覚する上で、従来のオブジェクト認識とインスタンスセグメンテーションの方法が大きな課題に直面している。この課題は、形状、色、テクスチャといった明確な特性の欠如から生じており、正確な識別を達成するために調整された解を求める。本稿では,テキストプロポータブルでユーザフレンドリーな基礎モデルに基づくdloインスタンスセグメンテーション手法を提案する。具体的には,CLIPSegモデルのテキスト条件セマンティックセグメンテーション機能とSegment Anything Model (SAM)のゼロショット一般化機能を組み合わせた。本手法はDLOインスタンスセグメンテーションにおけるSOTA性能を超え,mIoUが91.21\%$であることを示す。また、サンプルセグメンテーションのためのリッチで多様なDLO特化データセットも導入します。 In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.	翻訳日:2024-02-28 21:02:16 公開日:2024-02-27
# PolypNextLSTM:ConvNextとConvLSTMを用いた軽量かつ高速なPolypビデオセグメンテーションネットワーク PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM ( http://arxiv.org/abs/2402.11585v2 ) ライセンス: Link先を確認	Debayan Bhattacharya, Konrad Reuter, Finn Behrendnt, Lennart Maack, Sarah Grube, Alexander Schlaefer	(参考訳) ポリプセグメンテーションで一般的に用いられる単一の画像unetアーキテクチャは、ポリープの診断においてビデオデータから得られる時間的洞察が欠如している。臨床実践をより忠実に反映するために,提案手法であるPolypNextLSTMは,映像に基づく深層学習を活用し,時間的情報を利用して,最小パラメータオーバーヘッドでセグメンテーション性能を向上させる。 PolypNextLSTMは、UNetライクな構造で、ConvNext-Tinyをバックボーンとして、パラメータオーバーヘッドを減らすために、最後の2つのレイヤを戦略的に省略する。我々の時間融合モジュールであるConvLSTM(Convolutional Long Short Term Memory)は、時間的特徴を効果的に活用する。我々の主な特徴はPolypNextLSTMであり、パラメータの最もリーンで最速のモデルであり、5つの最先端の画像モデルとビデオベースのディープラーニングモデルの性能を上回っている。 sun-segデータセットの評価は、高速モーションやオクルージョンのような挑戦的なアーティファクトを含むビデオとともに、検出が容易で検出が難しいポリプシナリオにまたがる。 5つの画像ベースモデルと5つのビデオベースモデルを比較すると、PolypNextLSTMの優位性が示され、画像ベース PraNet (0.7519) とビデオベース PNSPlusNet (0.7486) を上回った。特にこのモデルは,ゴーストやオクルージョンなどの複雑なアーティファクトを特徴とするビデオに優れている。 Pruned ConvNext-TinyとConvLSTMを統合したPolypNextLSTMは、セグメンテーション性能が優れているだけでなく、評価モデルの中でも最高フレームを維持している。アクセスコード https://github.com/mtec-tuhh/polypnextlstm Commonly employed in polyp segmentation, single image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with the least parameter overhead, making it possibly suitable for edge devices. PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion. Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNSPlusNet (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion. PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Access code here https://github.com/mtec-tuhh/PolypNextLSTM	翻訳日:2024-02-28 21:02:01 公開日:2024-02-27
# RLHFを用いた翻訳選好モデルの改良:コスト効果ソリューションへの一歩 Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution ( http://arxiv.org/abs/2402.11525v3 ) ライセンス: Link先を確認	Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang	(参考訳) 忠実さ、表現力、優雅さは機械翻訳における絶え間ない追求である。しかし、‘textit{BLEU} のような伝統的なメトリクスは、翻訳品質の人間の好みと厳密に一致しない。本稿では,人間のフィードバックによる強化学習(\textit{RLHF})の活用による翻訳品質の向上について検討する。特に低リソース言語において、翻訳間の人的比較の大規模な高品質データセットを収集するのは自明ではない。この問題に対処するために,人間と機械の翻訳を区別して報酬モデルを最適化する,費用対効果の高い選好学習戦略を提案する。このようにして、報酬モデルは人間に比べて機械翻訳の欠陥を学習し、その後の機械翻訳の改善を導く。実験により, \textit{RLHF} は翻訳品質を効果的に向上し, この改善は, \textit{RLHF} で訓練されていない他の翻訳指導に有効であることが示された。さらなる分析は、モデルの言語能力が嗜好学習において重要な役割を果たすことを示している。強力な言語能力を持つ報酬モデルは、翻訳品質の微妙な違いをよりセンシティブに学習し、実際の人間の翻訳好みに合致することができる。 Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.	翻訳日:2024-02-28 21:01:23 公開日:2024-02-27
# irfundusset:調和した健康ラベルを持つ統合網膜眼底データセット IRFundusSet: An Integrated Retinal Fundus Dataset with a Harmonized Healthy Label ( http://arxiv.org/abs/2402.11488v2 ) ライセンス: Link先を確認	P. Bilha Githinji, Keming Zhao, Jiantao Wang, Peiwu Qin	(参考訳) 眼の条件は世界的関心事であり、網膜底色写真を利用した計算ツールは定期的なスクリーニングと管理に役立つ。しかし、包括的かつ十分な大きさのデータセットを持つことは、人口統計学や取得のバリエーションに加えて、病理学における異質性を示す複雑な網膜基底体にとって自明ではない。さらに、公共空間における網膜眼底データセットは、データの組織化と健全な観察の定義において断片化に苦しむ。本稿では,複数の公開データセットを統合し,調和させ,キュレーションするデータセットである統合網膜底セット(irfundusset)を提案する。 IRFundusSetはPythonパッケージで構成されており、調和を自動化し、PyTorchアプローチに従ってデータセットオブジェクトを活用する。さらに、画像が物理的にレビューされ、健康観察の一貫した定義のために新しいis_normalラベルが注釈付けされる。 10の公開データセットが46064の画像で検討され、そのうち25406が新しいis_normalラベルのためにキュレートされ、3515はソース全体で健全であると考えられている。 Ocular conditions are a global concern and computational tools utilizing retinal fundus color photographs can aid in routine screening and management. Obtaining comprehensive and sufficiently sized datasets, however, is non-trivial for the intricate retinal fundus, which exhibits heterogeneities within pathologies, in addition to variations from demographics and acquisition. Moreover, retinal fundus datasets in the public space suffer fragmentation in the organization of data and definition of a healthy observation. We present Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label. IRFundusSet comprises a Python package that automates harmonization and avails a dataset object in line with the PyTorch approach. Moreover, images are physically reviewed and a new is_normal label is annotated for a consistent definition of a healthy observation. Ten public datasets are initially considered with a total of 46064 images, of which 25406 are curated for a new is_normal label and 3515 are deemed healthy across the sources.	翻訳日:2024-02-28 21:01:02 公開日:2024-02-27
# FViT:Gaborフィルタを用いた音声ビジョン変換器 FViT: A Focal Vision Transformer with Gabor Filter ( http://arxiv.org/abs/2402.11303v2 ) ライセンス: Link先を確認	Yulong Shi, Mingwei Sun, Yongshuai Wang, Rui Wang, Hui Sun, Zengqiang Chen	(参考訳) ビジョントランスフォーマーは、様々なコンピュータビジョンタスクの進歩を奨励している。これは、機能トークン間のグローバルな依存関係のモデリングにおける自己注意の能力に起因している、というのが一般的な考えである。残念ながら、自己注意は、高い計算複雑性や望ましい帰納バイアスの欠如など、高密度な予測タスクにおけるいくつかの課題に直面している。これらの問題に対処するために,視覚変換器とGaborフィルタの統合による潜在的な利点を再検討し,畳み込みを用いた学習可能なGaborフィルタ(LGF)を提案する。自己注意の代替として,生体視覚系の単純細胞のイメージ入力に対する応答をシミュレートするためにLGFを用い,様々なスケールや方向からターゲットの識別的特徴表現に焦点を合わせるようモデルに促した。さらに,LGF に基づいた Bionic Focal Vision (BFV) ブロックを設計する。このブロックは神経科学からインスピレーションを受け、生物学的視覚野処理情報の動作方法を並列にエミュレートするMulti-Path Feed Forward Network (MPFFN)を導入している。さらに、BFVブロックを積み重ねることにより、Focal Vision Transformers (FViT) と呼ばれる統合的で効率的なピラミッドバックボーンネットワークファミリーを開発する。 FViTは様々な視覚タスクにおいて高い競争性能を示す。特に計算効率とスケーラビリティの面では、FViTは他と比較して大きな優位性を示している。コードはhttps://github.com/nkusyl/FViTで入手できる。 Vision transformers have achieved encouraging progress in various computer vision tasks. A common belief is that this is attributed to the competence of self-attention in modeling the global dependencies among feature tokens. Unfortunately, self-attention still faces some challenges in dense prediction tasks, such as the high computational complexity and absence of desirable inductive bias. To address these issues, we revisit the potential benefits of integrating vision transformer with Gabor filter, and propose a Learnable Gabor Filter (LGF) by using convolution. As an alternative to self-attention, we employ LGF to simulate the response of simple cells in the biological visual system to input images, prompting models to focus on discriminative feature representations of targets from various scales and orientations. Additionally, we design a Bionic Focal Vision (BFV) block based on the LGF. This block draws inspiration from neuroscience and introduces a Multi-Path Feed Forward Network (MPFFN) to emulate the working way of biological visual cortex processing information in parallel. Furthermore, we develop a unified and efficient pyramid backbone network family called Focal Vision Transformers (FViTs) by stacking BFV blocks. Experimental results show that FViTs exhibit highly competitive performance in various vision tasks. Especially in terms of computational efficiency and scalability, FViTs show significant advantages compared with other counterparts. Code is available at https://github.com/nkusyl/FViT	翻訳日:2024-02-28 20:59:40 公開日:2024-02-27
# linkner: 不確実性を用いたローカル名前付きエンティティ認識モデルと大規模言語モデルとのリンク LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty ( http://arxiv.org/abs/2402.10573v2 ) ライセンス: Link先を確認	Zhen Zhang, Yuhua Zhao, Hang Gao, and Mengting Hu	(参考訳) 名前付きエンティティ認識(ner)は、自然言語理解において基本的なタスクであり、webコンテンツ分析、検索エンジン、情報検索システムに直接影響する。ファインチューニングされたNERモデルは標準のNERベンチマークで満足な性能を示す。しかしながら、微調整データの制限と知識の欠如により、未認識のエンティティ認識では性能が低下する。その結果、Web 関連アプリケーションにおける NER モデルのユーザビリティと信頼性が損なわれている。代わりに、GPT-4のようなLarge Language Models (LLM) は外部知識を持っているが、NERタスクの専門性を欠いている。さらに、非公開および大規模の重み付けにより、LLMのチューニングが困難になる。これらの課題に対処するため,我々は,小さな微調整モデルとllm(リンクナー)を組み合わせるフレームワークと,微調整モデルがブラックボックスllmを補完し,よりよい性能を実現するための不確実性に基づくリンク戦略を提案する。我々は標準のnerテストセットと騒がしいソーシャルメディアデータセットの両方で実験する。 LinkNERは、堅牢性テストにおいて、特にSOTAモデルを上回るNERタスクパフォーマンスを向上させる。また,不確実性推定手法やLLM,コンテキスト内学習などの重要要素が多様なNERタスクに与える影響を定量的に分析し,特定のWeb関連勧告を提供する。 Named Entity Recognition (NER) serves as a fundamental task in natural language understanding, bearing direct implications for web content analysis, search engines, and information retrieval systems. Fine-tuned NER models exhibit satisfactory performance on standard NER benchmarks. However, due to limited fine-tuning data and lack of knowledge, it performs poorly on unseen entity recognition. As a result, the usability and reliability of NER models in web-related applications are compromised. Instead, Large Language Models (LLMs) like GPT-4 possess extensive external knowledge, but research indicates that they lack specialty for NER tasks. Furthermore, non-public and large-scale weights make tuning LLMs difficult. To address these challenges, we propose a framework that combines small fine-tuned models with LLMs (LinkNER) and an uncertainty-based linking strategy called RDC that enables fine-tuned models to complement black-box LLMs, achieving better performance. We experiment with both standard NER test sets and noisy social media datasets. LinkNER enhances NER task performance, notably surpassing SOTA models in robustness tests. We also quantitatively analyze the influence of key components like uncertainty estimation methods, LLMs, and in-context learning on diverse NER tasks, offering specific web-related recommendations.	翻訳日:2024-02-28 20:58:41 公開日:2024-02-27
# Johnson-Lindenstraus の単純統一解析とその応用 Simple, unified analysis of Johnson-Lindenstrauss with applications ( http://arxiv.org/abs/2402.10232v3 ) ライセンス: Link先を確認	Yingru Li	(参考訳) 本稿では,ジョンソン・リンデンシュトラウス(JL)補題の簡易かつ統一的な解析法を提案する。我々のアプローチは理解を単純化するだけでなく、球面、バイナリコイン、スパースJL、ガウスおよびガウス以下のモデルを含む様々な構成をJLフレームワークで統一する。この単純化と統一は、ストリーミングアルゴリズムから強化学習まで、さまざまなアプリケーションで不可欠なデータの内在的な幾何学を維持する上で、大きな一歩を踏み出します。特に球面構成の有効性に関する最初の厳密な証明を提供し、この単純化された枠組みの中でサブガウス構成の一般的なクラスを提供する。私たちの貢献の中心は、ハンソン・ライトの不等式から高次元への革新的拡張であり、明示的な定数で完備である。拡張対角化プロセスのような単純で強力な確率的ツールと分析技術を利用することで、我々の分析はJL補題の理論的基礎を独立性の仮定を取り除き、その実践的リーチを拡張し、現代の計算アルゴリズムにおける適応性と重要性を示す。 We present a simple and unified analysis of the Johnson-Lindenstrauss (JL) lemma, a cornerstone in the field of dimensionality reduction critical for managing high-dimensional data. Our approach not only simplifies the understanding but also unifies various constructions under the JL framework, including spherical, binary-coin, sparse JL, Gaussian and sub-Gaussian models. This simplification and unification make significant strides in preserving the intrinsic geometry of data, essential across diverse applications from streaming algorithms to reinforcement learning. Notably, we deliver the first rigorous proof of the spherical construction's effectiveness and provide a general class of sub-Gaussian constructions within this simplified framework. At the heart of our contribution is an innovative extension of the Hanson-Wright inequality to high dimensions, complete with explicit constants. By employing simple yet powerful probabilistic tools and analytical techniques, such as an enhanced diagonalization process, our analysis not only solidifies the JL lemma's theoretical foundation by removing an independence assumption but also extends its practical reach, showcasing its adaptability and importance in contemporary computational algorithms.	翻訳日:2024-02-28 20:58:18 公開日:2024-02-27
# 大規模言語モデルによるAutoTutorのオーサリングのスケールアップ Scaling the Authoring of AutoTutors with Large Language Models ( http://arxiv.org/abs/2402.09216v2 ) ライセンス: Link先を確認	Sankalan Pal Chowdhury, Vil\'em Zouhar, Mrinmaya Sachan	(参考訳) 大規模言語モデル(LLM)は、自動質問生成からエッセイ評価まで、いくつかのユースケースを教育で発見した。本稿では,Large Language Models (LLM) を用いて知能学習システムを構築する可能性について検討する。 LLMの共通の落とし穴は、学生に答えを漏らすなど、望まれる教育戦略からの逸脱であり、一般に保証を与えないことである。特定のガードレールを持つLLMは、被験者に取って代わることができるが、総合的な教育設計は、最高の学習結果を得るために手作業で行う必要があると仮定する。この原理に基づいて, MWPTutor という, LLM を用いて予め定義された有限状態トランスデューサの状態空間を埋める, エンドツーエンドの学習システムを構築した。このアプローチは、長年にわたって科学者によって開発されてきた伝統的なチューリングシステムの構造と教育を保ちながら、LLMベースのアプローチのさらなる柔軟性をもたらす。数学の単語問題に基づく2つのデータセットについて人間による評価を行った結果,本手法は指導されるが自由形式であるgpt-4よりも総合的な学習スコアが向上することを示した。 MWPTutorは完全にモジュール化されており、個々のモジュールを改善したり、それに従うことができる異なる教育戦略を使うことで、コミュニティがパフォーマンスを向上させるためのスコープを開放する Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject experts, the overall pedagogical design still needs to be handcrafted for the best learning results. Based on this principle, we create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer. This approach retains the structure and the pedagogy of traditional tutoring systems that has been developed over the years by learning scientists but brings in additional flexibility of LLM-based approaches. Through a human evaluation study on two datasets based on math word problems, we show that our hybrid approach achieves a better overall tutoring score than an instructed, but otherwise free-form, GPT-4. MWPTutor is completely modular and opens up the scope for the community to improve its performance by improving individual modules or using different teaching strategies that it can follow	翻訳日:2024-02-28 20:57:56 公開日:2024-02-27
# 点状不純物を持つ接環の連続体の束縛状態 Bound states in the continuum in a tangential ring with pointlike impurities ( http://arxiv.org/abs/2402.14134v2 ) ライセンス: Link先を確認	M.A. Figueroa, Vladimir Juricic, P.A. Orellana	(参考訳) 外部ナノワイヤと結合した量子環は、量子メゾスコピック輸送を操作するための汎用プラットフォームを提供する。本稿では,環に沿って周期的に分布する点状不純物を含むシステムについて検討する。ここで発見されたコンダクタンスの正確な表現に基づいて、環のブリルアンゾーンの高対称性モーメントにおける環状態から連続体(bics)の結合状態が形成されることを実証する。さらに、反転対称性の存在は共鳴状態の選択的分離を可能にし、bic生成を支持し、従って系の量子輸送において余分なチューナビリティを許容する。最後に、磁気フラックスとラシュバスピン軌道結合が、側結合量子環におけるBIC形成の他の経路となることを示唆する。 Quantum rings coupled to external nanowires offer a versatile platform for the manipulation of the quantum mesoscopic transport. Here, we study such a system, including periodically distributed pointlike impurities along the ring. Based on an exact expression for the conductance found here, we demonstrate that the bound states in the continuum (BICs) form from the ring states at the high-symmetry momenta in the ring's Brillouin zone. Furthermore, the presence of the inversion symmetry allows for a selective decoupling of resonant states, favoring the BIC generation and, therefore, allowing extra tunability in the quantum transport of the system. Finally, we suggest that the magnetic fluxes and Rashba spin-orbit coupling offer other possible routes for the BIC formation in laterally coupled quantum rings.	翻訳日:2024-02-28 20:53:03 公開日:2024-02-27
# 委員会としての知恵:基礎モデルから特殊応用モデルへ Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model ( http://arxiv.org/abs/2402.14035v2 ) ライセンス: Link先を確認	Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao	(参考訳) 基礎モデルの最近の進歩は、幅広いタスクで印象的なパフォーマンスをもたらしている。一方、特定のアプリケーションでは、実践者は特別なアプリケーションモデルを開発しています。両方のモデルの利点を享受するために、基礎モデルの知識を特殊なアプリケーションモデルに移すことが自然な道の1つだ。ここでは知識蒸留の技術が適用され、そこではアプリケーションモデルが基礎モデルの模倣を学ぶ。しかし、特殊なアプリケーションモデルと基礎モデルにはキャパシティにかなりのギャップがあり、異なるアーキテクチャを採用し、異なるモードから異なる入力機能を使用し、異なる分散に最適化されている。これらのモデル特性の違いは蒸留法に大きな課題をもたらす。本研究では,基礎モデル教員と補足教員の両方からなる教育委員会の創設を提案する。補足的な教師は、基礎モデルと専門アプリケーションモデルとのギャップを埋めることを目的として、生徒に類似したモデル特性を持っている。さらに,委員会における教師間の相違に対応するために,学生が各教師の専門知識を理解し,課題知識を抽出できる「DiverseDistill」を紹介した。本評価は,補完的な教員の追加が学生のパフォーマンスを向上させることを示すものである。最後に、DiverseDistillは教師の選択にかかわらず、基礎的な蒸留法を一貫して上回り、学生のパフォーマンスが大幅に向上する。 Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose creating a teaching committee comprising both foundation model teachers and complementary teachers. Complementary teachers possess model characteristics akin to the student's, aiming to bridge the gap between the foundation model and specialized application models for a smoother knowledge transfer. Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge. Our evaluations demonstrate that adding complementary teachers enhances student performance. Finally, DiverseDistill consistently outperforms baseline distillation methods, regardless of the teacher choices, resulting in significantly improved student performance.	翻訳日:2024-02-28 20:52:52 公開日:2024-02-27
# 逐次ランダム投影のための確率ツール Probability Tools for Sequential Random Projection ( http://arxiv.org/abs/2402.14026v2 ) ライセンス: Link先を確認	Yingru Li	(参考訳) 不確実性下での逐次的意思決定の課題に根ざしたアプローチである、逐次ランダム投影のための最初の確率的フレームワークを提案する。解析は、逐次決定過程に固有の適応機構の副産物である確率変数の逐次依存性と高次元の性質によって複雑である。本研究は,連続的に相互接続される集中イベント列の解析を容易にするため,停止過程の新規構築を特徴とする。停止過程に由来する自己正規化過程内の混合法を用いることにより、所望の非漸近確率境界が得られる。この境界はジョンソン・リンデンシュトラウス(JL)補題の非自明なマーチンゲール拡大を表し、ランダム射影とシーケンシャル解析に関する文献への先駆的な貢献を示している。 We introduce the first probabilistic framework tailored for sequential random projection, an approach rooted in the challenges of sequential decision-making under uncertainty. The analysis is complicated by the sequential dependence and high-dimensional nature of random variables, a byproduct of the adaptive mechanisms inherent in sequential decision processes. Our work features a novel construction of a stopped process, facilitating the analysis of a sequence of concentration events that are interconnected in a sequential manner. By employing the method of mixtures within a self-normalized process, derived from the stopped process, we achieve a desired non-asymptotic probability bound. This bound represents a non-trivial martingale extension of the Johnson-Lindenstrauss (JL) lemma, marking a pioneering contribution to the literature on random projection and sequential analysis.	翻訳日:2024-02-28 20:52:26 公開日:2024-02-27
# 折り紙:(un)プログラム合成のための再帰スキームの抽象化 Origami: (un)folding the abstraction of recursion schemes for program synthesis ( http://arxiv.org/abs/2402.13828v2 ) ライセンス: Link先を確認	Matheus Campos Fernandes, Fabricio Olivetti de Franca, Emilio Francesquini	(参考訳) 遺伝的プログラミングを用いたプログラム合成は、通常入力出力の例として提供される入力仕様を満たす正しいプログラムを探索する。特定の課題はループと再帰を効果的に扱う方法であり、終わらないプログラムを避けることである。この問題を緩和できる有用な抽象化は、データ生産と消費の組み合わせを一般化する再帰スキームの利用である。再帰スキームはデータの要約、シーケンスの作成、高度な計算が可能なプログラムの構築を可能にするため、非常に強力である。 Recursion Schemesを使ってプログラムを書く主な利点は、プログラムがよく定義されたテンプレートで構成されており、いくつかの部分だけを合成する必要があることである。本稿では,テンプレートの折り畳みと折り畳みによるプログラム合成の利点に関する初期研究を行い,予備的な実験結果について概説する。このアプローチの利点とデメリットを強調するために,再帰スキームを用いてGPSBベンチマーク全体を手作業で解決し,代替実装と比較して進化すべき部分を強調した。我々は、再帰スキームが選択されると、テンプレートの欠落部分のそれぞれがより単純な関数に還元されるため、合成プロセスが単純化され、さらに独自の入力型と出力型によって制約されることに気付いた。 Program synthesis with Genetic Programming searches for a correct program that satisfies the input specification, which is usually provided as input-output examples. One particular challenge is how to effectively handle loops and recursion avoiding programs that never terminate. A helpful abstraction that can alleviate this problem is the employment of Recursion Schemes that generalize the combination of data production and consumption. Recursion Schemes are very powerful as they allow the construction of programs that can summarize data, create sequences, and perform advanced calculations. The main advantage of writing a program using Recursion Schemes is that the programs are composed of well defined templates with only a few parts that need to be synthesized. In this paper we make an initial study of the benefits of using program synthesis with fold and unfold templates, and outline some preliminary experimental results. To highlight the advantages and disadvantages of this approach, we manually solved the entire GPSB benchmark using recursion schemes, highlighting the parts that should be evolved compared to alternative implementations. We noticed that, once the choice of which recursion scheme is made, the synthesis process can be simplified as each of the missing parts of the template are reduced to simpler functions, which are further constrained by their own input and output types.	翻訳日:2024-02-28 20:50:58 公開日:2024-02-27
# prosparse: 大規模言語モデルにおける内在的アクティベーションスパーシティの導入と拡張 ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models ( http://arxiv.org/abs/2402.13516v2 ) ライセンス: Link先を確認	Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun	(参考訳) アクティベーションスパーシティは、アクティベーションアウトプットの間にかなりの弱結合要素が存在することを意味する。 ReLUアクティベーション関数を用いたモデルの一般的な特性として、モデル推論効率を高めるための有望なパラダイムであることが証明されている。それにもかかわらず、ほとんどの大きな言語モデル(LLM)は、固有のアクティベーション間隔のないアクティベーション機能(GELUやSwishなど)を採用している。最近の研究では、LLMが活性化空間と推論加速度を達成するのに役立つ代替活性化関数としてReLUやその変種を導入することを検討しているが、高い間隔と同等のモデル性能を同時に得られるものはほとんどない。本稿では, モデル性能を低下させることなく, LLMを高機能化するために, プロスパース方式を提案する。具体的には、LLMの活性化関数をReLUで置換した後、ProSparseは複数の段階において正弦曲線に沿って滑らかに増加する因子で進行性スパーシティ正則化を採用する。これにより、アクティベーション分布の急変を避けることにより、アクティベーションスパーシティを高め、パフォーマンス低下を軽減することができる。 ProSparse では LLaMA2-7B と LLaMA2-13B に対して 89.32% と 88.80% の高間隔が得られる。さらに, 高い活性化スパース性によってもたらされる実用的加速を推算加速度実験により実証した。 Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs. As a prevalent property of the models using the ReLU activation function, it has been proven a promising paradigm to boost model inference efficiency. Nevertheless, most large language models (LLMs) adopt activation functions without intrinsic activation sparsity (e.g., GELU and Swish). Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance. This paper introduces an effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity without decreasing model performance. Specifically, after substituting the activation function of LLMs with ReLU, ProSparse adopts progressive sparsity regularization with a factor smoothly increasing along sine curves in multiple stages. This can enhance activation sparsity and alleviate performance degradation by avoiding radical shifts in activation distribution. With ProSparse, we obtain high sparsity of 89.32% and 88.80% for LLaMA2-7B and LLaMA2-13B, respectively, achieving comparable performance to their original Swish-activated versions. Our inference acceleration experiments further demonstrate the practical acceleration brought by higher activation sparsity.	翻訳日:2024-02-28 20:50:37 公開日:2024-02-27
# 不均一データからの連合因果発見 Federated Causal Discovery from Heterogeneous Data ( http://arxiv.org/abs/2402.13241v2 ) ライセンス: Link先を確認	Loka Li, Ignavier Ng, Gongxu Luo, Biwei Huang, Guangyi Chen, Tongliang Liu, Bin Gu, Kun Zhang	(参考訳) 従来の因果探索法は、多くの実世界の状況におけるデータの分散的性質と矛盾する集中データに依存している。この相違は、fcd(federated causal discovery)アプローチの開発を動機付けた。しかし、既存のFCD法は、特定可能な機能因果モデルや同質なデータ分布の潜在的に制限的な仮定によって制限され、様々なシナリオで適用範囲を狭めることができる。本稿では,任意の因果モデルと不均一データに対応する新しいfcd法を提案する。まず、クライアントインデックスに対応する代理変数を使用して、異なるクライアント間のデータの均一性を考慮します。次に, 因果骨格発見のための連邦条件独立試験(FCIT)を開発し, 因果方向を決定するための連邦独立変化原則(FICP)を確立する。これらのアプローチには、データプライバシを保護するために生データのプロキシとして要約統計を構築することが含まれる。非パラメトリックな性質のため、FCIT と FICP は特定の機能形式を仮定せず、任意の因果モデルの扱いを容易にする。本手法の有効性を示すために,合成データと実データについて広範な実験を行った。コードはhttps://github.com/lokali/fedcdh.gitで入手できる。 Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at https://github.com/lokali/FedCDH.git.	翻訳日:2024-02-28 20:50:08 公開日:2024-02-27
# SMORE:マルチセンサ時系列分類のための類似性に基づく超次元ドメイン適応 SMORE: Similarity-based Hyperdimensional Domain Adaptation for Multi-Sensor Time Series Classification ( http://arxiv.org/abs/2402.13233v2 ) ライセンス: Link先を確認	Junyao Wang, Mohammad Abdullah Al Faruque	(参考訳) IoT(Internet of Things)の現実的なアプリケーションの多くは、機械学習(ML)アルゴリズムを使用して、相互接続されたセンサーによって収集された時系列情報を分析する。しかし、データ駆動型MLの基本的な課題である分散シフトは、トレーニングデータとは異なるデータ分散上にモデルがデプロイされ、モデルのパフォーマンスが著しく低下する時に発生する。さらに、マルチセンサー時系列データにおける複雑な空間的および時間的依存関係をキャプチャするためには、ますます高度なディープニューラルネットワーク(DNN)が必要である。本稿では,超次元演算の効率と並列性を活用した,多センサ時系列分類のための新しい資源効率ドメイン適応(da)アルゴリズムsmoreを提案する。 SMOREは、各サンプルのドメインコンテキストを明確に考慮してテスト時のモデルを動的にカスタマイズし、ドメインシフトの負の影響を軽減する。 SMOREは,18.81倍高速トレーニングと4.63倍高速推論で,最先端(SOTA)のDAアルゴリズムよりも平均1.98%高い精度で達成されている。 Many real-world applications of the Internet of Things (IoT) employ machine learning (ML) algorithms to analyze time series information collected by interconnected sensors. However, distribution shift, a fundamental challenge in data-driven ML, arises when a model is deployed on a data distribution different from the training data and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) are required to capture intricate spatial and temporal dependencies in multi-sensor time series data, often exceeding the capabilities of today's edge devices. In this paper, we propose SMORE, a novel resource-efficient domain adaptation (DA) algorithm for multi-sensor time series classification, leveraging the efficient and parallel operations of hyperdimensional computing. SMORE dynamically customizes test-time models with explicit consideration of the domain context of each sample to mitigate the negative impacts of domain shifts. Our evaluation on a variety of multi-sensor time series classification tasks shows that SMORE achieves on average 1.98% higher accuracy than state-of-the-art (SOTA) DNN-based DA algorithms with 18.81x faster training and 4.63x faster inference.	翻訳日:2024-02-28 20:49:47 公開日:2024-02-27
# 信頼の問題:大規模言語モデルの固有の自己補正能力を再考する Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models ( http://arxiv.org/abs/2402.12563v2 ) ライセンス: Link先を確認	Loka Li, Guangyi Chen, Yusheng Su, Zhenhao Chen, Yixuan Zhang, Eric Xing, Kun Zhang	(参考訳) 近年の大規模言語モデル(llms)の成功は、自己修正機能への関心を高めている。本稿では,LLMの内在的自己補正に関する包括的調査を行い,その実現可能性に関する議論に対処する。我々の研究は、自己補正過程において重要な潜伏要因であるLSMの「信頼」を特定した。この因子を見渡すと、モデルは自分自身を過度に批判し、自己補正の有効性に関する信頼できない結論をもたらす。我々は,LSMが自身の反応において「自信」を理解する能力を持っていることを実験的に観察した。 if-or-else"(ioe)プロンプトフレームワークを開発する動機付けは、llmが自身の"信頼"を評価し、本質的な自己修正を促進するように設計されたことです。 IoEをベースとしたPromptは,初期回答に対する自己補正応答の精度に関して,一貫した改善を達成できることを示す。本研究は, LLMの自己補正に影響を及ぼす要因を明らかにするだけでなく, IoEプロンプト原理を利用した「自信」による自己補正能力を効率的に向上する実践的枠組みも導入する。コードはhttps://github.com/MBZUAI-CLeaR/IoE-Prompting.gitで公開されている。 The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the "confidence" of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the "confidence" in their own responses. It motivates us to develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence", facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with "confidence". The code is available at https://github.com/MBZUAI-CLeaR/IoE-Prompting.git.	翻訳日:2024-02-28 20:49:07 公開日:2024-02-27
# 周波数空間のダウンスケーリングによるバックドアポゾンデータセットからのクリーン言語モデル取得 Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space ( http://arxiv.org/abs/2402.12026v2 ) ライセンス: Link先を確認	Zongru Wu, Zhuosheng Zhang, Pengzhou Cheng, Gongshen Liu	(参考訳) 自然言語処理(NLP)タスクにおける言語モデル(LM)の顕著な成功にもかかわらず、LMの信頼性はバックドア攻撃の影響を受けやすい。以前の研究は、毒付きデータセットでlmsをトレーニングしながらバックドア学習を緩和しようとするが、現実のシナリオでは複雑なバックドア攻撃に苦しむ。本稿では,フーリエ解析による周波数空間におけるバックドアlmsの学習機構について検討する。以上の結果から, 汚染されたデータセットに提示されたバックドアマッピングは, クリーンマッピングよりも低周波傾向が顕著であり, バックドアマッピングの収束が早いことが示唆された。このジレンマを緩和するために,マルチスケール低ランク適応法(musclelora)を提案する。対象モデルに低ランク適応を加えて周波数空間に複数のラジアルスケーリングを展開し,パラメータ更新時の勾配をさらに調整する。周波数空間のダウンスケーリングを通じて、MuScleLoRAは比較的高周波なクリーンマッピングの学習を優先させ、結果としてバックドア学習を緩和する。実験の結果, MuScleLoRAはベースラインを著しく上回ることがわかった。 muscleloraは、さまざまなバックドア攻撃の平均成功率を複数のデータセットで15\%以下に削減し、bert、roberta、llama2を含む様々なバックボーンlmmに一般化する。コードはhttps://github.com/zrw00/muscleloraで入手できる。 Despite the notable success of language models (LMs) in various natural language processing (NLP) tasks, the reliability of LMs is susceptible to backdoor attacks. Prior research attempts to mitigate backdoor learning while training the LMs on the poisoned dataset, yet struggles against complex backdoor attacks in real-world scenarios. In this paper, we investigate the learning mechanisms of backdoor LMs in the frequency space by Fourier analysis. Our findings indicate that the backdoor mapping presented on the poisoned datasets exhibits a more discernible inclination towards lower frequency compared to clean mapping, resulting in the faster convergence of backdoor mapping. To alleviate this dilemma, we propose Multi-Scale Low-Rank Adaptation (MuScleLoRA), which deploys multiple radial scalings in the frequency space with low-rank adaptation to the target model and further aligns the gradients when updating parameters. Through downscaling in the frequency space, MuScleLoRA encourages the model to prioritize the learning of relatively high-frequency clean mapping, consequently mitigating backdoor learning. Experimental results demonstrate that MuScleLoRA outperforms baselines significantly. Notably, MuScleLoRA reduces the average success rate of diverse backdoor attacks to below 15\% across multiple datasets and generalizes to various backbone LMs, including BERT, RoBERTa, and Llama2. The codes are available at https://github.com/ZrW00/MuScleLoRA.	翻訳日:2024-02-28 20:47:32 公開日:2024-02-27
# 事前訓練された視覚不確かさ Pretrained Visual Uncertainties ( http://arxiv.org/abs/2402.16569v2 ) ライセンス: Link先を確認	Michael Kirchhof and Mark Collier and Seong Joon Oh and Enkelejda Kasneci	(参考訳) 正確な不確実性推定は、信頼できる機械学習には不可欠であるが、通常、タスクごとに不確実性を学ぶ必要がある。この研究は、視覚モデルのための最初の事前訓練された不確実性モジュールを導入する。標準的なプリトレーニングと同様に、大きなプリトレーニングデータセットで学んだ不確実性を、特別なダウンストリームデータセットにゼロショットで転送することができる。我々は,以前の不確実性モジュールの勾配衝突を解決し,最大180倍のトレーニングを加速することにより,imagenet-21kの大規模事前トレーニングを可能にする。事前訓練された不確実性は、目に見えないデータセットに一般化される。学習した不確かさを精査すると、それらがてんかん成分から遠ざかっているアレラトリック不確かさを捉えていることが分かる。これにより、安全な検索と不確実性対応データセットの可視化が可能になる。アプリケーションにさらなる問題やドメインを推奨するために、トレーニング済みのチェックポイントとコードをhttps://github.com/mkirchhof/url でリリースします。 Accurate uncertainty estimation is vital to trustworthy machine learning, yet uncertainties typically have to be learned for each task anew. This work introduces the first pretrained uncertainty modules for vision models. Similar to standard pretraining this enables the zero-shot transfer of uncertainties learned on a large pretraining dataset to specialized downstream datasets. We enable our large-scale pretraining on ImageNet-21k by solving a gradient conflict in previous uncertainty modules and accelerating the training by up to 180x. We find that the pretrained uncertainties generalize to unseen datasets. In scrutinizing the learned uncertainties, we find that they capture aleatoric uncertainty, disentangled from epistemic components. We demonstrate that this enables safe retrieval and uncertainty-aware dataset visualization. To encourage applications to further problems and domains, we release all pretrained checkpoints and code under https://github.com/mkirchhof/url .	翻訳日:2024-02-28 20:39:49 公開日:2024-02-27
# Semantic Mirror Jailbreak: 遺伝的アルゴリズムによるオープンソースLLMに対するジェイルブレイクプロンプト Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs ( http://arxiv.org/abs/2402.14872v2 ) ライセンス: Link先を確認	Xiaoxia Li, Siyuan Liang, Jiyi Zhang, Han Fang, Aishan Liu, Ee-Chien Chang	(参考訳) 大きな言語モデル(LLM)は、創造的な記述、コード生成、翻訳に使用され、入力シーケンスに基づいたテキストを生成するが、工芸的なプロンプトが有害な出力を誘導するジェイルブレイク攻撃に弱い。ほとんどのjailbreakプロンプトメソッドは、Jailbreakプロンプトの作成に関する質問に続いて、Jailbreakテンプレートの組み合わせを使用している。しかし、既存のjailbreakプロンプト設計は一般的に過剰なセマンティックな違いに悩まされ、単純なセマンティックメトリクスをしきい値として使用する防御に抵抗することができない。ジェイルブレイクプロンプトは、クエリに使われた質問よりも意味的に多様である。本稿では,semantic mirror jailbreak (smj) アプローチについて紹介する。セマンティック類似性とジェイルブレイク妥当性の両方を満たすジェイルブレイクプロンプトを多目的最適化問題としてモデル化し,適用可能なプロンプトを生成するための遺伝的アルゴリズムを標準化した。ベースラインのAutoDAN-GAと比較して、SMJは攻撃成功率(ASR)を最大35.4%上回っており、オニオン防衛は85.2%上回っている。また、Jailbreak Prompt、Simisity、Outlierの3つの意味論的意味度指標におけるSMJの優れたパフォーマンスは、これらの指標をしきい値として使用する防御に耐性があることを意味する。 Large Language Models (LLMs), used in creative writing, code generation, and translation, generate text based on input sequences but are vulnerable to jailbreak attacks, where crafted prompts induce harmful outputs. Most jailbreak prompt methods use a combination of jailbreak templates followed by questions to ask to create jailbreak prompts. However, existing jailbreak prompt designs generally suffer from excessive semantic differences, resulting in an inability to resist defenses that use simple semantic metrics as thresholds. Jailbreak prompts are semantically more varied than the original questions used for queries. In this paper, we introduce a Semantic Mirror Jailbreak (SMJ) approach that bypasses LLMs by generating jailbreak prompts that are semantically similar to the original question. We model the search for jailbreak prompts that satisfy both semantic similarity and jailbreak validity as a multi-objective optimization problem and employ a standardized set of genetic algorithms for generating eligible prompts. Compared to the baseline AutoDAN-GA, SMJ achieves attack success rates (ASR) that are at most 35.4% higher without ONION defense and 85.2% higher with ONION defense. SMJ's better performance in all three semantic meaningfulness metrics of Jailbreak Prompt, Similarity, and Outlier, also means that SMJ is resistant to defenses that use those metrics as thresholds.	翻訳日:2024-02-28 20:38:31 公開日:2024-02-27
# 医用画像セグメンテーションのための自己教師型コントラスト学習における次元崩壊の克服 Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation ( http://arxiv.org/abs/2402.14611v2 ) ライセンス: Link先を確認	Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter, Nicolas Padoy	(参考訳) ラベル付きデータの量を制限する自己教師付き学習(SSL)アプローチは大きな成功を収めた。 SSL内では、プレテキストタスクを解決して堅牢な特徴表現を学ぶ。そのような前提的タスクの1つは、対照的な学習であり、類似した異なる入力サンプルのペアを形成し、モデルの区別を誘導する。本研究では,医療画像解析の領域におけるコントラスト学習の応用について検討する。この結果から,最先端のコントラスト学習手法であるMoCo v2は,医用画像に適用すると次元的崩壊に遭遇することがわかった。これは、医療画像間で共有される画像間の類似度が高いためである。そこで我々は,局所的な特徴学習と特徴デコレーションという2つの重要な貢献を提案する。局所的な特徴学習は、モデルのイメージの局所的な領域にフォーカスする能力を向上させ、特徴の分離は、特徴間の線形依存を取り除く。実験の結果,リニア評価と完全微調整設定の両方において,医療セグメンテーションの下流課題におけるモデルの性能が有意に向上した。本研究は,医療画像タスクの特徴にSSL技術を効果的に適応させることの重要性を示す。ソースコードは、https://github.com/CAMMA-public/med-mocoで公開されます。 Self-supervised learning (SSL) approaches have achieved great success when the amount of labeled data is limited. Within SSL, models learn robust feature representations by solving pretext tasks. One such pretext task is contrastive learning, which involves forming pairs of similar and dissimilar input samples, guiding the model to distinguish between them. In this work, we investigate the application of contrastive learning to the domain of medical image analysis. Our findings reveal that MoCo v2, a state-of-the-art contrastive learning method, encounters dimensional collapse when applied to medical images. This is attributed to the high degree of inter-image similarity shared between the medical images. To address this, we propose two key contributions: local feature learning and feature decorrelation. Local feature learning improves the ability of the model to focus on the local regions of the image, while feature decorrelation removes the linear dependence among the features. Our experimental findings demonstrate that our contributions significantly enhance the model's performance in the downstream task of medical segmentation, both in the linear evaluation and full fine-tuning settings. This work illustrates the importance of effectively adapting SSL techniques to the characteristics of medical imaging tasks. The source code will be made publicly available at: https://github.com/CAMMA-public/med-moco	翻訳日:2024-02-28 20:38:04 公開日:2024-02-27
# ハンミングスライスサンプリングのための局所性境界 Locality Bounds for Sampling Hamming Slices ( http://arxiv.org/abs/2402.14278v2 ) ライセンス: Link先を確認	Daniel M. Kane, Anthony Ostuni, Kewen Wu	(参考訳) viola(journal of computing 2012)の影響を受けて、過去10年間、(ほぼ)サンプリング分布の複雑さについて、従来の計算関数の複雑さに焦点を当てた研究が活発に行われてきた。我々は、viola(journal of computing 2012) と filmus, leigh, riazanov, sokolov(random 2023) の疑問に答え、特定のハミングウェイトの2進文字列上の一様分布をほぼサンプリングするブール関数の局所性に関する超定数下界を提供するために、 viola の初期の暗黙的な結果の上に構築し、明らかにする。データ構造下限と量子古典的分離への応用について論じる。 Spurred by the influential work of Viola (Journal of Computing 2012), the past decade has witnessed an active line of research into the complexity of (approximately) sampling distributions, in contrast to the traditional focus on the complexity of computing functions. We build upon and make explicit earlier implicit results of Viola to provide superconstant lower bounds on the locality of Boolean functions approximately sampling the uniform distribution over binary strings of particular Hamming weights, both exactly and modulo an integer, answering questions of Viola (Journal of Computing 2012) and Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023). Applications to data structure lower bounds and quantum-classical separations are discussed.	翻訳日:2024-02-28 20:37:43 公開日:2024-02-27
# COPR: 最適政策規則化による継続的人選学習 COPR: Continual Human Preference Learning via Optimal Policy Regularization ( http://arxiv.org/abs/2402.14228v2 ) ライセンス: Link先を確認	Han Zhang, Lin Gui, Yu Lei, Yuanzhao Zhai, Yehong Zhang, Yulan He, Hui Wang, Yue Yu, Kam-Fai Wong, Bin Liang, Ruifeng Xu	(参考訳) RLHF(Reinforcement Learning from Human Feedback)は、大規模言語モデル(LLM)と人間の嗜好の整合性を改善するために一般的に用いられる。人間の嗜好の進化的な性質を考えると、連続的なアライメントは従来の静的アライメントと比べてより重要で実用的になる。それでも、RLHFをCL(Continuous Learning)と互換性を持たせることは、複雑なプロセスのため困難である。一方、新しい人間の嗜好を直接学習することは、歴史的嗜好の破滅的なフォーッティング(CF)につながる可能性がある。これらの課題を克服するために, 最適政策理論から着想を得たcopr(continual optimal policy regularization)法を提案する。 COPRはCLのサンプル分布を実演と正規化の制約として利用する。これはラグランジアン双対性(ld)法を採用し、歴史的に最適な政策に基づいて現在の政策を動的に定式化する。また,COPRの学習可能性に関する公式な証明も提供する。実験の結果,COPR は報酬ベース,GPT-4 評価,人的評価において,提案したベンチマークのCL ベースラインよりも優れていた。さらに,異なるバックボーン,メモリサイズ,学習順序など,さまざまなCL設定下でのCOPRの堅牢性を検証する。 Reinforcement Learning from Human Feedback (RLHF) is commonly utilized to improve the alignment of Large Language Models (LLMs) with human preferences. Given the evolving nature of human preferences, continual alignment becomes more crucial and practical in comparison to traditional static alignment. Nevertheless, making RLHF compatible with Continual Learning (CL) is challenging due to its complex process. Meanwhile, directly learning new human preferences may lead to Catastrophic Forgetting (CF) of historical preferences, resulting in helpless or harmful outputs. To overcome these challenges, we propose the Continual Optimal Policy Regularization (COPR) method, which draws inspiration from the optimal policy theory. COPR utilizes a sampling distribution as a demonstration and regularization constraints for CL. It adopts the Lagrangian Duality (LD) method to dynamically regularize the current policy based on the historically optimal policy, which prevents CF and avoids over-emphasizing unbalanced objectives. We also provide formal proof for the learnability of COPR. The experimental results show that COPR outperforms strong CL baselines on our proposed benchmark, in terms of reward-based, GPT-4 evaluations and human assessment. Furthermore, we validate the robustness of COPR under various CL settings, including different backbones, replay memory sizes, and learning orders.	翻訳日:2024-02-28 20:37:23 公開日:2024-02-27
# 複素モジュラー算術におけるグロッケ変換器の解釈 Interpreting Grokked Transformers in Complex Modular Arithmetic ( http://arxiv.org/abs/2402.16726v2 ) ライセンス: Link先を確認	Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo	(参考訳) グローキングは遅れた一般化の謎を明らかにするために活発に研究されている。グラクテッドモデル内で解釈可能なアルゴリズムを識別することは、そのメカニズムを理解するための示唆的なヒントである。 In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. また, モジュラー演算のための新しい進行測度, フーリエ周波数スパーシティとフーリエ係数比を導入し, 遅延一般化を示すだけでなく, グルークモデルの特異な内部表現をモジュラー演算毎に特徴付ける。実験分析では,様々な組み合わせの総合評価の重要性を強調した。 Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Frequency Sparsity and Fourier Coefficient Ratio, which not only indicate the late generalization but also characterize distinctive internal representations of grokked models per modular operation. Our empirical analysis emphasizes the importance of holistic evaluation among various combinations.	翻訳日:2024-02-28 19:59:43 公開日:2024-02-27
# nemotron-4 15b技術報告 Nemotron-4 15B Technical Report ( http://arxiv.org/abs/2402.16819v2 ) ライセンス: Link先を確認	Jupinder Parmar and Shrimai Prabhumoye and Joseph Jennings and Mostofa Patwary and Sandeep Subramanian and Dan Su and Chen Zhu and Deepak Narayanan and Aastha Jhunjhunwala and Ayush Dattagupta and Vibhu Jawa and Jiwei Liu and Ameya Mahabaleshwarkar and Osvald Nitski and Annika Brundyn and James Maki and Miguel Martinez and Jiaxuan You and John Kamalu and Patrick LeGresley and Denys Fridman and Jared Casper and Ashwath Aithal and Oleksii Kuchaiev and Mohammad Shoeybi and Jonathan Cohen and Bryan Catanzaro	(参考訳) 8兆のテキストトークンで訓練された15億パラメータの大規模多言語モデルであるnemotron-4 15bを紹介する。 Nemotron-4 15Bは、英語、多言語、コーディングタスクでの評価において、強力な性能を示しており、7つの下流評価領域のうち4つで、同様の大きさのオープンモデルを全て上回り、残りの領域で主要なオープンモデルと競合する性能を達成している。具体的には、Nemotron-4 15Bは、同じ大きさの全てのモデルの最高の多言語能力を示し、さらに4倍のモデル、特に多言語タスクに特化しているモデルよりも優れています。 We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.	翻訳日:2024-02-28 19:43:50 公開日:2024-02-27
# 3次元腹部臓器セグメンテーションのための重み付きモンテカルロ拡張球状フーリエ・ベッセル畳み込み層 Weighted Monte Carlo augmented spherical Fourier-Bessel convolutional layers for 3D abdominal organ segmentation ( http://arxiv.org/abs/2402.16825v2 ) ライセンス: Link先を確認	Wenzhao Zhao, Steffen Albert, Barbara D. Wichtmann, Angelika Maurer, Ulrike Attenberger, Frank G. Z\"ollner, and J\"urgen Hesser	(参考訳) フィルタ分解に基づく群同変畳み込みニューラルネットワークは, 3次元画像特徴抽出に期待できる安定性とデータ効率を示す。しかし、既存のフィルタ分解に基づく3次元群同変ニューラルネットワークはパラメータ共有設計に依存しており、選択された球面調和フィルタ基底が角直交のみを考える回転変換群に限られている。これらの制限は、医療画像セグメンテーションのためのディープニューラルネットワークアーキテクチャへの応用を妨げる。これらの問題に対処するために,モンテカルロの球面フーリエベッセルフィルタの適応アグリゲーションに基づく3次元医用画像分割のための非パラメータ共有アフィン群同変ニューラルネットワークについて述べる。採用した非パラメータ戦略の効率性と柔軟性は、体積データに対する3次元アフィン群同変畳み込みニューラルネットワークの効率的な実装を可能にする。導入された球面ベッセルフーリエフィルタ基底は、角直交と半径直交の両方を組み合わせて特徴抽出を改善する。 btcvとnih pancreasデータセットを用いた3次元画像分割実験により,提案手法が,高いトレーニング安定性とデータ効率で最先端の3dニューラルネットワークに優れていることを示した。コードはhttps://github.com/ZhaoWenzhao/WVMS.comで入手できる。 Filter-decomposition-based group equivariant convolutional neural networks show promising stability and data efficiency for 3D image feature extraction. However, the existing filter-decomposition-based 3D group equivariant neural networks rely on parameter-sharing designs and are mostly limited to rotation transform groups, where the chosen spherical harmonic filter bases consider only angular orthogonality. These limitations hamper its application to deep neural network architectures for medical image segmentation. To address these issues, this paper describes a non-parameter-sharing affine group equivariant neural network for 3D medical image segmentation based on an adaptive aggregation of Monte Carlo augmented spherical Fourier Bessel filter bases. The efficiency and flexibility of the adopted non-parameter strategy enable for the first time an efficient implementation of 3D affine group equivariant convolutional neural networks for volumetric data. The introduced spherical Bessel Fourier filter basis combines both angular and radial orthogonality for better feature extraction. The 3D image segmentation experiments on two abdominal image sets, BTCV and the NIH Pancreas datasets, show that the proposed methods excel the state-of-the-art 3D neural networks with high training stability and data efficiency. The code will be available at https://github.com/ZhaoWenzhao/WVMS.	翻訳日:2024-02-28 19:30:47 公開日:2024-02-27
# 最適化可能なグラフとしての言語エージェント Language Agents as Optimizable Graphs ( http://arxiv.org/abs/2402.16823v2 ) ライセンス: Link先を確認	Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin and J\"urgen Schmidhuber	(参考訳) 大規模言語モデル(llm)に基づく問題解決者を改善するために,人間設計のプロンプトエンジニアリング手法が提案されている。 LLMをベースとしたエージェントを計算グラフとして記述することで,これらのアプローチを統一する。ノードはマルチモーダルデータやLLMのクエリを処理する関数を実装し、エッジは操作間の情報フローを記述する。グラフは、(エッジが異なるエージェントの操作を接続する)エージェント間コラボレーションの階層を表す大きな複合グラフに再帰的に結合することができる。提案する新しい自動グラフオプティマイザ(1)ノードレベルのLCMプロンプト(ノード最適化)を改良し,(2)グラフ接続性(エッジ最適化)を変化させてエージェントオーケストレーションを改善する。実験により、我々のフレームワークは様々なLLMエージェントを効率的に開発、統合、自動改善するために利用できることが示された。コードはhttps://github.com/metauto-ai/gptswarmで見ることができる。 Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. The code can be found at https://github.com/metauto-ai/gptswarm.	翻訳日:2024-02-28 19:29:56 公開日:2024-02-27
# 基礎モデルの低ランク適応器の非対称性 Asymmetry in Low-Rank Adapters of Foundation Models ( http://arxiv.org/abs/2402.16842v2 ) ライセンス: Link先を確認	Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz S\'aez de Oc\'ariz Borde, Rickard Br\"uel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon	(参考訳) パラメータ効率の良い微調整は、パラメータのサブセットを更新することで、大規模で事前訓練された基礎モデルを最適化する。微調整におけるLoRA行列の役割の相違から着想を得て,低ランクアダプタ行列の重要性において予期せぬ非対称性を特徴付ける。具体的には、製品$ba$を追加してニューラルネットワークのパラメータ行列を更新するとき、$b$ と $a$ の行列が異なる関数を持つことを観察します。この観察に基づいて、細調整の$B$は、細調整の$A$よりも本質的に有効であり、ランダムな未トレーニングの$A$は、細調整の$A$よりもほぼ同等に実行されるべきであることを示す。また,情報理論レンズを用いて低ランクアダプタの一般化を行ない,B$の専用トレーニングのパラメータセーブがバウンドを改善することを示した。我々はRoBERTa, BART-Large, LLaMA-2, ViTsの実験で結論を支持した。 Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound the generalization of low-rank adapters, showing that the parameter savings of exclusively training $B$ improves the bound. We support our conclusions with experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.	翻訳日:2024-02-28 19:19:41 公開日:2024-02-27
# T-HITLは画像生成における問題関連に効果的に対応し、全体的な視覚的品質を維持する T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality ( http://arxiv.org/abs/2402.17101v1 ) ライセンス: Link先を確認	Susan Epstein, Li Chen, Alessandro Vecchiato, Ankit Jain	(参考訳) 生成的AI画像モデルは、必然的に人々の問題表現を生成する。過去の研究では、何百万人ものユーザーがこれらのモデルに毎日関与しており、問題のある人々の表現を含むモデルが現実世界の差別やその他の害を複雑化し、加速する可能性があると指摘している(Bianchi et al, 2023)。本稿では,社会データに埋め込まれた否定的なナラティブを反映し,強化する意味概念と,人口統計群間の問題のある関連について考察する。社会学文献(Blumer, 1958)とモデル行動へのマッピング表現に基づいて,画像生成モデルにおける問題関連性を研究する分類学を開発した。これらの関連に対処する方法として,モデルレベルでの微調整の有効性を検討し,従来の微調整の限界として視覚品質の低下の可能性を明らかにする。また、問題のある関連の低減と視覚的品質の維持を両立させるT-HITLによる新しい手法を提案する。モデルレベルでのT-HITLによる3つの問題関連性を示すことによって,T-HITLの有効性を示す。私たちの奨学金への貢献は2倍です。機械学習モデルと生成AIの文脈で問題のある関連を定義することで、これらの関連に対処するための概念的および技術的分類を導入します。最後に、これらの関連に対処し、画像モデル生成の視覚的品質を同時に維持するT-HITLを提案する。この緩和はトレードオフである必要はなく、むしろ強化である。 Generative AI image models may inadvertently generate problematic representations of people. Past research has noted that millions of users engage daily across the world with these models and that the models, including through problematic representations of people, have the potential to compound and accelerate real-world discrimination and other harms (Bianchi et al, 2023). In this paper, we focus on addressing the generation of problematic associations between demographic groups and semantic concepts that may reflect and reinforce negative narratives embedded in social data. Building on sociological literature (Blumer, 1958) and mapping representations to model behaviors, we have developed a taxonomy to study problematic associations in image generation models. We explore the effectiveness of fine tuning at the model level as a method to address these associations, identifying a potential reduction in visual quality as a limitation of traditional fine tuning. We also propose a new methodology with twice-human-in-the-loop (T-HITL) that promises improvements in both reducing problematic associations and also maintaining visual quality. We demonstrate the effectiveness of T-HITL by providing evidence of three problematic associations addressed by T-HITL at the model level. Our contributions to scholarship are two-fold. By defining problematic associations in the context of machine learning models and generative AI, we introduce a conceptual and technical taxonomy for addressing some of these associations. Finally, we provide a method, T-HITL, that addresses these associations and simultaneously maintains visual quality of image model generations. This mitigation need not be a tradeoff, but rather an enhancement.	翻訳日:2024-02-28 18:25:26 公開日:2024-02-27
# 熱赤外物体追跡のためのベイズフィルタの防御と復活 In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking ( http://arxiv.org/abs/2402.17098v1 ) ライセンス: Link先を確認	Peng Gao, Shi-Min Li, Feng Gao, Fei Wang, Ru-Yue Yuan, Hamido Fujita	(参考訳) 深層学習に基づく手法は、熱赤外(TIR)物体追跡の分野における最新の研究を独占する。しかし、より優れた追跡結果を得るためには、ディープラーニングモデルのみに頼ってターゲットオブジェクトを表現するのに有用な特徴情報を慎重に選択し、合理的なテンプレート更新戦略を設計する必要がある。このように、最近のTIRトラッキング手法は複雑なシナリオにおいて多くの課題に直面している。本稿では,これらの困難な状況下でのTIR追跡を強化するために,新しいディープベイズフィルタ(DBF)手法を提案する。 DBFは、システムと観測モデルという二重モデル構造に特有である。システムモデルは、運動データを利用して2次元ブラウン運動に基づいて対象物体のポテンシャル位置を推定し、事前確率を生成する。その後、TIR画像を取得すると観測モデルが再生される。分類器として機能し、赤外線情報を用いて推定位置の可能性を判定し、可能性確率を生成する。 2つのモデルのガイダンスに従って、対象オブジェクトの位置を決定でき、テンプレートを動的に更新することができる。いくつかのベンチマークデータセットの実験的解析により、DBFは複雑なシナリオにおいて既存のほとんどのTIRトラッキングメソッドを超越して、競争性能を達成することが明らかになった。 Deep learning-based methods monopolize the latest research in the field of thermal infrared (TIR) object tracking. However, relying solely on deep learning models to obtain better tracking results requires carefully selecting feature information that is beneficial to representing the target object and designing a reasonable template update strategy, which undoubtedly increases the difficulty of model design. Thus, recent TIR tracking methods face many challenges in complex scenarios. This paper introduces a novel Deep Bayesian Filtering (DBF) method to enhance TIR tracking in these challenging situations. DBF is distinctive in its dual-model structure: the system and observation models. The system model leverages motion data to estimate the potential positions of the target object based on two-dimensional Brownian motion, thus generating a prior probability. Following this, the observation model comes into play upon capturing the TIR image. It serves as a classifier and employs infrared information to ascertain the likelihood of these estimated positions, creating a likelihood probability. According to the guidance of the two models, the position of the target object can be determined, and the template can be dynamically updated. Experimental analysis across several benchmark datasets reveals that DBF achieves competitive performance, surpassing most existing TIR tracking methods in complex scenarios.	翻訳日:2024-02-28 18:24:59 公開日:2024-02-27
# 再表現: LLM 応答における実誤差を低減した説明後の修正 Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses ( http://arxiv.org/abs/2402.17097v1 ) ライセンス: Link先を確認	Juyeon Kim, Jeongeun Lee, Yoonho Chang, Chanyeol Choi, Junseong Kim, Jy-yong Sohn	(参考訳) 幻覚の問題を緩和することは、現実のシナリオでそれらを確実に活用するために、克服すべきLCMの主要な課題の1つです。近年,llm生成テキストの事実誤りをチェックし,それに従って修正し,幻覚問題を低減させる手法が提案されている。本稿では,LLM生成テキストの修正手法であるRe-Exを提案する。第1に、外部ツールを使用して、応答中の事実エラーの証拠を取得すること、第2に、llmは、第1のステップで収集された証拠に基づいて、応答の問題部分を説明するように指示すること、最後に、第2のステップで得られた説明を使って、応答を改訂することである。説明ステップに加えて,反応修正プロセスに必要なトークン量と壁面時間を削減するための新しいプロンプト手法を提案する。 Factool、CoVE、RARRといった既存のメソッドと比較して、Re-Exは複数のベンチマークでより少ない時間と少ないトークンで改善されたリビジョンパフォーマンスを提供する。 Mitigating hallucination issues is one of the main challenges of LLMs we need to overcome, in order to reliably use them in real-world scenarios. Recently, various methods are proposed to check the factual errors in the LLM-generated texts and revise them accordingly, to reduce the hallucination issue. In this paper, we propose Re-Ex, a method of revising LLM-generated texts, which introduces a novel step dubbed as the factual error explanation step. Re-Ex revises the initial response of LLMs using 3-steps: first, external tools are used to get the evidences on the factual errors in the response; second, LLMs are instructed to explain the problematic parts of the response based on the evidences gathered in the first step; finally, LLMs revise the response using the explanation obtained in the second step. In addition to the explanation step, we propose new prompting techniques to reduce the amount of tokens and wall-clock time required for the response revision process. Compared with existing methods including Factool, CoVE, and RARR, Re-Ex provides better revision performance with less time and fewer tokens in multiple benchmarks.	翻訳日:2024-02-28 18:24:40 公開日:2024-02-27
# 最適空洞界面を有する繊維集積ファンデルワールス量子センサ Fibre-integrated van der Waals quantum sensor with an optimal cavity interface ( http://arxiv.org/abs/2402.17095v1 ) ライセンス: Link先を確認	Jong Sung Moon, Benjamin Whitefield, Lesley Spencer, Mehran Kianinia, Madeline Hennessey, Milos Toth, Woong Bae Jeon, Je-Hyung Kim and Igor Aharonovich	(参考訳) 量子材料とファイバー光学を統合することで、様々な応用に高度な機能が追加され、複数の物理パラメータを検出可能なリモートセンサーなどのファイバーベースの量子デバイスが導入される。しかし、量子材料とファイバーの最適な統合を達成することは、特に適切な寸法の量子素子の作成の困難と、商業的な光ファイバーへの効率的なフォトニックインタフェースのために困難である。ここでは、ファイバー積分したファンデルワールス量子センサの新しいモダリティを示す。我々は,ヘキサゴナル窒化ホウ素 (hbn) からのホール型円形ブラッググレーティングキャビティの設計と製作を行い,キャビティ内の光学活性スピン欠陥を発生させ,決定論的パターン移動技術を用いてキャビティと光ファイバーを統合する。ファイバー集積化hBN空洞は、hBNのスピン欠陥からの光信号の効率的な励起と収集を可能にし、全ファイバー集積量子センサーを可能にする。さらに,強磁性体と任意の磁場のリモートセンシングを示す。全体として、ハイブリッドファイバーベースの量子センシングプラットフォームは、新しい世代の堅牢でリモートで多機能な量子センサーへの道を開くかもしれない。 Integrating quantum materials with fibre optics adds advanced functionalities to a variety of applications, and introduces fibre-based quantum devices such as remote sensors capable of probing multiple physical parameters. However, achieving optimal integration between quantum materials and fibres is challenging, particularly due to difficulties in fabrication of quantum elements with suitable dimensions and an efficient photonic interface to a commercial optical fibre. Here we demonstrate a new modality for a fibre-integrated van der Waals quantum sensor. We design and fabricate a hole-based circular Bragg grating cavity from hexagonal boron nitride (hBN), engineer optically active spin defects within the cavity, and integrate the cavity with an optical fibre using a deterministic pattern transfer technique. The fibre-integrated hBN cavity enables efficient excitation and collection of optical signals from spin defects in hBN, thereby enabling all-fibre integrated quantum sensors. Moreover, we demonstrate remote sensing of a ferromagnetic material and of arbitrary magnetic fields. All in all, the hybrid fibre-based quantum sensing platform may pave the way to a new generation of robust, remote, multi-functional quantum sensors.	翻訳日:2024-02-28 18:24:21 公開日:2024-02-27
# 多クラス異常検出と局所化のための構造的教師・学生正規性学習 Structural Teacher-Student Normality Learning for Multi-Class Anomaly Detection and Localization ( http://arxiv.org/abs/2402.17091v1 ) ライセンス: Link先を確認	Hanqiu Deng and Xingyu Li	(参考訳) 視覚異常検出は、通常のデータをモデリングしながら未知の異常パターンを特定することを目的とした、困難なオープンセットタスクである。知識蒸留パラダイムは,教師と学生のネットワーク特徴比較を利用して,一級異常検出において顕著な性能を示した。しかし、このパラダイムをマルチクラス異常検出に拡張することは、新しいスケーラビリティの課題をもたらす。本研究では,従来の教師学生モデルにおいて,複数クラス間干渉による異常検出に適用した場合の有意な性能劣化について検討する。そこで,本稿では,snl(structureal teacher-sudent normality learning)と呼ばれる新しい手法を提案する。(1)空間的チャネル蒸留法とinter-inter-affinity蒸留法を提案し,教師と学生ネットワーク間の構造的距離を計測する。 2) 学生ネットワークの正規表現空間をカプセル化する中央残余集約モジュール(CRAM)を導入する。提案手法をMVTecADとVisAの2つの異常検出データセットで評価した。本手法は, MVTecADが3.9%, MVTecADが1.5%, VisAが2.5%, マルチクラス異常検出とローカライゼーションタスクが2.5%であった。さらに,本アルゴリズムは,MVTecADとVisAの両方において,現在の最先端統一モデルよりも優れている。 Visual anomaly detection is a challenging open-set task aimed at identifying unknown anomalous patterns while modeling normal data. The knowledge distillation paradigm has shown remarkable performance in one-class anomaly detection by leveraging teacher-student network feature comparisons. However, extending this paradigm to multi-class anomaly detection introduces novel scalability challenges. In this study, we address the significant performance degradation observed in previous teacher-student models when applied to multi-class anomaly detection, which we identify as resulting from cross-class interference. To tackle this issue, we introduce a novel approach known as Structural Teacher-Student Normality Learning (SNL): (1) We propose spatial-channel distillation and intra-&inter-affinity distillation techniques to measure structural distance between the teacher and student networks. (2) We introduce a central residual aggregation module (CRAM) to encapsulate the normal representation space of the student network. We evaluate our proposed approach on two anomaly detection datasets, MVTecAD and VisA. Our method surpasses the state-of-the-art distillation-based algorithms by a significant margin of 3.9% and 1.5% on MVTecAD and 1.2% and 2.5% on VisA in the multi-class anomaly detection and localization tasks, respectively. Furthermore, our algorithm outperforms the current state-of-the-art unified models on both MVTecAD and VisA.	翻訳日:2024-02-28 18:24:01 公開日:2024-02-27
# SAM-DiffSR:超解像のための構造変調拡散モデル SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution ( http://arxiv.org/abs/2402.17133v1 ) ライセンス: Link先を確認	Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang	(参考訳) 拡散に基づく超解像(SR)モデルは、その強力な復元能力のために近年大きな注目を集めている。しかし、従来の拡散モデルは単一の分布からノイズサンプリングを行い、現実世界のシーンや複雑なテクスチャを扱う能力を制限する。セグメンテーション・アズ・モデル(SAM)の成功により、十分にきめ細かい領域マスクが生成され、拡散ベースSRモデルの詳細な回復が促進される。しかし、SAMを直接SRモデルに統合すると、計算コストが大幅に高くなる。本稿では,SAM-DiffSRモデルを提案する。このモデルでは,ノイズをサンプリングする過程でSAMから微細な構造情報を利用でき,推論時に計算コストを増大させることなく画質を向上させることができる。トレーニングの過程で、samからセグメンテーションマスクに構造的な位置情報をエンコードする。次に、符号化マスクをサンプルノイズに変調して前方拡散プロセスに統合する。この調整により、各セグメンテーション領域内でノイズ平均を独立に適応させることができる。この変調雑音を推定するために拡散モデルを訓練する。重要なことは、提案するフレームワークは逆拡散過程を変えず、SAMを推論時に必要としない。実験結果は,div2kデータセット上でのpsnrの観点で,提案手法の有効性を示し,アーティファクトの抑制に優れた性能を示し,既存の拡散ベース手法を最大0.74db上回った。コードとデータセットはhttps://github.com/lose4578/SAM-DiffSRで公開されている。 Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR.	翻訳日:2024-02-28 18:14:13 公開日:2024-02-27
# 新しい損失機能を持つトランスフォーマーとrnnを用いた哺乳類タンパク質のo-glcnacylation部位の予測 Predicting O-GlcNAcylation Sites in Mammalian Proteins with Transformers and RNNs Trained with a New Loss Function ( http://arxiv.org/abs/2402.17131v1 ) ライセンス: Link先を確認	Pedro Seber	(参考訳) タンパク質修飾であるグリコシル化は、複数の必須機能および構造的役割を持つ。グリコシル化のサブタイプであるo-glcnacylationは、治療の重要な標的となる可能性があるが、o-glcnacylationサイトを確実に予測する手法は2023年まで存在しなかった。さらに、多くはもはや使用できない。 2023年、f$_1$スコアのかなり優れたrnnモデルが36.17%、大規模なデータセット上のmccが34.57%出版された。この記事はまず、トランスフォーマーエンコーダを使ってこれらのメトリクスを改善しようとした。トランスフォーマーはこのデータセットで高いパフォーマンスを示したが、その性能は以前公開されたRNNよりも劣っていた。そこで我々は、重み付き焦点微分可能MCCと呼ばれる新しい損失関数を作成し、分類モデルの性能を向上させる。この新しい関数でトレーニングされたrnnモデルは、重み付きクロスエントロピー損失を使用してトレーニングされたモデルよりも優れたパフォーマンスを示す。この損失でトレーニングされた2セルRNNは、O-GlcNAcylationサイトの予測において、F$_1$スコア38.82%、MCC38.21%の最先端のパフォーマンスを達成する。 Glycosylation, a protein modification, has multiple essential functional and structural roles. O-GlcNAcylation, a subtype of glycosylation, has the potential to be an important target for therapeutics, but methods to reliably predict O-GlcNAcylation sites had not been available until 2023; a 2021 review correctly noted that published models were insufficient and failed to generalize. Moreover, many are no longer usable. In 2023, a considerably better RNN model with an F$_1$ score of 36.17% and an MCC of 34.57% on a large dataset was published. This article first sought to improve these metrics using transformer encoders. While transformers displayed high performance on this dataset, their performance was inferior to that of the previously published RNN. We then created a new loss function, which we call the weighted focal differentiable MCC, to improve the performance of classification models. RNN models trained with this new function display superior performance to models trained using the weighted cross-entropy loss; this new function can also be used to fine-tune trained models. A two-cell RNN trained with this loss achieves state-of-the-art performance in O-GlcNAcylation site prediction with an F$_1$ score of 38.82% and an MCC of 38.21% on that large dataset.	翻訳日:2024-02-28 18:13:52 公開日:2024-02-27
# OSCaR:オブジェクト状態のキャプションと状態変化の表現 OSCaR: Object State Captioning and State Change Representation ( http://arxiv.org/abs/2402.17128v1 ) ライセンス: Link先を確認	Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu	(参考訳) 物体の状態の変化を外挿し、理解するインテリジェントなモデルの能力は、AI研究の重要な側面であり、特に現実世界における人間のインタラクションのレンズを通してである。このタスクは複雑な視覚環境を記述し、アクティブなオブジェクトを識別し、言語を通して伝達される変化を解釈する。オブジェクトキャプションと状態変化検出を分離する従来の方法は、動的環境の限られたビューを提供する。さらに、変化を表すために小さな象徴的な単語セットに依存することは、言語の表現力を制限する。本稿では,これらの課題に対処するため,OSCaR(Object State Captioning and State Change Representation)データセットとベンチマークを紹介する。 OSCaRは14,084の注釈付きビデオセグメントで構成され、様々なエゴセントリックなビデオコレクションから1,000近いユニークなオブジェクトが集められている。マルチモーダル大言語モデル(MLLM)を評価するための新しいテストベッドを設定する。我々の実験では、MLLMはある程度のスキルを持っているが、オブジェクトの状態の変化を完全に理解していない。ベンチマークには、初期機能にもかかわらず、これらの変更を効果的に理解するために、精度と一般化能力を著しく改善する必要がある微調整モデルが含まれている。私たちのコードとデータセットはhttps://github.com/nguyennm1024/OSCaR.orgで公開されています。 The capability of intelligent models to extrapolate and comprehend changes in object states is a crucial yet demanding aspect of AI research, particularly through the lens of human interaction in real-world settings. This task involves describing complex visual environments, identifying active objects, and interpreting their changes as conveyed through language. Traditional methods, which isolate object captioning and state change detection, offer a limited view of dynamic environments. Moreover, relying on a small set of symbolic words to represent changes has restricted the expressiveness of language. To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark. OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections. It sets a new testbed for evaluating multimodal large language models (MLLMs). Our experiments demonstrate that while MLLMs show some skill, they lack a full understanding of object state changes. The benchmark includes a fine-tuned model that, despite initial capabilities, requires significant improvements in accuracy and generalization ability for effective understanding of these changes. Our code and dataset are available at https://github.com/nguyennm1024/OSCaR.	翻訳日:2024-02-28 18:13:25 公開日:2024-02-27
# ファクト・アンド・リフレクション(far)による大規模言語モデルの信頼度校正 Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models ( http://arxiv.org/abs/2402.17124v1 ) ライセンス: Link先を確認	Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen	(参考訳) LLMが信頼できるためには、その信頼性レベルが実際のパフォーマンスと良好に調整されるべきである。 LLMの性能がプロンプトに大きく影響していることは、現在では一般的な感覚であるが、LLMのプロンプトにおける信頼性校正は、まだ徹底的に検討されていない。本稿では, LLMの信頼性校正に異なるプロンプト戦略がどう影響するか, 改善の方法について検討する。質問応答の文脈で6つのプロンプト手法について広範な実験を行い、これらの手法がLLMキャリブレーションの改善に役立ちつつも、いくつかの事例に応答するとLSMが過信されてしまうことを観察した。人間の認知にインスパイアされたファクト・アンド・リフレクション(FaR)プロンプトを提案し,LLMキャリブレーションを2ステップで改善する。第一に、FaR は LLM からの入力プロンプトに関連する既知の「ファクト」を付与する。そしてモデルに対して,最終的な回答を生成するために,それを“反映”するように要求する。 FaRのプロンプトによりキャリブレーションが大幅に向上し、多目的QAタスクにおいて期待されるキャリブレーションエラーを23.5%削減する。特に、FaRのプロンプトは、信頼性の低いシナリオで懸念を言葉で表現する能力さえも引き起こすため、これらの難しいインスタンスを解決するために検索強化をトリガーするのに役立つ。 For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We conduct extensive experiments on six prompting methods in the question-answering context and we observe that, while these methods help improve the expected LLM calibration, they also trigger LLMs to be over-confident when responding to some instances. Inspired by human cognition, we propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. First, FaR elicits the known "facts" that are relevant to the input prompt from the LLM. And then it asks the model to "reflect" over them to generate the final answer. Experiments show that FaR prompting achieves significantly better calibration; it lowers the Expected Calibration Error by 23.5% on our multi-purpose QA tasks. Notably, FaR prompting even elicits the capability of verbally expressing concerns in less confident scenarios, which helps trigger retrieval augmentation for solving these harder instances.	翻訳日:2024-02-28 18:13:05 公開日:2024-02-27
# 理論非依存リアリズム Theory-Independent Realism ( http://arxiv.org/abs/2402.17123v1 ) ライセンス: Link先を確認	D. M. Fucci, R. M. Angelo	(参考訳) 他の物理理論と区別できる量子力学の特徴は、現実主義の概念に挑戦するものである。純粋に哲学的な観点からリアリズムを復元する手法は過去に提案されたが、量子力学の文脈だけのために提案された。一般化確率論の枠組みを用いて、理論非依存な文脈に対する実在論の概念を拡張し、測定結果に割り当てられた確率に基づいて一意に基準を与える。より、ロバスト性とkullback-leiblerの発散を用いて、任意の物理的性質の実数論に対して、一般物理理論の特定の状態を与える量化子を提案する。これらの理論非依存量子化子は量子力学に応用され、他の確立された非現実主義測度との関係を調べる。 The distinctive features of quantum mechanics, which set it apart from other physical theories, challenge our notions of realism. Recovering realism from purely philosophical grounds, a quantitative and operational criterion was proposed in the past, but solely for the context of quantum mechanics. We use a framework of generalized probabilistic theories to expand the notion of realism for a theory-independent context, providing a criterion uniquely based on the probabilities assigned to measurement outcomes. More so, using robustness and the Kullback-Leibler divergence, we propose quantifiers for the realism of arbitrary physical properties given a particular state of a generic physical theory. These theory-independent quantifiers are then employed in quantum mechanics and we investigate their relation with another well-established irrealism measure.	翻訳日:2024-02-28 18:12:39 公開日:2024-02-27
# LCEN:非線形・解釈可能な機械学習モデルのための新しい特徴選択アルゴリズム LCEN: A Novel Feature Selection Algorithm for Nonlinear, Interpretable Machine Learning Models ( http://arxiv.org/abs/2402.17120v1 ) ライセンス: Link先を確認	Pedro Seber and Richard D. Braatz	(参考訳) 解釈可能なアーキテクチャはブラックボックスアーキテクチャよりも利点があり、航空機や医療といった重要な環境における機械学習の適用には、解釈可能性が不可欠である。しかし、最も単純な最も一般的な解釈可能なアーキテクチャ(LASSOやENなど)は線形予測に限られており、特徴選択能力に乏しい。本研究では,非線形で解釈可能な機械学習モデルを作成するためのLASSO-Clip-ENアルゴリズムを提案する。 LCENはさまざまな人工および経験的なデータセットでテストされており、他の一般的なアーキテクチャよりも正確でスペーサーなモデルを生成する。これらの実験により、LCENは、ノイズ、多重線形性、データ不足、ハイパーパラメータ分散など、データセットやモデリングに典型的な多くの問題に対して堅牢であることが判明した。 LCENはまた、経験的なデータから複数の物理法則を再発見することができ、また、既知の物理法則を持たないプロセスでは、LCENは他の多くの高密度でスパースなメソッドよりも優れた結果が得られる。 Interpretable architectures can have advantages over black-box architectures, and interpretability is essential for the application of machine learning in critical settings, such as aviation or medicine. However, the simplest, most commonly used interpretable architectures (such as LASSO or EN) are limited to linear predictions and have poor feature selection capabilities. In this work, we introduce the LASSO-Clip-EN (LCEN) algorithm for the creation of nonlinear, interpretable machine learning models. LCEN is tested on a wide variety of artificial and empirical datasets, creating more accurate, sparser models than other commonly used architectures. These experiments reveal that LCEN is robust against many issues typically present in datasets and modeling, including noise, multicollinearity, data scarcity, and hyperparameter variance. LCEN is also able to rediscover multiple physical laws from empirical data and, for processes with no known physical laws, LCEN achieves better results than many other dense and sparse methods -- including using 10.8 times fewer features than dense methods and 8.1 times fewer features than EN on one dataset, and is comparable to an ANN on another dataset.	翻訳日:2024-02-28 18:12:25 公開日:2024-02-27
# suspenseful storiesの作成 - 大規模言語モデルによる反復計画 Creating Suspenseful Stories: Iterative Planning with Large Language Models ( http://arxiv.org/abs/2402.17119v1 ) ライセンス: Link先を確認	Kaige Xie, Mark Riedl	(参考訳) 自動ストーリ生成は,NLPの長年にわたる課題のひとつだ。ストーリーのあらゆる次元の中で、サスペンスは人間書きのストーリーでは一般的であるが、AI生成のストーリーでは比較的過小評価されている。近年の大規模言語モデル (LLM) の進歩は言語生成を飛躍的に推進してきたが、現状のLLMは物語生成の難しさに関してまだ信頼できない。認知心理学とナラトロジーのストーリーサスペンスの2つの理論的基礎を基礎とした,新しい反復型計画手法を提案する。この理論基底法は完全にゼロショット方式で機能し、教師付きストーリーコーパスに依存しない。我々の知る限りでは,本論文は LLM を用いたサスペンスなストーリー生成の試みとしては初めてである。生成したサスペンスストーリーを広範囲に評価した結果,本手法の有効性が示された。 Automated story generation has been one of the long-standing challenges in NLP. Among all dimensions of stories, suspense is very common in human-written stories but relatively under-explored in AI-generated stories. While recent advances in large language models (LLMs) have greatly promoted language generation in general, state-of-the-art LLMs are still unreliable when it comes to suspenseful story generation. We propose a novel iterative-prompting-based planning method that is grounded in two theoretical foundations of story suspense from cognitive psychology and narratology. This theory-grounded method works in a fully zero-shot manner and does not rely on any supervised story corpora. To the best of our knowledge, this paper is the first attempt at suspenseful story generation with LLMs. Extensive human evaluations of the generated suspenseful stories demonstrate the effectiveness of our method.	翻訳日:2024-02-28 18:12:08 公開日:2024-02-27
# 圧縮状態の重畳に基づく隠蔽単一光子源 A heralded single-photon source based on superpositions of squeezed states ( http://arxiv.org/abs/2402.17118v1 ) ライセンス: Link先を確認	Hiroo Azuma, William J. Munro, Kae Nemoto	(参考訳) ビームスプリッタに逆圧縮状態の重ね合わせを注入することに基づくヘラルド単光子源を提案する。スクイーズ状態の重ね合わせは、偶数個の光子数状態(光子の数は2,6,10,...$)のみで構成されており、これは、シングル光子がヘラルド・シン・フォトンで与えられる確率が、通常の2モードスクイーズ状態より高いことを意味する。これにより、強化された1光子源を実現することができる。本稿では, シングルモード圧縮状態とクロスカー非線形性を用いた圧縮状態の重ね合わせについて論じる。提案手法は, 自発パラメトリックダウン変換と比較して, 単光子を放出する確率を著しく向上させる。 We propose a heralded single-photon source based on injecting a superposition of oppositely squeezed states onto a beam splitter. Our superposition of squeezed states is composed of only even photon number states (the number of photons is equal to $2,6,10,...$) meaning the probability for an emitted single photon given a heralded singe photon events is higher than what one can achieve from the usual two-mode squeezed state. This enables one to realize an enhanced heralded single-photon source. We discuss how to create this superposition of squeezed states utilizing single-mode squeezed states and the cross-Kerr nonlinearity. Our proposed method significantly improves the probability of emitting the heralded single photon compared to spontaneous parametric down conversion.	翻訳日:2024-02-28 18:11:54 公開日:2024-02-27
# CharNeRF:コンセプトアートによる3Dキャラクタ生成 CharNeRF: 3D Character Generation from Concept Art ( http://arxiv.org/abs/2402.17115v1 ) ライセンス: Link先を確認	Eddy Chu, Yiyang Chen, Chedy Raissi, Anand Bhojan	(参考訳) 3dモデリングは、ar/vrとゲームの分野で重要な意味を持ち、芸術的創造性と実用的両方の応用を可能にする。しかし、プロセスはしばしば時間がかかり、高いレベルのスキルを必要とします。本稿では,3dモデリング業界における標準的なインプットとして,一貫したターンアラウンドコンセプトアートから3d文字のボリューム表現を作成するための新しい手法を提案する。 NeRF(Neural Radiance Field)は、画像に基づく3D再構成におけるゲームチェンジャーであり、私たちの知る限りでは、コンセプトアートのためのパイプラインを最適化する既知の研究はない。概念芸術の可能性を生かし,そのボディポーズと特定のビューアングルを具体化するために,モデルを先行として符号化することを提案する。学習可能なビュー指向のマルチヘッド自己アテンション層を通じて,これらの3Dポイントの事前情報を利用するようにネットワークをトレーニングする。さらに, レイサンプリングと表面サンプリングの組み合わせにより, ネットワークの推論能力が向上することを示す。私たちのモデルは高品質の360度キャラクタを生成できる。その後、モデルを利用して3Dメッシュを抽出するための簡単なガイドラインを提供する。モデルの推論能力はトレーニングデータの特徴に影響を受けており、主に1つの頭、2本の腕、2本の脚を持つキャラクターに焦点を当てている点に注意する必要がある。しかしながら、我々の方法論は、データに特定の仮定を課すことなく、さまざまな主題から概念芸術に適用可能である。 3D modeling holds significant importance in the realms of AR/VR and gaming, allowing for both artistic creativity and practical applications. However, the process is often time-consuming and demands a high level of skill. In this paper, we present a novel approach to create volumetric representations of 3D characters from consistent turnaround concept art, which serves as the standard input in the 3D modeling industry. While Neural Radiance Field (NeRF) has been a game-changer in image-based 3D reconstruction, to the best of our knowledge, there is no known research that optimizes the pipeline for concept art. To harness the potential of concept art, with its defined body poses and specific view angles, we propose encoding it as priors for our model. We train the network to make use of these priors for various 3D points through a learnable view-direction-attended multi-head self-attention layer. Additionally, we demonstrate that a combination of ray sampling and surface sampling enhances the inference capabilities of our network. Our model is able to generate high-quality 360-degree views of characters. Subsequently, we provide a simple guideline to better leverage our model to extract the 3D mesh. It is important to note that our model's inferencing capabilities are influenced by the training data's characteristics, primarily focusing on characters with a single head, two arms, and two legs. Nevertheless, our methodology remains versatile and adaptable to concept art from diverse subject matters, without imposing any specific assumptions on the data.	翻訳日:2024-02-28 18:11:36 公開日:2024-02-27
# 量子光学における中心電荷 Central Charge in Quantum Optics ( http://arxiv.org/abs/2402.17114v1 ) ライセンス: Link先を確認	Daniel Burgarth, Paolo Facchi, Hiromichi Nakazato, Saverio Pascazio, Kazuya Yuasa	(参考訳) 2つのユニタリの積は通常、有名なベーカー=カンベル=ハウスドルフの公式を通して指数関数として表現できる。ここでは、量子光学における反例として、新しい元(代数の中心拡大)の導入を犠牲にして、単一の指数関数の表現が可能であることを示すことによって、時間非依存の二次ハミルトン多様体では生成できないゲート列によって生成されるユニタリが存在することを示唆する。この現象を光る量子光学実験が提案されている。 The product of two unitaries can normally be expressed as a single exponential through the famous Baker-Campbell-Hausdorff formula. We present here a counterexample in quantum optics, by showing that an expression in terms of a single exponential is possible only at the expense of the introduction of a new element (a central extension of the algebra), implying that there will be unitaries, generated by a sequence of gates, that cannot be generated by any time-independent quadratic Hamiltonian. A quantum-optical experiment is proposed that brings to light this phenomenon.	翻訳日:2024-02-28 18:11:13 公開日:2024-02-27
# Latent Transparency を用いた透過層拡散 Transparent Image Layer Diffusion using Latent Transparency ( http://arxiv.org/abs/2402.17113v1 ) ライセンス: Link先を確認	Lvmin Zhang, Maneesh Agrawala	(参考訳) 本研究では,大規模事前学習された潜在拡散モデルを用いて透明画像を生成する手法である layerdiffusion を提案する。単一の透明な画像や複数の透明な層を生成することができる。この方法は、事前訓練された潜在拡散モデルの潜在多様体にアルファチャネルの透明性を符号化する「相対透過性」を学習する。事前訓練されたモデルの本来の潜伏分布に最小限の変更を加えて、付加された透明性を潜伏オフセットとして調節することにより、大規模な拡散モデルの生産可能な品質を維持する。このようにして、任意の潜在拡散モデルは、調整された潜在空間で微調整することで透明な画像生成器に変換できる。我々は,1mの透明な画像層ペアを用いて,ループ内人間収集方式を用いてモデルを訓練する。異なるオープンソースイメージジェネレータに適用したり,様々な条件制御システムに適用して,フォアグラウンド/バックグラウンドコンディショニング層生成,ジョイント層生成,レイヤコンテンツの構造制御などを実現することができる。ユーザ調査によると、ほとんどのケース(97%)のユーザは、生成やマッチングといった従来のアドホックなソリューションよりも、ネイティブに生成された透明なコンテンツを好む。ユーザが生成した透明な画像の品質は、Adobe Stockのような本物の商用透明な資産に匹敵する。 We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.	翻訳日:2024-02-28 18:11:04 公開日:2024-02-27
# 知識蒸留のためのシンクホーン距離最小化 Sinkhorn Distance Minimization for Knowledge Distillation ( http://arxiv.org/abs/2402.17110v1 ) ライセンス: Link先を確認	Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou and Houqiang Li	(参考訳) 知識蒸留(kd)は大規模言語モデル(llm)の圧縮に広く採用されている。既存のKD法では、Kllback-Leibler (KL)、Reverse Kullback-Leibler (RKL)、Jensen-Shannon (JS)などがある。しかし, 前提や定義に固有の制約のため, 教師と生徒の間には分布の重複が少ないため, 効果的な監督が得られない。本稿では, 上記のKL, RKL, JSの相違が, 多様なNLPタスクにおいてロジットベースのKDを劣化させるモード緩和, モード崩壊, モード下降の問題をそれぞれ抱えていることを示す。教師と生徒の分布の違いを微妙かつ正確に評価するために, シンクホーン距離を利用したシンクホーン知識蒸留(sinkd)を提案する。加えて、シンクホーン計量の特性による利益は、各教師と生徒のサンプルペアにおける発散の知覚を制限するサンプルワイズkdを取り除くことができる。代わりに,高次元空間におけるサンプル間の分布の幾何学的複雑度を捉えるバッチ分解法を提案する。 GLUE と SuperGLUE の総合評価では,エンコーダのみ,エンコーダのみ,デコーダのみのアーキテクチャで,あらゆる種類の LLM 上での最先端手法よりも,コンパラビリティ,妥当性,一般化性の面で優位性が強調されている。 Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when few distribution overlap exists between the teacher and the student. In this paper, we show that the aforementioned KL, RKL, and JS divergences respectively suffer from issues of mode-averaging, mode-collapsing, and mode-underestimation, which deteriorates logits-based KD for diverse NLP tasks. We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions. Besides, profit by properties of the Sinkhorn metric, we can get rid of sample-wise KD that restricts the perception of divergence in each teacher-student sample pair. Instead, we propose a batch-wise reformulation to capture geometric intricacies of distributions across samples in the high-dimensional space. Comprehensive evaluation on GLUE and SuperGLUE, in terms of comparability, validity, and generalizability, highlights our superiority over state-of-the-art methods on all kinds of LLMs with encoder-only, encoder-decoder, and decoder-only architectures.	翻訳日:2024-02-28 18:10:45 公開日:2024-02-27
# 複数の非ミリ波エージェントによる繰り返し契約:ポリシーレグレットと限定責任 Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability ( http://arxiv.org/abs/2402.17108v1 ) ライセンス: Link先を確認	Natalie Collina, Varun Gupta, Aaron Roth	(参考訳) 本稿では,各ラウンドにおいて,主役が$k$エージェントの中から適応的に選択する契約条件について検討する。エージェントは非ミオニックであり、プリンシパルのメカニズムはエージェント間でT$ラウンドの広範なフォームゲームを誘導する。契約理論の未熟な側面を理解することを目的としたいくつかの結果を提示する -- 契約するエージェントを選択する際に引き起こされるゲーム。 First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. 最後に、もしプリンシパルが、スワップ・リグレットを保証しないモノトーン・バンディット・アルゴリズムを用いてエージェントを選択すると、プリンシパルは、リニア・コントラクトが制限された責任契約ではないにもかかわらず、(プリンシパルがプリンシパルを支払わなくてもよい)限定的な負債契約のみを提示できるのである。我々は、この定理を単調なスワップ・レグレット・バンディットアルゴリズムの存在を示すことによってインスタンス化する。 We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.	翻訳日:2024-02-28 18:10:13 公開日:2024-02-27
# Dataset Fairness: ユーティリティ保証によるデータに対する達成可能な公正性 Dataset Fairness: Achievable Fairness on Your Data With Utility Guarantees ( http://arxiv.org/abs/2402.17106v1 ) ライセンス: Link先を確認	Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu	(参考訳) 機械学習のフェアネスでは、異なるセンシティブなグループ間の格差を最小限に抑えるトレーニングモデルはしばしば精度を低下させる。このトレードオフの深刻度は、基本的にデータセットの不均衡やバイアスといったデータセット特性に依存します。したがって、データセットにまたがる均一な公平性要件の使用は依然として疑わしいままであり、実用性がかなり低いモデルに繋がることが多い。そこで本研究では,厳密な統計保証によって裏打ちされた個別データセットに適合する公平性・正確性トレードオフ曲線を近似する計算効率の高い手法を提案する。 You-Only-Train-Once(YOTO)フレームワークを利用することで、トレードオフ曲線を近似する際に複数のモデルを訓練する際の計算負担を軽減する。さらに,この曲線の周囲に信頼区間を導入することで近似の不確かさを定量化し,任意の精度閾値に対するフェアネス違反の許容範囲に関する統計的根拠を与える。表、画像、言語のデータセットにまたがる経験的評価は、我々のアプローチは、さまざまなデータモダリティにわたるデータセット固有の公平性決定のための原則付きフレームワークを実践者に提供することを示しています。 In machine learning fairness, training models which minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off fundamentally depends on dataset characteristics such as dataset imbalances or biases. Therefore using a uniform fairness requirement across datasets remains questionable and can often lead to models with substantially low utility. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Moreover, we quantify the uncertainty in our approximation by introducing confidence intervals around this curve, offering a statistically grounded perspective on the acceptable range of fairness violations for any given accuracy threshold. Our empirical evaluation spanning tabular, image and language datasets underscores that our approach provides practitioners with a principled framework for dataset-specific fairness decisions across various data modalities.	翻訳日:2024-02-28 18:09:37 公開日:2024-02-27
# 物理信号の逆摂動 Adversarial Perturbations of Physical Signals ( http://arxiv.org/abs/2402.17104v1 ) ライセンス: Link先を確認	Robert L. Bassett, Austin Van Dellen, Anthony P. Austin	(参考訳) そこで本研究では,コンピュータビジョンに基づく信号分類器の脆弱性を,信号と摂動が物理的制約を受ける入力の逆摂動に対して検討する。我々は、ソースと干渉者が検出器に伝播する信号を発するシナリオを考察し、事前に訓練されたニューラルネットワークを用いて受信した信号のスペクトログラムを分析し、ソースを分類しようとする。 PDE制約の最適化問題を解くことにより、受信信号のスペクトルに対する摂動がほとんど知覚できないにもかかわらず、検出器がソースを誤分類する干渉信号を構築する。このような問題は数百万の意思決定変数を持つことができるが、効率的な解法を導入する。実験により,様々な物理的条件下で,様々な機械学習モデルに対して有効かつ物理的に実現可能な逆摂動を計算できることが証明された。 We investigate the vulnerability of computer-vision-based signal classifiers to adversarial perturbations of their inputs, where the signals and perturbations are subject to physical constraints. We consider a scenario in which a source and interferer emit signals that propagate as waves to a detector, which attempts to classify the source by analyzing the spectrogram of the signal it receives using a pre-trained neural network. By solving PDE-constrained optimization problems, we construct interfering signals that cause the detector to misclassify the source even though the perturbations to the spectrogram of the received signal are nearly imperceptible. Though such problems can have millions of decision variables, we introduce methods to solve them efficiently. Our experiments demonstrate that one can compute effective and physically realizable adversarial perturbations for a variety of machine learning models under various physical conditions.	翻訳日:2024-02-28 18:09:15 公開日:2024-02-27
# データサイエンスエージェントのベンチマーク Benchmarking Data Science Agents ( http://arxiv.org/abs/2402.17168v1 ) ライセンス: Link先を確認	Yuge Zhang, Qiyang Jiang, Xingyu Han, Nan Chen, Yuqing Yang, Kan Ren	(参考訳) データ駆動意思決定の時代において、データ分析の複雑さはデータサイエンスの高度な専門知識とツールを必要とし、専門家にとっても大きな課題となる。大規模言語モデル(LLM)は、データサイエンスエージェントとして有望な支援として登場し、データ分析と処理において人間を支援している。しかし、実際の応用や複雑な分析プロセスに対する様々な要求によって、実用性は依然として制限されている。本稿では、新しい評価パラダイムであるdsevalと、データサイエンスライフサイクル全体を通してこれらのエージェントのパフォーマンスを評価するための一連の革新的なベンチマークを紹介する。新規なブートストラップアノテーション手法を導入し,データセット作成の合理化,評価カバレッジの向上,ベンチマークの総合性の向上を図る。私たちの発見は、一般的な障害を明らかにし、この分野の今後の進歩を知るための重要な洞察を与えます。 In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. In this paper, we introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents throughout the entire data science lifecycle. Incorporating a novel bootstrapped annotation method, we streamline dataset preparation, improve the evaluation coverage, and expand benchmarking comprehensiveness. Our findings uncover prevalent obstacles and provide critical insights to inform future advancements in the field.	翻訳日:2024-02-28 18:04:29 公開日:2024-02-27
# モルフォロジー非依存的細胞インスタンスセグメンテーションに対するマイトショット適応 Few-shot adaptation for morphology-independent cell instance segmentation ( http://arxiv.org/abs/2402.17165v1 ) ライセンス: Link先を確認	Ram J. Zaveri and Voke Brume and Gianfranco Doretto	(参考訳) 顕微鏡データ収集はますます増加傾向にある。セルインスタンスのセグメンテーションのような正確で正確な定量的分析ツールは、それらを活用するために必要である。これは、新しいコレクションの精度を維持するためにセグメンテーションモデルを再トレーニングする必要があるデータの変化のため、難しい。これは特に細菌のような細長い非凸形態の細胞に必要である。そこで本研究では,新しいデータの1～5セルのみをアノテートして処理し,高い精度を維持するために,モデルを迅速に適用し,モデルの再トレーニングに必要なアノテーション量と計算能力を削減することを提案する。本研究は, 細菌データセットへの適応により, 精度が著しく向上したことを示す。 Microscopy data collections are becoming larger and more frequent. Accurate and precise quantitative analysis tools like cell instance segmentation are necessary to benefit from them. This is challenging due to the variability in the data, which requires retraining the segmentation model to maintain high accuracy on new collections. This is needed especially for segmenting cells with elongated and non-convex morphology like bacteria. We propose to reduce the amount of annotation and computing power needed for retraining the model by introducing a few-shot domain adaptation approach that requires annotating only one to five cells of the new data to process and that quickly adapts the model to maintain high accuracy. Our results show a significant boost in accuracy after adaptation to very challenging bacteria datasets.	翻訳日:2024-02-28 18:04:11 公開日:2024-02-27
# 参加型都市計画のための大規模言語モデル Large Language Model for Participatory Urban Planning ( http://arxiv.org/abs/2402.17161v1 ) ライセンス: Link先を確認	Zhilun Zhou, Yuming Lin, Depeng Jin, Yong Li	(参考訳) 参加型都市計画は、住民の活発な関与を含む現代の都市計画の主流である。しかし、伝統的な参加パラダイムは経験豊富な計画専門家を必要とし、しばしば時間と費用がかかる。幸いなことに、LLM(Large Language Models)は人間のようなエージェントをシミュレートする能力を示しており、参加プロセスのエミュレートに利用することができる。本研究では, 住民のニーズの多様さを考慮し, 都市部における土地利用計画を作成できる, 参加型都市計画のためのLLMベースのマルチエージェント協調フレームワークを提案する。具体的には、プランナーと何千人もの住民の多様なプロファイルと背景をシミュレートするLSMエージェントを構築する。我々はまずプランナーに初期土地利用計画の実行を依頼する。住民の異なる施設のニーズに対処するため,各地域住民を対象に,住民のプロフィールに基づいてフィードバックを提供する計画について議論を開始する。さらに,議論の効率を向上させるために,住民の一部が議論し,残りが各ラウンドのリスナーとして振る舞うフィッシュボウル議論機構を採用した。最後に、住民のフィードバックに基づいてプランナーに計画を変更する。我々はこの手法を北京の2つの現実世界に展開する。実験により, 住民の満足度と包含度において, 最先端のパフォーマンスを達成し, サービスアクセシビリティと生態指標の点で, 人的専門家より優れていることが示された。 Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the participatory process easily. In this work, we introduce an LLM-based multi-agent collaboration framework for participatory urban planning, which can generate land-use plans for urban regions considering the diverse needs of residents. Specifically, we construct LLM agents to simulate a planner and thousands of residents with diverse profiles and backgrounds. We first ask the planner to carry out an initial land-use plan. To deal with the different facilities needs of residents, we initiate a discussion among the residents in each community about the plan, where residents provide feedback based on their profiles. Furthermore, to improve the efficiency of discussion, we adopt a fishbowl discussion mechanism, where part of the residents discuss and the rest of them act as listeners in each round. Finally, we let the planner modify the plan based on residents' feedback. We deploy our method on two real-world regions in Beijing. Experiments show that our method achieves state-of-the-art performance in residents satisfaction and inclusion metrics, and also outperforms human experts in terms of service accessibility and ecology metrics.	翻訳日:2024-02-28 18:03:51 公開日:2024-02-27
# NocPlace: 生成的および継承的知識伝達を用いた夜間視覚的位置認識 NocPlace: Nocturnal Visual Place Recognition Using Generative and Inherited Knowledge Transfer ( http://arxiv.org/abs/2402.17159v1 ) ライセンス: Link先を確認	Bingxi Liu, Yiqun Wang, Huaqi Tao, Tingjun Huang, Fulin Tang, Yihong Wu, Jinqiang Cui and Hong Zhang	(参考訳) 視覚的位置認識(VPR)はコンピュータビジョンにおいて重要であり、既知の画像の広範なコレクションからクエリ画像に似たデータベースイメージを取得することを目的としている。しかしながら、多くの視覚関連タスクと同様に、学習ベースのVPRは夜間画像の不足により夜間にパフォーマンスが低下することが多い。具体的には、VPRは単一の夜間ドメインの問題ではなく、夜間のドメイン間の問題に対処する必要がある。これらの問題に対応するため、我々は、大規模で多視点の夜間VPRデータセットを利用して、学習したグローバルディスクリプタに、ダズリングライトと極暗光に対するレジリエンスを埋め込むNocPlaceを提案する。まず、NightCitiesと呼ばれる日夜の都市シーンデータセットを構築し、世界中の60都市で多様な夜間シナリオと照明のバリエーションを収集します。その後、このデータセット上で、画像対画像翻訳ネットワークを訓練する。この訓練された翻訳ネットワークを用いて既存のvprデータセットを処理し、夜間版を得る。 NocPlaceはナイトスタイルのイメージ、オリジナルのラベル、デイタイムVPRモデルから継承されたディスクリプタを使って微調整される。様々な夜間VPRテストセットに関する総合的な実験により、NocPlaceが従来の最先端手法をかなり上回っていることが判明した。 Visual Place Recognition (VPR) is crucial in computer vision, aiming to retrieve database images similar to a query image from an extensive collection of known images. However, like many vision-related tasks, learning-based VPR often experiences a decline in performance during nighttime due to the scarcity of nighttime images. Specifically, VPR needs to address the cross-domain problem of night-to-day rather than just the issue of a single nighttime domain. In response to these issues, we present NocPlace, which leverages a generated large-scale, multi-view, nighttime VPR dataset to embed resilience against dazzling lights and extreme darkness in the learned global descriptor. Firstly, we establish a day-night urban scene dataset called NightCities, capturing diverse nighttime scenarios and lighting variations across 60 cities globally. Following this, an unpaired image-to-image translation network is trained on this dataset. Using this trained translation network, we process an existing VPR dataset, thereby obtaining its nighttime version. The NocPlace is then fine-tuned using night-style images, the original labels, and descriptors inherited from the Daytime VPR model. Comprehensive experiments on various nighttime VPR test sets reveal that NocPlace considerably surpasses previous state-of-the-art methods.	翻訳日:2024-02-28 18:03:26 公開日:2024-02-27
# 複雑系のダイナミクス予測のための生成学習 Generative Learning for Forecasting the Dynamics of Complex Systems ( http://arxiv.org/abs/2402.17157v1 ) ライセンス: Link先を確認	Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos	(参考訳) 学習と効果的なダイナミクスを進化させることによって複雑なシステムのシミュレーションを加速する生成モデルを提案する。 g-led(generative learning of effective dynamics)では、高次元データの例を、自己回帰的注意機構によって進化する低次元多様体にサンプリングする。逆に、この低次元多様体を対応する高次元空間に写像するベイズ拡散モデルは、系の力学の統計を捉える。我々は,倉本-シヴァシンスキー方程式 (KS) や2次元高レイノルズ数流,3次元乱流流のシミュレーションなど,いくつかのベンチマークシステムにおけるG-LEDの性能と欠点を実証する。その結果、生成学習は計算コストを削減した複雑なシステムの統計特性を正確に予測するための新たなフロンティアを提供することを示した。 We introduce generative models for accelerating simulations of complex systems through learning and evolving their effective dynamics. In the proposed Generative Learning of Effective Dynamics (G-LED), instances of high dimensional data are down sampled to a lower dimensional manifold that is evolved through an auto-regressive attention mechanism. In turn, Bayesian diffusion models, that map this low-dimensional manifold onto its corresponding high-dimensional space, capture the statistics of the system dynamics. We demonstrate the capabilities and drawbacks of G-LED in simulations of several benchmark systems, including the Kuramoto-Sivashinsky (KS) equation, two-dimensional high Reynolds number flow over a backward-facing step, and simulations of three-dimensional turbulent channel flow. The results demonstrate that generative learning offers new frontiers for the accurate forecasting of the statistical properties of complex systems at a reduced computational cost.	翻訳日:2024-02-28 18:03:03 公開日:2024-02-27
# TaxDiff:タンパク質配列生成のための分類学的誘導拡散モデル TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation ( http://arxiv.org/abs/2402.17156v1 ) ライセンス: Link先を確認	Lin Zongying, Li Hao, Lv Liuzhenghao, Lin Bin, Zhang Junwu, Chen Calvin Yu-Chian, Yuan Li, Tian Yonghong	(参考訳) 特定の生物学的機能と構造安定性を持つタンパク質配列を設計することは、生物学や化学において重要である。生成モデルはすでに信頼できるタンパク質設計の能力を実証している。しかし、以前のモデルはタンパク質配列の無条件生成に制限されており、生物学的タスクに不可欠な制御可能な生成能力が欠如している。本研究では,生物種情報と拡散モデルの生成能力を組み合わせて,配列空間内で構造的に安定なタンパク質を生成する,制御可能なタンパク質配列生成のための分類学的拡散モデルであるtaxdiffを提案する。具体的には、変圧器ブロックの各層に分類制御情報を挿入して細粒度制御を行う。グローバルおよび局所的な注意の組み合わせにより、分類学的特異的タンパク質の配列整合性と構造的折りたたみ性が保証される。広範囲な実験により、TaxDiffは、分類学的誘導制御可能世代と無条件生成の両方において、複数のタンパク質配列生成ベンチマークにおいて、一貫してより良い性能を達成できることが示された。注目すべきは、TaxDiffが生成したシーケンスは、予測された構造に基づく信頼度の観点から直接構造生成モデルによって生成されたシーケンスを超え、拡散モデルに基づくモデルの4分の1しか必要としないことだ。タンパク質の生成と新しいバージョンのTaxDiffのトレーニングのためのコードは、https://github.com/Linzy19/TaxDiffで公開されている。 Designing protein sequences with specific biological functions and structural stability is crucial in biology and chemistry. Generative models already demonstrated their capabilities for reliable protein design. However, previous models are limited to the unconditional generation of protein sequences and lack the controllable generation ability that is vital to biological tasks. In this work, we propose TaxDiff, a taxonomic-guided diffusion model for controllable protein sequence generation that combines biological species information with the generative capabilities of diffusion models to generate structurally stable proteins within the sequence space. Specifically, taxonomic control information is inserted into each layer of the transformer block to achieve fine-grained control. The combination of global and local attention ensures the sequence consistency and structural foldability of taxonomic-specific proteins. Extensive experiments demonstrate that TaxDiff can consistently achieve better performance on multiple protein sequence generation benchmarks in both taxonomic-guided controllable generation and unconditional generation. Remarkably, the sequences generated by TaxDiff even surpass those produced by direct-structure-generation models in terms of confidence based on predicted structures and require only a quarter of the time of models based on the diffusion model. The code for generating proteins and training new versions of TaxDiff is available at:https://github.com/Linzy19/TaxDiff.	翻訳日:2024-02-28 18:02:47 公開日:2024-02-27
# シリコン上の超伝導回路におけるアクセプター誘起バルク誘電損失 Acceptor-induced bulk dielectric loss in superconducting circuits on silicon ( http://arxiv.org/abs/2402.17155v1 ) ライセンス: Link先を確認	Zi-Huai Zhang, Kadircan Godeneli, Justin He, Mutasem Odeh, Haoxin Zhou, Srujan Meesala, Alp Sipahigil	(参考訳) 超伝導量子回路の性能は主に2レベルシステム(tls)との相互作用による誘電損失によって制限される。材料界面を有する最先端回路はバルク基板からの誘電損失が重要な役割を果たす限界に近づいている。しかし、結晶性基板の誘電損失の微視的理解はまだ不十分である。本研究では,シリコン中のホウ素アクセプターが超伝導回路の強結合型tls浴を構成することを示す。ホウ素受容体の電子構造がシリコンのTLS応答に与える影響を論じる。シリコン中のホウ素濃度を網羅し、ホウ素受容体からのバルク誘電損失限界を示す。ホウ素が誘起する誘電損失は、ホウ素のスピン軌道構造により磁場中で低減できることを示した。この研究は超伝導回路のためのTLS浴について初めて詳細に説明し、次世代超伝導量子プロセッサのための超高純度基板の必要性を示す。 The performance of superconducting quantum circuits is primarily limited by dielectric loss due to interactions with two-level systems (TLS). State-of-the-art circuits with engineered material interfaces are approaching a limit where dielectric loss from bulk substrates plays an important role. However, a microscopic understanding of dielectric loss in crystalline substrates is still lacking. In this work, we show that boron acceptors in silicon constitute a strongly coupled TLS bath for superconducting circuits. We discuss how the electronic structure of boron acceptors leads to an effective TLS response in silicon. We sweep the boron concentration in silicon and demonstrate the bulk dielectric loss limit from boron acceptors. We show that boron-induced dielectric loss can be reduced in a magnetic field due to the spin-orbit structure of boron. This work provides the first detailed microscopic description of a TLS bath for superconducting circuits, and demonstrates the need for ultrahigh purity substrates for next-generation superconducting quantum processors.	翻訳日:2024-02-28 18:02:25 公開日:2024-02-27
# 単語よりも話者に耳を傾ける行動:生成レコメンデーションのためのトリリオンパラメータシーケンストランスデューサ Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations ( http://arxiv.org/abs/2402.17152v1 ) ライセンス: Link先を確認	Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi	(参考訳) 大規模レコメンデーションシステムは、高濃度、異質な特徴、毎日数十億のユーザアクションを処理する必要性に依存していることが特徴である。何千もの機能を備えた大量のデータでトレーニングされているにも関わらず、業界におけるほとんどのDeep Learning Recommendation Model(DLRM)は、計算処理ではスケールできない。言語と視覚領域におけるトランスフォーマーの成功に触発され、推奨システムの基本設計選択を再考する。生成型モデリングフレームワーク(``generative recommenders'')内の逐次変換タスクとしてレコメンデーション問題を再構成し,高濃度非定常ストリーミングレコメンデーションデータ用に設計された新しいアーキテクチャであるhstuを提案する。 HSTUはNDCGの合成データセットと公開データセットのベースラインを65.8倍に上回り、FlashAttention2ベースの8192のトランスフォーマーよりも5.3倍から15.2倍高速である。 HSTUベースのGenerative Recommendersは1.5兆のパラメータを持ち、オンラインA/Bテストのメトリクスを12.4\%改善し、数十億のユーザがいる大規模なインターネットプラットフォームの複数の面にデプロイされている。さらに重要なことは、ジェネレーティブ・リコメンダのモデル品質は、GPT-3/LLaMa-2スケールまでの3桁のトレーニング計算の強力な法則として実証的にスケールし、将来のモデル開発に必要な炭素フットプリントを減らすとともに、推奨の最初の基礎モデルへの道を開くことである。 Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework (``Generative Recommenders''), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8\% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4\% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundational models in recommendations.	翻訳日:2024-02-28 18:02:11 公開日:2024-02-27
# 文書部分のクラスタリング:文書からの影響キャンペーンの検出と特徴付け Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents ( http://arxiv.org/abs/2402.17151v1 ) ライセンス: Link先を確認	Zhengxiang Wang, Owen Rambow	(参考訳) 本稿では,文書からの影響を検知し,特徴付ける新しいクラスタリングパイプラインを提案する。このアプローチでは、ドキュメントの一部をクラスタ化し、影響キャンペーンを反映する可能性のあるクラスタを検出し、高影響クラスタとの関連を通じて影響キャンペーンに関連するドキュメントを識別する。本手法は,文書がインフルエンスキャンペーンの一部であるかどうかを予測する際に,直接文書レベルの分類と直接文書レベルのクラスタリングアプローチの両方に勝る。本稿では,既存の事象事実予測システムを用いて文書部分を取得し,複数のクラスタリング実験を集約し,クラスタおよび文書分類の性能を向上させるなど,パイプラインを強化するための新しい手法を提案する。クラスタリングの上に文書を分類することは、影響のあるキャンペーンに関連する文書の一部を正確に抽出するだけでなく、影響のあるキャンペーンを協調的かつ全体的現象として捉えている。我々の手法は、文書からの影響キャンペーンのよりきめ細やかなキャラクタリゼーションを可能にする。 We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked to an influence campaign via their association with the high-influence clusters. Our approach outperforms both the direct document-level classification and the direct document-level clustering approach in predicting if a document is part of an influence campaign. We propose various novel techniques to enhance our pipeline, including using an existing event factuality prediction system to obtain document parts, and aggregating multiple clustering experiments to improve the performance of both cluster and document classification. Classifying documents on the top of clustering not only accurately extracts the parts of the documents that are relevant to influence campaigns, but also capture influence campaigns as a coordinated and holistic phenomenon. Our approach makes possible more fine-grained and interpretable characterizations of influence campaigns from documents.	翻訳日:2024-02-28 18:01:39 公開日:2024-02-27
# テンソルネットワークを用いた量子コンピュータにおけるオプション価格の時系列生成 Time series generation for option pricing on quantum computers using tensor network ( http://arxiv.org/abs/2402.17148v1 ) ライセンス: Link先を確認	Nozomu Kobayashi, Yoshiyuki Suimon, Koichi Miyamoto	(参考訳) 金融、特にオプション価格は、量子コンピューティングの恩恵を受ける可能性のある有望な産業分野である。オプション価格の量子アルゴリズムが提案されているが、アルゴリズム内のコストのかかる操作をより効率的に実装することが望まれており、そのうちの1つは、基礎となる資産価格の確率分布をエンコードする量子状態の作成である。特に、経路依存オプションの価格設定では、基盤となる資産価格の複数の時点における共同分布をエンコードする状態を生成する必要があります。そこで本研究では,行列積状態(mps)を時系列生成のための生成モデルとして用いる新しい手法を提案する。我々のアプローチを検証するために、ヘストンモデルを対象とし、モデル内の時系列を生成する数値実験を行う。我々は,MPSモデルがヘストンモデルで経路を生成する能力を示し,量子コンピュータ上での経路依存オプションの価格設定の可能性を強調した。 Finance, especially option pricing, is a promising industrial field that might benefit from quantum computing. While quantum algorithms for option pricing have been proposed, it is desired to devise more efficient implementations of costly operations in the algorithms, one of which is preparing a quantum state that encodes a probability distribution of the underlying asset price. In particular, in pricing a path-dependent option, we need to generate a state encoding a joint distribution of the underlying asset price at multiple time points, which is more demanding. To address these issues, we propose a novel approach using Matrix Product State (MPS) as a generative model for time series generation. To validate our approach, taking the Heston model as a target, we conduct numerical experiments to generate time series in the model. Our findings demonstrate the capability of the MPS model to generate paths in the Heston model, highlighting its potential for path-dependent option pricing on quantum computers.	翻訳日:2024-02-28 18:01:23 公開日:2024-02-27
# Metasql: 自然言語からSQLへの変換のためのジェネレーション-then-Rankフレームワーク Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation ( http://arxiv.org/abs/2402.17144v1 ) ライセンス: Link先を確認	Yuankai Fan, Zhenying He, Tonghui Ren, Can Huang, Yinan Jing, Kai Zhang, X.Sean Wang	(参考訳) データベースへの自然言語インターフェース(nlidb)は、直感的な自然言語(nl)インタラクションを通じて、非技術ユーザによるデータベースアクセスを促進する。ニューラルシークエンス・ツー・シーケンスモデルや大規模言語モデルを利用する高度なアプローチは、通常、ユニークなSQLクエリをシーケンシャルに生成するために自動回帰デコードを使用する。これらの翻訳モデルは全体的な翻訳精度を大幅に改善し、NLIDBベンチマークでは70%を超えているが、単一のSQLクエリを生成する自動回帰デコードを使用することで、サブ最適出力が得られ、誤翻訳につながる可能性がある。本稿では,既存のNLIDBに柔軟に組み込むことができ,翻訳精度を一貫した向上を図ることができる,統一型ジェネレータ-then-rankフレームワークMetasqlを提案する。 metasqlはクエリメタデータを導入し、より良いsqlクエリ候補の生成を制御し、ラーニング・トゥ・ランクアルゴリズムを使用してグローバルに最適化されたクエリを検索する。具体的には、Metasqlはまず与えられたNLクエリの意味をクエリメタデータのセットに分解し、セマンティクスの基本概念を表現します。これらのメタデータは言語制約として使用され、基盤となる翻訳モデルから候補となるSQLクエリを生成する。最後に、Metasqlは候補をランク付けし、与えられたNLクエリに最適な候補を特定する。 Metasqlを2つの公開NLIDBベンチマークで研究するために、大規模な実験が行われた。その結果,metasqlを用いて翻訳モデルの性能を効果的に向上できることがわかった。 The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language models, typically employ auto-regressive decoding to generate unique SQL queries sequentially. While these translation models have greatly improved the overall translation accuracy, surpassing 70% on NLIDB benchmarks, the use of auto-regressive decoding to generate single SQL queries may result in sub-optimal outputs, potentially leading to erroneous translations. In this paper, we propose Metasql, a unified generate-then-rank framework that can be flexibly incorporated with existing NLIDBs to consistently improve their translation accuracy. Metasql introduces query metadata to control the generation of better SQL query candidates and uses learning-to-rank algorithms to retrieve globally optimized queries. Specifically, Metasql first breaks down the meaning of the given NL query into a set of possible query metadata, representing the basic concepts of the semantics. These metadata are then used as language constraints to steer the underlying translation model toward generating a set of candidate SQL queries. Finally, Metasql ranks the candidates to identify the best matching one for the given NL query. Extensive experiments are performed to study Metasql on two public NLIDB benchmarks. The results show that the performance of the translation models can be effectively improved using Metasql.	翻訳日:2024-02-28 18:01:08 公開日:2024-02-27
# 予測を伴うエネルギー効率スケジューリング Energy-Efficient Scheduling with Predictions ( http://arxiv.org/abs/2402.17143v1 ) ライセンス: Link先を確認	Eric Balkanski and Noemie Perivier and Clifford Stein and Hao-Ting Wei	(参考訳) 現代のスケジューリングシステムの重要な目標は、電力使用量を効率的に管理することである。エネルギー効率の高いスケジューリングでは、機械がジョブを処理する速度を、エネルギー消費を最小化し、結果として生じるスケジュールのサービスコストの品質を最適化する2つの目的で制御する。将来の要求に関する機械学習による予測は、過去のデータから学べることが多いため、近年の学習強化アルゴリズムの研究は、予測を利用して性能保証の改善を目指している。特にエネルギー効率のよいスケジューリングでは、Bamas et。アル [BamasMRS20]とAntoniadisら。アル [antoniadis2021novel] は期限問題によるエネルギー最小化の予測アルゴリズムを設計し, 予測誤差が小さい場合や, 予測誤差が任意に大きい場合でも最悪のケース境界を維持しながら, 競争率の向上を実現した。本稿では,エネルギー効率スケジューリングの一般的な設定を考察し,エネルギー効率スケジューリング問題に対するオフラインおよびオンラインアルゴリズムの入力として活用する,フレキシブルな学習型アルゴリズムフレームワークを提案する。予測誤差が小さい場合、この枠組みは、予測誤差にかかわらず、有界競争比を維持しつつ、期限付きエネルギーの最小化を含む、多くの異なるエネルギー効率のスケジューリング問題に対する競争率を改善する。最後に,本フレームワークが実データおよび合成データセットの性能を向上させることを実証的に示す。 An important goal of modern scheduling systems is to efficiently manage power usage. In energy-efficient scheduling, the operating system controls the speed at which a machine is processing jobs with the dual objective of minimizing energy consumption and optimizing the quality of service cost of the resulting schedule. Since machine-learned predictions about future requests can often be learned from historical data, a recent line of work on learning-augmented algorithms aims to achieve improved performance guarantees by leveraging predictions. In particular, for energy-efficient scheduling, Bamas et. al. [BamasMRS20] and Antoniadis et. al. [antoniadis2021novel] designed algorithms with predictions for the energy minimization with deadlines problem and achieved an improved competitive ratio when the prediction error is small while also maintaining worst-case bounds even when the prediction error is arbitrarily large. In this paper, we consider a general setting for energy-efficient scheduling and provide a flexible learning-augmented algorithmic framework that takes as input an offline and an online algorithm for the desired energy-efficient scheduling problem. We show that, when the prediction error is small, this framework gives improved competitive ratios for many different energy-efficient scheduling problems, including energy minimization with deadlines, while also maintaining a bounded competitive ratio regardless of the prediction error. Finally, we empirically demonstrate that this framework achieves an improved performance on real and synthetic datasets.	翻訳日:2024-02-28 18:00:41 公開日:2024-02-27
# 実世界の意思決定のための新しい言語としてのビデオ Video as the New Language for Real-World Decision Making ( http://arxiv.org/abs/2402.17139v1 ) ライセンス: Link先を確認	Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans	(参考訳) テキストデータもビデオデータもインターネット上で豊富であり、次のトークンやフレーム予測を通じて大規模な自己教師型学習をサポートする。しかし、それらは平等に活用されていない:言語モデルは現実世界に大きな影響を与え、ビデオ生成はメディアエンターテイメントに限られている。しかしビデオデータは、言語で表現するのが難しい物理的世界に関する重要な情報をキャプチャする。このギャップに対処するため,我々は,実世界の課題を解決するためにビデオ生成を拡張する機会を過小評価している。言語と同じく、ビデオはインターネットの知識を吸収し、多様なタスクを表現できる統一インターフェースとして機能する。さらに,映像生成は,言語モデルと同様に,インコンテキスト学習や計画,強化学習といった手法を用いて,プランナー,エージェント,計算エンジン,環境シミュレータとして機能することを示す。我々は、ロボット工学、自動運転、科学といった分野における大きなインパクトの機会を特定し、ビデオ生成におけるこのような高度な能力がいかに手の届く範囲内にあるかを示す最近の研究で支持されている。最後に、進捗を緩和するビデオ生成における重要な課題を特定します。これらの課題に対処することで、ビデオ生成モデルは、幅広いaiアプリケーションにおいて、言語モデルとともにユニークな価値を示すことができる。 Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning. We identify major impact opportunities in domains such as robotics, self-driving, and science, supported by recent work that demonstrates how such advanced capabilities in video generation are plausibly within reach. Lastly, we identify key challenges in video generation that mitigate progress. Addressing these challenges will enable video generation models to demonstrate unique value alongside language models in a wider array of AI applications.	翻訳日:2024-02-28 18:00:13 公開日:2024-02-27
# 機能報酬エンコーディングによる教師なしゼロショット強化学習 Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings ( http://arxiv.org/abs/2402.17135v1 ) ライセンス: Link先を確認	Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine	(参考訳) 大量の未ラベルのオフライン軌道からジェネラリストエージェントを事前訓練して、即座に新しい下流タスクにゼロショットで適応できるだろうか? 本稿では,このゼロショットRL問題に対する汎用かつスケーラブルな解として,関数型報酬符号化(FRE)を提案する。変換器をベースとした変分自動エンコーダを用いて状態回帰サンプルを符号化することで任意のタスクの関数表現を学習する。この機能的エンコーディングは、エージェントを広範囲の一般教師なし報酬関数から事前訓練するだけでなく、少数の報酬注釈サンプルが与えられた場合、新たな下流タスクをゼロショットで解決する手段も提供する。様々な無作為無防備報酬関数で訓練されたfreエージェントは、従来のゼロショットrlおよびオフラインrl法を上回って、様々なシミュレーションロボットベンチマークで新しいタスクを一般化できることを実証的に示す。このプロジェクトのコードは以下の通りである。 Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre	翻訳日:2024-02-28 17:59:54 公開日:2024-02-27
# シーンテキストスポッティングにおける言語前処理の効率化 Efficiently Leveraging Linguistic Priors for Scene Text Spotting ( http://arxiv.org/abs/2402.17134v1 ) ライセンス: Link先を確認	Nguyen Nguyen, Yapeng Tian, Chenliang Xu	(参考訳) 言語知識を組み込むことでシーンのテキスト認識が向上するが、テキスト検出と認識を伴うシーンのテキストスポッティングにも同じことが当てはまるかどうか疑問である。本稿では,大規模テキストコーパスからの言語知識を活用し,自己回帰的なテキストスポッティングと認識モデルで使用される従来の1ホットエンコーディングを置き換える手法を提案する。これにより、モデルが同じ単語の文字間の関係を捉えることができる。さらに,シーンテキストデータセットに適合するテキスト分布を生成する手法を導入し,ドメイン内微調整の必要性をなくした。その結果、新たに作成されたテキスト配信は、純粋なワンホット符号化よりも情報的であり、スポッティングと認識性能が向上する。本手法は単純かつ効率的であり,既存の自己回帰型アプローチと容易に統合できる。提案手法は,認識精度を向上させるだけでなく,より正確な単語のローカライズを可能にする。最先端のシーンテキストスポッティングと認識パイプラインの両方を大幅に改善し、いくつかのベンチマークで最先端の結果を達成する。 Incorporating linguistic knowledge can improve scene text recognition, but it is questionable whether the same holds for scene text spotting, which typically involves text detection and recognition. This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. This allows the model to capture the relationship between characters in the same word. Additionally, we introduce a technique to generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. As a result, the newly created text distributions are more informative than pure one-hot encoding, leading to improved spotting and recognition performance. Our method is simple and efficient, and it can easily be integrated into existing auto-regressive-based approaches. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words. It significantly improves both state-of-the-art scene text spotting and recognition pipelines, achieving state-of-the-art results on several benchmarks.	翻訳日:2024-02-28 17:59:35 公開日:2024-02-27
# スケーリングがllmの微調整に合致する場合:データ,モデル,微調整方法の影響 When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method ( http://arxiv.org/abs/2402.17193v1 ) ライセンス: Link先を確認	Biao Zhang, Zhongtao Liu, Colin Cherry, Orhan Firat	(参考訳) 大規模言語モデル(LLM)は、ダウンストリームアプリケーションにその機能を開放するためにファインタニングを採用することが多いが、異なるファインタニング手法の帰納的バイアス(特にスケーリング特性)に対する理解はまだ限られている。このギャップを埋めるために,llmモデルサイズ,事前トレーニングデータサイズ,新しい微調整パラメータサイズ,微調整データサイズなど,スケーリング要因が微調整性能に与える影響について,系統的な実験を行った。我々は,2種類の微調整-フルモデルチューニング (fmt) とパラメータ効率的なチューニング (pet, プロンプトチューニングとlora) について検討し,llmモデルサイズが微調整データサイズを大幅に上回るデータ制限条件下でのスケーリング動作について検討した。 1Bから16Bまでの2組の事前訓練されたバイリンガルLLMと、バイリンガル機械翻訳とマルチリンガル要約ベンチマークの実験から、我々はそのことが分かる。 1) llmの微調整は,データサイズと各スケーリング因子間の電力ベース乗算ジョイントスケーリング則に従う。 2) LLM の微調整は,データスケーリングの事前訓練よりも LLM モデルスケーリングの方が有効であり,PET パラメータスケーリングは一般的には効果がない。 3) 最適な微調整法は, タスクと微調整の精度が高い。 LLMファインタニング手法の理解,選択,開発に光を当てることが期待できる。 While large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications, our understanding on the inductive biases (especially the scaling properties) of different finetuning methods is still limited. To fill this gap, we conduct systematic experiments studying whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance. We consider two types of finetuning -- full-model tuning (FMT) and parameter efficient tuning (PET, including prompt tuning and LoRA), and explore their scaling behaviors in the data-limited regime where the LLM model size substantially outweighs the finetuning data size. Based on two sets of pretrained bilingual LLMs from 1B to 16B and experiments on bilingual machine translation and multilingual summarization benchmarks, we find that 1) LLM finetuning follows a powerbased multiplicative joint scaling law between finetuning data size and each other scaling factor; 2) LLM finetuning benefits more from LLM model scaling than pretraining data scaling, and PET parameter scaling is generally ineffective; and 3) the optimal finetuning method is highly task- and finetuning data-dependent. We hope our findings could shed light on understanding, selecting and developing LLM finetuning methods.	翻訳日:2024-02-28 17:53:44 公開日:2024-02-27
# 多様なバイオメカニクスがマーカーレスモーションキャプチャーの機会を解き放つ Differentiable Biomechanics Unlocks Opportunities for Markerless Motion Capture ( http://arxiv.org/abs/2402.17192v1 ) ライセンス: Link先を確認	R. James Cotton	(参考訳) 近年、gpu上で高速化可能な機械学習パイプライン用に設計された微分可能な物理シミュレータが開発されている。これらは生体力学モデルをシミュレートできるが、生体力学の研究やマーカーレスモーションキャプチャーには利用されていない。これらのシミュレータは,個人の擬人化計測に適合するようにモデルをスケーリングすることを含む,マーカーレスモーションキャプチャデータに逆キネマティックスを適合させることができる。これは運動軌跡の暗黙的な表現でエンドツーエンドに行われ、前方運動モデルによって伝播され、画像に再投影された3Dマーカーからの誤差を最小限に抑える。ディファレンシャルオプティマイザは、トラジェクトリ最適化中にバンドル調整を加えて外部カメラパラメータを洗練させたり、メタ最適化して、複数の参加者のトラジェクトリと共同でベースモデルを改善するといった他の機会をもたらす。提案手法は, 前手法によるマーカーレスモーションキャプチャーによる再投影誤差を改善し, 制御・臨床用歩行路と比較して正確な空間ステップパラメータを生成する。 Recent developments have created differentiable physics simulators designed for machine learning pipelines that can be accelerated on a GPU. While these can simulate biomechanical models, these opportunities have not been exploited for biomechanics research or markerless motion capture. We show that these simulators can be used to fit inverse kinematics to markerless motion capture data, including scaling the model to fit the anthropomorphic measurements of an individual. This is performed end-to-end with an implicit representation of the movement trajectory, which is propagated through the forward kinematic model to minimize the error from the 3D markers reprojected into the images. The differential optimizer yields other opportunities, such as adding bundle adjustment during trajectory optimization to refine the extrinsic camera parameters or meta-optimization to improve the base model jointly over trajectories from multiple participants. This approach improves the reprojection error from markerless motion capture over prior methods and produces accurate spatial step parameters compared to an instrumented walkway for control and clinical populations.	翻訳日:2024-02-28 17:53:19 公開日:2024-02-27
# AI駆動匿名:機械学習を活用しながら個人情報のプライバシーを保護する AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning ( http://arxiv.org/abs/2402.17191v1 ) ライセンス: Link先を確認	Le Yang, Miao Tian, Duan Xin, Qishuo Cheng, Jiajian Zheng	(参考訳) 人工知能の開発は人々の生活を大きく変えた。しかし、プライバシーとセキュリティに重大な脅威をもたらしており、個人情報がオンラインで暴露されたり、犯罪や盗難の報告があったりしている。その結果、機械学習アルゴリズムによる個人情報の知的保護を実現する必要性が最重要課題となっている。人工知能は高度なアルゴリズムと技術を活用し、個人情報を効果的に暗号化し匿名化する。本稿では、個人データのプライバシー保護と匿名化の促進を研究の中心目的とする。機械学習の差分プライバシー保護アルゴリズムを使用して、個人データのプライバシ保護と検出を実現する。この論文は、プライバシと個人データ保護に関連する機械学習の既存の課題にも対処し、改善提案を提供し、データセットに影響する要因を分析し、タイムリーな個人データプライバシ検出と保護を可能にする。 The development of artificial intelligence has significantly transformed people's lives. However, it has also posed a significant threat to privacy and security, with numerous instances of personal information being exposed online and reports of criminal attacks and theft. Consequently, the need to achieve intelligent protection of personal information through machine learning algorithms has become a paramount concern. Artificial intelligence leverages advanced algorithms and technologies to effectively encrypt and anonymize personal data, enabling valuable data analysis and utilization while safeguarding privacy. This paper focuses on personal data privacy protection and the promotion of anonymity as its core research objectives. It achieves personal data privacy protection and detection through the use of machine learning's differential privacy protection algorithm. The paper also addresses existing challenges in machine learning related to privacy and personal data protection, offers improvement suggestions, and analyzes factors impacting datasets to enable timely personal data privacy detection and protection.	翻訳日:2024-02-28 17:53:02 公開日:2024-02-27
# エンコーダの絡み合いを利用した符号切り換え音声認識のための実技混合手法 An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement ( http://arxiv.org/abs/2402.17189v1 ) ライセンス: Link先を確認	Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, and Berlin Chen	(参考訳) エンドツーエンド(E2E)ニューラルネットワークの大規模発展に伴い、近年は自動音声認識(ASR)における前例のないブレークスルーが見られた。しかし、ラベル付きデータの欠如と言語間の差異がしばしばASRの性能の低下につながるため、コードスイッチング現象はASRの完全性を妨げる大きな障害である。本稿では,E2E ASRの音響エンコーダの改良に特化して,符号スイッチング現象による課題に対処する。まず、エンコーダの下位層が、エンコーダの上位層における言語的混乱を緩和しつつ、言語間音響情報を捕捉できるようにするために、新しいアンタングルメント損失を導入する。第2に,提案手法が事前訓練されたデュアルエンコーダを用いた先行技術よりも優れており,コードスイッチングコーパスにのみアクセスし,パラメータ化の半分を消費していることを示す。第3に、エンコーダの出力特性の明らかな分化は、異方性損失とmoe(mixed-of-experts)アーキテクチャとの相補性も裏付ける。 With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR). However, the codeswitching phenomenon remains a major obstacle that hinders ASR from perfection, as the lack of labeled data and the variations between languages often lead to degradation of ASR performance. In this paper, we focus exclusively on improving the acoustic encoder of E2E ASR to tackle the challenge caused by the codeswitching phenomenon. Our main contributions are threefold: First, we introduce a novel disentanglement loss to enable the lower-layer of the encoder to capture inter-lingual acoustic information while mitigating linguistic confusion at the higher-layer of the encoder. Second, through comprehensive experiments, we verify that our proposed method outperforms the prior-art methods using pretrained dual-encoders, meanwhile having access only to the codeswitching corpus and consuming half of the parameterization. Third, the apparent differentiation of the encoders' output features also corroborates the complementarity between the disentanglement loss and the mixture-of-experts (MoE) architecture.	翻訳日:2024-02-28 17:52:47 公開日:2024-02-27
# pe-mvcnet:肺塞栓症予測のためのマルチビュー・クロスモーダル融合ネットワーク PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction ( http://arxiv.org/abs/2402.17187v1 ) ライセンス: Link先を確認	Zhaoxin Guo, Zhipeng Wang, Ruiquan Ge, Jianxun Yu, Feiwei Qin, Yuan Tian, Yuqing Peng, Yonghong Li, Changmiao Wang	(参考訳) 肺塞栓症(pe)の早期発見は患者の生存率を高める上で重要である。画像ベースと非画像ベースの両方の特徴は、医療分類タスクにおいて極めて重要である。臨床現場では、医師は医療画像の解釈に電子医療記録(EMR)が提供する文脈情報に頼る傾向がある。しかし、臨床情報を画像データと効果的に統合するモデルはほとんどない。この欠点に対処するために,ct肺血管造影画像とemrデータに基づくマルチモーダル核融合法pe-mvcnetを提案する。この方法は、統合マルチビューブロックを備えた画像専用モジュール、EMR専用モジュール、およびCross-modal Attention Fusion (CMAF)モジュールを含む。これらのモジュールは協調して、PEの予測を生成する包括的な特徴を抽出する。スタンフォード大学医療センターデータセットを用いた実験を行い、aurocは94.1%、精度は90.2%、f1スコアは90.6%とした。提案手法は既存の手法よりも優れており,単一のデータモダリティを用いたモデルに比べ,マルチモーダル融合モデルが優れていることを裏付ける。 The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrate clinical information with imaging data. To address this shortcoming, we suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data. This method comprises the Image-only module with an integrated multi-view block, the EMR-only module, and the Cross-modal Attention Fusion (CMAF) module. These modules cooperate to extract comprehensive features that subsequently generate predictions for PE. We conducted experiments using the publicly accessible Stanford University Medical Center dataset, achieving an AUROC of 94.1%, an accuracy rate of 90.2%, and an F1 score of 90.6%. Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality.	翻訳日:2024-02-28 17:52:26 公開日:2024-02-27
# 深層学習による計算流体力学のインパインティング Inpainting Computational Fluid Dynamics with Deep Learning ( http://arxiv.org/abs/2402.17185v1 ) ライセンス: Link先を確認	Dule Shu, Wilson Zhen, Zijie Li, Amir Barati Farimani	(参考訳) 流体データ補完は、実験と計算の両方の流体力学に高い利点をもたらす研究問題である。有効な流体データ補完法は流体力学実験に必要なセンサー数を削減し、計算流体力学(CFD)シミュレーションのための粗い、より適応的なメッシュを可能にする。しかし、流体データ補完問題の不適切性は、理論的解を得るのを強制的に難しくし、データ駆動アプローチ(例えばニューラルネットワークモデル)において高い数値不確実性と不安定性を示す。これらの課題に対処するために、ベクトル量子化技術を用いて、完全かつ不完全な流体データ空間を2段階の学習手順で離散値の低次元表現にマッピングする。我々は,大きさと配置の異なるマスクによって遮蔽されたコルモゴロフ流データ(レイノルズ数:1000)に対するアプローチの有効性を実証した。実験結果から, 提案モデルでは, 点分割精度, 乱流エネルギースペクトル, 渦度分布の点で, 異なる閉塞条件下でのベンチマークモデルよりも常に優れていた。 Fluid data completion is a research problem with high potential benefit for both experimental and computational fluid dynamics. An effective fluid data completion method reduces the required number of sensors in a fluid dynamics experiment, and allows a coarser and more adaptive mesh for a Computational Fluid Dynamics (CFD) simulation. However, the ill-posed nature of the fluid data completion problem makes it prohibitively difficult to obtain a theoretical solution and presents high numerical uncertainty and instability for a data-driven approach (e.g., a neural network model). To address these challenges, we leverage recent advancements in computer vision, employing the vector quantization technique to map both complete and incomplete fluid data spaces onto discrete-valued lower-dimensional representations via a two-stage learning procedure. We demonstrated the effectiveness of our approach on Kolmogorov flow data (Reynolds number: 1000) occluded by masks of different size and arrangement. Experimental results show that our proposed model consistently outperforms benchmark models under different occlusion settings in terms of point-wise reconstruction accuracy as well as turbulent energy spectrum and vorticity distribution.	翻訳日:2024-02-28 17:52:07 公開日:2024-02-27
# エクストリームエンコーダ出力フレームレート低減:大規模エンドツーエンドモデルの計算遅延の改善 Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models ( http://arxiv.org/abs/2402.17184v1 ) ライセンス: Link先を確認	Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno	(参考訳) エンド・ツー・エンド(e2e)自動音声認識(asr)モデルの精度は、より大きなサイズにスケールし、数十億のパラメータに達するほど改善され続けている。しかし、これらのモデルの広範な展開と採用には、復号化のための計算効率の良い戦略が必要である。本研究では,エンコーダ出力を少数の出力フレームに圧縮するために,エンコーダに複数のフレーム縮小層を適用する手法について検討する。同様の手法は先行研究で研究されてきたが,複数のファンネル還元層を用いて従来より劇的に削減が達成されている。本稿では,エンコーダにおける様々なアーキテクチャ選択の影響について検討し,最も効果的な戦略を同定する。入力音声の2.56秒毎に1つのエンコーダ出力フレームを生成することができ、大規模音声検索タスクにおける単語誤り率に大きな影響を及ぼすことなく、エンコーダとデコーダのレイテンシを48%と92%向上させることができる。 The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires computationally efficient strategies for decoding. In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames. While similar techniques have been investigated in previous work, we achieve dramatically more reduction than has previously been demonstrated through the use of multiple funnel reduction layers. Through ablations, we study the impact of various architectural choices in the encoder to identify the most effective strategies. We demonstrate that we can generate one encoder output frame for every 2.56 sec of input speech, without significantly affecting word error rate on a large-scale voice search task, while improving encoder and decoder latencies by 48% and 92% respectively, relative to a strong but computationally expensive baseline.	翻訳日:2024-02-28 17:51:49 公開日:2024-02-27
# QW-Search/Zeta対応 QW-Search/Zeta Correspondence ( http://arxiv.org/abs/2402.17183v1 ) ライセンス: Link先を確認	Taisuke Hosaka, Norio Konno, Etsuo Segawa	(参考訳) このゼータ関数と量子ウォークによる量子探索の関連性を検討する。まず, 1 次元トーラス上のゼータ関数の明示的な表現を,マークされた頂点の数と位置の一般的な場合に与える。さらに、d$-dimensional torus $(d \ge 2)$ 上のマークされた頂点の位置の2つの特別なケースを扱う。さらに,mahler測度を用いてゼータ関数の性質を扱う。その結果,ゼータ関数と量子探索アルゴリズムの関係が初めて明らかになった。 We consider the connection between this zeta function and quantum search via quantum walk. First, we give an explicit expression of the zeta function on the one-dimensional torus in the general case of the number and position of marked vertices. Moreover, we deal with the two special cases of the position of the marked vertices on the $d$-dimensional torus $(d \ge 2)$. Additionally, we treat the property of the zeta function by using the Mahler measure. Our results show the relationship between the zeta function and quantum search algorithms for the first time.	翻訳日:2024-02-28 17:51:27 公開日:2024-02-27
# x状態の多様性について On the variety of X-states ( http://arxiv.org/abs/2402.17181v1 ) ライセンス: Link先を確認	Luca Candelori, Vladimir Y. Chernyak, and John R. Klein	(参考訳) 我々は、n$-qubits 上の x-state の概念を紹介する。すべての混合状態の空間における X-状態の集合のザリスキー閉包をとると、局所対称性のリー群の作用を持つ複素代数多様体 $\scr X$ を得る。我々は、$\scr X$ 上の$G$-不変有理関数の体が、次数 2^{2n-1}-n-1$ の複素数に対して純粋に超越的であることを示す。 We introduce the notion of an X-state on $n$-qubits. After taking the Zariski closure of the set of X-states in the space of all mixed states, we obtain a complex algebraic variety $\scr X$ that is equipped with the action of the Lie group of local symmetries $G$. We show that the field of $G$-invariant rational functions on $\scr X$ is purely transcendental over the complex numbers of degree $2^{2n-1}-n-1$.	翻訳日:2024-02-28 17:51:21 公開日:2024-02-27
# デュアルスペース最適化:潜在プロンプトトランスフォーマーによる分子配列設計の改善 Dual-Space Optimization: Improved Molecule Sequence Design by Latent Prompt Transformer ( http://arxiv.org/abs/2402.17179v1 ) ライセンス: Link先を確認	Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, Sanping Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu	(参考訳) 薬物類似性やタンパク質標的に対する高い結合親和性などの望ましい性質を持つ分子を設計することは難しい問題である。本稿では,この問題を解決するために,潜在空間サンプリングとデータ空間選択を統合したデュアルスペース最適化(dso)手法を提案する。 DSOは、生成モデルと合成データを所望のプロパティ値の領域へ徐々にシフトさせる最適化プロセスにおいて、潜在空間生成モデルと合成データセットを反復的に更新する。我々の生成モデルは、潜在ベクトルが因果変換器のプロンプトとして機能する潜在プロンプト変換器(LPT)の形をとる。提案手法の有効性を実証し, 単一目的, 多目的, 制約された分子設計タスクにまたがって, 新たな性能ベンチマークを設定する。 Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimization process that gradually shifts the generative model and the synthetic data towards regions of desired property values. Our generative model takes the form of a Latent Prompt Transformer (LPT) where the latent vector serves as the prompt of a causal transformer. Our extensive experiments demonstrate effectiveness of the proposed method, which sets new performance benchmarks across single-objective, multi-objective and constrained molecule design tasks.	翻訳日:2024-02-28 17:51:13 公開日:2024-02-27
# Sora: 大規模ビジョンモデルの背景,技術,限界,機会に関するレビュー Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models ( http://arxiv.org/abs/2402.17177v1 ) ライセンス: Link先を確認	Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun	(参考訳) Sora(ソラ)は、OpenAIが2024年2月にリリースした、テキストからビデオへの生成AIモデルである。このモデルは、テキストの指示からリアルまたは想像的なシーンのビデオを生成し、物理的な世界をシミュレートする可能性を示すよう訓練されている。本稿では,公開技術報告とリバースエンジニアリングに基づいて,テキスト対ビデオaiモデルの背景,関連技術,応用,課題,今後の方向性について概観する。最初に sora の開発を追跡し、この "world simulator" を構築するのに使われた基礎技術を調査した。次に,映画製作から教育,マーケティングまで多産業におけるsoraの応用と潜在的影響について詳述する。安全で偏りのないビデオ生成の確保など,soraを広く展開するために取り組むべき主な課題と制限について論じる。最後に、Soraとビデオ生成モデルの将来的な発展と、その分野における進歩が、ビデオ生成の生産性とクリエイティビティを向上し、人間とAIのインタラクションの新たな方法を実現する方法について論じる。 Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.	翻訳日:2024-02-28 17:50:59 公開日:2024-02-27
# DeepDRK: 機能選択のためのDeep Dependency Regularized Knockoff DeepDRK: Deep Dependency Regularized Knockoff for Feature Selection ( http://arxiv.org/abs/2402.17176v1 ) ライセンス: Link先を確認	Hongyu Shen and Yici Yan and Zhizhen Zhao	(参考訳) model-x ノックオフは様々な特徴選択手法の中で、fdr(偽発見率)制御の保証のために最近注目を集めた。パラメトリック設計の導入に伴い、ノックオフは深層学習に基づく生成モデルを用いて任意のデータ分布を扱うように進歩している。しかし,現在のModel-Xノックオフフレームワークの実装には限界があることがわかった。特に、ノックオフが必要な「スワップ特性」はサンプルレベルでの課題にしばしば遭遇し、選択力が低下する。そこで本研究では,FDRと電力のバランスをとる分散自由なディープラーニング手法であるDeep Dependency Regularized Knockoff(DeepDRK)を開発した。 DeepDRKでは、「スワップ特性」をより良く達成するために、トランスフォーマーアーキテクチャを基盤とした生成モデルが導入された。より効率的な正則化技術も提案されている。我々のモデルは, サンプルサイズが小さく, データの分布が複雑である場合に, 合成, 半合成, 実世界のデータにおいて, 他のベンチマークよりも優れている。 Model-X knockoff, among various feature selection methods, received much attention recently due to its guarantee on false discovery rate (FDR) control. Subsequent to its introduction in parametric design, knockoff is advanced to handle arbitrary data distributions using deep learning-based generative modeling. However, we observed that current implementations of the deep Model-X knockoff framework exhibit limitations. Notably, the "swap property" that knockoffs necessitate frequently encounter challenges on sample level, leading to a diminished selection power. To overcome, we develop "Deep Dependency Regularized Knockoff (DeepDRK)", a distribution-free deep learning method that strikes a balance between FDR and power. In DeepDRK, a generative model grounded in a transformer architecture is introduced to better achieve the "swap property". Novel efficient regularization techniques are also proposed to reach higher power. Our model outperforms other benchmarks in synthetic, semi-synthetic, and real-world data, especially when sample size is small and data distribution is complex.	翻訳日:2024-02-28 17:50:41 公開日:2024-02-27
# Lane2Seq:シーケンス生成による統一レーン検出を目指して Lane2Seq: Towards Unified Lane Detection via Sequence Generation ( http://arxiv.org/abs/2402.17172v1 ) ライセンス: Link先を確認	Kunyang Zhou	(参考訳) 本稿では,レーン検出のための新しいシーケンス生成に基づくフレームワーク lane2seq を提案する。シーケンス生成タスクとしてレーン検出をキャストすることで、様々なレーン検出フォーマットを統一する。これは、よく設計されたタスク固有のヘッドネットワークと対応する損失関数に依存する以前のレーン検出方法とは異なる。 Lane2Seqは、単純なクロスエントロピー損失を持つプレーントランスフォーマーベースのエンコーダデコーダアーキテクチャのみを採用する。さらに,タスク固有の知識をLane2Seqに組み込むため,強化学習に基づく新しいマルチフォーマットモデルチューニングを提案する。実験の結果、このような単純なシーケンス生成パラダイムはレーン検出を統一するだけでなく、ベンチマークでの競合性能も達成できることが示されている。例えば、Lane2SeqはTusimpleとLLAMASのデータセットで97.95\%と97.42\%のF1スコアを取得し、2つのベンチマークで新たな最先端結果を確立する。 In this paper, we present a novel sequence generation-based framework for lane detection, called Lane2Seq. It unifies various lane detection formats by casting lane detection as a sequence generation task. This is different from previous lane detection methods, which depend on well-designed task-specific head networks and corresponding loss functions. Lane2Seq only adopts a plain transformer-based encoder-decoder architecture with a simple cross-entropy loss. Additionally, we propose a new multi-format model tuning based on reinforcement learning to incorporate the task-specific knowledge into Lane2Seq. Experimental results demonstrate that such a simple sequence generation paradigm not only unifies lane detection but also achieves competitive performance on benchmarks. For example, Lane2Seq gets 97.95\% and 97.42\% F1 score on Tusimple and LLAMAS datasets, establishing a new state-of-the-art result for two benchmarks.	翻訳日:2024-02-28 17:50:24 公開日:2024-02-27
# LiveHPS:LiDARに基づくシーンレベルの人間詩と自由環境における形状推定 LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment ( http://arxiv.org/abs/2402.17171v1 ) ライセンス: Link先を確認	Yiming Ren, Xiao Han, Chengfeng Zhao, Jingya Wang, Lan Xu, Jingyi Yu, Yuexin Ma	(参考訳) 人間中心の大規模シーンでは、人間の3次元グローバルポーズと形状のきめ細かいモデリングがシーン理解にとって重要であり、多くの現実世界のアプリケーションに役立てることができる。本稿では,光条件やウェアラブル機器の制限なしにシーンレベルの人間のポーズや形状を推定するための,単一LiDARに基づく新しいアプローチLiveHPSを提案する。特に,LiDAR点雲の分布変化効果を緩和する蒸留機構を設計し,連続するフレームに存在する時空間幾何学的・動的情報を活用し,閉塞・騒音障害を解消する。 LiveHPSは効率的な構成と高品質な出力を持ち、現実世界のアプリケーションに適している。さらに,人間のポーズ,形状,翻訳を多種多様なシナリオで収集する,FreeMotionという巨大な人間の動作データセットを提案する。マルチモーダルおよびマルチビューのデータからなり、キャリブレーションおよび同期lidar、カメラ、およびimusから取得する。新しいデータセットや他の公開データセットに関する広範囲な実験は、我々のアプローチのsotaパフォーマンスと堅牢性を示しています。間もなくコードとデータセットをリリースします。 For human-centric large-scale scenes, fine-grained modeling for 3D human global pose and shape is significant for scene understanding and can benefit many real-world applications. In this paper, we present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation without any limitation of light conditions and wearable devices. In particular, we design a distillation mechanism to mitigate the distribution-varying effect of LiDAR point clouds and exploit the temporal-spatial geometric and dynamic information existing in consecutive frames to solve the occlusion and noise disturbance. LiveHPS, with its efficient configuration and high-quality output, is well-suited for real-world applications. Moreover, we propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses, shapes and translations. It consists of multi-modal and multi-view acquisition data from calibrated and synchronized LiDARs, cameras, and IMUs. Extensive experiments on our new dataset and other public datasets demonstrate the SOTA performance and robustness of our approach. We will release our code and dataset soon.	翻訳日:2024-02-28 17:50:08 公開日:2024-02-27
# Deep Umbra: 都市空間における日光アクセス計算のための生成的アプローチ Deep Umbra: A Generative Approach for Sunlight Access Computation in Urban Spaces ( http://arxiv.org/abs/2402.17169v1 ) ライセンス: Link先を確認	Kazi Shahrukh Omar, Gustavo Moreira, Daniel Hodczak, Maryam Hosseini, Nicola Colaninno, Marcos Lage, Fabio Miranda	(参考訳) 日光と影は、都市空間の活用、繁栄、成長において重要な役割を果たしている。都市環境の成功には日光へのアクセスが不可欠であるが、日陰は暑い季節に滞在し、ヒートアイランド効果を緩和し、歩行者の快適度を高めることができる。大規模な都市環境での日光アクセスと影の定量化は、今日の都市が直面する重要な課題に取り組む上で鍵となる。本稿では,地球規模での日光アクセスと影の定量化を可能にする新しい計算フレームワークであるDeep Umbraを提案する。筆者らの枠組みは, 都市の物理的形態を考慮し, 年間ごとに蓄積した日光アクセスの高解像度空間情報を計算する条件付き逆向きネットワークに基づいている。我々は7つの異なる都市からのデータを用いてモデルをトレーニングし、広範囲な実験を通して、RMSE(以下0.1)の低レベルと、トレーニングセットに含まれていない都市への拡張性を示す。さらに,世界の6大陸にまたがる100以上の都市に対して,太陽光アクセス情報を備えたケーススタディと総合データセットを提供する。 Deep Umbraはhttps://urbantk.org/shadows.comで入手できる。 Sunlight and shadow play critical roles in how urban spaces are utilized, thrive, and grow. While access to sunlight is essential to the success of urban environments, shadows can provide shaded places to stay during the hot seasons, mitigate heat island effect, and increase pedestrian comfort levels. Properly quantifying sunlight access and shadows in large urban environments is key in tackling some of the important challenges facing cities today. In this paper, we propose Deep Umbra, a novel computational framework that enables the quantification of sunlight access and shadows at a global scale. Our framework is based on a conditional generative adversarial network that considers the physical form of cities to compute high-resolution spatial information of accumulated sunlight access for the different seasons of the year. We use data from seven different cities to train our model, and show, through an extensive set of experiments, its low overall RMSE (below 0.1) as well as its extensibility to cities that were not part of the training set. Additionally, we contribute a set of case studies and a comprehensive dataset with sunlight access information for more than 100 cities across six continents of the world. Deep Umbra is available at https://urbantk.org/shadows.	翻訳日:2024-02-28 17:49:52 公開日:2024-02-27
# 可変制御適応サンプリングによる効率的なバックプロパゲーション Efficient Backpropagation with Variance-Controlled Adaptive Sampling ( http://arxiv.org/abs/2402.17227v1 ) ライセンス: Link先を確認	Ziteng Wang, Jianfei Chen, Jun Zhu	(参考訳) 前方および/または後方伝播(BP)中の'重要'な計算を排除したサンプリングベースのアルゴリズムは、ニューラルネットワークトレーニングを加速するための潜在的なソリューションを提供する。しかし、サンプリングはトレーニングに近似を導入するため、これらのアルゴリズムは様々なタスクで一貫して精度を維持することはできない。本研究では,BPの高速化を目的とした分散制御型適応サンプリング(VCAS)手法を提案する。 VCASは、アクティベーション勾配計算のためのデータ次元において、きめ細かい層ごとに重要なサンプリングを行い、トークン次元におけるスコアサンプリングを利用して重み勾配計算を行う。精度を維持するため,トレーニング中のモデルパラメータと組み合わせてサンプル比を学習することにより,付加的な分散を制御した。我々は、視覚領域と自然言語領域の両方において、複数の微調整タスクと事前訓練タスクについてVCASを評価した。すべてのタスクにおいてVCASは、トレーニングプロセス全体の73.87%のFLOPと49.58%のFLOPを削減して、元のトレーニング損失軌跡と検証精度を維持することができる。実装はhttps://github.com/thu-ml/VCAS で公開されている。 Sampling-based algorithms, which eliminate ''unimportant'' computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to training, such algorithms may not consistently maintain accuracy across various tasks. In this work, we introduce a variance-controlled adaptive sampling (VCAS) method designed to accelerate BP. VCAS computes an unbiased stochastic gradient with fine-grained layerwise importance sampling in data dimension for activation gradient calculation and leverage score sampling in token dimension for weight gradient calculation. To preserve accuracy, we control the additional variance by learning the sample ratio jointly with model parameters during training. We assessed VCAS on multiple fine-tuning and pre-training tasks in both vision and natural language domains. On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87% FLOPs reduction of BP and 49.58% FLOPs reduction of the whole training process. The implementation is available at https://github.com/thu-ml/VCAS .	翻訳日:2024-02-28 17:45:01 公開日:2024-02-27
# 会話における推論:大言語モデルの対話シミュレーションによる主観的課題の解決 Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models ( http://arxiv.org/abs/2402.17226v1 ) ライセンス: Link先を確認	Xiaolong Wang, Yile Wang, Yuanchi Zhang, Fuwen Luo, Peng Li, Maosong Sun, Yang Liu	(参考訳) 大規模言語モデル(llm)は、オープンドメインの質問応答や数学的推論といった客観的なタスクにおいて顕著なパフォーマンスを達成している。しかし,主観的タスクにおけるllmの性能は,メタファー認識やダークユーモア検出など,まだ不十分であることがわかった。客観的タスクと比較して、主観的なタスクは、普遍的に受け入れられる推論経路よりも、解釈や感情的な反応に重点を置いている。課題の特徴とLLMの強力な対話生成能力に基づいて,対話型シミュレーションによる主観的タスクの解法であるRiC(Reasoning in Conversation)を提案する。 ricの動機は、チェーン・オブ・マインド(chain-of-thought)スタイルの根拠を提供するのではなく、対話をシミュレートして有用な文脈情報をマイニングすることにある。 GPT-4, ChatGPT, OpenChatを含むAPIベースおよびオープンソースLLMを12タスクにわたって評価した。実験結果から, RiC は各種ベースラインと比較して顕著な改善が得られた。 Large Language Models (LLMs) have achieved remarkable performance in objective tasks such as open-domain question answering and mathematical reasoning, which can often be solved through recalling learned factual knowledge or chain-of-thought style reasoning. However, we find that the performance of LLMs in subjective tasks is still unsatisfactory, such as metaphor recognition, dark humor detection, etc. Compared to objective tasks, subjective tasks focus more on interpretation or emotional response rather than a universally accepted reasoning pathway. Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales, thereby offering potential useful knowledge behind dialogues for giving the final answers. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks. Experimental results show that RiC can yield significant improvement compared with various baselines.	翻訳日:2024-02-28 17:44:42 公開日:2024-02-27
# Viblio: ビデオ共有プラットフォームに信頼性信号とCitationを導入する Viblio: Introducing Credibility Signals and Citations to Video-Sharing Platforms ( http://arxiv.org/abs/2402.17218v1 ) ライセンス: Link先を確認	Emelia Hughes, Renee Wang, Prerna Juneja, Tony Li, Tanu Mitra, Amy Zhang	(参考訳) より多くのユーザーが情報ソースとしてYouTubeのようなビデオ共有プラットフォームに目を向けるにつれ、最高の努力にもかかわらず誤った情報を消費する可能性がある。本研究では,プラットフォーム上での既存の信号を用いた信頼性決定の方法を探り,新しい信頼性に基づく信号の導入と評価を行うことにより,ビデオの信頼性を評価する方法を検討する。実験では,youtubeユーザを対象に12件の文脈質問インタビューを実施し,実験参加者がチャンネル名,品質,事前知識などの既存の信号の組み合わせを用いて信頼性を評価した上で,時としてその取り組みに支障をきたすことが判明した。そこで我々は、YouTubeユーザーが参加者のニーズに基づいてビデオを見ながら、引用や関連情報を見たり追加したりできるプロトタイプシステムViblioを開発した。 12人の被験者による評価から、すべての参加者は、ビデオの信頼性を評価する過程において、直感的で有用であることに気付き、将来的にはViblioを使うようになる。 As more users turn to video-sharing platforms like YouTube as an information source, they may consume misinformation despite their best efforts. In this work, we investigate ways that users can better assess the credibility of videos by first exploring how users currently determine credibility using existing signals on platforms and then by introducing and evaluating new credibility-based signals. We conducted 12 contextual inquiry interviews with YouTube users, determining that participants used a combination of existing signals, such as the channel name, the production quality, and prior knowledge, to evaluate credibility, yet sometimes stumbled in their efforts to do so. We then developed Viblio, a prototype system that enables YouTube users to view and add citations and related information while watching a video based on our participants' needs. From an evaluation with 12 people, all participants found Viblio to be intuitive and useful in the process of evaluating a video's credibility and could see themselves using Viblio in the future.	翻訳日:2024-02-28 17:44:22 公開日:2024-02-27
# オフライン安全強化学習のための時相論理仕様条件付き決定変換器 Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning ( http://arxiv.org/abs/2402.17217v1 ) ライセンス: Link先を確認	Zijian Guo, Weichao Zhou, Wenchao Li	(参考訳) オフラインセーフ強化学習(rl)は、固定データセットから制約満足度ポリシーをトレーニングすることを目的としている。現在の最先端のアプローチは、条件付きポリシーによる教師付き学習に基づいている。しかし、これらのアプローチは、時間的および論理的構造が豊富な複雑なタスクを含む実世界のアプリケーションでは不十分である。本稿では、信号時間論理(STL)の表現力を利用して、エージェントが従うべき複雑な時間規則と、決定変換器(DT)の逐次モデリング能力を指定する新しいフレームワークである、時間論理仕様条件付き決定変換器(SDT)を提案する。 DSRLベンチマークの実証的な評価は、既存のアプローチと比較して、安全性と高いリワードポリシーの学習において、SDTのキャパシティが優れていることを示している。さらに、sdtは、条件付けされているstl仕様の異なる所望の満足度に関して良好な一致を示す。 Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a fixed dataset. Current state-of-the-art approaches are based on supervised learning with a conditioned policy. However, these approaches fall short in real-world applications that involve complex tasks with rich temporal and logical structures. In this paper, we propose temporal logic Specification-conditioned Decision Transformer (SDT), a novel framework that harnesses the expressive power of signal temporal logic (STL) to specify complex temporal rules that an agent should follow and the sequential modeling capability of Decision Transformer (DT). Empirical evaluations on the DSRL benchmarks demonstrate the better capacity of SDT in learning safe and high-reward policies compared with existing approaches. In addition, SDT shows good alignment with respect to different desired degrees of satisfaction of the STL specification that it is conditioned on.	翻訳日:2024-02-28 17:44:05 公開日:2024-02-27
# クラウドコンピューティングリソーススケジューリングと管理における機械学習最適化の適用 Application of Machine Learning Optimization in Cloud Computing Resource Scheduling and Management ( http://arxiv.org/abs/2402.17216v1 ) ライセンス: Link先を確認	Yifan Zhang, Bo Liu, Yulu Gong, Jiaxin Huang, Jingyu Xu, Weixiang Wan	(参考訳) 近年、クラウドコンピューティングは広く使われている。クラウドコンピューティングは中央集権的なコンピューティングリソースを指し、ユーザーは中央集権的なリソースにアクセスして計算を完了し、クラウドコンピューティングセンターはプログラム処理の結果をユーザに返す。クラウドコンピューティングは、個々のユーザだけでなく、エンタープライズユーザのためのものでもある。クラウドサーバを購入することで、ユーザは大量のコンピュータを購入する必要がなく、計算コストを節約できる。 China Economic News Networkのレポートによると、中国のクラウドコンピューティングの規模は209億元に達した。現在、中国のより成熟したクラウドサービスプロバイダは、Ali Cloud、Baidu Cloud、Huawei Cloudなどである。そこで本研究では,機械学習最適化技術を用いて,クラウドコンピューティング資源スケジューリングと管理の複雑な問題を解くための革新的なアプローチを提案する。本研究は、低リソース利用やクラウド環境の非バランス負荷といった課題の詳細な研究を通じて、ディープラーニングや遺伝的アルゴリズムなどの最適化手法を含む包括的なソリューションを提案し、システムの性能と効率を改善し、クラウドコンピューティングリソース管理の分野で新たなブレークスルーと進歩をもたらす。クラウドコンピューティングのリソース割り当てでは、クラウドコンピューティングセンターは限られたクラウドリソースを持ち、ユーザは順番に到着する。各ユーザは、特定の時間に特定の数のクラウドリソースを使用するように、クラウドコンピューティングセンターに要求する。 In recent years, cloud computing has been widely used. Cloud computing refers to the centralized computing resources, users through the access to the centralized resources to complete the calculation, the cloud computing center will return the results of the program processing to the user. Cloud computing is not only for individual users, but also for enterprise users. By purchasing a cloud server, users do not have to buy a large number of computers, saving computing costs. According to a report by China Economic News Network, the scale of cloud computing in China has reached 209.1 billion yuan. At present, the more mature cloud service providers in China are Ali Cloud, Baidu Cloud, Huawei Cloud and so on. Therefore, this paper proposes an innovative approach to solve complex problems in cloud computing resource scheduling and management using machine learning optimization techniques. Through in-depth study of challenges such as low resource utilization and unbalanced load in the cloud environment, this study proposes a comprehensive solution, including optimization methods such as deep learning and genetic algorithm, to improve system performance and efficiency, and thus bring new breakthroughs and progress in the field of cloud computing resource management.Rational allocation of resources plays a crucial role in cloud computing. In the resource allocation of cloud computing, the cloud computing center has limited cloud resources, and users arrive in sequence. Each user requests the cloud computing center to use a certain number of cloud resources at a specific time.	翻訳日:2024-02-28 17:43:49 公開日:2024-02-27
# 固有行列による多次元非構造スパース回復 Multidimensional unstructured sparse recovery via eigenmatrix ( http://arxiv.org/abs/2402.17215v1 ) ライセンス: Link先を確認	Lexing Ying	(参考訳) 本稿では,多次元非構造スパース回収問題について考察する。例えばフーリエインバージョンやスパースデコンボリューションがある。固有行列は1次元問題に対して所望の近似固有値と固有ベクトルを持つデータ駆動構成である。このノートは多次元問題に対する固有行列アプローチを拡張している。提案手法の性能を示す数値的な結果を得た。 This note considers the multidimensional unstructured sparse recovery problems. Examples include Fourier inversion and sparse deconvolution. The eigenmatrix is a data-driven construction with desired approximate eigenvalues and eigenvectors proposed for the one-dimensional problems. This note extends the eigenmatrix approach to multidimensional problems. Numerical results are provided to demonstrate the performance of the proposed method.	翻訳日:2024-02-28 17:43:28 公開日:2024-02-27
# キャラクタGen:マルチビューポーズ正準化を用いた単一画像からの効率的な3次元キャラクタ生成 CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization ( http://arxiv.org/abs/2402.17214v1 ) ライセンス: Link先を確認	Hao-Yang Peng, Jia-Peng Zhang, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu	(参考訳) デジタルコンテンツ作成の分野では、特に身体の複雑度や自己排除の問題やあいまいさを考えると、単一画像から高品質な3D文字を生成することは困難である。本稿では,3D文字を効率よく生成するフレームワークである characterGen を提案する。 charactergenは、画像条件付きマルチビュー拡散モデルとともに、合理化された生成パイプラインを導入する。このモデルは、入力画像のキー属性を保持しながら、入力ポーズを標準形式で効果的に校正し、多様なポーズによって生じる課題に対処する。変換器ベースで一般化可能なスパースビュー再構成モデルは,マルチビュー画像から詳細な3Dモデルを作成する上で,我々のアプローチの中核となるコンポーネントである。また,高品質なテクスチャマップを作成するためにテクスチャバックプロジェクション戦略も採用した。さらに、モデルのトレーニングと評価のために、複数のポーズとビューでレンダリングされたアニメ文字のデータセットをキュレートしました。提案手法は定量的・定性的な実験を通じて徹底的に評価され,高品質な形状とテクスチャを持つ3dキャラクタの生成に熟練しており,リギングやアニメーションなどの下流アプリケーションに対応している。 In the field of digital content creation, generating high-quality 3D characters from single images is challenging, especially given the complexities of various body poses and the issues of self-occlusion and pose ambiguity. In this paper, we present CharacterGen, a framework developed to efficiently generate 3D characters. CharacterGen introduces a streamlined generation pipeline along with an image-conditioned multi-view diffusion model. This model effectively calibrates input poses to a canonical form while retaining key attributes of the input image, thereby addressing the challenges posed by diverse poses. A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach, facilitating the creation of detailed 3D models from multi-view images. We also adopt a texture-back-projection strategy to produce high-quality texture maps. Additionally, we have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model. Our approach has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures, ready for downstream applications such as rigging and animation.	翻訳日:2024-02-28 17:43:23 公開日:2024-02-27
# VCD: ビジュアルコモンセンス発見のための知識ベース VCD: Knowledge Base Guided Visual Commonsense Discovery in Images ( http://arxiv.org/abs/2402.17213v1 ) ライセンス: Link先を確認	Xiangqing Shen, Yurun Song, Siwei Wu and Rui Xia	(参考訳) ビジュアルコモンセンスは、視覚データ内のオブジェクトの特性、関係、行動に関する知識を含んでいる。視覚コモンセンスの発見は、より包括的でより豊かな画像の理解を提供し、コンピュータビジョンシステムの推論と意思決定能力を高めることができる。しかし、既存の視覚コモンセンス発見研究で定義された視覚コモンセンスは粗く、不完全である。本研究では,自然言語処理におけるコモンセンス知識ベース概念ネットから着想を得て,視覚コモンセンスのタイプを体系的に定義する。これに基づいて、画像内の異なるオブジェクトに含まれる異なる種類の細かなコモンセンスを抽出することを目的とした、Visual Commonsense Discovery (VCD)という新しいタスクを導入する。そこで我々は,Visual GenomeとConceptNetからVCD用のデータセット(VCDD)を構築し,10万以上の画像と1400万のオブジェクト・コモンセンスのペアを特徴とする。さらに、視覚言語モデルと命令調律を統合してVCDに取り組む生成モデル(VCDM)を提案する。自動的および人的評価は、VCDにおけるVCDMの熟練度を示し、特に暗黙のコモンセンス発見においてGPT-4Vを上回っている。 VCDの価値は、視覚的常識評価と視覚的質問応答を含む2つの下流タスクに適用することでさらに実証される。データとコードはgithubから入手できる。 Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-grained and incomplete. In this work, we draw inspiration from a commonsense knowledge base ConceptNet in natural language processing, and systematically define the types of visual commonsense. Based on this, we introduce a new task, Visual Commonsense Discovery (VCD), aiming to extract fine-grained commonsense of different types contained within different objects in the image. We accordingly construct a dataset (VCDD) from Visual Genome and ConceptNet for VCD, featuring over 100,000 images and 14 million object-commonsense pairs. We furthermore propose a generative model (VCDM) that integrates a vision-language model with instruction tuning to tackle VCD. Automatic and human evaluations demonstrate VCDM's proficiency in VCD, particularly outperforming GPT-4V in implicit commonsense discovery. The value of VCD is further demonstrated by its application to two downstream tasks, including visual commonsense evaluation and visual question answering. The data and code will be made available on GitHub.	翻訳日:2024-02-28 17:43:02 公開日:2024-02-27
# 精製・統一ステレオグラフィネットワーク Purified and Unified Steganographic Network ( http://arxiv.org/abs/2402.17210v1 ) ライセンス: Link先を確認	Guobiao Li, Sheng Li, Zicong Luo, Zhenxing Qian, Xinpeng Zhang	(参考訳) ステガノグラフィー(英: Steganography)とは、秘密データを秘密メディアに隠して隠蔽通信を行う技術である。近年,深層ニューラルネットワーク(dnn)ベースのステガノグラフィースキームが,秘密の埋め込みと回収のためにステガノグラフィーネットワークを訓練するために提案されている。手作りのステガノグラフィーツールと比較すると、ステガノグラフィーネットワークはサイズが大きくなる傾向にある。シークレット通信を容易にするために、これらのネットワークを送信側と受信側に知覚的かつ効果的に送信する方法に関する懸念を提起する。この問題に対処するため,本稿では,純粋で統一されたステガノグラフィーネットワーク (pusnet) を提案する。パーソナライズされたネットワークで通常の機械学習タスクを実行し、異なるキーを使って秘密の埋め込みやリカバリのためにステガノグラフィーネットワークにトリガーすることができる。我々は, 浄化されたネットワークとステガノグラフィーのネットワークを柔軟に切り替えるために, pusnetの構成を疎重充填問題として定式化する。我々はさらに,秘密画像埋め込みとリカバリのために2つのステガノグラフィネットワークを隠蔽した画像デニュージングネットワークとして,pusnetをインスタンス化する。包括的実験により、当社のPUSNetは、秘密画像の埋め込み、秘密画像の復元、単一アーキテクチャにおける画像の復号化に優れた性能を発揮することが示された。また、清浄されたネットワークでステガノグラフィーネットワークを受動的に搬送できることも示されている。コードは \url{https://github.com/albblgb/PUSNet} で入手できる。 Steganography is the art of hiding secret data into the cover media for covert communication. In recent years, more and more deep neural network (DNN)-based steganographic schemes are proposed to train steganographic networks for secret embedding and recovery, which are shown to be promising. Compared with the handcrafted steganographic tools, steganographic networks tend to be large in size. It raises concerns on how to imperceptibly and effectively transmit these networks to the sender and receiver to facilitate the covert communication. To address this issue, we propose in this paper a Purified and Unified Steganographic Network (PUSNet). It performs an ordinary machine learning task in a purified network, which could be triggered into steganographic networks for secret embedding or recovery using different keys. We formulate the construction of the PUSNet into a sparse weight filling problem to flexibly switch between the purified and steganographic networks. We further instantiate our PUSNet as an image denoising network with two steganographic networks concealed for secret image embedding and recovery. Comprehensive experiments demonstrate that our PUSNet achieves good performance on secret image embedding, secret image recovery, and image denoising in a single architecture. It is also shown to be capable of imperceptibly carrying the steganographic networks in a purified network. Code is available at \url{https://github.com/albblgb/PUSNet}	翻訳日:2024-02-28 17:42:41 公開日:2024-02-27
# 実行時calibratableオブジェクト検出のためのデプロイ事前インジェクション Deployment Prior Injection for Run-time Calibratable Object Detection ( http://arxiv.org/abs/2402.17207v1 ) ライセンス: Link先を確認	Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua	(参考訳) トレーニングとテスト分布の強いアライメントにより、コンテキストとしてのオブジェクト関係は、事前にオブジェクト検出を促進する。しかし、それは有害だが避けられない訓練となり、空間と時間に異なる変化をもたらすテスト分布に偏りが生じる。それでも既存の検出器は、パラメータ更新なしにテストフェーズの前に配置コンテキストを組み込むことはできない。この種の能力は、事前の文脈に関して、非絡み合った表現を明示的に学習する必要がある。これを実現するために,グラフが事前に配置コンテキストを表し,エッジ値がオブジェクトの関係を表すグラフ入力を検出器に追加する。そして、修正されたトレーニング目的を用いて、検出者の動作をグラフにバインドするように訓練する。その結果、テストフェーズの間、任意の適切なデプロイメントコンテキストをグラフ編集を介して検出器に注入することができ、そのためパラメータを更新することなく、指定された事前に検出器を調整または"バイアス"することができる。配置の事前が不明であっても、検出器は、独自の予測を用いて近似した配置の自己調整を行うことができる。 cocoデータセットの包括的な実験結果とobject365データセットのクロスデータセットテストは、実行時のcalibratable検出器の有効性を示している。 With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such kind of capability requires the model to explicitly learn disentangled representations with respect to context prior. To achieve this, we introduce an additional graph input to the detector, where the graph represents the deployment context prior, and its edge values represent object relations. Then, the detector behavior is trained to bound to the graph with a modified training objective. As a result, during the test phase, any suitable deployment context prior can be injected into the detector via graph edits, hence calibrating, or "re-biasing" the detector towards the given prior at run-time without parameter update. Even if the deployment prior is unknown, the detector can self-calibrate using deployment prior approximated using its own predictions. Comprehensive experimental results on the COCO dataset, as well as cross-dataset testing on the Objects365 dataset, demonstrate the effectiveness of the run-time calibratable detector.	翻訳日:2024-02-28 17:42:13 公開日:2024-02-27
# 神経モデルの視覚言語stemスキルの測定 Measuring Vision-Language STEM Skills of Neural Models ( http://arxiv.org/abs/2402.17205v1 ) ライセンス: Link先を確認	Jianhao Shen, Ye Yuan, Srbuhi Mirzoyan, Ming Zhang, Chenguang Wang	(参考訳) ニューラルモデルのSTEMスキルをテストするための新しい挑戦を紹介する。現実世界の問題は多くの場合、STEM(科学、技術、工学、数学)の知識を組み合わせて解決する必要がある。既存のデータセットとは異なり、我々のデータセットはSTEMのマルチモーダル視覚言語情報を理解する必要がある。私たちのデータセットは、この課題のための最大かつ最も包括的なデータセットの1つです。 448のスキルと、全STEM科目の1,073,146の質問が含まれている。専門家レベルの能力を調べることに集中する既存のデータセットと比較して、我々のデータセットは、K-12カリキュラムに基づいて設計された基本的なスキルと質問を含んでいる。ベンチマークにはCLIPやGPT-3.5-Turboといった最先端の基盤モデルも追加しています。その結果、最近のモデルの進歩は、データセット内の非常に限られた下位レベルのスキル(3年生の2.5%)の習得にしか役立たないことが分かりました。実際、これらのモデルはまだ小学生のパフォーマンスをかなり下回っており(平均54.7%)、専門家レベルのパフォーマンスには言及されていない。データセットのパフォーマンスを理解し、向上するために、データセットのトレーニング分割についてモデルを教えます。成績は改善したものの,平均的な小学生に比べてモデル性能は低いままである。 STEM問題を解決するには、コミュニティからの新しいアルゴリズムの革新が必要である。 We introduce a new challenge to test the STEM skills of neural models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark. Results show that the recent model advances only help master a very limited number of lower grade-level skills (2.5% in the third grade) in our dataset. In fact, these models are still well below (averaging 54.7%) the performance of elementary students, not to mention near expert-level performance. To understand and increase the performance on our dataset, we teach the models on a training split of our dataset. Even though we observe improved performance, the model performance remains relatively low compared to average elementary students. To solve STEM problems, we will need novel algorithmic innovations from the community.	翻訳日:2024-02-28 17:41:52 公開日:2024-02-27
# 生成モデル評価の向上:OCRシステムにおける実写画像合成と比較のための新しいアルゴリズム Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System ( http://arxiv.org/abs/2402.17204v1 ) ライセンス: Link先を確認	Majid Memari, Khaled R. Ahmed, Shahram Rahimi, Noorbakhsh Amiri Golilarz	(参考訳) 本研究は、生成モデル分野における重要な課題、特に合成画像の生成と評価について論じる。生成モデルの固有の複雑さとそれらの比較のための標準化された手順の欠如を考えると、本研究は合成画像のリアリズムを客観的に評価するための先駆的アルゴリズムを提案する。このアプローチは、Fr'echet Inception Distance(FID)スコアを精細化し、画像品質をより正確かつ主観的に評価することで、評価手法を大幅に強化する。このアルゴリズムは,画像生成における現実主義の主観的性質から,従来ほとんど不可能であったアラビア文字の現実的画像の生成と評価の課題に対処するために,特に調整されている。体系的かつ客観的なフレームワークを提供することにより, 異なる生成モデルの比較を可能にするだけでなく, 設計と出力の改善への道を開く。この評価と比較のブレークスルーは、OCRの分野、特に特異な複雑さを示すスクリプトの進歩に不可欠であり、高品質な合成画像の生成と評価において新しい標準を設定している。 This research addresses a critical challenge in the field of generative models, particularly in the generation and evaluation of synthetic images. Given the inherent complexity of generative models and the absence of a standardized procedure for their comparison, our study introduces a pioneering algorithm to objectively assess the realism of synthetic images. This approach significantly enhances the evaluation methodology by refining the Fr\'echet Inception Distance (FID) score, allowing for a more precise and subjective assessment of image quality. Our algorithm is particularly tailored to address the challenges in generating and evaluating realistic images of Arabic handwritten digits, a task that has traditionally been near-impossible due to the subjective nature of realism in image generation. By providing a systematic and objective framework, our method not only enables the comparison of different generative models but also paves the way for improvements in their design and output. This breakthrough in evaluation and comparison is crucial for advancing the field of OCR, especially for scripts that present unique complexities, and sets a new standard in the generation and assessment of high-quality synthetic images.	翻訳日:2024-02-28 17:41:37 公開日:2024-02-27
# FedBRB:デバイス・ヘテロジニティ・フェデレーション学習における小規模から大規模シナリオの効果的な解決法 FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning ( http://arxiv.org/abs/2402.17202v1 ) ライセンス: Link先を確認	Ziyue Xu, Mingfeng Xu, Tianchi Liao, Zibin Zheng and Chuan Chen	(参考訳) 近年,大規模モデルの成功は,モデルサイズをスケールアップすることの重要性を示している。このことが、連合学習の観点から大規模モデルの協調トレーニングの探求への関心を喚起した。計算の制約のため、多くの機関は大規模モデルをローカルで訓練するのに苦労している。したがって、より小さな局所モデルのみを用いてより大規模なグローバルモデルをトレーニングすることが重要なシナリオとなっている(つまり、textbf{small-to-large scenario})。最近のデバイス・ヘテロジニティ・フェデレーション・ラーニング・アプローチはこの領域を探求し始めているが、グローバルモデルのパラメータ空間を完全にカバーすることの限界に直面している。本稿では,ブロックの概念に基づいて,‘textbf{FedBRB}(\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast)という手法を提案する。 fedbrbは小さなローカルモデルを使用して、大きなグローバルモデルのすべてのブロックをトレーニングし、より高速な情報インタラクションのためにトレーニングされたパラメータを空間全体にブロードキャストすることができる。実験はfeedbrbが実質的なパフォーマンス向上をもたらし、このシナリオで最先端の結果を得ることを示す。さらに、最小限のローカルモデルのみを使用するFedBRBは、より大きなローカルモデルを使用するベースラインを超えることができる。 Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an important scenario (i.e., the \textbf{small-to-large scenario}). Although recent device-heterogeneity federated learning approaches have started to explore this area, they face limitations in fully covering the parameter space of the global model. In this paper, we propose a method called \textbf{FedBRB} (\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast) based on the block concept. FedBRB can uses small local models to train all blocks of the large global model, and broadcasts the trained parameters to the entire space for faster information interaction. Experiments demonstrate FedBRB yields substantial performance gains, achieving state-of-the-art results in this scenario. Moreover, FedBRB using only minimal local models can even surpass baselines using larger local models.	翻訳日:2024-02-28 17:41:17 公開日:2024-02-27
# 圧縮領域に対する強調バイアス緩和による圧縮画像の品質向上 Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain ( http://arxiv.org/abs/2402.17200v1 ) ライセンス: Link先を確認	Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu and Ying Chen	(参考訳) 既存の圧縮画像の品質向上手法では、強調領域を生領域と整合させることに重点を置いている。しかし、これらの手法は圧縮領域に対して広範に拡張バイアスを示し、不注意に原領域よりも現実的であると見なす。このバイアスにより、強調画像は圧縮された画像とよく似ているため、知覚品質は低下する。本稿では,このバイアスを緩和し,圧縮画像の品質を高めるための,シンプルで効果的な方法を提案する。本手法では,圧縮画像をキーとする条件付き判別器を用い,領域分割正規化を組み込んで圧縮領域から強調領域を積極的に距離づける。この2つの戦略により,提案手法は圧縮領域に対する識別を可能にし,拡張領域を生領域に近づける。総合的な品質評価は,提案手法が推論オーバーヘッドを発生させることなく,他の最先端手法よりも優れていることを示す。 Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading their perceptual quality. In this paper, we propose a simple yet effective method to mitigate this bias and enhance the quality of compressed images. Our method employs a conditional discriminator with the compressed image as a key condition, and then incorporates a domain-divergence regularization to actively distance the enhancement domain from the compression domain. Through this dual strategy, our method enables the discrimination against the compression domain, and brings the enhancement domain closer to the raw domain. Comprehensive quality evaluations confirm the superiority of our method over other state-of-the-art methods without incurring inference overheads.	翻訳日:2024-02-28 17:40:55 公開日:2024-02-27
# 不確実性定量化を用いたベイズ深層学習法によるsym-h指標の予測 Prediction of the SYM-H Index Using a Bayesian Deep Learning Method with Uncertainty Quantification ( http://arxiv.org/abs/2402.17196v1 ) ライセンス: Link先を確認	Yasser Abduallah, Khalid A. Alobaid, Jason T. L. Wang, Haimin Wang, Vania K. Jordanova, Vasyl Yurchyshyn, Huseyin Cavus, Ju Jing	(参考訳) 本研究では,SYM-H指数の短期予測を1分間および5分間の解像度データに基づいて,太陽風と惑星間磁場パラメータから協調的に学習するための,グラフニューラルネットワークと双方向長短期記憶ネットワークを用いた新しいディープラーニングフレームワークSYMHnetを提案する。 SYMHnetは、入力として、NASAのSpace Science Data Coordinated Archiveが提供するパラメータ値の時系列を取得し、出力として、wが1または2の所定の時間点tに対して、時間点t + w時間におけるSYM-Hインデックス値を予測する。ベイズ推論を学習フレームワークに組み込むことで、将来のSYM-H指標を予測する際に、SYMHnetはアレタリック(データ)不確実性とエピステミック(モデル)不確実性の両方を定量化することができる。実験の結果、SYMHnetは1分と5分の両方の解像度データに対して、静かな時間と嵐時にうまく動作することがわかった。また,SYMHnetは関連する機械学習手法よりも性能がよいことを示した。例えば、SYMHnetは、5分間の解像度データを用いて大嵐(SYM-H = -393 nT)におけるSYM-H指数(事前1時間)を予測する際に、最近の勾配押し上げ機(GBM)法のFFSの0.074と比較して0.343の予測スキルスコア(FSS)を達成する。大きな嵐でSYM-H指数を予測した場合(2時間前)、SYMHnetはGBM法の0.087のFSSと比較して0.553のFSSを達成する。さらに、SYMHnetはデータとモデルの不確実性定量化の両方に結果を提供できるが、関連する手法はできない。 We propose a novel deep learning framework, named SYMHnet, which employs a graph neural network and a bidirectional long short-term memory network to cooperatively learn patterns from solar wind and interplanetary magnetic field parameters for short-term forecasts of the SYM-H index based on 1-minute and 5-minute resolution data. SYMHnet takes, as input, the time series of the parameters' values provided by NASA's Space Science Data Coordinated Archive and predicts, as output, the SYM-H index value at time point t + w hours for a given time point t where w is 1 or 2. By incorporating Bayesian inference into the learning framework, SYMHnet can quantify both aleatoric (data) uncertainty and epistemic (model) uncertainty when predicting future SYM-H indices. Experimental results show that SYMHnet works well at quiet time and storm time, for both 1-minute and 5-minute resolution data. The results also show that SYMHnet generally performs better than related machine learning methods. For example, SYMHnet achieves a forecast skill score (FSS) of 0.343 compared to the FSS of 0.074 of a recent gradient boosting machine (GBM) method when predicting SYM-H indices (1 hour in advance) in a large storm (SYM-H = -393 nT) using 5-minute resolution data. When predicting the SYM-H indices (2 hours in advance) in the large storm, SYMHnet achieves an FSS of 0.553 compared to the FSS of 0.087 of the GBM method. In addition, SYMHnet can provide results for both data and model uncertainty quantification, whereas the related methods cannot.	翻訳日:2024-02-28 17:40:41 公開日:2024-02-27
# 未知領域検出におけるLCMの性能調査 Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection ( http://arxiv.org/abs/2402.17256v1 ) ライセンス: Link先を確認	Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu	(参考訳) Out-of-domain(OOD)インテント検出は、ユーザのクエリが、タスク指向対話(TOD)システムの適切な機能に欠かせない、システムの事前定義されたドメイン外にあるかどうかを調べることを目的としている。従来の方法は、識別モデルの微調整によってそれに対処する。近年,ChatGPT で表される大規模言語モデル (LLM) を様々な下流タスクに適用する研究が行われているが,OOD 検出タスクの能力についてはまだ不明であり,様々な実験環境下で LLM の総合評価を行い,その強みと弱点を概説する。 LLMには強力なゼロショット機能と少数ショット機能があるが、フルリソースで微調整されたモデルに比べれば不利である。より深く、一連の追加分析実験を通じて、LLMが直面する課題を議論、要約し、ドメイン知識の注入、IND(In- domain)からOODへの知識伝達の強化、ロングインストラクションの理解など、今後の研究の指針を提供する。 Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, but it is still unclear for their ability on OOD detection task.This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and then outline the strengths and weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource. More deeply, through a series of additional analysis experiments, we discuss and summarize the challenges faced by LLMs and provide guidance for future work including injecting domain knowledge, strengthening knowledge transfer from IND(In-domain) to OOD, and understanding long instructions.	翻訳日:2024-02-28 17:35:11 公開日:2024-02-27
# 構成型ゼロショット学習における文脈ベースと多様性による特異性 Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning ( http://arxiv.org/abs/2402.17251v1 ) ライセンス: Link先を確認	Yun Li, Zhe Liu, Hang Chen, and Lina Yao	(参考訳) 合成ゼロショット学習 (CZSL) は、観測例の限られたセットに基づいて、未知の属性オブジェクト対を認識することを目的としている。現在のczsl方法論は、その進歩にもかかわらず、属性に存在する異なる特異性レベルを無視する傾向がある。例えば、スライスしたイチゴのイメージを考えると、前者がより情報に富むにもかかわらず、一般的な「レッド・ストローベリー」よりも「スライス・ストローベリー」を優先できない可能性がある。また、クローズワールド (cw) から open-world (ow) czsl へ移行する際には、検索スペースが膨らむ。本稿では,CZSL(CDS-CZSL)のためのコンテキストベースおよび多様性駆動型特化学習フレームワークを提案する。本フレームワークは, 属性の特異性について, 適用対象の多様性と関連するコンテキストを考慮して評価する。この手法は、特定の属性オブジェクト対を強調してより正確な予測を可能にし、OW-CZSLにおける合成フィルタリングを改善する。我々はCWシナリオとOWシナリオの両方で実験を行い、我々のモデルは3つのデータセットで最先端の結果を得る。 Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object pairs based on a limited set of observed examples. Current CZSL methodologies, despite their advancements, tend to neglect the distinct specificity levels present in attributes. For instance, given images of sliced strawberries, they may fail to prioritize `Sliced-Strawberry' over a generic `Red-Strawberry', despite the former being more informative. They also suffer from ballooning search space when shifting from Close-World (CW) to Open-World (OW) CZSL. To address the issues, we introduce the Context-based and Diversity-driven Specificity learning framework for CZSL (CDS-CZSL). Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context. This novel approach allows for more accurate predictions by emphasizing specific attribute-object pairs and improves composition filtering in OW-CZSL. We conduct experiments in both CW and OW scenarios, and our model achieves state-of-the-art results across three datasets.	翻訳日:2024-02-28 17:34:48 公開日:2024-02-27
# 多層適応フレームワークを用いた深層学習音声と視覚合成によるフィッシング攻撃検出の改善 Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive Framework ( http://arxiv.org/abs/2402.17249v1 ) ライセンス: Link先を確認	Tosin Ige, Christopher Kiekintveld, Aritran Piplai	(参考訳) 最新のフィッシング検出手法をバイパスするフィッシング技術は、業界とアカデミアの両方の研究者にとって、複雑なフィッシング攻撃を検出するための現在のアプローチが不可能であることから、大きな課題となっている。このように、攻撃者による高度化戦略と、検出を回避するために新たな戦術が開発されている速度が相まって、現在のアンチフィッシング手法は複雑なフィッシングに弱いままである。本研究では,深層学習とランダムフォレストを組み合わせた適応型フレームワークを提案し,画像の読み出し,深層映像からの音声合成,各種予測における自然言語処理を行い,フィッシング攻撃検出のための機械学習モデルの性能を大幅に向上させる。 The ever-evolving ways attacker continues to im prove their phishing techniques to bypass existing state-of-the-art phishing detection methods pose a mountain of challenges to researchers in both industry and academia research due to the inability of current approaches to detect complex phishing attack. Thus, current anti-phishing methods remain vulnerable to complex phishing because of the increasingly sophistication tactics adopted by attacker coupled with the rate at which new tactics are being developed to evade detection. In this research, we proposed an adaptable framework that combines Deep learning and Randon Forest to read images, synthesize speech from deep-fake videos, and natural language processing at various predictions layered to significantly increase the performance of machine learning models for phishing attack detection.	翻訳日:2024-02-28 17:34:24 公開日:2024-02-27
# SDR-Former:3D Multi-Phase Imaging を用いた肝病変分類用シームズデュアルリゾリューショントランス SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging ( http://arxiv.org/abs/2402.17246v1 ) ライセンス: Link先を確認	Meng Lou, Hanning Ying, Xiaoqing Liu, Hong-Yu Zhou, Yuqing Zhang, Yizhou Yu	(参考訳) 多相CTおよびMRスキャンにおける肝病変の自動分類は臨床的に重要であるが困難である。本研究は,3次元多相CTおよびMR画像における肝病変分類のための新しいSDR-Formerフレームワークを提案する。提案するsdr-formerは、snnを用いて多相画像入力を処理し、計算効率を維持しつつロバストな特徴表現を有する。 SNNの重み共有機能は、3D畳み込みニューラルネットワーク(CNN)と高解像度と低解像度の画像をそれぞれ処理するための調整された3DトランスからなるハイブリッドDual-Resolution Transformer(DR-Former)によってさらに強化される。このハイブリッドサブアーキテクチャは、詳細なローカル特徴をキャプチャし、グローバルなコンテキスト情報を理解することで、SNNの機能抽出能力を向上する。さらに、新しい適応位相選択モジュール(APSM)を導入し、位相特異的通信を促進し、各位相が診断結果に与える影響を動的に調整する。提案するsdr-formerフレームワークは、3相ctデータセットと8相mrデータセットの2つの臨床データセットに関する包括的な実験を通じて検証されている。実験の結果,提案手法の有効性が確認された。科学コミュニティを支援するため,肝病変解析のための多段階MRデータセットを公開しています。この分野で初めて公開されたマルチフェーズMRデータセットであるこの先駆的なデータセットは、MICCAI LLD-MMRI Challengeを支えている。データセットは:https://bit.ly/3iyylgnでアクセスできる。 Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) to process multi-phase imaging inputs, possessing robust feature representations while maintaining computational efficiency. The weight-sharing feature of the SNN is further enriched by a hybrid Dual-Resolution Transformer (DR-Former), comprising a 3D Convolutional Neural Network (CNN) and a tailored 3D Transformer for processing high- and low-resolution images, respectively. This hybrid sub-architecture excels in capturing detailed local features and understanding global contextual information, thereby, boosting the SNN's feature extraction capabilities. Additionally, a novel Adaptive Phase Selection Module (APSM) is introduced, promoting phase-specific intercommunication and dynamically adjusting each phase's influence on the diagnostic outcome. The proposed SDR-Former framework has been validated through comprehensive experiments on two clinical datasets: a three-phase CT dataset and an eight-phase MR dataset. The experimental results affirm the efficacy of the proposed framework. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public. This pioneering dataset, being the first publicly available multi-phase MR dataset in this field, also underpins the MICCAI LLD-MMRI Challenge. The dataset is accessible at:https://bit.ly/3IyYlgN.	翻訳日:2024-02-28 17:34:08 公開日:2024-02-27
# Playground v2.5: テキスト・画像生成における美的品質向上に向けた3つの視点 Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation ( http://arxiv.org/abs/2402.17245v1 ) ライセンス: Link先を確認	Daiqing Li, Aleks Kamko, Ehsan Akhgari, Ali Sabet, Linmiao Xu, Suhail Doshi	(参考訳) 本稿では,テキストから画像への生成モデルにおいて,最先端の美的品質を実現するための3つの知見について述べる。色とコントラストの強化、複数のアスペクト比における生成の改善、人間中心の細部の改善の3つの重要な側面に焦点を当てた。まず,拡散モデルの学習におけるノイズスケジュールの重要性を掘り下げ,現実性と視覚的忠実性に大きな影響を与えることを示す。第2に、バランスの取れたデータセットを作成することの重要性を強調し、画像生成における様々なアスペクト比を調節することの課題に対処する。最後に、モデル出力と人間の嗜好を一致させる重要な役割について検討し、生成した画像が人間の知覚的期待に共鳴することを保証する。幅広い分析と実験を通じて、Playground v2.5は様々な条件とアスペクト比の美的品質の観点から最先端のパフォーマンスを示し、SDXLやPlayground v2のような広く使われているオープンソースモデルと、DALLE 3やMidjourney v5.2のようなクローズドソースの商用システムの両方を上回っている。われわれのモデルはオープンソースであり、Playground v2.5の開発は、拡散型画像生成モデルの美的品質を高めることを目的とした研究者に貴重なガイドラインを提供することを期待している。 In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demonstrating its profound impact on realism and visual fidelity. Second, we address the challenge of accommodating various aspect ratios in image generation, emphasizing the importance of preparing a balanced bucketed dataset. Lastly, we investigate the crucial role of aligning model outputs with human preferences, ensuring that generated images resonate with human perceptual expectations. Through extensive analysis and experiments, Playground v2.5 demonstrates state-of-the-art performance in terms of aesthetic quality under various conditions and aspect ratios, outperforming both widely-used open-source models like SDXL and Playground v2, and closed-source commercial systems such as DALLE 3 and Midjourney v5.2. Our model is open-source, and we hope the development of Playground v2.5 provides valuable guidelines for researchers aiming to elevate the aesthetic quality of diffusion-based image generation models.	翻訳日:2024-02-28 17:33:37 公開日:2024-02-27
# HardTaint: Selective Hardware Tracingによるプロダクション実行動的タレント解析 HardTaint: Production-Run Dynamic Taint Analysis via Selective Hardware Tracing ( http://arxiv.org/abs/2402.17241v1 ) ライセンス: Link先を確認	Yiyu Zhang, Tianyi Liu, Yueyang Wang, Yun Qi, Kai Ji, Jian Tang, Xiaoliang Wang, Xuandong Li, Zhiqiang Zuo	(参考訳) 動的taint解析(dta)は,セキュリティやプライバシ,診断などにおいて,基本的な解析手法として広く用いられている。 DTAは大量のテナントデータをオンラインで収集し分析することを要求するため、非常に高いランタイムオーバーヘッドを被る。過去数十年にわたり、DTAのオーバーヘッドを下げるために多くの試みがなされてきた。残念ながら、彼らが達成した削減は限界であり、DTAはデバッグ/テストシナリオのみに適用できる。本稿では,実運用環境における動的テナント追跡を実現するシステムであるHardTaintを提案する。 hardtaintは静的解析、選択的ハードウェアトレース、並列グラフ処理技術を組み合わせたハイブリッドで体系的な設計を採用している。包括的な評価では、Intent検出機能を犠牲にすることなく、最先端よりも桁違いに低いランタイムオーバーヘッドを約9%導入している。 Dynamic taint analysis (DTA), as a fundamental analysis technique, is widely used in security, privacy, and diagnosis, etc. As DTA demands to collect and analyze massive taint data online, it suffers extremely high runtime overhead. Over the past decades, numerous attempts have been made to lower the overhead of DTA. Unfortunately, the reductions they achieved are marginal, causing DTA only applicable to the debugging/testing scenarios. In this paper, we propose and implement HardTaint, a system that can realize production-run dynamic taint tracking. HardTaint adopts a hybrid and systematic design which combines static analysis, selective hardware tracing and parallel graph processing techniques. The comprehensive evaluations demonstrate that HardTaint introduces only around 9% runtime overhead which is an order of magnitude lower than the state-of-the-arts, while without sacrificing any taint detection capability.	翻訳日:2024-02-28 17:33:12 公開日:2024-02-27
# ネガティブサンプリングは重要か? その理論と応用についての考察 Does Negative Sampling Matter? A Review with Insights into its Theory and Applications ( http://arxiv.org/abs/2402.17238v1 ) ライセンス: Link先を確認	Zhen Yang, Ming Ding, Tinglin Huang, Yukuo Cen, Junshuai Song, Bin Xu, Yuxiao Dong, and Jie Tang	(参考訳) 負のサンプリングは、機械学習、コンピュータビジョン、自然言語処理、データマイニング、リコメンダシステムといった幅広い応用によって、研究の焦点として急速に注目を集めている。負のサンプリングは本当に重要か? 既存の否定的サンプリングメソッドをすべて組み込むことのできる一般的なフレームワークはありますか? どんな分野で適用されますか? これらの疑問に対して,我々は負のサンプリングを利用する汎用フレームワークを提案する。負のサンプリングの歴史を掘り下げて,5つの進化経路を通じて負のサンプリングの展開を追跡した。我々は、グローバル、ローカル、ミニバッチ、ホップ、メモリベースのアプローチを詳述し、ネガティブなサンプル候補の選択に使用する戦略を特定し分類する。本稿では,現在の負サンプリング法を静的,ハード,ganベース,補助ベース,インバッチ法と5つのタイプに分類し,負サンプリングを理解するための明確な構造を提供する。詳細な分類以外にも,様々な分野における負のサンプリングの適用を強調し,その実用的メリットについて考察する。最後に,オープン問題と負サンプリングの今後の方向性について概説する。 Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling methods? In what fields is it applied? Addressing these questions, we propose a general framework that leverages negative sampling. Delving into the history of negative sampling, we trace the development of negative sampling through five evolutionary paths. We dissect and categorize the strategies used to select negative sample candidates, detailing global, local, mini-batch, hop, and memory-based approaches. Our review categorizes current negative sampling methods into five types: static, hard, GAN-based, Auxiliary-based, and In-batch methods, providing a clear structure for understanding negative sampling. Beyond detailed categorization, we highlight the application of negative sampling in various areas, offering insights into its practical benefits. Finally, we briefly discuss open problems and future directions for negative sampling.	翻訳日:2024-02-28 17:32:52 公開日:2024-02-27
# マルチビューアテンションによる画像テキストマッチング Image-Text Matching with Multi-View Attention ( http://arxiv.org/abs/2402.17237v1 ) ライセンス: Link先を確認	Rui Cheng, Wanqing Cui	(参考訳) 既存の画像テキストマッチングの2ストリームモデルでは,検索速度を確保しつつ良好な性能を示し,産業や学界から広く注目を集めている。これらの方法は、画像とテキストを別々にエンコードする単一の表現を使用し、コサイン類似性やベクトルの内部積と一致するスコアを得る。しかし、2ストリームモデルの性能はしばしば準最適である。一方、単一の表現は複雑なコンテンツを包括的にカバーすることが難しい。一方,インタラクションの欠如というこの枠組みでは,情報の無視につながる複数の意味を一致させることが困難である。上記の問題に対処し、2ストリームモデルの性能を向上させるために、2ストリーム画像テキストマッチングMVAM(\textbf{M}ulti-\textbf{V}iew \textbf{A}ttention \textbf{M}odel)を提案する。まず、異なるビューコードを持つ様々な注意ヘッドによって、複数の画像とテキストの表現を学習する。そして、これらの表現をマッチングのために1つにまとめる。多様性の目標は、アテンションヘッド間の多様性を促進するためにも用いられる。この方法で、モデルは異なるビューから画像やテキストをエンコードし、より重要なポイントに到達することができる。より多くの情報を含む表現を得ることができます検索タスクを行う場合、画像とテキストのマッチングスコアを異なる側面から計算することができ、マッチングパフォーマンスが向上する。 MSCOCO と Flickr30K の実験結果から,提案モデルが既存モデルよりも改良されていることが示された。さらなるケーススタディでは、異なる注意頭が異なるコンテンツに集中でき、最終的により包括的な表現が得られることが示されている。 Existing two-stream models for image-text matching show good performance while ensuring retrieval speed and have received extensive attention from industry and academia. These methods use a single representation to encode image and text separately and get a matching score with cosine similarity or the inner product of vectors. However, the performance of the two-stream model is often sub-optimal. On the one hand, a single representation is challenging to cover complex content comprehensively. On the other hand, in this framework of lack of interaction, it is challenging to match multiple meanings which leads to information being ignored. To address the problems mentioned above and facilitate the performance of the two-stream model, we propose a multi-view attention approach for two-stream image-text matching MVAM (\textbf{M}ulti-\textbf{V}iew \textbf{A}ttention \textbf{M}odel). It first learns multiple image and text representations by diverse attention heads with different view codes. And then concatenate these representations into one for matching. A diversity objective is also used to promote diversity between attention heads. With this method, models are able to encode images and text from different views and attend to more key points. So we can get representations that contain more information. When doing retrieval tasks, the matching scores between images and texts can be calculated from different aspects, leading to better matching performance. Experiment results on MSCOCO and Flickr30K show that our proposed model brings improvements over existing models. Further case studies show that different attention heads can focus on different contents and finally obtain a more comprehensive representation.	翻訳日:2024-02-28 17:32:34 公開日:2024-02-27
# 個人化教育におけるデータマイニングの現状と将来展望 A Review of Data Mining in Personalized Education: Current Trends and Future Prospects ( http://arxiv.org/abs/2402.17236v1 ) ライセンス: Link先を確認	Zhang Xiong, Haoxuan Li, Zhuang Liu, Zhuofan Chen, Hao Zhou, Wenge Rong, Yuanxin Ouyang	(参考訳) 個別の学生のニーズに合わせたパーソナライズド教育は、デジタル時代の教育技術と人工知能(AI)を活用して学習効率を向上させる。教育プラットフォームにおけるAIの統合は、学術的パフォーマンス、学習の好み、行動に関する洞察を提供し、個人の学習プロセスを最適化する。データマイニング技術によって、それは学生に利益をもたらすだけでなく、教育者や機関にカスタマイズされた学習体験を作るツールを提供する。個人化された教育データマイニングの最近の進歩を包括的にレビューするために,本研究では,教育推薦,認知診断,知識追跡,学習分析の4つのシナリオに焦点を当てる。本稿では,各分野の分類体系を整理し,一般的なデータセットをコンパイルし,今後の研究方向を特定し,パーソナライズ教育におけるデータマイニングの役割を強調し,今後の探索とイノベーションへの道を開く。 Personalized education, tailored to individual student needs, leverages educational technology and artificial intelligence (AI) in the digital age to enhance learning effectiveness. The integration of AI in educational platforms provides insights into academic performance, learning preferences, and behaviors, optimizing the personal learning process. Driven by data mining techniques, it not only benefits students but also provides educators and institutions with tools to craft customized learning experiences. To offer a comprehensive review of recent advancements in personalized educational data mining, this paper focuses on four primary scenarios: educational recommendation, cognitive diagnosis, knowledge tracing, and learning analysis. This paper presents a structured taxonomy for each area, compiles commonly used datasets, and identifies future research directions, emphasizing the role of data mining in enhancing personalized education and paving the way for future exploration and innovation.	翻訳日:2024-02-28 17:32:07 公開日:2024-02-27
# バンディットの確率的勾配解析 Stochastic Gradient Succeeds for Bandits ( http://arxiv.org/abs/2402.17235v1 ) ライセンス: Link先を確認	Jincheng Mei and Zixin Zhong and Bo Dai and Alekh Agarwal and Csaba Szepesvari and Dale Schuurmans	(参考訳) この結果から,バンドイット法では,ステップサイズが0(1/t)である場合でも,0(1/t)$のemph{globally optimal}ポリシーに収束することが示された。驚くべきことに、確率勾配バンディットアルゴリズムのグローバル収束は、バンディットに適用できる古いアルゴリズムであるにもかかわらず、以前に確立されていない。 The new result is achieved by establishing two novel technical findings: first, the noise of the stochastic updates in the gradient bandit algorithm satisfies a strong ``growth condition'' property, where the variance diminishes whenever progress becomes small, implying that additional noise control via diminishing step sizes is unnecessary; second, a form of ``weak exploration'' is automatically achieved through the stochastic gradient updates, since they prevent the action probabilities from decaying faster than $O(1/t)$, thus ensuring that every action is sampled infinitely often with probability $1$. これらの2つの発見は、この確率的勾配更新が、探究と搾取は、ほぼ確実にグローバルな最適収束を保証する方法で自動的にバランスをとるという意味で、すでに「十分」であることを示すために用いられる。これらの新しい理論的な発見は実験結果によってさらに検証される。 We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two novel technical findings: first, the noise of the stochastic updates in the gradient bandit algorithm satisfies a strong ``growth condition'' property, where the variance diminishes whenever progress becomes small, implying that additional noise control via diminishing step sizes is unnecessary; second, a form of ``weak exploration'' is automatically achieved through the stochastic gradient updates, since they prevent the action probabilities from decaying faster than $O(1/t)$, thus ensuring that every action is sampled infinitely often with probability $1$. These two findings can be used to show that the stochastic gradient update is already ``sufficient'' for bandits in the sense that exploration versus exploitation is automatically balanced in a manner that ensures almost sure convergence to a global optimum. These novel theoretical findings are further verified by experimental results.	翻訳日:2024-02-28 17:31:49 公開日:2024-02-27
# ハイブリッド正方形ニューラルオード因果モデリング Hybrid Square Neural ODE Causal Modeling ( http://arxiv.org/abs/2402.17233v1 ) ライセンス: Link先を確認	Bob Junyi Zou, Matthew E. Levine, Dessi P. Zaharieva, Ramesh Johari, Emily B. Fox	(参考訳) ハイブリッドモデルは、機械的ODEベースのダイナミクスと柔軟な表現力のあるニューラルネットワークコンポーネントを組み合わせる。このようなモデルは急速に普及し、特にODEベースのモデリングが重要な解釈可能性を提供し、因果的根拠(例えば、反事実的推論)を検証している科学分野において顕著である。メカニスティックモデルの導入は、小さなデータセットや部分的に観察された複雑なシステムから学ぶ際に重要な、標準的なブラックボックスモデリングアプローチにおける帰納的バイアスを与える。残念ながら、ハイブリッドモデルがより柔軟になるにつれて、力学モデルによって提供される因果基底は急速に失われる。本稿では, 治療効果が不明な場合でも, 治療効果のランク付けという, ドメイン知識の他の共通源を活用することで, この問題に対処する。この情報を,標準的な予測損失と組み合わせた因果損失にエンコードして,学習を因果的に有効なハイブリッドモデルに偏らせるハイブリッド損失に到達します。我々は,運動中のグルコース動態をモデル化する困難な課題において,最先端の予測性能と因果正性を達成する能力を示す。 Hybrid models combine mechanistic ODE-based dynamics with flexible and expressive neural network components. Such models have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox modeling approaches, critical when learning from small datasets or partially observed, complex systems. Unfortunately, as hybrid models become more flexible, the causal grounding provided by the mechanistic model can quickly be lost. We address this problem by leveraging another common source of domain knowledge: ranking of treatment effects for a set of interventions, even if the precise treatment effect is unknown. We encode this information in a causal loss that we combine with the standard predictive loss to arrive at a hybrid loss that biases our learning towards causally valid hybrid models. We demonstrate our ability to achieve a win-win -- state-of-the-art predictive performance and causal validity -- in the challenging task of modeling glucose dynamics during exercise.	翻訳日:2024-02-28 17:31:29 公開日:2024-02-27
# 小パラメータ部分微分方程式に対する2次元ニューラルネットワーク Two-scale Neural Networks for Partial Differential Equations with Small Parameters ( http://arxiv.org/abs/2402.17232v1 ) ライセンス: Link先を確認	Qiao Zhuang, Chris Ziyi Yao, Zhongqiang Zhang, George Em Karniadakis	(参考訳) 物理インフォームドニューラルネットワーク(PINN)を用いて,小さなパラメータで偏微分方程式(PDE)を解くための2次元ニューラルネットワーク手法を提案する。この小さなパラメータをニューラルネットワークのアーキテクチャに直接組み込む。提案手法は, トラルニケートパラメータの探索に対して, フーリエ特徴を付加することなく, 簡単な方法でPDEを解くことができる。様々な数値例は、小さなパラメータによって引き起こされる溶液の大きい導関数の特徴を捉えるのに妥当な精度を示している。 We propose a two-scale neural network method for solving partial differential equations (PDEs) with small parameters using physics-informed neural networks (PINNs). We directly incorporate the small parameters into the architecture of neural networks. The proposed method enables solving PDEs with small parameters in a simple fashion, without adding Fourier features or other computationally taxing searches of truncation parameters. Various numerical examples demonstrate reasonable accuracy in capturing features of large derivatives in the solutions caused by small parameters.	翻訳日:2024-02-28 17:31:11 公開日:2024-02-27
# MATHSENSEI: 数学的推論のためのツール拡張大型言語モデル MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning ( http://arxiv.org/abs/2402.17231v1 ) ライセンス: Link先を確認	Debrup Das, Debopriyo Banerjee, Somak Aditya, Ashish Kulkarni	(参考訳) ツール強化された大規模言語モデル(TALM)は、大きな言語モデル(LLM)のスキルセットを高めることで知られており、多くのタスクにおける推論能力の向上につながっている。 TALMは、様々な質問答えベンチマーク、複雑な数学的推論ベンチマークにおける有効性、そして知識検索や数学的方程式の解法のためのツールによって提供される潜在的補完的な利点がオープンな研究課題である。本研究では,数学的推論のためのツール拡張型大規模言語モデルMATHSENSEIを提案する。知識検索ツール(bing web search)、プログラム実行ツール(python)、シンボリック方程式解法(wolfram-alpha)を加えて、数学的推論データセットの評価を通じて、これらのツールの補完的利点について検討する。様々な数学的分野の数学的推論を評価する一般的なデータセットであるmaths上で,徹底的なアブレーションを行う。また、有名なツールプランナによる実験を行い、ツールシークエンシングがモデル性能に与える影響について検討する。 MATHSENSEI は gpt-3.5-turbo よりも 13.5% 精度がよい。さらに,より単純な数学語問題 (GSM-8k) に対してTALMは有効ではなく,複雑性や必要な知識が増大するにつれてメリットが増大する(AQuA,MMLU-Math,MATHの高次複雑問題など)。コードとデータはhttps://github.com/debrup-61/mathsenseiで入手できる。 Tool-augmented Large Language Models (TALM) are known to enhance the skillset of large language models (LLM), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complimentary benefits offered by tools for knowledge retrieval and mathematical equation solving, are open research questions. In this work, we present MATHSENSEI, a tool-augmented large language model for mathematical reasoning. Augmented with tools for knowledge retrieval (Bing Web Search), program execution (Python), and symbolic equation solving (Wolfram-Alpha), we study the complimentary benefits of these tools through evaluations on mathematical reasoning datasets. We perform exhaustive ablations on MATH,a popular dataset for evaluating mathematical reasoning on diverse mathematical disciplines. We also conduct experiments involving well-known tool planners to study the impact of tool sequencing on the model performance. MATHSENSEI achieves 13.5% better accuracy over gpt-3.5-turbo with chain-of-thought on the MATH dataset. We further observe that TALMs are not as effective for simpler math word problems (in GSM-8k), and the benefit increases as the complexity and required knowledge increases (progressively over AQuA, MMLU-Math, and higher level complex questions in MATH). The code and data are available at https://github.com/Debrup-61/MathSensei.	翻訳日:2024-02-28 17:31:01 公開日:2024-02-27
# ディープフェイク検出における公正な一般化の保存 Preserving Fairness Generalization in Deepfake Detection ( http://arxiv.org/abs/2402.17229v1 ) ライセンス: Link先を確認	Li Lin, Xinan He, Yan Ju, Xin Wang, Feng Ding, Shu Hu	(参考訳) 近年、効果的なディープフェイク検出モデルが開発されているが、近年の研究では、これらのモデルが人種や性別などの人口集団間で不公平なパフォーマンス格差をもたらすことが示されている。これにより、特定のグループが不公平なターゲティングや検出の排除に直面し、誤った分類されたディープフェイクが世論を操り、モデルに対する信頼を損なう可能性がある。この問題に対処する既存の方法は、公正な損失関数を提供することである。ドメイン内評価では良好な公平性を示すが、クロスドメインテストでは公平性を維持することができない。これはディープフェイクとの戦いにおける公平な一般化の重要性を強調している。本研究では,特徴,損失,最適化を同時に考慮し,ディープフェイク検出におけるフェアネス一般化問題に対処する最初の手法を提案する。提案手法では, ゆがみ学習を用いて, 人口動態やドメインに依存しない偽造的特徴を抽出し, 平らな損失景観における公正な学習を促進する。顕著なディープフェイクデータセットに対する大規模な実験は、クロスドメインディープフェイク検出時の公平性を維持するための最先端アプローチを超越して、我々の方法の有効性を示す。コードはhttps://github.com/Purdue-M2/Fairness-Generalizationで公開されている。 Although effective deepfake detection models have been developed in recent years, recent studies have revealed that these models can result in unfair performance disparities among demographic groups, such as race and gender. This can lead to particular groups facing unfair targeting or exclusion from detection, potentially allowing misclassified deepfakes to manipulate public opinion and undermine trust in the model. The existing method for addressing this problem is providing a fair loss function. It shows good fairness performance for intra-domain evaluation but does not maintain fairness for cross-domain testing. This highlights the significance of fairness generalization in the fight against deepfakes. In this work, we propose the first method to address the fairness generalization problem in deepfake detection by simultaneously considering features, loss, and optimization aspects. Our method employs disentanglement learning to extract demographic and domain-agnostic forgery features, fusing them to encourage fair learning across a flattened loss landscape. Extensive experiments on prominent deepfake datasets demonstrate our method's effectiveness, surpassing state-of-the-art approaches in preserving fairness during cross-domain deepfake detection. The code is available at https://github.com/Purdue-M2/Fairness-Generalization	翻訳日:2024-02-28 17:30:33 公開日:2024-02-27
# Feature Re-Embedding:計算病理における基礎モデルレベルパフォーマンスを目指して Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology ( http://arxiv.org/abs/2402.17228v1 ) ライセンス: Link先を確認	Wenhao Tang and Fengtao Zhou and Sheng Huang and Xiang Zhu and Yi Zhang and Bo Liu	(参考訳) マルチプル・インスタンス・ラーニング(MIL)は、サブタイピング、診断、予後などを含む計算病理学において最も広く使われているフレームワークである。しかし、既存のMILパラダイムは、通常、トレーニング済みのResNetや基礎モデルのようなオフラインのインスタンス機能抽出器を必要とする。このアプローチには、特定の下流タスク内で機能を微調整する機能がなく、適応性とパフォーマンスが制限されている。この問題に対処するため,インスタンス機能をオンラインで再埋め込みするためのRe-embedded Regional Transformer (R$^2$T)を提案する。強力な機能抽出器を事前訓練したり、洗練されたインスタンスアグリゲータを設計する既存の作業とは異なり、R$^2$Tはオンラインでインスタンス機能を再組み込むように調整されている。メインストリームのMILモデルにシームレスに統合できるポータブルモジュールとして機能する。一般的な計算病理タスクに関する広範囲な実験の結果は、以下のとおりである。 1) 機能再埋め込みにより,ResNet-50機能に基づくMILモデルの性能が基礎モデル機能レベルに向上し,基礎モデル機能の性能がさらに向上する。 2) R$^2$T は様々な MIL モデルにさらなる性能改善をもたらすことができる。 3) R$^2$T-MILは、R$^2$T-enhanced AB-MILとして、他の最新の手法よりも大きなマージンで優れている。コードは以下の通りである。~\href{https://github.com/DearCaat/RRT-MIL}{https://github.com/DearCaat/RRT-MIL}。 Multiple instance learning (MIL) is the most widely used framework in computational pathology, encompassing sub-typing, diagnosis, prognosis, and more. However, the existing MIL paradigm typically requires an offline instance feature extractor, such as a pre-trained ResNet or a foundation model. This approach lacks the capability for feature fine-tuning within the specific downstream tasks, limiting its adaptability and performance. To address this issue, we propose a Re-embedded Regional Transformer (R$^2$T) for re-embedding the instance features online, which captures fine-grained local features and establishes connections across different regions. Unlike existing works that focus on pre-training powerful feature extractor or designing sophisticated instance aggregator, R$^2$T is tailored to re-embed instance features online. It serves as a portable module that can seamlessly integrate into mainstream MIL models. Extensive experimental results on common computational pathology tasks validate that: 1) feature re-embedding improves the performance of MIL models based on ResNet-50 features to the level of foundation model features, and further enhances the performance of foundation model features; 2) the R$^2$T can introduce more significant performance improvements to various MIL models; 3) R$^2$T-MIL, as an R$^2$T-enhanced AB-MIL, outperforms other latest methods by a large margin. The code is available at:~\href{https://github.com/DearCaat/RRT-MIL}{https://github.com/DearCaat/RRT-MIL}.	翻訳日:2024-02-28 17:30:12 公開日:2024-02-27
# パーシステンス図の量子距離近似 Quantum Distance Approximation for Persistence Diagrams ( http://arxiv.org/abs/2402.17295v1 ) ライセンス: Link先を確認	Bernardo Ameneyro, Rebekah Herrman, George Siopsis, Vasileios Maroulas	(参考訳) トポロジカルデータ解析法は, 多種多様な分野の分類やクラスタリングに有用であり, 潜在的に複雑かつ高次元なデータセットの形状に関する重要な情報を要約した2次元の永続化図を提供することができる。パーシステンスダイアグラムの空間は、統計構造を認め、これらの要約を機械学習アルゴリズムに使用できるようにする、wasserstein距離のような様々なメトリクスを付与することができる。しかしながら、2つの永続化ダイアグラム間の距離を計算するには、2つのダイアグラムのポイントにマッチする最適な方法を見つける必要がある。本研究では,量子コンピュータの持続性図間の距離を推定する可能性について検討し,特にワッサースタイン距離と$d^{c}_{p}$距離の変分量子アルゴリズムを提案する。我々の実装は、最適化問題の制約を符号化するために制御節に依存するQuantum Approximate Optimization Algorithmの重み付けバージョンである。 Topological Data Analysis methods can be useful for classification and clustering tasks in many different fields as they can provide two dimensional persistence diagrams that summarize important information about the shape of potentially complex and high dimensional data sets. The space of persistence diagrams can be endowed with various metrics such as the Wasserstein distance which admit a statistical structure and allow to use these summaries for machine learning algorithms. However, computing the distance between two persistence diagrams involves finding an optimal way to match the points of the two diagrams and may not always be an easy task for classical computers. In this work we explore the potential of quantum computers to estimate the distance between persistence diagrams, in particular we propose variational quantum algorithms for the Wasserstein distance as well as the $d^{c}_{p}$ distance. Our implementation is a weighted version of the Quantum Approximate Optimization Algorithm that relies on control clauses to encode the constraints of the optimization problem.	翻訳日:2024-02-28 17:25:14 公開日:2024-02-27
# DivAvatar: 単発の3Dアバター・ジェネレーション DivAvatar: Diverse 3D Avatar Generation with a Single Prompt ( http://arxiv.org/abs/2402.17292v1 ) ライセンス: Link先を確認	Weijing Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao	(参考訳) テキストからアバタールへの生成は最近、拡散モデルの進歩によって大きな進歩を遂げている。しかし、既存の作品の多くは限定的な多様性によって制約を受けており、与えられたテキストプロンプトの外観の微妙な違いを持つアバターが生み出されている。多様なアバターを生成する新しいフレームワークであるDivAvatarを設計し、単一のテキストプロンプトから多種多様な多種多様な3Dアバターを3Dクリエイティブに活用する。 NeRF、DivAvatar finetunes 3D生成モデル(EVA3D)のようなシーン固有の3D表現を利用する既存の作業とは異なり、単純なノイズサンプリングから様々なアバターを生成することができる。 DivAvatarには、世代多様性と視覚的品質を達成するための2つの重要な設計がある。第一は,様々な外観を創り出すのに不可欠な訓練段階におけるノイズサンプリング手法である。 2つめは意味認識ズーム機構と新しい奥行き損失であり、前者は特定の身体部位の微調整を分離してテキスト忠実度の高い外観を生成し、後者は特徴空間で生成されたメッシュを滑らかにすることで幾何学的品質を大幅に向上させる。広範な実験により、ディヴァタールは多様な外観のアバターを生成するのに非常に多用途であることが示された。 Text-to-Avatar generation has recently made significant strides due to advancements in diffusion models. However, most existing work remains constrained by limited diversity, producing avatars with subtle differences in appearance for a given text prompt. We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D avatars from a single text prompt. Different from most existing work that exploits scene-specific 3D representations such as NeRF, DivAvatar finetunes a 3D generative model (i.e., EVA3D), allowing diverse avatar generation from simply noise sampling in inference time. DivAvatar has two key designs that help achieve generation diversity and visual quality. The first is a noise sampling technique during training phase which is critical in generating diverse appearances. The second is a semantic-aware zoom mechanism and a novel depth loss, the former producing appearances of high textual fidelity by separate fine-tuning of specific body parts and the latter improving geometry quality greatly by smoothing the generated mesh in the features space. Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.	翻訳日:2024-02-28 17:24:55 公開日:2024-02-27
# 荷電欠陥音波フォノン放射による誘電損失 Dielectric Loss due to Charged-Defect Acoustic Phonon Emission ( http://arxiv.org/abs/2402.17291v1 ) ライセンス: Link先を確認	Mark E. Turiansky and Chris G. Van de Walle	(参考訳) 最先端超伝導量子ビットのコヒーレンス時間はバルク誘電損失によって制限されるが、この損失につながる顕微鏡機構は不明確である。実験により得られた損失は、音響フォノンの放射による電磁放射の吸収を可能にする荷電欠陥の存在に起因することが示唆された。この機構の吸収係数の明示的な導出により、最近の高精度測定(A. P. Read et al., Phys. Appl. 19, 034064 (2023))とよく一致して、7.2 \times 10^{-9}$ for Al$2$O$_3$の損失接種を導出することができる。また, 約0.2K以下の温度の場合, 損失は温度に依存しないはずであり, 観測値とも一致している。本研究は, 欠陥毎の損失が主にホスト材料の特性に依存することを示すとともに, 高スループット探索により, ダイヤモンド, 立方BN, AlN, SiCが最適であることが示唆された。 The coherence times of state-of-the-art superconducting qubits are limited by bulk dielectric loss, yet the microscopic mechanism leading to this loss is unclear. Here we propose that the experimentally observed loss can be attributed to the presence of charged defects that enable the absorption of electromagnetic radiation by the emission of acoustic phonons. Our explicit derivation of the absorption coefficient for this mechanism allows us to derive a loss tangent of $7.2 \times 10^{-9}$ for Al$_2$O$_3$, in good agreement with recent high-precision measurements [A. P. Read et al., Phys. Rev. Appl. 19, 034064 (2023)]. We also find that for temperatures well below ~0.2 K, the loss should be independent of temperature, also in agreement with observations. Our investigations show that the loss per defect depends mainly on properties of the host material, and a high-throughput search suggests that diamond, cubic BN, AlN, and SiC are optimal in this respect.	翻訳日:2024-02-28 17:24:31 公開日:2024-02-27
# 多回転航空機ローカライゼーションのためのアクティブ推進ノイズシェーピング Active propulsion noise shaping for multi-rotor aircraft localization ( http://arxiv.org/abs/2402.17289v1 ) ライセンス: Link先を確認	Serussi Gabriele, Shor Tamir, Hirshberg Tom, Baskin Chaim, Bronstein Alex	(参考訳) マルチローターの自律走行車(MAV)は主にナビゲーション目的のビジョンに依存している。しかし、視覚的局在化とオドメトリー技術は、低い日光や直射日光、視野の制限、閉塞に対する脆弱性に悩まされている。音響センシングは多くの状況において視覚の補完的あるいは代替的モダリティとして機能し、特にマイクロ航空機にとって重要なシステムコストとエネルギーフットプリントの利点も備えている。本稿では,ロータが発する航空機の推進騒音を,有害なニュアンスではなく,局部化作業のために積極的に制御・成形することを提案する。既知の環境における自己雑音に基づくローカライゼーションのためのニューラルネットワークアーキテクチャを提案する。学習時間変動ロータ位相変調と同時にトレーニングすることで,高精度でロバストな局所化を実現することを示す。提案手法は,回転子圧力場の実記録に適合する2次元音響環境におけるmavロータ雑音の計算可能なシミュレーションを用いて評価する。 Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benefits of lower system cost and energy footprint, which is especially important for micro aircraft. This paper proposes actively controlling and shaping the aircraft propulsion noise generated by the rotors to benefit localization tasks, rather than considering it a harmful nuisance. We present a neural network architecture for selfnoise-based localization in a known environment. We show that training it simultaneously with learning time-varying rotor phase modulation achieves accurate and robust localization. The proposed methods are evaluated using a computationally affordable simulation of MAV rotor noise in 2D acoustic environments that is fitted to real recordings of rotor pressure fields.	翻訳日:2024-02-28 17:24:09 公開日:2024-02-27
# 生成モデルのエントロピーに基づく新規性の解釈可能な評価 An Interpretable Evaluation of Entropy-based Novelty of Generative Models ( http://arxiv.org/abs/2402.17287v1 ) ライセンス: Link先を確認	Jingwei Zhang, Cheuk Ting Li, Farzan Farnia	(参考訳) 生成モデルフレームワークやアーキテクチャの大規模な開発には、参照データセットやベースライン生成モデルと比較して、モデルの新規性を評価するための原則的な方法が必要となる。最近の文献では、生成モデルの質、多様性、一般化性の評価が広く研究されているが、ベースラインモデルに対するモデルの新規性の評価は機械学習コミュニティでは十分に研究されていない。生成モデル $\mathcal{g}$ と参照データセット $\mathcal{s}$ のサンプルを考えると、$\mathcal{s}$ より頻繁に$\mathcal{g}$ で表されるモードをどうやって見つけて数えることができるか。本稿では,上記のタスクにスペクトル的アプローチを導入し,分散$p_\mathcal{g}$のモードに基づく新しさを定量化するためのカーネル・ベース・エントロピー・ノベルティ(ken)スコアを提案する。サブガウシアン成分との混合分布下でのケンスコアの挙動を解析的に解釈する。次に,Colesky分解に基づく観測試料からKENスコアを計算する手法を開発した。我々は,KENに基づく新規性の定量化を支援するために,合成および実画像の分布に関する数値的な結果を提示した。提案手法は,新しいモードの検出に成功し,最新の生成モデルとの比較を行った。 The massive developments of generative model frameworks and architectures require principled methods for the evaluation of a model's novelty compared to a reference dataset or baseline generative models. While the recent literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a baseline model has not been adequately studied in the machine learning community. In this work, we focus on the novelty assessment under multi-modal generative models and attempt to answer the following question: Given the samples of a generative model $\mathcal{G}$ and a reference dataset $\mathcal{S}$, how can we discover and count the modes expressed by $\mathcal{G}$ more frequently than in $\mathcal{S}$. We introduce a spectral approach to the described task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of distribution $P_\mathcal{G}$ with respect to distribution $P_\mathcal{S}$. We analytically interpret the behavior of the KEN score under mixture distributions with sub-Gaussian components. Next, we develop a method based on Cholesky decomposition to compute the KEN score from observed samples. We support the KEN-based quantification of novelty by presenting several numerical results on synthetic and real image distributions. Our numerical results indicate the success of the proposed approach in detecting the novel modes and the comparison of state-of-the-art generative models.	翻訳日:2024-02-28 17:23:49 公開日:2024-02-27
# 拡散モデルとグループオートコーダ超解像ネットワークによるハイパースペクトル画像の強調 Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network ( http://arxiv.org/abs/2402.17285v1 ) ライセンス: Link先を確認	Zhaoyang Wang, Dongyang Li, Mingyang Zhang, Hao Luo, Maoguo Gong	(参考訳) 既存の超スペクトル画像(HSI)超解像法は、複雑なスペクトル空間関係と低レベルの詳細を効果的に捉えるのに苦労する一方、拡散モデルは、複雑な関係をモデル化し、高レベルの視覚的特徴を学習することにおいて、優れた生成モデルである。 HSI SR への拡散モデルの直接適用は、モデル収束の困難や引き抜き推論時間といった課題によって妨げられる。本稿では,拡散モデルと相乗的に結合して高効率なHSI SRモデル(DMGASR)を構築する,新しいグループオートエンコーダ(GAE)フレームワークを提案する。提案するGAEフレームワークは,拡散モデルが機能する低次元潜在空間に高次元HSIデータを符号化することにより,バンド相関を維持しながら拡散モデルのトレーニングを困難にし,推定時間を著しく短縮する。自然と遠隔の両方のハイパースペクトルデータセットに対する実験結果から,提案手法は視覚的および計量的に他の最先端手法よりも優れていることが示された。 Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their exceptional performance in modeling complex relations and learning high and low-level visual features. The direct application of diffusion models to HSI SR is hampered by challenges such as difficulties in model convergence and protracted inference time. In this work, we introduce a novel Group-Autoencoder (GAE) framework that synergistically combines with the diffusion model to construct a highly effective HSI SR model (DMGASR). Our proposed GAE framework encodes high-dimensional HSI data into low-dimensional latent space where the diffusion model works, thereby alleviating the difficulty of training the diffusion model while maintaining band correlation and considerably reducing inference time. Experimental results on both natural and remote sensing hyperspectral datasets demonstrate that the proposed method is superior to other state-of-the-art methods both visually and metrically.	翻訳日:2024-02-28 17:23:27 公開日:2024-02-27
# 非エルミート摂動を用いた位相秩序状態の固有状態切替 Eigenstate switching of topologically ordered states using non-Hermitian perturbations ( http://arxiv.org/abs/2402.17280v1 ) ライセンス: Link先を確認	Cheol Hun Yeom, Beom Hyun Kim and Moon Jip Park	(参考訳) 位相的に順序付けられた位相は局所摂動に対して頑健な縮退基底状態を持ち、フォールトトレラント量子計算のための有望なプラットフォームを提供する。トポロジカル秩序の非局所的特徴にもかかわらず、局所的非エルミート摂動はトポロジカルに順序づけられた基底状態間の遷移を引き起こす。本研究では,非エルミート摂動の存在下でのトーリック符号の研究を行う。非休眠性を制御することによって、非直交基底状態は固有状態の合体を示し、スペクトル特異点(EP)を持つことを示す。我々は位相秩序の制御におけるepsの可能性を探る。 Adiabatic Encircling EPsは固有状態の制御を可能とし、基底状態の縮退を動的に操作できる。興味深いことに、局所摂動の任意の強度がEPおよび固有状態スイッチングを誘導できるという我々のスキームの特性を示す。最後に,非断熱遷移(NAT)のEP周囲の動的循環における配向依存性の挙動を示す。我々の研究は、非ヘルミティシティの制御がフォールトトレラント量子情報処理の有望な戦略となり得ることを示している。 Topologically ordered phases have robust degenerate ground states against the local perturbations, providing a promising platform for fault-tolerant quantum computation. Despite of the non-local feature of the topological order, we find that local non-Hermitian perturbations can induce the transition between the topologically ordered ground states. In this work, we study the toric code in the presence of non-Hermitian perturbations. By controlling the non-Hermiticity, we show that non-orthogonal ground states can exhibit an eigenstate coalescence and have the spectral singularity, known as an exceptional point (EP). We explore the potential of the EPs in the control of topological order. Adiabatic encircling EPs allows for the controlled switching of eigenstates, enabling dynamic manipulation between the ground state degeneracy. Interestingly, we show a property of our scheme that arbitrary strengths of local perturbations can induce the EP and eigenstate switching. Finally, we also show the orientation-dependent behavior of non-adiabatic transitions (NAT) during the dynamic encirclement around an EP. Our work shows that control of the non-Hermiticity can serve as a promising strategy for fault-tolerant quantum information processing.	翻訳日:2024-02-28 17:23:05 公開日:2024-02-27
# 1ショット構造を考慮したスティル化画像合成 One-Shot Structure-Aware Stylized Image Synthesis ( http://arxiv.org/abs/2402.17275v1 ) ライセンス: Link先を確認	Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong	(参考訳) GANベースのモデルは画像のスタイリング作業で成功しているが、幅広い入力イメージをスタイリングしながら構造保存に苦慮することが多い。近年,画像スタイリングには拡散モデルが採用されているが,入力画像の本来の品質を維持する能力は乏しい。そこで本研究では,構造保存に頑健なワンショットスタイライゼーション手法であるosasisを提案する。我々は、OSASISが画像の構造から意味を効果的に切り離し、与えられた入力に実装されたコンテンツやスタイルのレベルを制御することができることを示す。ドメイン外参照画像のスタイライゼーションやテキスト操作によるスタイライゼーションなど,さまざまな実験的な設定にosasisを適用する。その結果、オサシスは他のスタイライゼーション法よりも優れており、特にトレーニング中にほとんど見つからなかった入力画像に対して、拡散モデルによるスタイライゼーションに対する有望な解決策が得られた。 While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust in structure preservation. We show that OSASIS is able to effectively disentangle the semantics from the structure of an image, allowing it to control the level of content and style implemented to a given input. We apply OSASIS to various experimental settings, including stylization with out-of-domain reference images and stylization with text-driven manipulation. Results show that OSASIS outperforms other stylization methods, especially for input images that were rarely encountered during training, providing a promising solution to stylization via diffusion models.	翻訳日:2024-02-28 17:22:45 公開日:2024-02-27
# マルチエージェント, ヒューマンエージェント, その他: 社会的ジレンマにおける協調に関する調査 Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social Dilemmas ( http://arxiv.org/abs/2402.17270v1 ) ライセンス: Link先を確認	Hao Guo, Chunjiang Mu, Yang Chen, Chen Shen, Shuyue Hu, Zhen Wang	(参考訳) 社会的ジレンマにおける協力の研究は、コンピュータ科学や社会科学を含む様々な分野において、長い間基本的な話題であった。人工知能(AI)の最近の進歩はこの分野を大きく変え、協力の理解と強化に新たな洞察を与えている。本調査は,社会ジレンマにおけるaiと協調の交点における3つの重要領域について検討する。まず,複数エージェント間の協調を支援する本質的・外的モチベーションと,多様な相手に対する効果的な戦略開発のための手法について検討する。次に,人間とエージェントの協調について考察し,人間と協調する現在のaiアルゴリズムと,aiエージェントに対する人間のバイアスについて論じる。第3に,人間同士の協力を高めるためにAIエージェントを活用するという創発的な分野を概観する。結論として, 大規模言語モデルの利用, 統一理論フレームワークの確立, 既存の人間協力理論の再検討, 複数の実世界応用の検討など, 今後の研究の道筋について論じる。 The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human-agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.	翻訳日:2024-02-28 17:22:28 公開日:2024-02-27
# マルチモーダル感情認識のための非循環グラフを用いたカリキュラム学習 Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition ( http://arxiv.org/abs/2402.17269v1 ) ライセンス: Link先を確認	Cam-Van Thi Nguyen, Cao-Bach Nguyen, Quang-Thuy Ha, Duc-Trong Le	(参考訳) 会話における感情認識(erc)は、自然言語処理と感情コンピューティングにおいて重要なタスクである。本稿では,多言語対話におけるマルチモーダル感情認識(ERC)の新たなアプローチであるMultiDAG+CLを提案する。このモデルはCurriculum Learning (CL)によって強化され、感情の変化やデータの不均衡に関連する課題に対処する。カリキュラム学習は、トレーニングサンプルを段階的に意味のある順序で提示することで学習プロセスを容易にし、感情の変化やデータの不均衡を扱う際のモデルの性能を向上させる。 IEMOCAPとMELDデータセットの実験結果は、MultiDAG+CLモデルがベースラインモデルより優れていることを示している。 Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models.	翻訳日:2024-02-28 17:22:11 公開日:2024-02-27
# 融合型位置認識のための明示的相互作用 Explicit Interaction for Fusion-Based Place Recognition ( http://arxiv.org/abs/2402.17264v1 ) ライセンス: Link先を確認	Jingyi Xu, Junyi Ma, Qi Wu, Zijie Zhou, Yue Wang, Xieyuanli Chen, and Ling Pei	(参考訳) フュージョンベースの位置認識は、マルチモーダルな知覚データを利用して、ロボットや自動運転車のGPSデニッドシナリオでこれまで訪れた場所を認識する新しい技術である。近年の核融合型位置認識法は, 暗黙的に多モード特徴を組み合わせている。顕著な結果が得られたが、融合系において個々のモダリティが与える価値を明示的に考慮していない。したがって、マルチモーダルな特徴融合の利点を十分に探求することはできない。本稿では,2つのモードの明示的な相互作用を実現するために,EINetと呼ばれる新しい融合型ネットワークを提案する。 EINetはLiDARレンジを使用して長期にわたってより堅牢な視覚機能を監視し、同時にカメラRGBデータを使用してLiDARポイントクラウドの識別を改善する。さらに, nuScenesデータセットに基づく位置認識タスクのための新しいベンチマークを開発する。このベンチマークを総合的な比較で確立するために,評価プロトコルとともに教師付きおよび自己監督型のトレーニングスキームを導入する。提案するベンチマークを広範囲に実験し,実験結果から,最先端の核融合型位置認識手法と比較して,固有ネットの認識性能が向上し,高い一般化性が得られた。私たちのオープンソースコードとベンチマークは、https://github.com/BIT-XJY/EINet.comで公開されています。 Fusion-based place recognition is an emerging technique jointly utilizing multi-modal perception data, to recognize previously visited places in GPS-denied scenarios for robots and autonomous vehicles. Recent fusion-based place recognition methods combine multi-modal features in implicit manners. While achieving remarkable results, they do not explicitly consider what the individual modality affords in the fusion system. Therefore, the benefit of multi-modal feature fusion may not be fully explored. In this paper, we propose a novel fusion-based network, dubbed EINet, to achieve explicit interaction of the two modalities. EINet uses LiDAR ranges to supervise more robust vision features for long time spans, and simultaneously uses camera RGB data to improve the discrimination of LiDAR point clouds. In addition, we develop a new benchmark for the place recognition task based on the nuScenes dataset. To establish this benchmark for future research with comprehensive comparisons, we introduce both supervised and self-supervised training schemes alongside evaluation protocols. We conduct extensive experiments on the proposed benchmark, and the experimental results show that our EINet exhibits better recognition performance as well as solid generalization ability compared to the state-of-the-art fusion-based place recognition approaches. Our open-source code and benchmark are released at: https://github.com/BIT-XJY/EINet.	翻訳日:2024-02-28 17:21:58 公開日:2024-02-27
# パラメータ効率の良い微調整のためのミニセンブル低ランクアダプタ Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2402.17263v1 ) ライセンス: Link先を確認	Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei	(参考訳) パラメータ効率細調整(PEFT)は、特にモデルのスケールやタスクの多様性が増大するにつれて、訓練済みの大規模言語モデル(LLM)を調整するための一般的な手法である。低ランク適応(LoRA)は、適応過程が本質的に低次元である、すなわち重要なモデル変化を比較的少数のパラメータで表すことができるという考えに基づいている。しかし、フルパラメータの微調整と比較した場合、ランクの低下は特定のタスクの一般化エラーと遭遇する。我々は,より高いランクを維持しながらトレーニング可能なパラメータを少なくし,性能を向上するミニアンサンブル低ランクアダプタMELoRAを提案する。基本的なアイデアは、トレーニング済みのオリジナルのウェイトを凍結し、少数のパラメータしか持たないミニロラスのグループをトレーニングすることだ。これはミニロラスのかなりの多様性を捉え、より優れた一般化能力を促進することができる。種々のNLPタスクに関する理論的解析と実証的研究を行う。実験の結果, MELoRA は LoRA と比較して,自然言語理解タスクの8倍のトレーニングパラメータ,36倍のトレーニングパラメータで性能が向上し,MELoRA の有効性が示された。 Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring pre-trained large language models (LLMs), especially as the models' scale and the diversity of tasks increase. Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional, i.e., significant model changes can be represented with relatively few parameters. However, decreasing the rank encounters challenges with generalization errors for specific tasks when compared to full-parameter fine-tuning. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank, thereby offering improved performance potential. The core idea is to freeze original pretrained weights and train a group of mini LoRAs with only a small number of parameters. This can capture a significant degree of diversity among mini LoRAs, thus promoting better generalization ability. We conduct a theoretical analysis and empirical studies on various NLP tasks. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks, which demonstrates the effectiveness of MELoRA.	翻訳日:2024-02-28 17:21:36 公開日:2024-02-27
# ターンアウト:多ターン対話における大規模言語モデルの安全性脆弱性 Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue ( http://arxiv.org/abs/2402.17262v1 ) ライセンス: Link先を確認	Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su	(参考訳) 大規模言語モデル(LLM)は、特に「ジェイルブレイク」を受ける場合、違法または非倫理的な応答を生成することが示されている。脱獄の研究はLLMの安全性の問題を浮き彫りにした。しかし、従来の研究では、LLMから人間が情報を引き出す重要なモードであるマルチターン対話によって生じる潜在的な複雑さやリスクを無視して、シングルターン対話に主に焦点を合わせてきた。本稿では,人間が多ターン対話を利用してLSMを誘導し,有害な情報を生成することを論じる。 LLMは、マルチターン対話において、各ターンが悪意ある1つの目的のために密に提供されたとしても、警告やバウンダリのアンセーフクエリを拒否する意図はない。そこで,マルチターン対話のために,安全でないクエリを複数のサブクエリに分解することで,LSMに有害なサブクエリに対する回答を徐々に誘導し,全体として有害な応答を導いた。本実験は多方向対話におけるLLMの安全性メカニズムの問題点を示唆するものである。本研究は,マルチターン対話を伴う複雑なシナリオにおいて,LLMの脆弱性を明らかにし,LLMの安全性に関する新たな課題を提示する。 Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses, particularly when subjected to "jailbreak." Research on jailbreak has highlighted the safety issues of LLMs. However, prior studies have predominantly focused on single-turn dialogue, ignoring the potential complexities and risks presented by multi-turn dialogue, a crucial mode through which humans derive information from LLMs. In this paper, we argue that humans could exploit multi-turn dialogue to induce LLMs into generating harmful information. LLMs may not intend to reject cautionary or borderline unsafe queries, even if each turn is closely served for one malicious purpose in a multi-turn dialogue. Therefore, by decomposing an unsafe query into several sub-queries for multi-turn dialogue, we induced LLMs to answer harmful sub-questions incrementally, culminating in an overall harmful response. Our experiments, conducted across a wide range of LLMs, indicate current inadequacies in the safety mechanisms of LLMs in multi-turn dialogue. Our findings expose vulnerabilities of LLMs in complex scenarios involving multi-turn dialogue, presenting new challenges for the safety of LLMs.	翻訳日:2024-02-28 17:21:14 公開日:2024-02-27
# 無限次元における確率近似 Stochastic approximation in infinite dimensions ( http://arxiv.org/abs/2402.17258v1 ) ライセンス: Link先を確認	Rajeeva Laxman Karandikar, Bhamidi V Rao	(参考訳) 確率近似(Stochastic Approximation、SA)は1950年代初頭に導入され、数十年にわたって研究の活発な領域であった。初期の焦点は統計的な問題であったが、信号処理や凸最適化に応用されていた。 %) が,近年では強化学習 (rl) に応用され,関心の復活に繋がるなど,saに対する関心が復活している。文献の大部分は、観測が有限次元ユークリッド空間からのものである場合のSA上にあるが、同じものを無限次元に拡張することに興味がある。ヒルベルト空間への拡張は比較的容易に行うことができるが、バナッハ空間を考えるとそうではない。近似がバナッハ空間で作用するいくつかの場合を考える。我々のフレームワークは、バナッハ空間 $\Bb$ が $\Cb([0,1],\R^d)$ である場合と、$\L^1([0,1],\R^d)$ である場合を含み、ラドン-ニコディムの性質さえ持たない2つの場合を含む。 Stochastic Approximation (SA) was introduced in the early 1950's and has been an active area of research for several decades. While the initial focus was on statistical questions, it was seen to have applications to signal processing, convex optimisation. %Over the last decade, there has been a revival of interest in SA as In later years SA has found application in Reinforced Learning (RL) and led to revival of interest. While bulk of the literature is on SA for the case when the observations are from a finite dimensional Euclidian space, there has been interest in extending the same to infinite dimension. Extension to Hilbert spaces is relatively easier to do, but this is not so when we come to a Banach space - since in the case of a Banach space, even {\em law of large numbers} is not true in general. We consider some cases where approximation works in a Banach space. Our framework includes case when the Banach space $\Bb$ is $\Cb([0,1],\R^d)$, as well as $\L^1([0,1],\R^d)$, the two cases which do not even have the Radon-Nikodym property.	翻訳日:2024-02-28 17:20:55 公開日:2024-02-27
# RIME:雑音を考慮したロバスト推論に基づく強化学習 RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences ( http://arxiv.org/abs/2402.17257v1 ) ライセンス: Link先を確認	Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang	(参考訳) 嗜好に基づく強化学習(PbRL)は、報酬信号として人間の嗜好を活用することにより、報酬工学の必要性を回避する。しかし、現在のPbRLアルゴリズムは、ドメインエキスパートからの高品質なフィードバックを過度に頼っているため、堅牢性が欠如している。本稿では,雑音の選好から効果的な報酬学習のための頑健なPbRLアルゴリズムであるRIMEを提案する。提案手法は,ロバストトレーニングのための選別選好を動的にフィルタするために,サンプル選択に基づく判別器を組み込んだ。誤選択による累積誤差を軽減するため,pbrlにおける事前トレーニングからオンライントレーニングへの移行時のパフォーマンスギャップを橋渡しし,報酬モデルのウォームスタートを提案する。ロボット操作とロコモーションタスクに関する実験により,現在のpbrl法のロバスト性が大幅に向上することを示した。アブレーション研究は、限られたフィードバックの場合の堅牢性とフィードバック効率の両方に温かいスタートが不可欠であることを示した。 Preference-based Reinforcement Learning (PbRL) avoids the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL algorithms over-reliance on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method incorporates a sample selection-based discriminator to dynamically filter denoised preferences for robust training. To mitigate the accumulated error caused by incorrect selection, we propose to warm start the reward model, which additionally bridges the performance gap during transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the current state-of-the-art PbRL method. Ablation studies further demonstrate that the warm start is crucial for both robustness and feedback-efficiency in limited-feedback cases.	翻訳日:2024-02-28 17:20:29 公開日:2024-02-27
# SDDGR: クラスインクリメンタルオブジェクト検出のための安定拡散に基づくDeep Generative Replay SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection ( http://arxiv.org/abs/2402.17323v1 ) ライセンス: Link先を確認	Junsu Kim, Hoseong Cho, Jihyeon Kim, Yihalem Yimolal Tiruneh, Seungryul Baek	(参考訳) クラスインクリメンタルラーニング(CIL)の分野では、遺伝的リプレイは、遺伝子モデルにおける連続的な改善とともに、破滅的な忘れを緩和する方法として、ますます注目されている。しかし、クラスインクリメンタルオブジェクト検出(ciod)におけるその応用は、主に複数のラベルを含むシーンのcom-plexityのため、著しく制限されている。本稿では,ciodのための安定拡散深層生成リプレイ(sddgr)と呼ばれる新しい手法を提案する。本手法は,事前学習したテキストから拡散ネットワークを用いた拡散モデルを用いて,現実的で多様な合成テーマ画像を生成する。 SDDGRは、古いクラスを含む高品質な画像を作成するための反復的な改善戦略を取り入れている。さらに,合成画像における事前知識の保持を改善するために,l2知識ディスティラメント手法を採用する。さらに,新しいタスクイメージ内の古いオブジェクトを擬似ラベル化することで,背景要素の誤分類を防止する。 COCO 2017データセットに関する大規模な実験では、SDDGRが既存のアルゴリズムを著しく上回り、さまざまなCIODシナリオで新たな最先端を実現することが示されている。ソースコードは一般公開される予定だ。 In the field of class incremental learning (CIL), genera- tive replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the con- tinuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the com- plexities of scenes involving multiple labels. In this paper, we propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD. Our method utilizes a diffusion-based generative model with pre-trained text- to-diffusion networks to generate realistic and diverse syn- thetic images. SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes. Additionally, we adopt an L2 knowledge distilla- tion technique to improve the retention of prior knowledge in synthetic images. Furthermore, our approach includes pseudo-labeling for old objects within new task images, pre- venting misclassification as background elements. Exten- sive experiments on the COCO 2017 dataset demonstrate that SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios. The source code will be made available to the public.	翻訳日:2024-02-28 17:16:41 公開日:2024-02-27
# 集中的視覚予測のためのバニラマルチタスクフレームワーク - 第1回vclチャレンジ -- マルチタスクロバスト性トラック A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track ( http://arxiv.org/abs/2402.17319v1 ) ライセンス: Link先を確認	Zehui Chen, Qiuchen Wang, Zhenyu Li, Jiaming Liu, Shanghang Zhang, Feng Zhao	(参考訳) 本稿では,ICCV 2023 Workshopにおいて,第1回視覚連続学習(VCL)チャレンジのマルチタスクロバスト性トラックに対するソリューションを提案する。様々な視覚知覚アルゴリズムをマルチタスクモデルにシームレスに組み合わせた,uninetというバニラフレームワークを提案する。具体的には,DreTR3D,Mask2Former,BinsFormerの3次元オブジェクト検出,インスタンス分割,深さ推定タスクを選択する。最終的な提出は、InternImage-Lのバックボーンを持つシングルモデルで、ShiFT検証セットで49.6スコア(29.5デットmAP、80.3mTPS、46.4セグマAP、7.93シログ)を達成した。また, 密集した視覚予測におけるマルチタスク学習の開発を促進するため, 実験で興味深い観察を行った。 In this report, we present our solution to the multi-task robustness track of the 1st Visual Continual Learning (VCL) Challenge at ICCV 2023 Workshop. We propose a vanilla framework named UniNet that seamlessly combines various visual perception algorithms into a multi-task model. Specifically, we choose DETR3D, Mask2Former, and BinsFormer for 3D object detection, instance segmentation, and depth estimation tasks, respectively. The final submission is a single model with InternImage-L backbone, and achieves a 49.6 overall score (29.5 Det mAP, 80.3 mTPS, 46.4 Seg mAP, and 7.93 silog) on SHIFT validation set. Besides, we provide some interesting observations in our experiments which may facilitate the development of multi-task learning in dense visual prediction.	翻訳日:2024-02-28 17:16:17 公開日:2024-02-27
# Augmented Auxiliary Networksによるローカル学習のスケーリング Scaling Supervised Local Learning with Augmented Auxiliary Networks ( http://arxiv.org/abs/2402.17318v1 ) ライセンス: Link先を確認	Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan	(参考訳) ディープニューラルネットワークは一般的に、生物学的に理解できないだけでなく、更新ロックの問題に悩まされ、膨大なメモリ消費を必要とするグローバルエラー信号を使用して訓練される。各レイヤを独立して勾配分離した補助ネットワークで更新するローカル学習は、上記の問題に対処するための有望な代替手段を提供する。しかし,既存のローカル学習手法は,特に大規模ネットワークにおいてBPとの大きな精度差に直面している。これは、レイヤ間の勾配通信がないため、ローカル層とその後のネットワーク層の間の弱い結合に起因する。この問題に対処するため,AugLocalと呼ばれる拡張ローカル学習手法を提案する。 AugLocalは、後続のネットワーク層から小さなレイヤのサブセットを均一に選択することで、各隠されたレイヤの補助ネットワークを構築する。また,隠れた層が深くなるにつれて補助ネットワークの深さを線形に削減し,補助ネットワークの計算コストを低減し,十分なネットワーク容量を確保することを提案する。 4つの画像分類データセット(CIFAR-10、SVHN、STL-10、ImageNet)に関する広範な実験により、AugLocalは、BPトレーニングネットワークに匹敵する精度で、数十のローカルレイヤに効果的にスケールアップでき、GPUメモリ使用量を約40%削減できることを示した。したがって、提案されたauglocalメソッドは、リソース制約のあるプラットフォーム上でハイパフォーマンスなディープニューラルネットワークをトレーニングする無数の機会を開く。 Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at https://github.com/ChenxiangMA/AugLocal.	翻訳日:2024-02-28 17:15:58 公開日:2024-02-27
# 2023年の成人グリオーマチャレンジをどうやって勝ち取ったか? 偽物だ! 脳腫瘍セグメンテーションのための合成データの強化とモデルアンサンブル How we won BraTS 2023 Adult Glioma challenge? Just faking it! Enhanced Synthetic Data Augmentation and Model Ensemble for brain tumour segmentation ( http://arxiv.org/abs/2402.17317v1 ) ライセンス: Link先を確認	Andr\'e Ferreira, Naida Solak, Jianning Li, Philipp Dammann, Jens Kleesiek, Victor Alves, Jan Egger	(参考訳) Deep Learningは、脳腫瘍をセグメント化するための最先端技術である。しかし、これは多くの高品質なデータを必要とするため、特に医療分野では入手が困難である。そこで本研究では,データ拡張のための非従来的メカニズムを用いてこの問題に対処する。 brats2023チャレンジの最初のタスクである3つの異なる脳腫瘍セグメンテーションのディープラーニングモデルをトレーニングするための利用可能なサンプルの量を増やすために、生成的逆境ネットワークと登録が使用される。最初のモデルは標準のnnU-Net、2番目はSwin UNETR、3番目はBraTS 2021 Challengeの勝利のソリューションである。パイプライン全体は、合成データの生成を除いて、nnU-Net実装に基づいて構築されている。畳み込みアルゴリズムとトランスフォーマーを使用することで、互いの知識ギャップを埋めることができる。新しい測定値を用いて, 有効解法は, 検証セットで0.9005, 0.8673, 0.8509, HD95 14.940, 14.467, 17.699(全腫瘍, 腫瘍コア, 造影腫瘍)を得る。 Deep Learning is the state-of-the-art technology for segmenting brain tumours. However, this requires a lot of high-quality data, which is difficult to obtain, especially in the medical field. Therefore, our solutions address this problem by using unconventional mechanisms for data augmentation. Generative adversarial networks and registration are used to massively increase the amount of available samples for training three different deep learning models for brain tumour segmentation, the first task of the BraTS2023 challenge. The first model is the standard nnU-Net, the second is the Swin UNETR and the third is the winning solution of the BraTS 2021 Challenge. The entire pipeline is built on the nnU-Net implementation, except for the generation of the synthetic data. The use of convolutional algorithms and transformers is able to fill each other's knowledge gaps. Using the new metric, our best solution achieves the dice results 0.9005, 0.8673, 0.8509 and HD95 14.940, 14.467, 17.699 (whole tumour, tumour core and enhancing tumour) in the validation set.	翻訳日:2024-02-28 17:15:27 公開日:2024-02-27
# 選択的エントロピー蒸留によるロバストで効率的な雲縁弾性モデル適応 Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation ( http://arxiv.org/abs/2402.17316v1 ) ライセンス: Link先を確認	Yaofo Chen, Shuaicheng Niu, Shoukai Xu, Hengjie Song, Yaowei Wang, Mingkui Tan	(参考訳) 従来のディープラーニングパラダイムでは、しばしば、サーバー上でディープモデルをトレーニングし、モデルまたは蒸留したモデルをリソース制限エッジデバイスにデプロイする。通常、モデルは、サーバ側とエッジ側の両方に対するモデル適応の潜在的高コストのために、一度(少なくとも一定期間)デプロイしても固定され続けなければならない。しかし、多くの実世界のシナリオでは、テスト環境は動的に変化し(分散シフトと呼ばれる)、しばしば性能が低下する。したがって、エッジモデルに迅速に適応して、有望なパフォーマンスを達成する必要がある。さらに、エッジで収集されるデータの増加に伴い、このパラダイムは、パフォーマンス向上のためにクラウドモデルをさらに適応することができない。これらに対処するために、私たちは2つの大きな課題に遭遇します。 1)エッジモデルは計算能力が限られており,前方伝播のみをサポートすることができる。 2) クラウドとエッジデバイス間のデータ転送予算は遅延に敏感なシナリオで制限される。本稿では,クラウド-エッジ弾性モデル適応(CEMA)パラダイムを構築し,エッジモデルが前方伝播のみを実行し,エッジモデルをオンラインで適用可能にする。 CEMAでは、通信負担を軽減するため、不要なサンプルをクラウドにアップロードすること、すなわち動的で信頼性の低いサンプルを除外することの2つの基準を考案した。アップロードしたサンプルに基づいて,より強力な基礎モデルから試料再生戦略を用いてエッジモデルに蒸留することにより,正規化層のアフィンパラメータを更新,分散する。 ImageNet-C と ImageNet-R の大規模な実験結果により,CEMA の有効性が検証された。 The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA.	翻訳日:2024-02-28 17:15:06 公開日:2024-02-27
# SKT5SciSumm - 多文書科学要約のためのハイブリッド生成手法 SKT5SciSumm - A Hybrid Generative Approach for Multi-Document Scientific Summarization ( http://arxiv.org/abs/2402.17311v1 ) ライセンス: Link先を確認	Huy Quoc To, Hung-Nghiep Tran, Andr'e Greiner-Petter, Felix Beierle, Akiko Aizawa	(参考訳) 科学文献の要約は、研究コミュニティと人間社会の両方に多大な利益をもたらしている。科学的テキストの性質が独特であり、多文書要約タスクの入力がかなり長いことを考えると、重要な情報を失うことなく十分な埋め込み生成とテキストトランケーションが必要である。本稿では,多文書科学要約(MDSS)のためのハイブリッドフレームワークであるSKT5SciSummを提案する。引用変換トランスフォーマー(specter)を用いた論文埋め込みの文変換版を用いて,テキスト文の符号化と表現を行い,k-meansクラスタリングを用いた効率的な抽出要約を可能にする。抽出された文を用いて抽象要約を生成するために,t5系モデルを用いる。 SKT5SciSummはMulti-XScienceデータセット上で最先端のパフォーマンスを達成する。より広範な実験と評価を通じて、より複雑なモデルを用いて顕著な結果を得ることにより、論文の多文書要約の分野を前進させる可能性を明らかにする。 Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS). We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences, allowing for efficient extractive summarization using k-means clustering. We employ the T5 family of models to generate abstractive summaries using extracted sentences. SKT5SciSumm achieves state-of-the-art performance on the Multi-XScience dataset. Through extensive experiments and evaluation, we showcase the benefits of our model by using less complicated models to achieve remarkable results, thereby highlighting its potential in advancing the field of multi-document summarization for scientific text.	翻訳日:2024-02-28 17:14:41 公開日:2024-02-27
# 自動しきい値とラベリングを用いた蛍光ラベルセルの追跡および解析方法 Method of Tracking and Analysis of Fluorescent-Labeled Cells Using Automatic Thresholding and Labeling ( http://arxiv.org/abs/2402.17310v1 ) ライセンス: Link先を確認	Mizuki Fukasawa (1), Tomokazu Fukuda (1), Takuya Akashi (1) ((1) Iwate University)	(参考訳) 細胞画像を用いた高スループットスクリーニングは、医薬品の新しい候補をスクリーニングする効率的な方法である。スクリーニング処理を完了させるためには、細胞画像を分析するための効率的なプロセスが不可欠である。本稿では,細胞を効率的に追跡し,細胞質と核のシグナル比を定量的に検出する新しい方法を提案する。既存の手法には、画像処理技術を使用するものや、人工知能(AI)を利用するものが含まれる。しかし、これらの手法は画像間のセルの対応を考慮せず、AIを訓練するために大量の新しい学習データを必要とする。そこで本手法では,各セルの位置を画像間で比較し,セルの信号比を連続的に測定・解析するために,自動しきい値付けとラベル付けアルゴリズムを用いる。本稿では,本手法のアルゴリズムについて述べる。本手法を用いて,バイナライズ過程における開封操作回数が細胞追跡に及ぼす影響について検討した。実験により,開封プロセスの適切な数を決定することができた。 High-throughput screening using cell images is an efficient method for screening new candidates for pharmaceutical drugs. To complete the screening process, it is essential to have an efficient process for analyzing cell images. This paper presents a new method for efficiently tracking cells and quantitatively detecting the signal ratio between cytoplasm and nuclei. Existing methods include those that use image processing techniques and those that utilize artificial intelligence (AI). However, these methods do not consider the correspondence of cells between images, or require a significant amount of new learning data to train AI. Therefore, our method uses automatic thresholding and labeling algorithms to compare the position of each cell between images, and continuously measure and analyze the signal ratio of cells. This paper describes the algorithm of our method. Using the method, we experimented to investigate the effect of the number of opening and closing operations during the binarization process on the tracking of the cells. Through the experiment, we determined the appropriate number of opening and closing processes.	翻訳日:2024-02-28 17:14:21 公開日:2024-02-27
# キラル分子の完全な量子状態制御 Full quantum state control of chiral molecules ( http://arxiv.org/abs/2402.17308v1 ) ライセンス: Link先を確認	JuHyeon Lee, Elahe Abdiha, Boris G. Sartakov, Gerard Meijer, Sandra Eibenberger-Arias	(参考訳) 選択されたエナンチオマーに対するキラル分子の内部量子状態の制御には、幅広い基礎応用がある。調整されたマイクロ波場を用いて、選択された回転状態は、ラセミ混合物からでも選択されたエナンチオマーに対して濃縮することができる。これにより、特定の状態で異なるエナンチオマーのサンプルを素早く切り替えることができ、例えばキラル分子のパリティ違反を測定するといった大きな可能性を秘めている。完全なエナンチオマー特異的な状態移動を達成することは、これや他の多くのアプリケーションにとって重要な要件である。理論的には実現可能ではあったが、必要な実験条件を達成することは現実的ではなかった。ここでは, 熱人口の限界と回転状態の空間縮退を克服し, ほぼ理想的条件を実現する。以上の結果から,C1対称性のすべてのキラル分子に普遍的に適用可能なアプローチにより,96%の状態特異なエナンチオマー純度をラセミ混合物から得ることができた。 Controlling the internal quantum states of chiral molecules for a selected enantiomer has a wide range of fundamental applications. Using tailored microwave fields, a chosen rotational state can be enriched for a selected enantiomer, even starting from a racemic mixture. This enables rapid switching between samples of different enantiomers in a given state, holding great promise, for instance, for measuring parity violation in chiral molecules. Achieving full enantiomer-specific state transfer is a key requirement for this and many other applications. Although theoretically feasible, achieving the required experimental conditions seemed unrealistic. Here, we realize near-ideal conditions, overcoming both the limitations of thermal population and spatial degeneracy in rotational states. Our results show that 96% state-specific enantiomeric purity can be obtained from a racemic mixture, in an approach that is universally applicable to all chiral molecules of C1 symmetry.	翻訳日:2024-02-28 17:14:07 公開日:2024-02-27
# 健康な脳組織を塗りつぶすための消音拡散モデル Denoising Diffusion Models for Inpainting of Healthy Brain Tissue ( http://arxiv.org/abs/2402.17307v1 ) ライセンス: Link先を確認	Alicia Durrer, Philippe C. Cattin, Julia Wolleb	(参考訳) 本論文は,「脳の局所的な脳組織合成法(brats 2023 local synthesis of healthy brain tissue via inpainting challenge)」への貢献である。この課題の課題は、腫瘍組織を脳磁気共鳴(MR)画像の健康な組織に変換することである。この考え方は、MR画像が自動処理ツールで評価できるという問題に端を発するが、これらのツールの多くは健康組織の分析に最適化されている。与えられた塗装課題を解決することにより、病変を特徴とする画像の自動解析と、さらに下流のタスクを可能にする。本手法は拡散確率モデルに基づく。健康な組織を切り抜いたスライスを用いてトレーニングした2dモデルを用いて、再び塗り替えられることを学習した。これにより、トレーニング中に真実の健全な組織を使用することができます。サンプリング段階では、元の3dボリュームの病組織を含むスライスを、健康な組織を塗るスライスに置き換える。我々のアプローチでは、競合する手法に匹敵する結果を得る。検証セットでは、平均SSIMは0.7804、PSNRは20.3525、MSEは0.0113である。将来、我々は2Dモデルを3Dモデルに拡張し、隣り合うスライスのコンテキスト情報を失うことなく、関心領域全体をインペイントする計画を立てる。 This paper is a contribution to the "BraTS 2023 Local Synthesis of Healthy Brain Tissue via Inpainting Challenge". The task of this challenge is to transform tumor tissue into healthy tissue in brain magnetic resonance (MR) images. This idea originates from the problem that MR images can be evaluated using automatic processing tools, however, many of these tools are optimized for the analysis of healthy tissue. By solving the given inpainting task, we enable the automatic analysis of images featuring lesions, and further downstream tasks. Our approach builds on denoising diffusion probabilistic models. We use a 2D model that is trained using slices in which healthy tissue was cropped out and is learned to be inpainted again. This allows us to use the ground truth healthy tissue during training. In the sampling stage, we replace the slices containing diseased tissue in the original 3D volume with the slices containing the healthy tissue inpainting. With our approach, we achieve comparable results to the competing methods. On the validation set our model achieves a mean SSIM of 0.7804, a PSNR of 20.3525 and a MSE of 0.0113. In future we plan to extend our 2D model to a 3D model, allowing to inpaint the region of interest as a whole without losing context information of neighboring slices.	翻訳日:2024-02-28 17:13:50 公開日:2024-02-27
# 第2ラウンド: ソフトウェアエンジニアリングへのさまざまな道 The Second Round: Diverse Paths Towards Software Engineering ( http://arxiv.org/abs/2402.17306v1 ) ライセンス: Link先を確認	Sonja Hyrynsalmi, Ella Peltonen, Fanny Vainionp\"a\"a, Sami Hyrynsalmi	(参考訳) 現存する文献では、マイノリティがソフトウェア業界に入る動機や動機について議論されている。例えば、大学は、より多様な学生を惹きつけるために、より多様なイメージに投資してきた。しかし,本研究では,なぜ学生が現在の専攻を選ぶのか,そしてソフトウェア工学を専攻することを最初に決めたのかを考察する。私たちはまた、より多くの女性をテック業界に誘い込むためにマーケティングに役立ててくれる兆候があるかどうかを学ぶことにも興味を持っていた。フィンランドのソフトウェア工学の大学生に送ったオンライン調査(N = 78)を通じて、このトピックにアプローチしました。結果から,女性の平均は男性よりもソフトウェア工学研究に応用され,男女差は統計的に有意であった。さらに、マーケティング行動は性別によって異なる影響があることがわかりました。ライブイベントやプラットフォームにおける個人指導は女性に最も影響を及ぼす一方、教師やソーシャルメディアは男性に大きな影響を与えます。また, 従来のリニア教育経路と成人キャリア変化経路の2つの主要な経路が性別によって大きく異なることを示している。 In the extant literature, there has been discussion on the drivers and motivations of minorities to enter the software industry. For example, universities have invested in more diverse imagery for years to attract a more diverse pool of students. However, in our research, we consider whether we understand why students choose their current major and how they did in the beginning decided to apply to study software engineering. We were also interested in learning if there could be some signs that would help us in marketing to get more women into tech. We approached the topic via an online survey (N = 78) sent to the university students of software engineering in Finland. Our results show that, on average, women apply later to software engineering studies than men, with statistically significant differences between genders. Additionally, we found that marketing actions have different impacts based on gender: personal guidance in live events or platforms is most influential for women, whereas teachers and social media have a more significant impact on men. The results also indicate two main paths into the field: the traditional linear educational pathway and the adult career change pathway, each significantly varying by gender	翻訳日:2024-02-28 17:13:29 公開日:2024-02-27
# グローバル・ローカル意味表現のためのマルチモーダル大言語モデル探索 Probing Multimodal Large Language Models for Global and Local Semantic Representation ( http://arxiv.org/abs/2402.17304v1 ) ライセンス: Link先を確認	Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao	(参考訳) 大規模な言語モデルの成功は、研究者にその例外的な表現能力を他のモダリティに移すきっかけとなった。イメージキャプチャアライメントデータセットを活用して、mllm(multimodal large language model)をトレーニングし、画像からテキストへのタスクで最先端のパフォーマンスを実現する。しかし、MLLMが完全な画像情報、すなわちグローバルな情報、あるいはローカルなオブジェクト情報のみをキャプチャできるかどうかを真に理解する研究はほとんどない。本研究では,モデルの中間層がより大域的な意味情報をエンコードできることを示す。さらに、オブジェクト検出タスクを通して局所的な意味表現のモデルを探索する。そして,最上位層が地域情報に過度に集中し,グローバル情報をエンコードする能力が低下する可能性があるという結論を導いた。 The success of large language models has inspired researchers to transfer their exceptional representing ability to other modalities. Several recent works leverage image-caption alignment datasets to train multimodal large language models (MLLMs), which achieve state-of-the-art performance on image-to-text tasks. However, there are very few studies exploring whether MLLMs truly understand the complete image information, i.e., global information, or if they can only capture some local object information. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models for local semantic representation through object detection tasks. And we draw a conclusion that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.	翻訳日:2024-02-28 17:13:11 公開日:2024-02-27
# LLMは文化関連コモンセンスQAデータを生成することができるか? インドネシアとスンダニの事例研究 Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese ( http://arxiv.org/abs/2402.17302v1 ) ライセンス: Link先を確認	Rifki Afina Putri, Faiz Ghifari Haznitrama, Dea Adhista, Alice Oh	(参考訳) 大規模言語モデル(llm)は、モデルのトレーニングと評価のために合成データを生成するためにますます使われている。しかしながら、言語に埋め込まれた知識や文化的ニュアンス、特に低リソース言語を組み込んだ、優れたqol(qa)データセットを生成できるかどうかは不明だ。本研究では,インドネシア語とスンダ語における文化関連コモンセンスQAデータセット作成におけるLLMの有効性を検討した。そのために、LLMと人間アノテーションの両方を含む様々な手法を用いて、これらの言語のためのデータセットを作成する。実験の結果,現在最高のLLMであるGPT-4 Turboは,インドネシア語では十分な知識を持つ質問を生成できるが,スンダ語ではそうではない。また、生成されたデータセット上で様々なLCMをベンチマークし、LLM生成データセットにおいて人間によって生成されたデータセットよりも優れた性能を発揮することを発見した。 Large Language Models (LLMs) are increasingly being used to generate synthetic data for training and evaluating models. However, it is unclear whether they can generate a good quality of question answering (QA) dataset that incorporates knowledge and cultural nuance embedded in a language, especially for low-resource languages. In this study, we investigate the effectiveness of using LLMs in generating culturally relevant commonsense QA datasets for Indonesian and Sundanese languages. To do so, we create datasets for these languages using various methods involving both LLMs and human annotators. Our experiments show that the current best-performing LLM, GPT-4 Turbo, is capable of generating questions with adequate knowledge in Indonesian but not in Sundanese, highlighting the performance discrepancy between medium- and lower-resource languages. We also benchmark various LLMs on our generated datasets and find that they perform better on the LLM-generated datasets compared to those created by humans.	翻訳日:2024-02-28 17:12:51 公開日:2024-02-27
# コンパイルされたXORゲームの価値に関する計算的Tsirelsonの理論 A Computational Tsirelson's Theorem for the Value of Compiled XOR Games ( http://arxiv.org/abs/2402.17301v1 ) ライセンス: Link先を確認	David Cui, Giulio Malavolta, Arthur Mehta, Anand Natarajan, Connor Paddock, Simon Schmidt, Michael Walter, Tina Zhang	(参考訳) 非局所ゲームは、複数の空間的に分離された量子デバイスの設定において、絡み合いを理解し、量子プロトコルを構築するための基本的なツールである。本研究は、古典的検証器と暗号に制限された1つの量子デバイスの間で行われる非局所的ゲームのkalaiら(stoc '23)による研究を継続する。我々の主な成果は、Kalaiらによって提案されたコンパイラが、任意の2プレーヤXORゲームに対してサウンドであることである。 tsirelson の有名な定理は、xor のゲームでは量子値が半定値のプログラムによって正確に与えられることを示し、sdp の上限がコンパイル結果から生じる不可解な誤差までコンパイルされたゲームに対して保持することを示した。これはnataarajan と zhang (focs '23) が提起した chsh ゲームの特定のケースに対して健全性を示した質問に対する答えである。我々は,(1)並列繰り返しXORゲームのコンパイル値の厳密なバウンダリ,(2)コンパイルされたXORゲームに対する演算子自己検証文,(3)演算子剛性を示す任意のXORゲームに対する ` `nice' sum-of-squares 証明書など,いくつかの追加結果を得た。 Nonlocal games are a foundational tool for understanding entanglement and constructing quantum protocols in settings with multiple spatially separated quantum devices. In this work, we continue the study initiated by Kalai et al. (STOC '23) of compiled nonlocal games, played between a classical verifier and a single cryptographically limited quantum device. Our main result is that the compiler proposed by Kalai et al. is sound for any two-player XOR game. A celebrated theorem of Tsirelson shows that for XOR games, the quantum value is exactly given by a semidefinite program, and we obtain our result by showing that the SDP upper bound holds for the compiled game up to a negligible error arising from the compilation. This answers a question raised by Natarajan and Zhang (FOCS '23), who showed soundness for the specific case of the CHSH game. Using our techniques, we obtain several additional results, including (1) tight bounds on the compiled value of parallel-repeated XOR games, (2) operator self-testing statements for any compiled XOR game, and (3) a ``nice" sum-of-squares certificate for any XOR game, from which operator rigidity is manifest.	翻訳日:2024-02-28 17:12:20 公開日:2024-02-27
# ArcSin: 言語駆動視覚タスクに対する適応的範囲のコサイン類似性注入ノイズ ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks ( http://arxiv.org/abs/2402.17298v1 ) ライセンス: Link先を確認	Yang Liu, Xiaomin Yu, Gongyu Zhang, Christos Bergeles, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin	(参考訳) 本研究では,視覚的質問応答 (VQA) やイメージキャプション (IC) ,ビジュアル・エンターテイメント (VE) など,視覚的タスクに対する言語からの学習と推論の間のモダリティギャップを埋めることの課題に対処する。我々は、これらのタスクのモデルをゼロショットクロスモーダル転送設定でトレーニングする。このドメインでは、以前のstate-of-the-artメソッドは固定されたスケールのノイズインジェクションに依存しており、しばしば元のモダリティ埋め込みの意味的内容に妥協する。そこで本研究では,適応射程コサイン類似性注入ノイズ(ArcSin)と呼ばれる新しい手法を提案する。まず,従来のテキスト特徴の整合性を維持しつつ,より可変性の高いテキスト要素を効果的に生成する適応雑音尺度を提案する。次に、類似性プール戦略を採用し、全体のノイズスケールを広げることで、ドメイン一般化の可能性を広げる。この二重戦略は、コンテンツ整合性を守りながら、元のドメインの範囲を効果的に拡大する。実験結果から,これらのモデルが画像上で訓練されたモデルと性能的に密接に競合していることが判明した。具体的には,S-Cap と M-Cap の 1.9 と 1.1 の CIDEr 点をそれぞれ獲得した。さらに, VQA, VQA-E, VEの精度は1.5パーセンテージ(pp), 1.4pp, 1.4ppの増加を観察し, 画像学習モデルベンチマークの制約内で達成可能な領域の境界を押し上げる。コードはリリースされます。 In this study, we address the challenging task of bridging the modality gap between learning from language and inference for visual tasks, including Visual Question Answering (VQA), Image Captioning (IC) and Visual Entailment (VE). We train models for these tasks in a zero-shot cross-modal transfer setting, a domain where the previous state-of-the-art method relied on the fixed scale noise injection, often compromising the semantic content of the original modality embedding. To combat it, we propose a novel method called Adaptive ranged cosine Similarity injected noise (ArcSin). First, we introduce an innovative adaptive noise scale that effectively generates the textual elements with more variability while preserving the original text feature's integrity. Second, a similarity pool strategy is employed, expanding the domain generalization potential by broadening the overall noise scale. This dual strategy effectively widens the scope of the original domain while safeguarding content integrity. Our empirical results demonstrate that these models closely rival those trained on images in terms of performance. Specifically, our method exhibits substantial improvements over the previous state-of-the-art, achieving gains of 1.9 and 1.1 CIDEr points in S-Cap and M-Cap, respectively. Additionally, we observe increases of 1.5 percentage points (pp), 1.4 pp, and 1.4 pp in accuracy for VQA, VQA-E, and VE, respectively, pushing the boundaries of what is achievable within the constraints of image-trained model benchmarks. The code will be released.	翻訳日:2024-02-28 17:11:40 公開日:2024-02-27
# 動的シーンにおける学習露光補正 Learning Exposure Correction in Dynamic Scenes ( http://arxiv.org/abs/2402.17296v1 ) ライセンス: Link先を確認	Jin Liu, Bo Wang, Chuanming Wang, Huiyuan Fu, Huadong Ma	(参考訳) 露光を間違えたビデオの撮影は、通常は不満足な視覚効果をもたらす。画像の露光補正は一般的な話題だが、ビデオは文献ではあまり研究されていない。ビデオ入力に事前のイメージベース手法を直接適用すると、時間的不整合が生じ、視覚的品質が低下する。この領域における既存の研究は、高品質なベンチマークデータセットの欠如によっても制限されている。これらの問題に対処するために、私たちは、過度な露出と過度な露出の両方を含む、最初の実世界のペアビデオデータセットを構築します。空間アライメントを実現するために,2台のデジタル一眼レフカメラとビームスプリッタを用いて不適切な露光映像と通常の露光映像を同時に撮影する。また,2流照明学習機構を組み込んだretinex理論に基づく映像露光補正ネットワーク(vecnet)を提案する。推定多重フレーム反射率とデュアルパス照明成分は特徴レベルと画像レベルの両方で融合し、視覚的に魅力的な結果をもたらす。実験結果から,提案手法は既存の画像の露出補正やビデオ強調手法よりも優れていた。コードとデータセットは近く提供される。 Capturing videos with wrong exposure usually produces unsatisfactory visual effects. While image exposure correction is a popular topic, the video counterpart is less explored in the literature. Directly applying prior image-based methods to input videos often results in temporal incoherence with low visual quality. Existing research in this area is also limited by the lack of high-quality benchmark datasets. To address these issues, we construct the first real-world paired video dataset, including both underexposure and overexposure dynamic scenes. To achieve spatial alignment, we utilize two DSLR cameras and a beam splitter to simultaneously capture improper and normal exposure videos. In addition, we propose a Video Exposure Correction Network (VECNet) based on Retinex theory, which incorporates a two-stream illumination learning mechanism to enhance the overexposure and underexposure factors, respectively. The estimated multi-frame reflectance and dual-path illumination components are fused at both feature and image levels, leading to visually appealing results. Experimental results demonstrate that the proposed method outperforms existing image exposure correction and underexposed video enhancement methods. The code and dataset will be available soon.	翻訳日:2024-02-28 17:10:49 公開日:2024-02-27
# CGGM:IoTネットワークにおけるノード異常検出のための適応間隔付き条件付きグラフ生成モデル CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks ( http://arxiv.org/abs/2402.17363v1 ) ライセンス: Link先を確認	Xianshi Su, Munan Li, Tongbang Jiang, Hao Long	(参考訳) 動的グラフはiot(internet of things)内のノードの異常な振る舞いを検出するために広く使われている。生成モデルは、動的グラフにおける不均衡ノードカテゴリの問題に対処するためにしばしば用いられる。それでも、それらが直面する制約には、隣接関係の単調性、ノードの多次元機能構築の難しさ、ノードの複数カテゴリのエンドツーエンド生成方法の欠如などがある。本稿では,マイノリティクラスに属するノードを多数生成することを目的として,CGGMと呼ばれる新しいグラフ生成モデルを提案する。適応スパーシティにより隣接行列を生成する機構は、その構造における柔軟性を高める。多次元特徴生成(MFG)と呼ばれる特徴生成モジュールは、位相情報とともにノード特徴を生成する。ラベルは埋め込みベクトルに変換され、複数のカテゴリにわたる合成データの生成を制御するための条件付き制約となる。多段階の損失を用いて合成データの分布を調整し、実データとよく似ている。大規模な実験では、CGGMの合成データが様々な指標で最先端の手法よりも優れていることを示す。その結果,多種多様なデータカテゴリの効率的な生成が可能となり,多種分類モデルの性能が向上した。 Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. This paper presents a novel graph generation model, called CGGM, designed specifically to generate a larger number of nodes belonging to the minority class. The mechanism for generating an adjacency matrix, through adaptive sparsity, enhances flexibility in its structure. The feature generation module, called multidimensional features generator (MFG) to generate node features along with topological information. Labels are transformed into embedding vectors, serving as conditional constraints to control the generation of synthetic data across multiple categories. Using a multi-stage loss, the distribution of synthetic data is adjusted to closely resemble that of real data. In extensive experiments, we show that CGGM's synthetic data outperforms state-of-the-art methods across various metrics. Our results demonstrate efficient generation of diverse data categories, robustly enhancing multi-category classification model performance.	翻訳日:2024-02-28 17:05:45 公開日:2024-02-27
# CAPT:変圧器を用いた単一点雲からのカテゴリーレベルの調音推定 CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer ( http://arxiv.org/abs/2402.17360v1 ) ライセンス: Link先を確認	Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato and Takeshi Oishi	(参考訳) ジョイントパラメータを推定する能力は、ロボット工学やコンピュータビジョンの様々な応用に不可欠である。本稿では,変圧器を用いた点雲からのカテゴリレベルの調音推定法captを提案する。 CAPTはエンドツーエンドのトランスフォーマーベースのアーキテクチャを使用して、単一点雲からの調音対象の結合パラメータと状態推定を行う。提案手法は, 高精度で頑健な各種調音物体の接合パラメータと状態を高精度に推定する。また, 物体の動的特徴を強調することで, 音節推定性能を向上させる動き損失手法を提案する。さらに,このフレームワークに粗いパラメータ推定を提供するために,二重投票戦略を提案する。いくつかのカテゴリデータセットにおける実験結果は,提案手法が既存の調音推定法よりも優れていることを示している。本研究は,調音オブジェクト分析にトランスフォーマティブアーキテクチャを適用するための有望なソリューションを提供する。 The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurately estimate joint parameters and states for various articulated objects with high precision and robustness. The paper also introduces a motion loss approach, which improves articulation estimation performance by emphasizing the dynamic features of articulated objects. Additionally, the paper presents a double voting strategy to provide the framework with coarse-to-fine parameter estimation. Experimental results on several category datasets demonstrate that our methods outperform existing alternatives for articulation estimation. Our research provides a promising solution for applying Transformer-based architectures in articulated object analysis.	翻訳日:2024-02-28 17:05:23 公開日:2024-02-27
# SoFA: 優先ルールによるオンザフライアライメント SoFA: Shielded On-the-fly Alignment via Priority Rule Following ( http://arxiv.org/abs/2402.17358v1 ) ライセンス: Link先を確認	Xinyu Lu, Bowen Yu, Yaojie Lu, Hongyu Lin, Haiyang Yu, Le Sun, Xianpei Han, Yongbin Li	(参考訳) 大規模言語モデル(llm)におけるアライメント問題は、それらを幅広い人間の価値観に適応させることである。この要件は、好みや規制基準の多様性によって既存のアライメント手法に挑戦する。本稿では,各ダイアログにおけるルールを主制御機構として定義し,ユーザの指示を優先する新たなアライメントパラダイムである優先ルールを提案する。予備分析の結果, GPT-4 のような先進的な LLM でさえ,ルールの理解と優先順位付けに欠点があることが判明した。そこで本研究では,llmシミュレーションからの信号に追従した優先度を蒸留する半自動法であるprioritydistillを提案する。実験により,本手法は1つの一般規則のみを用いた誤調整を効果的に最小化するだけでなく,様々な未知規則に順応し,ハイジャックから保護され,モデルが適切に応答することを示す。 The alignment problem in Large Language Models (LLMs) involves adapting them to the broad spectrum of human values. This requirement challenges existing alignment methods due to diversity of preferences and regulatory standards. This paper introduces a novel alignment paradigm, priority rule following, which defines rules as the primary control mechanism in each dialog, prioritizing them over user instructions. Our preliminary analysis reveals that even the advanced LLMs, such as GPT-4, exhibit shortcomings in understanding and prioritizing the rules. Therefore, we present PriorityDistill, a semi-automated approach for distilling priority following signals from LLM simulations to ensure robust rule integration and adherence. Our experiments show that this method not only effectively minimizes misalignments utilizing only one general rule but also adapts smoothly to various unseen rules, ensuring they are shielded from hijacking and that the model responds appropriately.	翻訳日:2024-02-28 17:05:12 公開日:2024-02-27
# RECOST:外部知識ガイドによるデータ効率インストラクションチューニング RECOST: External Knowledge Guided Data-efficient Instruction Tuning ( http://arxiv.org/abs/2402.17355v1 ) ライセンス: Link先を確認	Qi Zhang, Yiming Zhang, Haobo Wang, Junbo Zhao	(参考訳) 現在の大規模言語モデル(llm)の展望では、命令チューニングのプロセスが重要なステップとなっている。高い計算能力のオーバヘッドを考慮して,高品質な指導データの選択を目的とした,このプロセスのトレーニングデータサイズを削減するために,データ効率の高い命令チューニングが提案されている。それでも、現在のデータ効率のよい命令チューニング手法のほとんどは、元の命令チューニングデータセットの品質に大きく依存していると論じる。この分野で一般的なシナリオであるLSMによって合成されたデータセットに関しては、汚れたサンプルは他のサンプルよりも高い確率で選択される。これらの課題に対処するために,外部知識(関連する例や段落)を用いて,llmで合成した試料を,文脈内相対予測エントロピーを用いて評価した。新しい指標に基づいて,外部知識ベースの再評価と多様性に一貫性のあるサンプリングをひとつのパイプラインに統合するフレームワークを,‘textbf{RECOST}’として提案した。いくつかの合成データセット(AlpacaとAlpaca-gpt4)の広範な実験を通じて、本手法の有効性を実証し、全データセットのtextbf{1\%}でさらに優れた結果を得る。 In the current landscape of large language models (LLMs), the process of instruction tuning serves as an essential step. Considering the high computing power overhead, data-efficient instruction tuning was proposed to reduce the training data size in this process, aiming at selecting high-quality instructional data. Nevertheless, we argue that most current data-efficient instruction-tuning methods are highly dependent on the quality of the original instruction-tuning dataset. When it comes to datasets synthesized by LLMs, a common scenario in this field, dirty samples will even be selected with a higher probability than other samples. To address these challenges, we utilized external knowledge (relevant examples or paragraphs) to evaluate those samples synthesized by LLMs with an in-context-based relative predictive entropy. Based on the new metric, we proposed a framework, dubbed as \textbf{RECOST}, which integrates external-knowledge-base re-ranking and diversity-consistent sampling into a single pipeline. Through extensive experiments on several synthetic datasets (Alpaca and Alpaca-gpt4), we demonstrate the effectiveness of our method and achieve even better results with only \textbf{1\%} of the full dataset.	翻訳日:2024-02-28 17:04:55 公開日:2024-02-27
# ICP-Flow:ICPを用いたLiDARシーンフロー推定 ICP-Flow: LiDAR Scene Flow Estimation with ICP ( http://arxiv.org/abs/2402.17351v1 ) ライセンス: Link先を確認	Yancong Lin and Holger Caesar	(参考訳) シーンフローは、近くの時間ステップで自動運転車が捉えた2つのLiDARスキャン間の3D運動を特徴付ける。代表的な方法は、シーンフローを、大規模トレーニングまたは推論時の時間的最適化によって学習できる、ポイントワイズな非制約フローベクトルとして考えることである。しかし、これらの方法は、自律運転の物体がしばしば固く動くことを考慮しない。この剛体運動の仮定を我々の設計に取り入れ、目的はスキャン上のオブジェクトを関連付け、局所的な剛体変換を推定することである。学習自由フロー推定器icp-flowを提案する。我々の設計の中核は、オブジェクトを時間とともに整列させ、対応する剛性変換を出力する従来の反復閉点(ICP)アルゴリズムである。 icpを支援するために,最も可能性の高い翻訳を見つけるためのヒストグラムに基づく初期化を提案し,icpの出発点を提供する。完全なシーンフローは、剛性変換から回復される。教師付きモデルを含む最先端のベースラインをWaymoデータセットで上回り、Argoverse-v2とnuScenesで競合的に実行します。さらに,我々のモデルから擬似ラベルによって教師されるフィードフォワードニューラルネットワークを訓練し,リアルタイム推論が可能なすべてのモデルでトップパフォーマンスを実現する。我々は,他のモデルで有意義な結果が得られない場合の0.5秒までの時間間隔で,シーンフロー推定におけるモデルの有用性を検証する。 Scene flow characterizes the 3D motion between two LiDAR scans captured by an autonomous vehicle at nearby timesteps. Prevalent methods consider scene flow as point-wise unconstrained flow vectors that can be learned by either large-scale training beforehand or time-consuming optimization at inference. However, these methods do not take into account that objects in autonomous driving often move rigidly. We incorporate this rigid-motion assumption into our design, where the goal is to associate objects over scans and then estimate the locally rigid transformations. We propose ICP-Flow, a learning-free flow estimator. The core of our design is the conventional Iterative Closest Point (ICP) algorithm, which aligns the objects over time and outputs the corresponding rigid transformations. Crucially, to aid ICP, we propose a histogram-based initialization that discovers the most likely translation, thus providing a good starting point for ICP. The complete scene flow is then recovered from the rigid transformations. We outperform state-of-the-art baselines, including supervised models, on the Waymo dataset and perform competitively on Argoverse-v2 and nuScenes. Further, we train a feedforward neural network, supervised by the pseudo labels from our model, and achieve top performance among all models capable of real-time inference. We validate the advantage of our model on scene flow estimation with longer temporal gaps, up to 0.5 seconds where other models fail to deliver meaningful results.	翻訳日:2024-02-28 17:04:33 公開日:2024-02-27
# 強化可能なGDPR仕様に向けて Towards an Enforceable GDPR Specification ( http://arxiv.org/abs/2402.17350v1 ) ライセンス: Link先を確認	Fran\c{c}ois Hublet and Alexander Kvamme and Sr{\dj}an Krsti\'c	(参考訳) プライバシ・バイ・デザイン(PbD)はEUのGDPRのような現代的なプライバシー規制によって規定されているが、実際のソフトウェアシステムでPbDを達成することは、非常に難しい課題である。 PbDを実現するための新たなテクニックのひとつが実行時執行(RE)である。システムのプライバシ要件の仕様をロードした執行者が,システムによって実行されるアクションを観察し,その要件を常に遵守するように指示する。 PbDにREテクニックを使用するためには、まずプライバシ規則を強制可能な仕様に変換する必要がある。本稿では,GDPRの形式化に向けた取り組みについて報告する。まず、法律規定の強制可能な形式仕様を作成するための一連の要件と反復的方法論を提示する。そこで我々は,GDPRの一部の実施可能な仕様を導出するために,我々の方法論を用いた予備事例研究を報告した。本研究は,本手法が正確な施行可能な仕様策定に有効であることが示唆された。 While Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's GDPR, achieving PbD in real software systems is a notoriously difficult task. One emerging technique to realize PbD is Runtime enforcement (RE), in which an enforcer, loaded with a specification of a system's privacy requirements, observes the actions performed by the system and instructs it to perform actions that will ensure compliance with these requirements at all times. To be able to use RE techniques for PbD, privacy regulations first need to be translated into an enforceable specification. In this paper, we report on our ongoing work in formalizing the GDPR. We first present a set of requirements and an iterative methodology for creating enforceable formal specifications of legal provisions. Then, we report on a preliminary case study in which we used our methodology to derive an enforceable specification of part of the GDPR. Our case study suggests that our methodology can be effectively used to develop accurate enforceable specifications.	翻訳日:2024-02-28 17:04:10 公開日:2024-02-27
# 入力サブドメインレベル損失関数勾配のレンズによるプリンギングホイルを過ぎる非定常流れに対するPINNの訓練の理解 Understanding the training of PINNs for unsteady flow past a plunging foil through the lens of input subdomain level loss function gradients ( http://arxiv.org/abs/2402.17346v1 ) ライセンス: Link先を確認	Rahul Sundar, Didier Lucor, and Sunetra Sarkar	(参考訳) 近年, 移動境界対応PINN (MB-PINN) を含む物理インフォームドニューラルネットワーク (PINN) は, 速度を正確に再構成し, 圧力を非定常移動体への隠れ変数として回復する能力を示した。懸垂翼を過ぎる流れを考えると、MB-PINNは地球物理学の損失緩和と物理に基づくアンダーサンプリング法を併用して訓練され、精度が向上した。本研究の目的は,物理損失緩和と物理に基づくアンダーサンプリングの効果の下で,入力空間サブドメインがどのトレーニングに寄与するかを検討することである。 mbピントレーニングの文脈では、移動体、ウェイク、アウターゾーンの3つの空間ゾーンが定義された。トレーニングを駆動する空間ゾーンを定量化するために、各ゾーンにおける粒子損失成分勾配統計とサンプル点の割合から、2つの新しいメトリクスを算出する。その結果, この学習は, 粒子損失成分勾配と各ゾーンにおける点の割合の組合せ効果に依存することがわかった。さらに、支配的な入力ゾーンは、ある意味で最も強い解勾配を持つものでもある。 Recently immersed boundary method-inspired physics-informed neural networks (PINNs) including the moving boundary-enabled PINNs (MB-PINNs) have shown the ability to accurately reconstruct velocity and recover pressure as a hidden variable for unsteady flow past moving bodies. Considering flow past a plunging foil, MB-PINNs were trained with global physics loss relaxation and also in conjunction with a physics-based undersampling method, obtaining good accuracy. The purpose of this study was to investigate which input spatial subdomain contributes to the training under the effect of physics loss relaxation and physics-based undersampling. In the context of MB-PINNs training, three spatial zones: the moving body, wake, and outer zones were defined. To quantify which spatial zone drives the training, two novel metrics are computed from the zonal loss component gradient statistics and the proportion of sample points in each zone. Results confirm that the learning indeed depends on the combined effect of the zonal loss component gradients and the proportion of points in each zone. Moreover, the dominant input zones are also the ones that have the strongest solution gradients in some sense.	翻訳日:2024-02-28 17:03:53 公開日:2024-02-27
# LocalGCL: グラフのためのローカル認識型コントラスト学習 LocalGCL: Local-aware Contrastive Learning for Graphs ( http://arxiv.org/abs/2402.17345v1 ) ライセンス: Link先を確認	Haojun Jiang, Jiawei Sun, Jie Li, Chentao Wu	(参考訳) グラフ表現学習(GRL)は、トポロジカルな構造を持つグラフを低次元の埋め込みに符号化する。一方、グラフラベルを手動でアノテートする時間とコストのかかるプロセスは、自己教師あり学習(SSL)技術の成長を促す。 SSLの主流のアプローチとして、Contrastive Learning (CL)は、正と負のサンプルを区別することで差別的な表現を学ぶ。しかし、グラフデータに適用すると、局所構造を無視しながらグローバルパターンを過度に強調する。そこで本研究では,局所グラフ情報をマスキングベースのモデルで補足的にキャプチャする自己教師付き学習フレームワークである \underline{local}-aware \underline{g}raph \underline{c}ontrastive \underline{l}earning (\textbf{\methnametrim})を提案する。包括的グラフ表現学習者としての可能性を実証し, 最先端手法に対する<methname>の優位性を検証した。 Graph representation learning (GRL) makes considerable progress recently, which encodes graphs with topological structures into low-dimensional embeddings. Meanwhile, the time-consuming and costly process of annotating graph labels manually prompts the growth of self-supervised learning (SSL) techniques. As a dominant approach of SSL, Contrastive learning (CL) learns discriminative representations by differentiating between positive and negative samples. However, when applied to graph data, it overemphasizes global patterns while neglecting local structures. To tackle the above issue, we propose \underline{Local}-aware \underline{G}raph \underline{C}ontrastive \underline{L}earning (\textbf{\methnametrim}), a self-supervised learning framework that supplementarily captures local graph information with masking-based modeling compared with vanilla contrastive learning. Extensive experiments validate the superiority of \methname against state-of-the-art methods, demonstrating its promise as a comprehensive graph representation learner.	翻訳日:2024-02-28 17:03:31 公開日:2024-02-27
# 抽象特性の選択的モデリングによるベイズ最適化の強化 Enhanced Bayesian Optimization via Preferential Modeling of Abstract Properties ( http://arxiv.org/abs/2402.17343v1 ) ライセンス: Link先を確認	Arun Kumar A V, Alistair Shilton, Sunil Gupta, Santu Rana, Stewart Greenhill, Svetha Venkatesh	(参考訳) 実験的な(設計)最適化は、新しい製品やプロセスの設計と発見において重要な要素である。ベイズ最適化(BO)は、コストとブラックボックスの実験的設計プロセスを最適化するための効果的なツールである。ベイズ最適化は、実験最適化に対する原則化されたデータ駆動のアプローチであるが、スクラッチからすべてを学習し、必ずしも直接測定されない物理的特性(または測定可能な)を使用して、異なる抽象レベルでシステムについて推論する人間(ドメイン)の専門家の専門知識から大きな恩恵を受けることができる。本稿では,不測の抽象特性に関する専門家の嗜好をサロゲートモデルに取り入れ,BOの性能をさらに向上させる,人間-AI協調型ベイズフレームワークを提案する。我々は,不正確かつ誤解を招く専門家のバイアスを優先判断で対処できる効率的な戦略を提供する。提案するフレームワークの収束挙動について論じる。合成関数と実世界のデータセットを含む実験結果は,本手法のベースラインに対する優越性を示す。 Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise of its human (domain) experts who often reason about systems at different abstraction levels using physical properties that are not necessarily directly measured (or measurable). In this paper, we propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into the surrogate modeling to further boost the performance of BO. We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments. We discuss the convergence behavior of our proposed framework. Our experimental results involving synthetic functions and real-world datasets show the superiority of our method against the baselines.	翻訳日:2024-02-28 17:03:10 公開日:2024-02-27
# グローバー歩行におけるグラフの対称性と完全状態移動 Symmetry of graphs and perfect state transfer in Grover walks ( http://arxiv.org/abs/2402.17341v1 ) ライセンス: Link先を確認	Sho Kubota, Kiyoto Yoshino	(参考訳) グローバー歩行におけるグラフの対称性と完全状態移動の関係について検討した。グラフの対称性は数学的にグラフの自己同型を指す。完全状態遷移が2つの頂点の間で起こるとき、以下の2つのステートメントが成立する。 1つは自己同型が完全状態移動の発生を保存することである。もう一つは、これらの二つの頂点に関して自己同型群の安定化部分群が一致することである。これらの結果を用いて、完全な状態転送を許容するvalency $4$まで循環グラフを完全に特徴付ける。その証明は代数的数論も使う。 We study relationships between symmetry of graphs and perfect state transfer in Grover walks. Symmetry of graphs mathematically refers to automorphisms of graphs. When perfect state transfer occurs between two vertices, the following two statements hold. One is that automorphisms preserve the occurrence of perfect state transfer. The other is that the stabilizer subgroups of the automorphism groups with respect to those two vertices coincide. Using these results, we completely characterize circulant graphs up to valency $4$ that admit perfect state transfer. Its proof uses also algebraic number theory.	翻訳日:2024-02-28 17:02:52 公開日:2024-02-27
# SocialCVAE:相互作用条件付き潜伏剤による歩行者軌道予測 SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents ( http://arxiv.org/abs/2402.17339v1 ) ライセンス: Link先を確認	Wei Xiang, Haoteng Yin, He Wang, Xiaogang Jin	(参考訳) 歩行者の軌道予測は、人間の行動に関する洞察を与え、将来の動きを予測するための多くの応用において重要な技術である。既存の経験モデルの多くは、説明可能性を維持しながら、強力な表現力の学習に基づく技術と組み合わせたハイブリッドモデルの開発に重点を置いている。しかしながら、経験モデルから学習された操舵行動の決定論的性質は、モデルの実用的性能を制限する。本研究は,人的行動決定における行動の不確実性を探るため,CVAEを用いた歩行者軌道予測のための社会条件変分オートエンコーダ(SocialCVAE)を提案する。ソーシャルCVAEは、社会的に説明可能な相互作用エネルギーマップをCVAEの状態として利用することにより、社会的に合理的な動作ランダム性を学ぶ。エネルギーマップは、歩行者と隣人の相互作用のエネルギーコスト(すなわち反発強度)を予測するエネルギーベース相互作用モデルを用いて生成される。 25のシーンを含む2つの公開ベンチマーク実験の結果、SocialCVAEは最先端の手法と比較して予測精度を著しく改善し、平均変位誤差(ADE)は16.85%、最終変位誤差(FDE)は69.18%改善した。 Pedestrian trajectory prediction is the key technology in many applications for providing insights into human behavior and anticipating human future motions. Most existing empirical models are explicitly formulated by observed human behaviors using explicable mathematical terms with a deterministic nature, while recent work has focused on developing hybrid models combined with learning-based techniques for powerful expressiveness while maintaining explainability. However, the deterministic nature of the learned steering behaviors from the empirical models limits the models' practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. SocialCVAE learns socially reasonable motion randomness by utilizing a socially explainable interaction energy map as the CVAE's condition, which illustrates the future occupancy of each pedestrian's local neighborhood area. The energy map is generated using an energy-based interaction model, which anticipates the energy cost (i.e., repulsion intensity) of pedestrians' interactions with neighbors. Experimental results on two public benchmarks including 25 scenes demonstrate that SocialCVAE significantly improves prediction accuracy compared with the state-of-the-art methods, with up to 16.85% improvement in Average Displacement Error (ADE) and 69.18% improvement in Final Displacement Error (FDE).	翻訳日:2024-02-28 17:02:45 公開日:2024-02-27
# 無線伝搬経路の深層学習による屋外環境復元 Outdoor Environment Reconstruction with Deep Learning on Radio Propagation Paths ( http://arxiv.org/abs/2402.17336v1 ) ライセンス: Link先を確認	Hrant Khachatrian, Rafayel Mkrtchyan, Theofanis P. Raptis	(参考訳) 従来の屋外環境の再構築手法は、フォトグラム法やLiDARのような視覚に基づく技術に大きく依存しており、制約付きカバレッジ、環境条件への感受性、高い計算とエネルギー要求といった制限に直面している。これらの課題は、拡張現実ナビゲーションのようなアプリケーション、特に制約のある計算リソースとエネルギー予算を備えたウェアラブルデバイスとの統合において顕著である。そこで本稿では,屋外環境復元のための環境無線信号を用いた新しい手法を提案する。無線周波数(RF)データを解析することにより,環境特性を推定し,屋外環境をデジタル的に再構築することを目的とする。合成rfデータセットwair-dにおける選択型深層学習(dl)手法の有効性を検討するため,本研究は,この領域における研究ギャップに対処することを目的としている。 2つのDL駆動アプローチ(ビジョントランスフォーマーに基づく畳み込みU-NetとCLIP+)が評価され、交叉結合(IoU)、ハウスドルフ距離、シャンファー距離などの指標を用いて評価される。その結果,RFを用いた再建法の性能が向上し,軽量でスケーラブルな再建ソリューションへの道が開けた。 Conventional methods for outdoor environment reconstruction rely predominantly on vision-based techniques like photogrammetry and LiDAR, facing limitations such as constrained coverage, susceptibility to environmental conditions, and high computational and energy demands. These challenges are particularly pronounced in applications like augmented reality navigation, especially when integrated with wearable devices featuring constrained computational resources and energy budgets. In response, this paper proposes a novel approach harnessing ambient wireless signals for outdoor environment reconstruction. By analyzing radio frequency (RF) data, the paper aims to deduce the environmental characteristics and digitally reconstruct the outdoor surroundings. Investigating the efficacy of selected deep learning (DL) techniques on the synthetic RF dataset WAIR-D, the study endeavors to address the research gap in this domain. Two DL-driven approaches are evaluated (convolutional U-Net and CLIP+ based on vision transformers), with performance assessed using metrics like intersection-over-union (IoU), Hausdorff distance, and Chamfer distance. The results demonstrate promising performance of the RF-based reconstruction method, paving the way towards lightweight and scalable reconstruction solutions.	翻訳日:2024-02-28 17:02:18 公開日:2024-02-27
# BiVRec:双方向ビューベースのマルチモーダルシーケンスレコメンデーション BiVRec: Bidirectional View-based Multimodal Sequential Recommendation ( http://arxiv.org/abs/2402.17334v1 ) ライセンス: Link先を確認	Jiaxi Hu, Jingtong Gao, Xiangyu Zhao, Yuehong Hu, Yuxuan Liang, Yiqi Wang, Ming He, Zitao Liu, Hongzhi Yin	(参考訳) シーケンシャルレコメンデータシステムへのマルチモーダル情報の統合は、最近の研究で大きな注目を集めている。マルチモーダルシーケンシャルレコメンデーションモデルの初期段階では、メインストリームのパラダイムはID優先レコメンデーションであり、マルチモーダル情報はサイド情報として融合された。しかし、転送可能性と情報侵入の制限により、別のパラダイムが出現し、マルチモーダル機能は推奨のために直接使用されるようになり、データセット間のレコメンデーションを可能にした。それでも、ユーザーid情報を見落とし、情報利用率の低下とトレーニングコストの高騰を招いた。そこで本研究では,idとマルチモーダルビューの両方で協調してレコメンデーションタスクを訓練し,その相乗的関係を利用してレコメンデーションパフォーマンスを双方向に向上させる,革新的なフレームワークであるbivrecを提案する。情報の不均一性問題に取り組むために,まず構造化されたユーザ関心表現を構築し,それらの間の相乗的関係を学習する。 Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages. The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information intrusion, another paradigm emerged, wherein multimodal features were employed directly for recommendation, enabling recommendation across datasets. Nonetheless, it overlooked user ID information, resulting in low information utilization and high training costs. To this end, we propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views, leveraging their synergistic relationship to enhance recommendation performance bidirectionally. To tackle the information heterogeneity issue, we first construct structured user interest representations and then learn the synergistic relationship between them. Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.	翻訳日:2024-02-28 17:01:57 公開日:2024-02-27
# ユニバーサルコーパスによる教師なし複数選択質問応答 Unsupervised multiple choices question answering via universal corpus ( http://arxiv.org/abs/2402.17333v1 ) ライセンス: Link先を確認	Qin Zhang, Hao Ge, Xiaojun Chen, Meng Fang	(参考訳) 教師なしの質問応答は有望だが難しい課題であり、新しいドメインで大規模な注釈付きデータを構築することの負担を軽減する。 mcqa(unsupervised multi-choice question answering)問題の研究を動機付けています。本稿では,手作業によるアノテーションを一切必要とせずに,ユニバーサルドメインの文脈にほとんど基づかない合成型mcqaデータを生成するための新しいフレームワークを提案する。可能な答えは抽出され、関連する質問を生成するために使用される。次に、名前付きエンティティ(NE)と知識グラフの両方を利用して、プラウシブルなイントラクタを発見し、完全な合成サンプルを形成する。複数のMCQAデータセットに対する実験により,本手法の有効性が示された。 Unsupervised question answering is a promising yet challenging task, which alleviates the burden of building large-scale annotated data in a new domain. It motivates us to study the unsupervised multiple-choice question answering (MCQA) problem. In this paper, we propose a novel framework designed to generate synthetic MCQA data barely based on contexts from the universal domain without relying on any form of manual annotation. Possible answers are extracted and used to produce related questions, then we leverage both named entities (NE) and knowledge graphs to discover plausible distractors to form complete synthetic samples. Experiments on multiple MCQA datasets demonstrate the effectiveness of our method.	翻訳日:2024-02-28 17:01:31 公開日:2024-02-27
# クラスタリングに基づく高感度サンプリングによるデータ効率学習:基礎モデルとそれ以上 Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond ( http://arxiv.org/abs/2402.17327v1 ) ライセンス: Link先を確認	Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder	(参考訳) 本研究では,機械学習モデルを効率的に学習することのできる,データの小さな代表的サブセットを選択することを目的としたデータ選択問題について検討する。我々は,$k$-meansクラスタリングと感度サンプリングに基づく新しいデータ選択手法を提案する。モデル損失がh\"older continuousであるデータへの埋め込み表現へのアクセスを仮定すると、このアプローチは、入力埋め込みの$k$-meansコストを$\phi_k$で、$\lambda$をh\"older定数として、平均損失がデータセット全体の平均損失に対応する一連の ``typical'' $k + 1/\varepsilon^2$要素を、乗算的$(1\pm\varepsilon)$ factorと付加的$\varepsilon \lambda \phi_k$で選択できる。さらに,基礎モデルの微調整に対するアプローチの性能とスケーラビリティを実証し,最先端手法よりも優れていることを示す。また,線形回帰に適用する方法を示し,スコアサンプリングの性能に驚くほど合致する新しいサンプリング戦略を導いており,概念的にはシンプルでスケーラブルである。 We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and sensitivity sampling. Assuming access to an embedding representation of the data with respect to which the model loss is H\"older continuous, our approach provably allows selecting a set of ``typical'' $k + 1/\varepsilon^2$ elements whose average loss corresponds to the average loss of the whole dataset, up to a multiplicative $(1\pm\varepsilon)$ factor and an additive $\varepsilon \lambda \Phi_k$, where $\Phi_k$ represents the $k$-means cost for the input embeddings and $\lambda$ is the H\"older constant. We furthermore demonstrate the performance and scalability of our approach on fine-tuning foundation models and show that it outperforms state-of-the-art methods. We also show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling, while being conceptually simpler and more scalable.	翻訳日:2024-02-28 17:01:20 公開日:2024-02-27
# 家庭内虐待の被害者や生存者を支援するチャットボット Designing Chatbots to Support Victims and Survivors of Domestic Abuse ( http://arxiv.org/abs/2402.17393v1 ) ライセンス: Link先を確認	Rahime Belen Saglam, Jason R. C. Nurse, Lisa Sugiura	(参考訳) 目的: 新型コロナウイルス(covid-19)のパンデミックや、被害者や生存者が支援を受けることの難しさなどにより、過去4年間で家庭内暴力事件が著しく増加している。本研究では,人工知能(AI)やルールベースのチャットボットが,このような状況や支援への直接アクセスが制限される状況において,被害者や生き残りを支援する役割について検討する。方法: 家庭内虐待支援サービスや組織(慈善団体,法執行機関など)の専門家によるインタビューを行い, 関連する支援サービス提供者のウェブサイトの内容を収集した。その後、テーマコンテンツ分析を用いて、インタビューデータと被害者支援ウェブサイト上のコンテンツから洞察を抽出した。また,被災者支援に使用されるエージェントの設計原則やインタラクションパターンを反映する研究を振り返って,関連するチャットボット文献をレビューした。結果:本分析では,チャットボットが持つ可能性のあるユースケースや対象グループ,対話構造,パーソナリティ特性,最後に対処すべき安全性やプライバシの問題などを考慮した,チャットボットの設計上の考察と実践について概説した。特に注目すべきは、aiシステム(例えば、chatgpt、copilot、gemini)が使用のために推奨されない状況、感情的なサポートを伝える価値、透明性の重要性、安全で機密性の高い空間の必要性である。結論:これらの考察や実践が、チャットボットやai開発者やサービスプロバイダの間で議論を刺激し、チャットボットが使用に適した状況において、家庭内虐待の生き残りを支援するために、チャットボットの効率的な利用を促すことを期待します。 Objective: Domestic abuse cases have risen significantly over the last four years, in part due to the COVID-19 pandemic and the challenges for victims and survivors in accessing support. In this study, we investigate the role that chatbots - Artificial Intelligence (AI) and rule-based - may play in supporting victims/survivors in situations such as these or where direct access to help is limited. Methods: Interviews were conducted with experts working in domestic abuse support services and organizations (e.g., charities, law enforcement) and the content of websites of related support-service providers was collected. Thematic content analysis was then applied to assess and extract insights from the interview data and the content on victim-support websites. We also reviewed pertinent chatbot literature to reflect on studies that may inform design principles and interaction patterns for agents used to support victims/survivors. Results: From our analysis, we outlined a set of design considerations/practices for chatbots that consider potential use cases and target groups, dialog structure, personality traits that might be useful for chatbots to possess, and finally, safety and privacy issues that should be addressed. Of particular note are situations where AI systems (e.g., ChatGPT, CoPilot, Gemini) are not recommended for use, the value of conveying emotional support, the importance of transparency, and the need for a safe and confidential space. Conclusion: It is our hope that these considerations/practices will stimulate debate among chatbots and AI developers and service providers and - for situations where chatbots are deemed appropriate for use - inspire efficient use of chatbots in the support of survivors of domestic abuse.	翻訳日:2024-02-28 16:56:23 公開日:2024-02-27
# spot the bot: ボットと人間のセマンティクスパスの粒度の粗い分割 Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans ( http://arxiv.org/abs/2402.17392v1 ) ライセンス: Link先を確認	Vasilii A. Gromov, Alexandra S. Kogan	(参考訳) 現在、テクノロジーは急速に進歩している。ボットはコメント、記事、レビューを書いている。この事実から、テキストが人間が書いたものなのか、ボットによるものなのかを知ることが重要である。本稿では,人書きテキストとボット生成テキストのセマンティックパスの粗粒度分割構造の比較に焦点をあてる。複数のボットが生成した文文テキストとテキストからn-gramのデータセットのクラスタ化を比較した。仮説は、構造とクラスタ化が異なることである。我々の研究は仮説を支持している。意味構造が言語によって異なる可能性があるため、ロシア語、英語、ドイツ語、ベトナム語を調査する。 Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and texts generated by several bots. The hypothesis is that the structures and clusterizations are different. Our research supports the hypothesis. As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.	翻訳日:2024-02-28 16:55:54 公開日:2024-02-27
# 点滅エミッタを用いたナノセンシングにおける空間超解像 Spatial super-resolution in nanosensing with blinking emitters ( http://arxiv.org/abs/2402.17391v1 ) ライセンス: Link先を確認	Alexander Mikhalychev, Aleksandr Saushin, Alex Ulyanenkov, and Polina Kuzhir	(参考訳) 本稿では,超高分解能光ゆらぎイメージング(SOFI)と組み合わせて,蛍光ナノセンサを点滅させてメトロジー(サーモメトリー,磁気メソメトリー,pH推定など)における空間分解能向上手法を提案する。この手法の効率性を示すために、変動するナノエミッタとイメージの意図的なぼやけをモデル化するレーザーダイオードを用いたモデル実験を行う。第2、第3、第4次累積画像はコントラストの改善を提供し、モデル温度(または他の物理パラメータ)分布の小さな特徴を強度に基づくアプローチに相対的に再現することに成功した。本研究は, 生活科学分野において, 蛍光蛍光体のスペクトル応答の局所的変化を認識するために, 現像された画像解析技術と相補的な点滅蛍光センシング剤を日常的に利用することができると信じている。これは、病気効果、老化、治癒、治療に対する反応を含む外部の「力」を応用することで、生体細胞の物理的パラメータの変化を局所的に測定するのに非常に有用である。 We propose a method of spatial resolution enhancement in metrology (thermometry, magnetometry, pH estimation and similar methods) with blinking fluorescent nanosensors by combining sensing with super-resolution optical fluctuation imaging (SOFI). To demonstrate efficiency of this approach, a model experiment with laser diodes modeling fluctuating nanoemitters and intentional blurring of the image is performed. The 2nd, 3rd, and 4th order cumulant images provide improvement of the contrast and enable successful reconstruction of smaller features of the modeled temperature (or any other physical parameter) distribution relatively to the intensity-based approach. We believe that blinking fluorescent sensing agents being complemented with the developed image analysis technique could be utilized routinely in the life science sector for recognizing the local changes in the spectral response of blinking fluorophores, e.g. delivered targetly to the wanted cell or even organelle. It is extremely useful for the local measurements of living cells' physical parameters changes due to applying any external "forces", including disease effect, aging, healing or response to the treatment.	翻訳日:2024-02-28 16:55:45 公開日:2024-02-27
# セキュア機械学習モデル更新のためのロバストネス・コンストラレント・アドバイザリトレーニング Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates ( http://arxiv.org/abs/2402.17390v1 ) ライセンス: Link先を確認	Daniele Angioni, Luca Demetrio, Maura Pintor, Luca Oneto, Davide Anguita, Battista Biggio, Fabio Roli	(参考訳) 機械学習モデルは、平均精度を改善するために定期的な更新を要求し、新しいアーキテクチャと追加データを活用する。しかし、新しく更新されたモデルは、以前のモデルが行わなかった間違いを犯す可能性がある。このような誤分類は負のフリップと呼ばれ、ユーザがパフォーマンスの回帰として経験する。本研究では,この問題が敵の事例に対する堅牢性にも影響を及ぼし,セキュアなモデル更新手法の開発を妨げていることを示す。特に、敵の堅牢性を改善するためにモデルを更新する場合、以前は効果のなかった敵のいくつかの例は誤って分類され、システムのセキュリティが低下する。この問題に対処するために,ロバストネス・コングルーエント・コングルータル・トレーニングという新しい手法を提案する。敵のトレーニングでモデルを微調整すると同時に、更新前に正しく分類された敵の例に対して高い堅牢性を維持するように制限します。我々のアルゴリズムと、より一般的には非回帰制約による学習は、一貫性のある推定子を訓練するための理論的根拠付きフレームワークを提供する。コンピュータビジョンのためのロバストモデルに関する実験は (i) モデル更新後に改善しても、精度と堅牢性の両方が負のフリップの影響を受け得ること。 (II) 強靭性と相反する対人訓練は問題を軽減し, 競合するベースライン法よりも優れる。 Machine-learning models demand for periodic updates to improve their average accuracy, exploiting novel architectures and additional data. However, a newly-updated model may commit mistakes that the previous model did not make. Such misclassifications are referred to as negative flips, and experienced by users as a regression of performance. In this work, we show that this problem also affects robustness to adversarial examples, thereby hindering the development of secure model update practices. In particular, when updating a model to improve its adversarial robustness, some previously-ineffective adversarial examples may become misclassified, causing a regression in the perceived security of the system. We propose a novel technique, named robustness-congruent adversarial training, to address this issue. It amounts to fine-tuning a model with adversarial training, while constraining it to retain higher robustness on the adversarial examples that were correctly classified before the update. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators. Our experiments on robust models for computer vision confirm that (i) both accuracy and robustness, even if improved after model update, can be affected by negative flips, and (ii) our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.	翻訳日:2024-02-28 16:55:22 公開日:2024-02-27
# FairBelief - 言語モデルにおける有害な信念の評価 FairBelief - Assessing Harmful Beliefs in Language Models ( http://arxiv.org/abs/2402.17389v1 ) ライセンス: Link先を確認	Mattia Setzu, Marta Marchiori Manerba, Pasquale Minervini, Debora Nozza	(参考訳) 言語モデル(lms)は、もしそのようなシステムが注意深く公正な監査なしで現実世界のアプリケーションに統合されたら、少数派や少数派グループを傷つけるであろう望ましくない偏見を継承することが示されている。本論文は,信仰を捉え,評価するための分析的アプローチであるFairBeliefを提案する。 fairbeliefでは、モデルスケールや確率など、これまで無視されていた異なる軸にまたがる最先端のlmsの挙動を調査し、特にlms出力の有害性を定量化するために設計されたフェアネスデータセット上での予測を評価する。最後に,モデルによる信念の詳細な質的評価を行った。本研究は、FairBeliefを英語のLMに適用し、これらのアーキテクチャは様々な自然言語処理タスクにおいて高いパフォーマンスを実現するが、特定の性別に対する有害な信念を示す。興味深いことに、トレーニング手順とデータセット、モデルスケール、アーキテクチャは、異なるレベルの傷つきの信念を誘発する。 Language Models (LMs) have been shown to inherit undesired biases that might hurt minorities and underrepresented groups if such systems were integrated into real-world applications without careful fairness auditing. This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that an LM may embed with different degrees of confidence and that covertly influence its predictions. With FairBelief, we leverage prompting to study the behavior of several state-of-the-art LMs across different previously neglected axes, such as model scale and likelihood, assessing predictions on a fairness dataset specifically designed to quantify LMs' outputs' hurtfulness. Finally, we conclude with an in-depth qualitative assessment of the beliefs emitted by the models. We apply FairBelief to English LMs, revealing that, although these architectures enable high performances on diverse natural language processing tasks, they show hurtful beliefs about specific genders. Interestingly, training procedure and dataset, model scale, and architecture induce beliefs of different degrees of hurtfulness.	翻訳日:2024-02-28 16:55:01 公開日:2024-02-27
# 高エネルギー粒子物理学への応用のためのテストベンチへのグラフニューラルネットワークの送信に関するケーススタディ A case study of sending graph neural networks back to the test bench for applications in high-energy particle physics ( http://arxiv.org/abs/2402.17386v1 ) ライセンス: Link先を確認	Emanuel Pfeffer and Michael Wa{\ss}mer and Yee-Ying Cung and Roger Wolf and Ulrich Husemann	(参考訳) 高エネルギー粒子衝突では、主衝突生成物は、通常さらに崩壊し、木のような階層構造となり、先行しない多重性を持つ。安定粒子レベルでは、衝突形式の全ての崩壊積は最終状態オブジェクトの置換不変量集合である。数学グラフの類似性から、グラフニューラルネットワーク(GNN)は自然にこれらの性質に類似しており、高エネルギー粒子物理学に関連する多くの課題に対処するのに最適であるという考えが生まれている。本稿では,よく確立されたディープフルリンクフィードフォワードアーキテクチャのニューラルネットワークに対する,典型的なgnnのベンチマークテストについて述べる。我々は、ノード、隠れた層、ニューラルネットワークのトレーニング可能なパラメータの観点から、この比較を最大限に非バイアスで行うことを目指している。物理学の場合には、cernの大型ハドロン衝突器における陽子-陽子衝突におけるトップクォーク-反クォーク対と関連して生成された最終状態xの分類を用いる。 In high-energy particle collisions, the primary collision products usually decay further resulting in tree-like, hierarchical structures with a priori unknown multiplicity. At the stable-particle level all decay products of a collision form permutation invariant sets of final state objects. The analogy to mathematical graphs gives rise to the idea that graph neural networks (GNNs), which naturally resemble these properties, should be best-suited to address many tasks related to high-energy particle physics. In this paper we describe a benchmark test of a typical GNN against neural networks of the well-established deep fully-connected feed-forward architecture. We aim at performing this comparison maximally unbiased in terms of nodes, hidden layers, or trainable parameters of the neural networks under study. As physics case we use the classification of the final state X produced in association with top quark-antiquark pairs in proton-proton collisions at the Large Hadron Collider at CERN, where X stands for a bottom quark-antiquark pair produced either non-resonantly or through the decay of an intermediately produced Z or Higgs boson.	翻訳日:2024-02-28 16:54:40 公開日:2024-02-27
# LLM支援意思決定の要因 Determinants of LLM-assisted Decision-Making ( http://arxiv.org/abs/2402.17385v1 ) ライセンス: Link先を確認	Eva Eigner and Thorsten H\"andler	(参考訳) 意思決定は日常生活の基本的な能力である。大規模言語モデル(LLM)は、人間の意思決定プロセスを強化するための多面的サポートを提供する。しかし, LLMによる意思決定の要因を理解することは, 個人がLLMが提供する利点を活用し, 関連リスクを最小限に抑えるために重要である。本研究は,包括的文献分析の結果を示し,llm支援による意思決定に影響する決定要因の構造的概要と詳細な分析を行った。特に、透明性やエンジニアリングの促進、感情や意思決定スタイルなどの心理的要因、タスクの難易度や説明責任などの決定的決定要因など、LCMの技術的側面の影響について検討する。加えて、決定要因が意思決定プロセスに与える影響は、複数のアプリケーションシナリオを通じて示されます。分析から,これらの決定要因間の相互相互依存性の観点から,可能な相互作用を体系化する依存フレームワークを開発した。本研究は, 各種決定因子との多面的相互作用により, LLMの信頼度や信頼度, ユーザのメンタルモデル, 情報処理特性などの要因が, LLMによる意思決定プロセスに影響を及ぼす重要な側面として認識されることを明らかにする。我々の発見は、人間とAIのコラボレーションにおける意思決定の質の向上、ユーザと組織双方の強化、より効果的なLLMインターフェースの設計に不可欠であると考えられる。さらに,本研究は,LCMによる意思決定決定要因に関する今後の実証研究の基盤を提供する。 Decision-making is a fundamental capability in everyday life. Large Language Models (LLMs) provide multifaceted support in enhancing human decision-making processes. However, understanding the influencing factors of LLM-assisted decision-making is crucial for enabling individuals to utilize LLM-provided advantages and minimize associated risks in order to make more informed and better decisions. This study presents the results of a comprehensive literature analysis, providing a structural overview and detailed analysis of determinants impacting decision-making with LLM support. In particular, we explore the effects of technological aspects of LLMs, including transparency and prompt engineering, psychological factors such as emotions and decision-making styles, as well as decision-specific determinants such as task difficulty and accountability. In addition, the impact of the determinants on the decision-making process is illustrated via multiple application scenarios. Drawing from our analysis, we develop a dependency framework that systematizes possible interactions in terms of reciprocal interdependencies between these determinants. Our research reveals that, due to the multifaceted interactions with various determinants, factors such as trust in or reliance on LLMs, the user's mental model, and the characteristics of information processing are identified as significant aspects influencing LLM-assisted decision-making processes. Our findings can be seen as crucial for improving decision quality in human-AI collaboration, empowering both users and organizations, and designing more effective LLM interfaces. Additionally, our work provides a foundation for future empirical investigations on the determinants of decision-making assisted by LLMs.	翻訳日:2024-02-28 16:54:21 公開日:2024-02-27
# 近似複素振幅符号化によるVQEのウォームスタート Warm-Starting the VQE with Approximate Complex Amplitude Encoding ( http://arxiv.org/abs/2402.17378v1 ) ライセンス: Link先を確認	Felix Truger, Johanna Barzen, Frank Leymann, Julian Obst	(参考訳) 変分量子固有解法(VQE)は、量子力学系の基底状態を決定する変分量子アルゴリズム(VQA)である。 VQAとして、量子回路のパラメータ値を最適化するために古典的なコンピュータを使用している。しかしながら、VQEの各反復は様々な測定を必要とし、最適化は不毛の台地、局所的なミニマ、そしてその後の緩やかな収束などの障害にさらされる。本稿では,これらの効果を緩和するためのVQEの初期パラメータ値を近似を用いて生成するウォームスタート手法を提案する。ウォームスタートは、古典的な影からの忠実度推定を用いて複雑な振幅ベクトルを量子状態に符号化するVQAである。このようなウォームスタートは、古典近似アルゴリズムと量子アルゴリズムの実りある組み合わせへの道を開く。特に,本手法の評価は,従来のVQEよりも早く,より高品質なVQEが得られることを示す。 The Variational Quantum Eigensolver (VQE) is a Variational Quantum Algorithm (VQA) to determine the ground state of quantum-mechanical systems. As a VQA, it makes use of a classical computer to optimize parameter values for its quantum circuit. However, each iteration of the VQE requires a multitude of measurements, and the optimization is subject to obstructions, such as barren plateaus, local minima, and subsequently slow convergence. We propose a warm-starting technique, that utilizes an approximation to generate beneficial initial parameter values for the VQE aiming to mitigate these effects. The warm-start is based on Approximate Complex Amplitude Encoding, a VQA using fidelity estimations from classical shadows to encode complex amplitude vectors into quantum states. Such warm-starts open the path to fruitful combinations of classical approximation algorithms and quantum algorithms. In particular, the evaluation of our approach shows that the warm-started VQE reaches higher quality solutions earlier than the original VQE.	翻訳日:2024-02-28 16:53:55 公開日:2024-02-27
# kodialogbench: 韓国語対話ベンチマークによる言語モデルの会話理解の評価 KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark ( http://arxiv.org/abs/2402.17377v1 ) ライセンス: Link先を確認	Seongbo Jang, Seonghyeon Lee, Hwanjo Yu	(参考訳) 言語モデルは、しばしばチャットボットアシスタントとしてデプロイされるため、モデルがユーザの最初の言語で会話を行うようになる。これらのモデルは幅広い言語で訓練されているが、韓国語のような低リソース言語における能力の総合的な評価は不足している。本研究では,韓国語における言語モデルの対話能力を評価するためのベンチマークであるKoDialogBenchを紹介する。この目的のために,日中の話題に関する韓国語対話を公開資料から収集したり,他言語からの対話を翻訳したりする。次に,対話理解から応答選択タスクに至るまで,これらの会話をさまざまなテストデータセットに構成する。提案手法を用いて,韓国語対話の基盤的理解を測定するため,様々な言語モデルの広範な評価と分析を行う。実験の結果,モデルによる会話能力の向上の余地が示唆された。さらに, 異なる言語モデル間の詳細な比較では, 会話能力向上における最近の訓練手法の有効性が強調された。我々はKoDialogBenchが韓国語モデルの発展を促進することを期待する。 As language models are often deployed as chatbot assistants, it becomes a virtue for models to engage in conversations in a user's first language. While these models are trained on a wide range of languages, a comprehensive evaluation of their proficiency in low-resource languages such as Korean has been lacking. In this work, we introduce KoDialogBench, a benchmark designed to assess language models' conversational capabilities in Korean. To this end, we collect native Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks. Leveraging the proposed benchmark, we conduct extensive evaluations and analyses of various language models to measure a foundational understanding of Korean dialogues. Experimental results indicate that there exists significant room for improvement in models' conversation skills. Furthermore, our in-depth comparisons across different language models highlight the effectiveness of recent training techniques in enhancing conversational proficiency. We anticipate that KoDialogBench will promote the progress towards conversation-aware Korean language models.	翻訳日:2024-02-28 16:53:37 公開日:2024-02-27
# 最適時間ステップによる拡散サンプリングの高速化 Accelerating Diffusion Sampling with Optimized Time Steps ( http://arxiv.org/abs/2402.17376v1 ) ライセンス: Link先を確認	Shuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li	(参考訳) 拡散確率モデル (dpms) は高分解能画像合成において有意な性能を示したが, そのサンプリング効率は, 比較的多くのサンプリングステップが要求されている。近年, dpm用高次数値odeソルバの進歩により, サンプリングステップの少ない高品質画像の生成が可能となった。これは重要な発展であるが、ほとんどのサンプリング方法は依然として均一な時間ステップを用いるが、これは少数のステップを使用する場合に最適ではない。この問題に対処するため,DPMの特定の数値ODEソルバに対して,より適切な時間ステップを求める最適化問題を設計するための一般的なフレームワークを提案する。この最適化問題は,ODEと数値解法に対応する近似解との距離を最小化することを目的としている。制限付き信頼領域法を用いて効率的に解くことができ、15ドル以下の秒で解くことができる。ピクセル空間と潜在空間のdpmを用いた無条件サンプリングと条件サンプリングの両方に関する広範な実験により,最先端のサンプリング手法であるunipcと組み合わせることで,cifar-10やimagenetといったデータセットのfidスコアの点で,最適な時間ステップが画像生成性能を大幅に向上できることが証明された。 Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps. To address this issue, we propose a general framework for designing an optimization problem that seeks more appropriate time steps for a specific numerical ODE solver for DPMs. This optimization problem aims to minimize the distance between the ground-truth solution to the ODE and an approximate solution corresponding to the numerical solver. It can be efficiently solved using the constrained trust region method, taking less than $15$ seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.	翻訳日:2024-02-28 16:53:21 公開日:2024-02-27
# 連続時間制御のための積分強化学習における計算の影響 Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control ( http://arxiv.org/abs/2402.17375v1 ) ライセンス: Link先を確認	Wenhan Cao, Wei Pan	(参考訳) 積分強化学習(IntRL)は、その政策評価(PEV)段階における実用関数の積分の正確な計算を要求する。これは、離散時間で得られた状態サンプルから評価されたユーティリティ関数の重み付けされた和である。計算手法の選択(この場合、二次規則)は制御性能に大きな影響を及ぼす可能性がある。この影響は、PEV段階で導入された計算エラーがポリシーイテレーションの収束挙動に影響し、結果として学習したコントローラに影響を与えるという事実に遡ることができる。計算が制御に与える影響を解明するために、ハミルトン・ヤコビ・ベルマン方程式に適用したIntRLのポリシー反復とニュートンの手法の並列性を描く。この光において、PEVの計算誤差はニュートン法の各反復において余分な誤差項として表され、その上限は計算誤差に比例する。さらに、実効関数が再生カーネルヒルベルト空間 (RKHS) に存在するとき、最適二次函数は、ベイズ二次函数とRKHS誘導カーネル関数を用いて達成可能であることを示す。そこで、trapezoidal rule と bayesian quadrature を用いた intrl の局所収束率は、mat\'ern kernel が $o(n^{-2})$ と $o(n^{-b})$ となることを証明し、ここで $n$ は等間隔のサンプルの数、$b$ は mat\'ern kernel の滑らかさパラメータであることを示した。これらの理論的な発見は、2つの標準制御タスクによって最終的に検証される。 Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Mat\'ern kernel to be $O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Mat\'ern kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks.	翻訳日:2024-02-28 16:52:59 公開日:2024-02-27
# 局所認識3次元剛点クラウドマッチングのための結合ラプラス固有写像 Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching ( http://arxiv.org/abs/2402.17372v1 ) ライセンス: Link先を確認	Matteo Bastico, Etienne Decenci\`ere, Laurent Cort\'e, Yannick Tillier, David Ryckelynck	(参考訳) コンピュータビジョン、医療、ロボット分野において重要な技術であるポイントクラウドマッチングは、ポイントクラウドとボクセルのペア間の対応を見つけることに関心がある。いくつかの実践シナリオでは、正確なマッチングを正確に識別するためには、局所的な差異を強調することが重要である。一般的に使われる形状記述子にはいくつかの制限があり、ペアのジオメトリに関する意味のある局所的洞察を提供しないことが多い。本研究では,グラフラプラシアン固有写像を基礎として,細かな局所構造を考慮した点雲のマッチングを行う新しい手法を提案する。ラプラシアン固有写像の順序と符号のあいまいさに対処するために、複数の厳密に登録された幾何学に対して、アライメントされた固有空間を容易に生成できる結合ラプラシアンと呼ばれる新しい作用素を導入する。これらの高次元空間間の類似性は、形状に一致するような局所的な意味のあるスコアを与えることを示す。本稿ではまず,MVTec 3D-ADデータセットを用いて,物体の異常な位置決めのタスクに着目し,提案手法の性能をポイントワイズで評価する。さらに,結合した固有空間から得られるグローバルな類似度スコアを用いて,BSE(Automatic Bone Side Estimation)と呼ばれる新しい医療タスクを定義する。そこで本研究では,様々な公共データセットから骨表面構造を収集するベンチマークを提案する。 Coupled Laplacianをベースとしたマッチング手法は,両タスクの精度向上によって,他の手法よりも優れている。実験を再現するコードは https://github.com/matteo-bastico/CoupledLaplacian と Supplementary Code で公開されている。 Point cloud matching, a crucial technique in computer vision, medical and robotics fields, is primarily concerned with finding correspondences between pairs of point clouds or voxels. In some practical scenarios, emphasizing local differences is crucial for accurately identifying a correct match, thereby enhancing the overall robustness and reliability of the matching process. Commonly used shape descriptors have several limitations and often fail to provide meaningful local insights on the paired geometries. In this work, we propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures. To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian, that allows to easily generate aligned eigenspaces for multiple rigidly-registered geometries. We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes. We initially evaluate the performance of the proposed technique in a point-wise manner, specifically focusing on the task of object anomaly localization using the MVTec 3D-AD dataset. Additionally, we define a new medical task, called automatic Bone Side Estimation (BSE), which we address through a global similarity score derived from coupled eigenspaces. In order to test it, we propose a benchmark collecting bone surface structures from various public datasets. Our matching technique, based on Coupled Laplacian, outperforms other methods by reaching an impressive accuracy on both tasks. The code to reproduce our experiments is publicly available at https://github.com/matteo-bastico/CoupledLaplacian and in the Supplementary Code.	翻訳日:2024-02-28 16:52:26 公開日:2024-02-27
# 中世ヘブライ語詩におけるメタファー検出のためのデータセット A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry ( http://arxiv.org/abs/2402.17371v1 ) ライセンス: Link先を確認	Michael Toker, Oren Mishali, Ophir M\"unz-Manor, Benny Kimelfeld, Yonatan Belinkov	(参考訳) 古代後期と中世のヘブライ語のテキストが多数ある。聖書と現代ヘブライ語の間の重要な言語と文化の橋である。詩はこれらのテキストで顕著であり、その主な特徴の1つはメタファの頻繁な使用である。フィギュラティブ言語とリテラル言語の使用を区別することは、人文科学、特に文学、言語学、hermeneuticsの分野の学者にとって大きな課題である。本稿では,古代後期と中世のヘブライ詩の,メタファの専門的注釈を用いた新・難解なデータセットと,この領域におけるさらなる研究の促進を期待する。 There is a large volume of late antique and medieval Hebrew texts. They represent a crucial linguistic and cultural bridge between Biblical and modern Hebrew. Poetry is prominent in these texts and one of its main haracteristics is the frequent use of metaphor. Distinguishing figurative and literal language use is a major task for scholars of the Humanities, especially in the fields of literature, linguistics, and hermeneutics. This paper presents a new, challenging dataset of late antique and medieval Hebrew poetry with expert annotations of metaphor, as well as some baseline results, which we hope will facilitate further research in this area.	翻訳日:2024-02-28 16:51:58 公開日:2024-02-27
# 曖昧な境界を持つ残像に対する効率的なMLPに基づくポイント誘導セグメンテーションネットワーク An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary ( http://arxiv.org/abs/2402.17370v1 ) ライセンス: Link先を確認	Guodong Sun, Yuting Peng, Le Cheng, Mengya Xu, An Wang, Bo Wu, Hongliang Ren, Yang Zhang	(参考訳) 鉱石画像の正確なセグメンテーションは、受益プロセスの実行の成功に不可欠である。低コントラストと不明瞭な境界につながる鉱石の均一な外観のため、正確なセグメンテーションが困難となり、認識が問題となる。本稿では,エッジバーリングの問題を解決することを目的とした,MLP(Multi-Layer Perceptron)に基づく軽量フレームワークを提案する。具体的には,低レベル機能の効率的な抽出に適した軽量なバックボーンを導入する。さらに,局所情報とグローバル情報のバランスをとる2つのMLP構造からなる特徴ピラミッドネットワークを設計し,検出精度を向上させる。さらに,予測点をインスタンスのエッジ点と一致させて明確なオブジェクト境界を実現するための新たな損失関数を提案する。提案手法の有効性を検証するために広範な実験を行った。提案手法は,モデルサイズが73MBの27フレーム/秒(FPS)以上の処理速度を実現する。さらに,本手法は,鉱石画像データセットで試験した場合に,現在利用可能な最先端技術と比較して,それぞれ60.4および48.9 in~$AP_{50}^{box}$および~$AP_{50}^{mask}$の印象的な性能スコアを一定レベルの精度で提供する。ソースコードは \url{https://github.com/MVME-HBUT/ORENEXT} で公開される。 The precise segmentation of ore images is critical to the successful execution of the beneficiation process. Due to the homogeneous appearance of the ores, which leads to low contrast and unclear boundaries, accurate segmentation becomes challenging, and recognition becomes problematic. This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the problem of edge burring. Specifically, we introduce a lightweight backbone better suited for efficiently extracting low-level features. Besides, we design a feature pyramid network consisting of two MLP structures that balance local and global information thus enhancing detection accuracy. Furthermore, we propose a novel loss function that guides the prediction points to match the instance edge points to achieve clear object boundaries. We have conducted extensive experiments to validate the efficacy of our proposed method. Our approach achieves a remarkable processing speed of over 27 frames per second (FPS) with a model size of only 73 MB. Moreover, our method delivers a consistently high level of accuracy, with impressive performance scores of 60.4 and 48.9 in~$AP_{50}^{box}$ and~$AP_{50}^{mask}$ respectively, as compared to the currently available state-of-the-art techniques, when tested on the ore image dataset. The source code will be released at \url{https://github.com/MVME-HBUT/ORENEXT}.	翻訳日:2024-02-28 16:51:35 公開日:2024-02-27
# 高品質音声頭部合成のための学習動的テトラヘドラ Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis ( http://arxiv.org/abs/2402.17364v1 ) ライセンス: Link先を確認	Zicheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang	(参考訳) ニューラル・ラジアンス・フィールド(Neural Radiance Fields、NeRF)のような暗黙の表現における最近の研究は、ビデオシーケンスから現実的でアニマタブルな頭部アバターの生成を進歩させている。明示的な幾何学的制約の欠如は、複雑な顔の変形を正確にモデル化する上で根本的な課題となる。本稿では、ニューラルネットワークによる明示的な動的メッシュを符号化し、様々な動きや視点の幾何的整合性を確保する新しいハイブリッド表現であるDynamic Tetrahedra(DynTet)を紹介する。 DynTetは、符号付き距離、変形、材料テクスチャを学習し、トレーニングデータを予め定義されたテトラヘドラグリッドに固定する座標ベースのネットワークによってパラメータ化される。マーチング・テトラヘドラを利用することで、DynTetはテクスチャメッシュを一貫したトポロジで効率的にデコードし、異なるラスタライザによる高速レンダリングとピクセルロスによる監督を可能にする。学習効率を向上させるため,テクスチャ学習を簡略化するための標準空間の定義と,幾何学的学習を容易にするために古典的な3Dモーフィブルモデルを組み込んだ。これらの利点は、DynTetで使われる効果的な幾何学的表現によって容易に達成できる。以前の作品と比較すると、dyntetはさまざまなメトリクスによる忠実度、唇の同期、リアルタイムパフォーマンスの大幅な改善を示している。安定して視覚的に魅力的な合成ビデオを生成するだけでなく、多くの新興アプリケーションを可能にすることを約束する動的メッシュも出力する。 Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks to ensure geometric consistency across various motions and viewpoints. DynTet is parameterized by the coordinate-based networks which learn signed distance, deformation, and material texture, anchoring the training data into a predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently decodes textured meshes with a consistent topology, enabling fast rendering through a differentiable rasterizer and supervision via a pixel loss. To enhance training efficiency, we incorporate classical 3D Morphable Models to facilitate geometry learning and define a canonical space for simplifying texture learning. These advantages are readily achievable owing to the effective geometric representation employed in DynTet. Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics. Beyond producing stable and visually appealing synthesis videos, our method also outputs the dynamic meshes which is promising to enable many emerging applications.	翻訳日:2024-02-28 16:51:11 公開日:2024-02-27
# 量子多体断熱性のレート関数モデリング Rate Function Modelling of Quantum Many-Body Adiabaticity ( http://arxiv.org/abs/2402.17415v1 ) ライセンス: Link先を確認	Vibhu Mishra, Salvatore Manamana, Stefan Kehrein	(参考訳) 量子断熱定理 (quantum adiabatic theorem) は、量子力学における基礎的な結果であり、理論と実用の両方に多くの応用がある。本研究では, 観測不能で集中的な量の性質を解析し, 量子多体系を相互作用させるための断熱過程のダイナミクスについて検討する。特に、レート関数 $f(t)$ はランプ時間 $t$ に依存しており、これは多体断熱的忠実度を $t$ の関数として完全に特徴づけ、パラメータ変位 $\delta \lambda$ の強さを与える。これにより、多体システムにおける断熱性の概念を制御および定義することができる。熱力学と断熱限界の相互作用に関する文献のいくつかの重要な結果は、大きな$t$制限において$f(t)$の特性から推測として得られる。 The quantum adiabatic theorem is a fundamental result in quantum mechanics, which has a multitude of applications, both theoretical and practical. Here, we investigate the dynamics of adiabatic processes for interacting quantum many-body systems by analysing the properties of observable-free, intensive quantities. In particular, we study the rate function $f(T)$ in dependence of the ramp time $T$, which gives us a complete characterization of the many-body adiabatic fidelity as a function of $T$ and the strength of the parameter displacement $\Delta \lambda$. This allows us to control and define the notion of adiabaticity in many-body systems. Several key results in the literature regarding the interplay of the thermodynamic and the adiabatic limit are obtained as inferences from the properties of $f(T)$ in the large $T$ limit.	翻訳日:2024-02-28 16:48:15 公開日:2024-02-27
# 特徴変調によるニューラルビデオ圧縮 Neural Video Compression with Feature Modulation ( http://arxiv.org/abs/2402.17414v1 ) ライセンス: Link先を確認	Jiahao Li, Bin Li, Yan Lu	(参考訳) 新たな条件付きコーディングベースのニューラルビデオコーデック(NVC)は、一般的に使用されている残留コーディングベースのコーデックよりも優れている。しかし、NVCの実用性を阻害する重大な問題がある。本稿では,特徴変調による2つの重要な問題を解く,条件付き符号化に基づく強力なNVCを提案する。ひとつは、単一のモデルで幅広い品質範囲をサポートする方法です。以前のNVCでは、平均で約3.8dBのPSNRしかサポートしていない。この制限に対処するため、学習可能な量子化スケーラを用いて現在のフレームの潜時特性を変調する。本研究では,符号化と量子化の調和を改善するために,一様量子化パラメータサンプリング機構を特別に設計する。これにより、量子化スケーラの学習が向上し、NVCが約11.4dBのPSNRの範囲をサポートするのに役立ちます。 2つ目は、NVCを長い予測チェーンの下で機能させる方法だ。我々は, 従来のSOTA NVCは, 時間内設定が大きい場合に, 明らかに品質劣化の問題があることを明らかにした。そこで本研究では,品質向上のための周期的リフレッシュ機構による時間的特徴の変調を提案する。 % 以上の2つの問題を解決する一方で,RGB と YUV のカラースペースをサポートする単一モデルも設計する。特に,フレーム内の単一設定では,従来のSOTA NVCよりも29.7\%のビットレートを削減でき,MACは16\%減少する。私たちのコーデックは、NVC進化の旅で目立ったランドマークとなります。コードはhttps://github.com/microsoft/DCVCにある。 The emerging conditional coding-based neural video codec (NVC) shows superiority over commonly-used residual coding-based codec and the latest NVC already claims to outperform the best traditional codec. However, there still exist critical problems blocking the practicality of NVC. In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation. The first is how to support a wide quality range in a single model. Previous NVC with this capability only supports about 3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent feature of the current frame via the learnable quantization scaler. During the training, we specially design the uniform quantization parameter sampling mechanism to improve the harmonization of encoding and quantization. This results in a better learning of the quantization scaler and helps our NVC support about 11.4 dB PSNR range. The second is how to make NVC still work under a long prediction chain. We expose that the previous SOTA NVC has an obvious quality degradation problem when using a large intra-period setting. To this end, we propose modulating the temporal feature with a periodically refreshing mechanism to boost the quality. %Besides solving the above two problems, we also design a single model that can support both RGB and YUV colorspaces. Notably, under single intra-frame setting, our codec can achieve 29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our codec serves as a notable landmark in the journey of NVC evolution. The codes are at https://github.com/microsoft/DCVC.	翻訳日:2024-02-28 16:48:01 公開日:2024-02-27
# DiffuseKrona:パーソナライズド拡散モデルのためのパラメータ効率的なファインチューニング法 DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model ( http://arxiv.org/abs/2402.17412v1 ) ライセンス: Link先を確認	Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen	(参考訳) 近年のDreamBoothやBLIP-Diffusionのような対象駆動型テキスト・トゥ・イメージ(T2I)生成モデルでは、複雑な微調整要求とかなりのパラメータ要求により、限界に遭遇した。 DreamBooth内のローランク適応(LoRA)モジュールはトレーニング可能なパラメータの削減を提供するが、ハイパーパラメータに顕著な感度を導入し、パラメータ効率とT2Iパーソナライズされた画像合成の品質の妥協につながった。これらの制約に対処し,lora-dreambooth および original dreambooth と比較してパラメータ数を35\%,99.947\%と大幅に減少させるだけでなく,画像合成のクオリティを高める新しいクロネッカー積に基づく適応モジュールである \textbf{\textit{diffusekrona}} を導入する。重要なことに、 \textit{DiffuseKronA} はハイパーパラメータ感度の問題を緩和し、幅広いハイパーパラメータにわたって一貫した高品質な世代を提供する。さらに、より制御可能な分解により、 \textit{diffusekrona} はより解釈しやすくなり、lora-dreambooth に匹敵する結果で最大 50\% 削減できる。多様な複雑な入力画像やテキストプロンプトに対して評価された \textit{DiffuseKronA} は、既存のモデルよりも一貫して優れており、改良された忠実さとオブジェクトのより正確な色分布を持つ高品質の多様な画像を生成する。コードへのリンクと事前トレーニングされたチェックポイントからなる私たちのプロジェクトページは、 \href{https://diffusekrona.github.io/}{https://diffusekrona.github.io/}で利用可能です。 In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at \href{https://diffusekrona.github.io/}{https://diffusekrona.github.io/}.	翻訳日:2024-02-28 16:47:39 公開日:2024-02-27
# 一貫性の問題 - ブラックボックスの観点からのLCMの一貫性を探る Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective ( http://arxiv.org/abs/2402.17411v1 ) ライセンス: Link先を確認	Fufangchen Zhao, Guoqiang Jin, Jiaheng Huang, Rui Zhao and Fei Tan	(参考訳) 現在、商用とオープンソースの両方の学術的 LLM が NLP の主流となっている。しかし、LLMの一貫性に関する研究がまだ不足しているため、LLMの研究と展開の様々な段階において、内部のパラメータと能力は変わらないはずである。この問題は産業と学術の両方に存在している。この問題に対する解決策は、しばしば時間消費と労働集約であり、また二次配備の追加コストがあり、結果として経済的および時間的損失が生じる。このギャップを埋めるために、LLM一貫性タスクデータセットを構築し、いくつかのベースラインを設計する。さらに,本実験では,様々なスケールのモデルを選択する。具体的には、LightGBM実験において、従来のNLGメトリクス(ROUGE、BLEU、METEOR)をモデルトレーニングに必要な機能として使用しました。最終結果は、手動評価とGPT3.5、およびメイン実験における他のモデルを超え、最高の性能を達成する。最終的には、最高のパフォーマンスのLightGBMモデルをベースモデルとして使用して評価ツールを構築し、ビジネスモデルの展開を効果的に支援します。私たちのコードとツールのデモはhttps://github.com/heavenhellchen/consistency.gitで利用可能です。 Nowadays both commercial and open-source academic LLM have become the mainstream models of NLP. However, there is still a lack of research on LLM consistency, meaning that throughout the various stages of LLM research and deployment, its internal parameters and capabilities should remain unchanged. This issue exists in both the industrial and academic sectors. The solution to this problem is often time-consuming and labor-intensive, and there is also an additional cost of secondary deployment, resulting in economic and time losses. To fill this gap, we build an LLM consistency task dataset and design several baselines. Additionally, we choose models of diverse scales for the main experiments. Specifically, in the LightGBM experiment, we used traditional NLG metrics (i.e., ROUGE, BLEU, METEOR) as the features needed for model training. The final result exceeds the manual evaluation and GPT3.5 as well as other models in the main experiment, achieving the best performance. In the end, we use the best performing LightGBM model as the base model to build the evaluation tool, which can effectively assist in the deployment of business models. Our code and tool demo are available at https://github.com/heavenhellchen/Consistency.git	翻訳日:2024-02-28 16:46:57 公開日:2024-02-27
# ノイズ伝搬解析のためのフーリエ領域補間ニューラルネットワークの新たな画像空間形式 A novel image space formalism of Fourier domain interpolation neural networks for noise propagation analysis ( http://arxiv.org/abs/2402.17410v1 ) ライセンス: Link先を確認	Peter Dawood, Felix Breuer, Istvan Homolya, Jannik Stebani, Maximilian Gram, Peter M. Jakob, Moritz Zaiss, Martin Blaimer	(参考訳) 目的:MRI再構成におけるフーリエ領域補間のための多層畳み込みニューラルネットワーク(CNN)の画像空間定式化とCNN推論時の雑音伝搬を解析的に推定する。理論と方法:複素数値整流器線形単位を用いたフーリエ領域(k-空間としても知られる)の非線形活性化は、活性化マスクとの要素積として表される。この操作は、画像空間内の畳み込みに変換される。 k-空間でのネットワークトレーニングの後、この手法は、画像空間内のネットワークへの入力テンソルとして機能する、エイリアスコイル画像に対する再構成画像の微分に対する代数的表現を提供する。これにより、ネットワーク推論のばらつきを解析的に推定し、ノイズ特性を記述することができる。自動微分に基づくモンテカルロシミュレーションと数値手法を用いて検証を行った。このフレームワークは、invivoの脳画像の逆行例でテストされた。結果: 画像領域で実行される推論は、k-空間における推論と準同一性を持ち、対応する量的指標によって導かれる。解析式から得られたノイズ分散マップは,モンテカルロシミュレーションおよび自動微分法により得られたものに対応する。ノイズレジリエンスは、古典的並列画像の場合のように、よく特徴付けられる。コモルゴロフ・スミルノフ実験はモンテカルロシミュレーションにより得られた分散写像におけるボクセル等級のガウス分布を実証した。結論:k空間補間のためのニューラルネットワークの準同値画像空間形式は、従来の並列画像法における幾何因子写像と類似したcnn推論中のノイズ特性の高速かつ正確な記述を可能にする。 Purpose: To develop an image space formalism of multi-layer convolutional neural networks (CNNs) for Fourier domain interpolation in MRI reconstructions and analytically estimate noise propagation during CNN inference. Theory and Methods: Nonlinear activations in the Fourier domain (also known as k-space) using complex-valued Rectifier Linear Units are expressed as elementwise multiplication with activation masks. This operation is transformed into a convolution in the image space. After network training in k-space, this approach provides an algebraic expression for the derivative of the reconstructed image with respect to the aliased coil images, which serve as the input tensors to the network in the image space. This allows the variance in the network inference to be estimated analytically and to be used to describe noise characteristics. Monte-Carlo simulations and numerical approaches based on auto-differentiation were used for validation. The framework was tested on retrospectively undersampled invivo brain images. Results: Inferences conducted in the image domain are quasi-identical to inferences in the k-space, underlined by corresponding quantitative metrics. Noise variance maps obtained from the analytical expression correspond with those obtained via Monte-Carlo simulations, as well as via an auto-differentiation approach. The noise resilience is well characterized, as in the case of classical Parallel Imaging. Komolgorov-Smirnov tests demonstrate Gaussian distributions of voxel magnitudes in variance maps obtained via Monte-Carlo simulations. Conclusion: The quasi-equivalent image space formalism for neural networks for k-space interpolation enables fast and accurate description of the noise characteristics during CNN inference, analogous to geometry-factor maps in traditional parallel imaging methods.	翻訳日:2024-02-28 16:46:37 公開日:2024-02-27
# 超高速クロック周波数量子プロセッサのための非ガウス量子状態の高速生成と状態トモグラフィー High-rate Generation and State Tomography of Non-Gaussian Quantum States for Ultra-fast Clock Frequency Quantum Processors ( http://arxiv.org/abs/2402.17408v1 ) ライセンス: Link先を確認	Akito Kawasaki, Ryuhoh Ide, Hector Brunel, Takumi Suzuki, Rajveer Nehra, Katsuki Nakashima, Takahiro Kashiwazaki, Asuka Inoue, Takeshi Umeki, Fumihiro China, Masahiro Yabuno, Shigehito Miki, Hirotaka Terai, Taichi Yamashima, Atsushi Sakaguchi, Kan Takase, Mamoru Endo, Warit Asavanant, Akira Furusawa	(参考訳) 量子情報プロセッサは、デコヒーレンスによって洗い流される前に量子の利点を十分に活用するため、高クロック周波数の恩恵を受ける。この追従において、全光学系は、固有の100 THzキャリア周波数の利点があり、THzクロック周波数プロセッサを開発することができる。実際には、量子光源と測定装置の帯域幅はMHz範囲に制限されており、非古典状態の生成速度はkHzオーダーに制限されている。本研究では,光パラメトリック増幅器(OPA)を圧縮光源とし,光位相感度増幅器(PSA)を応用して,ガウス非ガウス状態の高速発生とその量子トモグラフィーを実現する。我々は,6thz光源,6thz psa,66ghzホモダイン検出器からなる状態生成・測定システムを開発した。このシステムでは、非ガウス状態発生を0.9MHz(現在の最先端実験より約3桁高い)で連続波レーザーを用いたサブナノ秒波パケットで実証することに成功した。性能は超伝導検出器のジッタのみに制限されており、現在は光学系や電子系ではなく、圧縮光帯域幅を1ghzに制限している。したがって、超伝導検出器のタイミングジッタの制限を克服できれば、光量子プロセッサにおける非ガウス状態の生成と検出は、OPAで可能かもしれない。 Quantum information processors greatly benefit from high clock frequency to fully harnessing the quantum advantages before they get washed out by the decoherence. In this pursuit, all-optical systems offer unique advantages due to their inherent 100 THz carrier frequency, permitting one to develop THz clock frequency processors. In practice, the bandwidth of the quantum light sources and the measurement devices has been limited to the MHz range and the generation rate of nonclassical states to kHz order -- a tiny fraction of what can be achieved. In this work, we go beyond this limitation by utilizing optical parametric amplifier (OPA) as a squeezed-light source and optical phase-sensitive amplifiers (PSA) to realize high-rate generation of broadband non-Gaussian states and their quantum tomography. Our state generation and measurement system consists of a 6-THz squeezed-light source, a 6-THz PSA, and a 66-GHz homodyne detector. With this system, we have successfully demonstrated non-Gaussian state generation at a 0.9 MHz rate -- almost three orders of magnitude higher than the current state-of-the-art experiments -- with a sub-nanosecond wave packet using continuous-wave laser. The performance is constrained only by the superconducting detector's jitter which currently limits the usable bandwidth of the squeezed light to 1 GHz, rather than the optical and electronic systems. Therefore, if we can overcome the limitation of the timing jitter of superconducting detector, non-Gaussian state generation and detection at GHz rate, or even THz rate, for optical quantum processors might be possible with OPAs.	翻訳日:2024-02-28 16:46:10 公開日:2024-02-27
# アルゴリズム問題解決のためのニューラルネットワーク書き換えシステム A Neural Rewriting System to Solve Algorithmic Problems ( http://arxiv.org/abs/2402.17407v1 ) ライセンス: Link先を確認	Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti	(参考訳) 現代のニューラルネットワークアーキテクチャは、分散問題のインスタンスを解決するために構成規則を体系的に適用する必要があるアルゴリズムの手順を学ぶのに未だに苦労している。本研究では,記号型人工知能の古典的枠組みである書き換えシステムに触発されたアルゴリズムタスクを学習する手法を提案する。対象のサブ表現をプロセスに識別するセレクタと、対応する結果を計算することでサブ表現を単純化するソルバと、サブ表現を提供されたソリューションに置き換えて元の式の新しいバージョンを生成するコンビネータである。我々は、リスト、算術、代数式を含む記号式を単純化する必要のある3種類のアルゴリズム的タスクでモデルを評価する。提案アーキテクチャの補間性能を,訓練中に見られたものよりも多くのオペランドとネストレベルを含む公式を用いて検証し,その性能をニューラルデータルータ,系統一般化に特化した最近のモデル,そして高度なプロンプト戦略で探索された最先端の大規模言語モデル(GPT-4)と比較した。 Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we propose an original approach to learn algorithmic tasks inspired by rewriting systems, a classic framework in symbolic artificial intelligence. We show that a rewriting system can be implemented as a neural architecture composed by specialized modules: the Selector identifies the target sub-expression to process, the Solver simplifies the sub-expression by computing the corresponding result, and the Combiner produces a new version of the original expression by replacing the sub-expression with the solution provided. We evaluate our model on three types of algorithmic tasks that require simplifying symbolic formulas involving lists, arithmetic, and algebraic expressions. We test the extrapolation capabilities of the proposed architecture using formulas involving a higher number of operands and nesting levels than those seen during training, and we benchmark its performance against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies.	翻訳日:2024-02-28 16:45:22 公開日:2024-02-27
# LSPT:視覚表現学習のための長期空間プロンプトチューニング LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning ( http://arxiv.org/abs/2402.17406v1 ) ライセンス: Link先を確認	Shentong Mo, Yansen Wang, Xufang Luo, Dongsheng Li	(参考訳) ビジュアルプロンプトチューニング(VPT)技術は、事前訓練された視覚変換器(ViT)をプロンプトと呼ばれる特別な学習可能なトークンを使用して下流の視覚タスクに適応させる能力で有名になった。現代のVPT方法論、特に自己監督型視覚変換器を使用する場合、しばしば新しい学習可能なプロンプトを導入するか、モデルの以前のブロックから主に引き出されたプロンプトトークンをゲートする。このようなアプローチにおける重要な監視は、各自己監督型ViT内のプロンプトの源として、長距離前のブロックの可能性を利用することができないことである。この重要なギャップを埋めるために、視覚表現学習の革新的アプローチであるLSPT(Long-term Spatial Prompt Tuning)を導入する。 LSPTは人間の脳の複雑さからインスピレーションを得て、長期のゲートプロンプトを巧みに取り入れている。この機能は時間的コーディングとして機能し、以前のブロックから取得したパラメータを忘れるリスクを抑制する。 LSPTはその技術をさらに強化し、空間符号化としてプレイパッチトークンを導入している。戦略的には、クラスを意識した特徴を永久に蓄積し、視覚カテゴリーの識別と識別におけるモデルの長所を固めるように設計されている。提案手法の有効性を検証するため、5つのFGVCと19のVTAB-1Kベンチマークで厳密な実験を行った。実験の結果,lsptの優位性が強調され,視覚プロンプトチューニング性能における新しいベンチマークの設定能力が示された。 Visual Prompt Tuning (VPT) techniques have gained prominence for their capacity to adapt pre-trained Vision Transformers (ViTs) to downstream visual tasks using specialized learnable tokens termed as prompts. Contemporary VPT methodologies, especially when employed with self-supervised vision transformers, often default to the introduction of new learnable prompts or gated prompt tokens predominantly sourced from the model's previous block. A pivotal oversight in such approaches is their failure to harness the potential of long-range previous blocks as sources of prompts within each self-supervised ViT. To bridge this crucial gap, we introduce Long-term Spatial Prompt Tuning (LSPT) - a revolutionary approach to visual representation learning. Drawing inspiration from the intricacies of the human brain, LSPT ingeniously incorporates long-term gated prompts. This feature serves as temporal coding, curbing the risk of forgetting parameters acquired from earlier blocks. Further enhancing its prowess, LSPT brings into play patch tokens, serving as spatial coding. This is strategically designed to perpetually amass class-conscious features, thereby fortifying the model's prowess in distinguishing and identifying visual categories. To validate the efficacy of our proposed method, we engaged in rigorous experimentation across 5 FGVC and 19 VTAB-1K benchmarks. Our empirical findings underscore the superiority of LSPT, showcasing its ability to set new benchmarks in visual prompt tuning performance.	翻訳日:2024-02-28 16:44:45 公開日:2024-02-27
# Soraは幾何学的一貫性を損なうビデオを生成する Sora Generates Videos with Stunning Geometrical Consistency ( http://arxiv.org/abs/2402.17403v1 ) ライセンス: Link先を確認	Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou and Ming-Ming Cheng	(参考訳) 最近開発されたSoraモデル[1]は、ビデオ生成において顕著な能力を示し、実世界の現象をシミュレートする能力に関する激しい議論を引き起こした。人気が高まっているにもかかわらず、実世界の物理学への忠実さを定量的に評価する確立した指標が不足している。本稿では,実世界の物理原理に固執した映像の質を評価するための新しいベンチマークを提案する。生成した映像を3次元モデルに変換する手法を用いて,3次元再構成の精度が映像品質に大きく影響しているという前提を生かした。 3次元再構成の観点からは,構築した3次元モデルが満足する幾何学的制約の忠実性を用いて,生成した映像が実世界の物理法則に適合する程度を測定する。プロジェクトページ: https://sora-geometrical-consistency.github.io/ The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. In this paper, we introduce a new benchmark that assesses the quality of the generated videos based on their adherence to real-world physics principles. We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality. From the perspective of 3D reconstruction, we use the fidelity of the geometric constraints satisfied by the constructed 3D models as a proxy to gauge the extent to which the generated videos conform to real-world physics rules. Project page: https://sora-geometrical-consistency.github.io/	翻訳日:2024-02-28 16:44:04 公開日:2024-02-27
# beacon - フロー制御のための軽量深層強化学習ベンチマークライブラリ Beacon, a lightweight deep reinforcement learning benchmark library for flow control ( http://arxiv.org/abs/2402.17402v1 ) ライセンス: Link先を確認	Jonathan Viquerat and Philippe Meliga and Pablo Jeken and Elie Hachem	(参考訳) 近年,流れ制御問題に対する深部強化学習の利用が増加し,数値流体力学環境の制御に対する既存アルゴリズムの結合と適応に着目した新たな研究領域が生まれている。まだ初期段階だが、この分野は短期間で複数の成功を収めており、その迅速な開発ペースは、コミュニティの拡大を推進するオープンソースの取り組みに部分的に影響を与えている。しかし、この新興ドメインは依然として共通の根拠を逃している。 (i)結果の再現性を確保すること、 (ii)適切なアドホックなベンチマークベースを提供する。そこで本研究では,様々な特性,動作・観測空間特性,cpu要件の7つの軽量1dおよび2dフロー制御問題からなる,オープンソースのベンチマークライブラリであるbeaconを提案する。このコントリビューションでは、考慮された7つの問題を説明し、参照制御ソリューションを提供する。以下の作業のソースはhttps://github.com/jviquerat/beaconにある。 Recently, the increasing use of deep reinforcement learning for flow control problems has led to a new area of research, focused on the coupling and the adaptation of the existing algorithms to the control of numerical fluid dynamics environments. Although still in its infancy, the field has seen multiple successes in a short time span, and its fast development pace can certainly be partly imparted to the open-source effort that drives the expansion of the community. Yet, this emerging domain still misses a common ground to (i) ensure the reproducibility of the results, and (ii) offer a proper ad-hoc benchmarking basis. To this end, we propose Beacon, an open-source benchmark library composed of seven lightweight 1D and 2D flow control problems with various characteristics, action and observation space characteristics, and CPU requirements. In this contribution, the seven considered problems are described, and reference control solutions are provided. The sources for the following work are available at https://github.com/jviquerat/beacon.	翻訳日:2024-02-28 16:43:37 公開日:2024-02-27
# 位相差測定のための量子エンタングルメント有効楕円計 Quantum entanglement enabled ellipsometer for phase retardance measurement ( http://arxiv.org/abs/2402.17401v1 ) ライセンス: Link先を確認	Meng-Yu Xie, Su-Jian Niu, Yin-Hai Li, Zheng Ge, Ming-Yuan Gao, Zhao-Qi-Zhi Han, Ren-Hui Chen, Zhi-Yuan Zhou, and Bao-Sen Shi	(参考訳) エリプソメータ(英: ellipsometer)は、膜厚、光学定数、構造プロファイルなどの多くの分野において、光学パラメーターを幅広い用途で測定するために用いられる重要な精度測定ツールである。しかし, 過度な入力光子により, 光感度材料の精密測定は大きな障害に遭うため, 低入射光強度下での検出精度の向上の必要性は, 精度測定に欠かせない課題である。本研究は, 偏光結合型光子源と古典的な透過型エリプソメータを組み合わせることにより, 量子エリプソメータとPSA (Polarizer-Sample-Analyzer) とセナーマウント法を組み合わせて, 複屈折材料の位相遅延を測定する。実験の結果, 極端に低い入力強度でナノメートルスケールに到達でき, 補償器を用いて試験した試料の安定性は1%以内であることが判明した。我々の研究は、低入射光強度での精密測定への道を開き、感光材料、活性生物試料、その他の遠隔監視シナリオの計測に潜在的に応用する。 An ellipsometer is a vital precision tool used for measuring optical parameters with wide applications in many fields, including accurate measurements in film thickness, optical constants, structural profiles, etc. However, the precise measurement of photosensitive materials meets huge obstacles because of the excessive input photons, therefore the requirement of enhancing detection accuracy under low incident light intensity is an essential topic in the precision measurement. In this work, by combining a polarization-entangled photon source with a classical transmission-type ellipsometer, the quantum ellipsometer with the PSA (Polarizer-Sample-Analyzer) and the Senarmount method is constructed firstly to measure the phase retardation of the birefringent materials. The experimental results show that the accuracy can reach to nanometer scale at extremely low input intensity, and the stability are within 1% for all specimens tested with a compensator involved. Our work paves the way for precision measurement at low incident light intensity, with potential applications in measuring photosensitive materials, active-biological samples and other remote monitoring scenarios.	翻訳日:2024-02-28 16:43:12 公開日:2024-02-27
# 大規模言語モデルにおける継続事前学習の考察:洞察と意味 Investigating Continual Pretraining in Large Language Models: Insights and Implications ( http://arxiv.org/abs/2402.17400v1 ) ライセンス: Link先を確認	\c{C}a\u{g}atay Y{\i}ld{\i}z, Nishaanth Kanna Ravichandran, Prishruit Punia, Matthias Bethge, Beyza Ermis	(参考訳) 本稿では,大規模言語モデル(llm)における進化途上の連続学習領域(cl)について検討し,効率的かつ持続的な学習のための戦略の開発に焦点をあてる。ドメイン固有の識別に頼ることなく、学習済みの知識を維持し、ドメイン間の知識伝達を向上しながら、様々なドメインからの新しい情報を統合する能力を備えたLLMを設計するプロセスである。タスクやドメインの限られた選択に集中し,主に忘れの問題に対処する従来の研究とは異なり,本研究では,LLMの実践シナリオにおけるデータランドスケープの変化に対する適応性と能力を評価する。この目的のために,これらの発展するデータ環境へのllmの適用性を測定するための新しいベンチマークを導入し,総合的な評価フレームワークを提供する。モデルサイズが学習の効率性や忘れに及ぼす影響や、新興ドメインの進行と類似性がこれらのモデル内の知識伝達に与える影響について検討する。私たちの発見は、いくつかの重要な洞察を明らかにする。 i) ドメインのシーケンスがセマンティックな類似性を示す場合、連続的な事前訓練により、LCMはスタンドアローンの微調整に比べて、現在のドメインでより専門化することができる。 (ii)様々な分野にわたる訓練は、知識伝達の後方及び前方の両方を増強し、また、 3) より小さなモデルは特に継続事前学習に敏感であり, 忘れることと学習の両方において最も重要な割合を示す。我々は,LLMにおけるCL調査のためのより現実的なベンチマークの確立に向けての我々の研究の転換であり,今後の研究の方向性を導く上で重要な役割を果たす可能性があることを示唆する。 This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.	翻訳日:2024-02-28 16:42:37 公開日:2024-02-27
# 合成マイノリティオーバーサンプリング技術(smote)への量子的アプローチ A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE) ( http://arxiv.org/abs/2402.17398v1 ) ライセンス: Link先を確認	Nishikanta Mohanty, Bikash K. Behera and Christopher Ferrie	(参考訳) 本稿では,機械学習データセットにおけるクラス不均衡の問題を解くために,量子コンピューティング技術を用いた新しい解法であるQuantum-SMOTE法を提案する。シンセティックマイノリティオーバーサンプリング技術(SMOTE)にインスパイアされた量子SMOTEは、スワップテストや量子回転といった量子プロセスを用いて合成データポイントを生成する。このプロセスは、k-ネアレスト近傍 (knn) とユークリッド距離 (euclidean distances) を用いた従来のスモートアルゴリズムと異なり、近隣に頼らずにマイノリティクラスデータポイントから合成インスタンスを生成することができる。このアルゴリズムは、特定のデータセット要求に対するカスタマイズを可能にする回転角、マイノリティパーセンテージ、分割係数などのハイパーパラメータを導入することで、合成データ生成プロセスに対するより大きな制御を主張する。このアプローチはtelecomchurnの公開データセット上でテストされ、ランダムフォレストとロジスティック回帰という2つの著名な分類アルゴリズムと共に評価され、その影響と合成データのさまざまな比率を決定する。 The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional SMOTE algorithm's usage of K-Nearest Neighbors (KNN) and Euclidean distances, enabling synthetic instances to be generated from minority class data points without relying on neighbor proximity. The algorithm asserts greater control over the synthetic data generation process by introducing hyperparameters such as rotation angle, minority percentage, and splitting factor, which allow for customization to specific dataset requirements. The approach is tested on a public dataset of TelecomChurn and evaluated alongside two prominent classification algorithms, Random Forest and Logistic Regression, to determine its impact along with varying proportions of synthetic data.	翻訳日:2024-02-28 16:42:09 公開日:2024-02-27
# アルゴリズム問題に対するベンチマークgpt-4:プロンプト戦略の体系的評価 Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies ( http://arxiv.org/abs/2402.17396v1 ) ライセンス: Link先を確認	Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti	(参考訳) 大規模言語モデル(LLM)は、さまざまな下流タスクにおいて大量のテキストコーパスで得られた知識を、最小限の(もしあれば)チューニングステップで再利用する能力によって、自然言語処理の分野に革命をもたらした。同時に、LLMには体系的な一般化が欠如していることが繰り返し示されており、学習された統計正則をトレーニング分布の外へ外挿することができる。本研究では、2つのパラメータで問題の難易度を制御できることを特徴とする3つのアルゴリズム的タスクに対して、GPT-4の体系的ベンチマークを行う。我々は、GPT-4の性能を前身(GPT-3.5)の性能と比較し、最近導入されたTransformer-Encoderアーキテクチャの変種であるNeural Data Routerと比較した。高度なプロンプト技術の導入により,gpt-4はすべてのタスクにおいて優れた精度を達成でき,体系的な一般化を必要とする課題においても,最先端のllmが極めて強力なベースラインとなることを示した。 Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps. At the same time, it has been repeatedly shown that LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available, on three algorithmic tasks characterized by the possibility to control the problem difficulty with two parameters. We compare the performance of GPT-4 with that of its predecessor (GPT-3.5) and with a variant of the Transformer-Encoder architecture recently introduced to solve similar tasks, the Neural Data Router. We find that the deployment of advanced prompting techniques allows GPT-4 to reach superior accuracy on all tasks, demonstrating that state-of-the-art LLMs constitute a very strong baseline also in challenging tasks that require systematic generalization.	翻訳日:2024-02-28 16:41:47 公開日:2024-02-27
# 量子プロセッサの周波数チューニングのためのジョセフソン接合の電子ビームアニール Electron-beam annealing of Josephson junctions for frequency tuning of quantum processors ( http://arxiv.org/abs/2402.17395v1 ) ライセンス: Link先を確認	Yashwanth Balaji, Narendra Acharya, Robert Armstrong, Kevin G. Crawford, Sergey Danilin, Thomas Dixon, Oscar W. Kennedy, Renuka Devi Pothuraju, Kowsar Shahbazi, Connor D. Shelly	(参考訳) 超伝導量子ビットは、大規模量子コンピュータを実現するための有望な経路である。大規模超伝導量子プロセッサの実現における重要な課題は、周波数衝突の緩和である。本稿では,定周波量子ビットを電子ビームで調整し,ジョセフソン接合部を局所的に焼成する手法を提案する。接合バリア抵抗の増大と減少を両立する能力を示す。本手法は,我々のキュービットアーキテクチャにおける周波数衝突の評価により,ウェハスケールの周波数目標化の改善を示す。コヒーレンス測定は、チューニング前後のパフォーマンスを評価するためにも行われる。チューニングプロセスは標準的な電子ビームリソグラフィーシステムを利用し、ジョセフソン接合を作製できる任意のグループによる再現性と実装を保証する。この手法は、大規模量子コンピューティングシステムの性能を大幅に向上させ、量子コンピューティングの将来への道を開く可能性がある。 Superconducting qubits are a promising route to achieving large-scale quantum computers. A key challenge in realising large-scale superconducting quantum processors involves mitigating frequency collisions. In this paper, we present an approach to tuning fixed-frequency qubits with the use of an electron beam to locally anneal the Josephson junction. We demonstrate the ability to both increase and decrease the junction barrier resistance. The technique shows an improvement in wafer scale frequency targetting by assessing the frequency collisions in our qubit architecture. Coherence measurements are also done to evaluate the performance before and after tuning. The tuning process utilises a standard electron beam lithography system, ensuring reproducibility and implementation by any group capable of fabricating these Josephson junctions. This technique has the potential to significantly improve the performance of large-scale quantum computing systems, thereby paving the way for the future of quantum computing.	翻訳日:2024-02-28 16:41:27 公開日:2024-02-27
# DS-Agent:ケースベース推論による大規模言語モデルを活用したデータサイエンスの自動化 DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning ( http://arxiv.org/abs/2402.17453v1 ) ライセンス: Link先を確認	Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, Jun Wang	(参考訳) 本研究では,データサイエンスタスクを自動化するための大規模言語モデル(llms)ベースのエージェントの可能性について,タスク要件の理解と,最適な機械学習モデルの構築とトレーニングを目標として検討する。その成功にもかかわらず、既存のLLMエージェントは、このシナリオ内で不合理な実験計画を生成することで妨げられている。この目的のために, LLMエージェントとケースベース推論(CBR)を利用した新しい自動フレームワークDS-Agentを提案する。開発段階では、DS-AgentはCBRフレームワークに従って自動イテレーションパイプラインを構築し、Kaggleから専門家の知識を柔軟に活用し、フィードバックメカニズムを通じて一貫したパフォーマンス改善を促進する。さらにDS-Agentは、開発段階で成功したソリューションを直接コード生成に適応させるため、シンプルなCBRパラダイムで低リソースのデプロイメントステージを実装しており、LCMの基本能力に対する需要を著しく減らしている。 GPT-4を用いたDS-Agentは、開発段階では前例のない100%の成功率を達成し、デプロイ段階では、代替LLMの平均1パスレートを36%改善した。どちらの段階でもDS-AgentはGPT-4で1ラン当たり1.60ドルと0.13ドルという最高の成績を収めている。 In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing \$1.60 and \$0.13 per run with GPT-4, respectively.	翻訳日:2024-02-28 16:36:44 公開日:2024-02-27
# レシピの深層学習に基づく名前付きエンティティ認識モデル Deep Learning Based Named Entity Recognition Models for Recipes ( http://arxiv.org/abs/2402.17447v1 ) ライセンス: Link先を確認	Mansi Goel, Ayush Agarwal, Shubham Agrawal, Janak Kapuriya, Akhil Vamshi Konam, Rishabh Gupta, Shrey Rastogi, Niharika, and Ganesh Bagler	(参考訳) 食べ物は、風味、栄養、健康、持続可能性など、さまざまな取り組みを通じて私たちの生活に触れます。レシピは、非構造化テキストを介して世代にわたって伝達される文化カプセルである。名前付きエンティティを認識するための自動プロトコルであるレシピテキストのビルディングブロックは、情報抽出から新しいレシピ生成に至るまで、さまざまなアプリケーションにとって大きな価値を持つ。名前付きエンティティ認識は、未知または半構造化データから既知のラベルで情報を抽出する技術である。 6,611句の注釈付きデータから,26,445句の累積的な拡張データセットを作成した。同時に,レシピデータベースであるgold-standard recipe data repository から成分句を整理・分析し,stanford ner を用いてアノテートした。この分析に基づいて,マシンアノテートデータセットの作成には多様性を維持しつつ,クラスタリングに基づく手法を用いて88,526句のサブセットをサンプリングした。深層学習に基づく言語モデルの統計的、微調整と、大規模言語モデル(LLM)へのわずかなプロンプトを含む、これらの3つのデータセットに対するNERアプローチの徹底的な調査は、深い洞察を提供する。 llms上でのマイショット・プロンプトは漸近的性能を持つが,マクロf1スコアの95.9%,96.04%,95.71%が手動アノテーション付き,拡張型,機械アノテーション付きデータセットにおいて,微調整されたスペイサーが最良モデルとして出現する。 Food touches our lives through various endeavors, including flavor, nourishment, health, and sustainability. Recipes are cultural capsules transmitted across generations via unstructured text. Automated protocols for recognizing named entities, the building blocks of recipe text, are of immense value for various applications ranging from information extraction to novel recipe generation. Named entity recognition is a technique for extracting information from unstructured or semi-structured data with known labels. Starting with manually-annotated data of 6,611 ingredient phrases, we created an augmented dataset of 26,445 phrases cumulatively. Simultaneously, we systematically cleaned and analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER. Based on the analysis, we sampled a subset of 88,526 phrases using a clustering-based approach while preserving the diversity to create the machine-annotated dataset. A thorough investigation of NER approaches on these three datasets involving statistical, fine-tuning of deep learning-based language models and few-shot prompting on large language models (LLMs) provides deep insights. We conclude that few-shot prompting on LLMs has abysmal performance, whereas the fine-tuned spaCy-transformer emerges as the best model with macro-F1 scores of 95.9%, 96.04%, and 95.71% for the manually-annotated, augmented, and machine-annotated datasets, respectively.	翻訳日:2024-02-28 16:36:21 公開日:2024-02-27
# Ansible Lightspeed: IT自動化のためのコード生成サービス Ansible Lightspeed: A Code Generation Service for IT Automation ( http://arxiv.org/abs/2402.17442v1 ) ライセンス: Link先を確認	Priyam Sahoo, Saurabh Pujar, Ganesh Nalawade, Richard Gebhardt, Louis Mandel, Luca Buratti	(参考訳) コードを生成するLarge Language Models(LLMs)が利用可能になったことで、開発者の生産性を向上させるツールの開発が可能になった。開発者がソフトウェアを書くのに使用する統合開発環境やIDEは、しばしばLLMと対話するためのインターフェースとして使用される。多くのツールがリリースされたが、ほとんどが汎用プログラミング言語に焦点を当てている。 IT自動化に不可欠なようなドメイン固有言語はあまり注目されていません。 Ansibleは、YAMLベースのIT自動化専用言語である。 red hat ansible lightspeed with ibm watson code assistant、別名ansible lightspeedは、自然言語からansibleコード生成に明示的に設計されたllmベースのサービスである。本稿では,ansible lightspeedサービスの設計と実装について述べるとともに,数千人の実ユーザからのフィードバックを分析する。利用者の感情とともに、即時および拡張された利用パターンによって分類された多様なパフォーマンス指標について検討した。分析の結果、Ansible Lightspeed提案のユーザ受け入れ率は、より汎用的で、プログラミング言語に特有でない同等のツールよりも高いことがわかった。これは、受け入れられたモデル提案と見なされるものに対してより厳密な基準を使用した後でも事実であり、受け入れられた後に大々的に編集された提案を破棄する。比較的高い受け入れ率は、期待以上のユーザ保持と概ね肯定的なユーザフィードバックをもたらす。本稿では,ドメイン固有言語上で,比較的小さな専用モデルがどのように機能するか,さらにユーザからどのように受信されるのかについて考察する。 The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which developers use to write software are often used as an interface to interact with LLMs. Although many such tools have been released, almost all of them focus on general-purpose programming languages. Domain-specific languages, such as those crucial for IT automation, have not received much attention. Ansible is one such YAML-based IT automation-specific language. Red Hat Ansible Lightspeed with IBM Watson Code Assistant, further referred to as Ansible Lightspeed, is an LLM-based service designed explicitly for natural language to Ansible code generation. In this paper, we describe the design and implementation of the Ansible Lightspeed service and analyze feedback from thousands of real users. We examine diverse performance indicators, classified according to both immediate and extended utilization patterns along with user sentiments. The analysis shows that the user acceptance rate of Ansible Lightspeed suggestions is higher than comparable tools that are more general and not specific to a programming language. This remains true even after we use much more stringent criteria for what is considered an accepted model suggestion, discarding suggestions which were heavily edited after being accepted. The relatively high acceptance rate results in higher-than-expected user retention and generally positive user feedback. This paper provides insights on how a comparatively small, dedicated model performs on a domain-specific language and more importantly, how it is received by users.	翻訳日:2024-02-28 16:35:55 公開日:2024-02-27
# ハイパーパラメータの原則的アーキテクチャ対応スケーリング Principled Architecture-aware Scaling of Hyperparameters ( http://arxiv.org/abs/2402.17440v1 ) ライセンス: Link先を確認	Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin	(参考訳) 高品質のディープニューラルネットワークをトレーニングするには、適切なハイパーパラメータを選択する必要がある。現在の作業では、ハイパーパラメータの原則を最適化したり、設計したりすることを試みている。しかしながら、ほとんどの設計や最適化手法はネットワーク構造の選択に依存しないため、ニューラルアーキテクチャがハイパーパラメータに与える影響を無視する。本研究では,ネットワークの深さ,幅,畳み込みカーネルサイズ,接続パターンを含むネットワークアーキテクチャに対する初期化と最大学習率の依存性を正確に特徴付ける。プリアクティベーションの平均2乗変化ですべてのパラメータを最大に更新することで、高度なグラフトポロジによるmlp(multi-layer perception)とcnn(convolutional neural network)間の初期化と学習率を一般化することができる。包括的な実験で原則を検証する。さらに重要なことに、当社の戦略はアーキテクチャ設計の現在のベンチマークの進展に光を当てています。 AutoMLアルゴリズムの公正な比較には、正確なネットワークランキングが必要である。しかし,アーキテクチャを意識した学習率と初期化によるベンチマークでは,ネットワークのランク付けがより優れたトレーニングネットワークによって容易に変更可能であることを示す。 Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. Current works try to automatically optimize or design principles of hyperparameters, such that they can generalize to diverse unseen scenarios. However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture, which includes the network depth, width, convolutional kernel size, and connectivity patterns. By pursuing every parameter to be maximally updated with the same mean squared change in pre-activations, we can generalize our initialization and learning rates across MLPs (multi-layer perception) and CNNs (convolutional neural network) with sophisticated graph topologies. We verify our principles with comprehensive experiments. More importantly, our strategy further sheds light on advancing current benchmarks for architecture design. A fair comparison of AutoML algorithms requires accurate network rankings. However, we demonstrate that network rankings can be easily changed by better training networks in benchmarks with our architecture-aware learning rates and initialization.	翻訳日:2024-02-28 16:35:30 公開日:2024-02-27
# V2C-Long:経時的対応による縦隔皮質再建術 V2C-Long: Longitudinal Cortex Reconstruction with Spatiotemporal Correspondence ( http://arxiv.org/abs/2402.17438v1 ) ライセンス: Link先を確認	Fabian Bongratz, Jan Fecht, Anne-Marie Rickmann, Christian Wachinger	(参考訳) 縦型mriから皮質を再構築することは、ヒトの脳の形態変化を分析する上で不可欠である。最近の深層学習による皮質表面再構成の混乱にもかかわらず、縦断データから生じる課題は依然として続いている。特に強時空間対応の欠如は、導入した雑音による下流解析を妨げる。この問題に対処するため,V2C-Longは縦型MRIのための,初めて専用の深層学習型大脳皮質再建法である。既存の方法とは対照的に、V2C-Long曲面は断面的および縦方向の方法で直接比較される。 2つのディープメッシュ変形ネットワークの新規構成と特徴強化されたイントラサブジェクトテンプレートの高速集約により,強固有時空間対応を確立する。内部および外部試験データから,V2C-Longは従来法に比べて精度と整合性が改善された。最後に、この改善はアルツハイマー病の局所性皮質萎縮に対する高い感受性を示す。 Reconstructing the cortex from longitudinal MRI is indispensable for analyzing morphological changes in the human brain. Despite the recent disruption of cortical surface reconstruction with deep learning, challenges arising from longitudinal data are still persistent. Especially the lack of strong spatiotemporal point correspondence hinders downstream analyses due to the introduced noise. To address this issue, we present V2C-Long, the first dedicated deep learning-based cortex reconstruction method for longitudinal MRI. In contrast to existing methods, V2C-Long surfaces are directly comparable in a cross-sectional and longitudinal manner. We establish strong inherent spatiotemporal correspondences via a novel composition of two deep mesh deformation networks and fast aggregation of feature-enhanced within-subject templates. The results on internal and external test data demonstrate that V2C-Long yields cortical surfaces with improved accuracy and consistency compared to previous methods. Finally, this improvement manifests in higher sensitivity to regional cortical atrophy in Alzheimer's disease.	翻訳日:2024-02-28 16:35:12 公開日:2024-02-27
# 共感応答生成のための爆発的感情・情動相関 Exploiting Emotion-Semantic Correlations for Empathetic Response Generation ( http://arxiv.org/abs/2402.17437v1 ) ライセンス: Link先を確認	Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao	(参考訳) 共感応答生成は、対話の言語から話者の感情を理解することによって共感応答を生成することを目的としている。最近の手法では、コミュニケーションの言語で感情的な単語を捉え、ニュアンスされた感情を知覚する静的ベクターとして構築している。しかし言語学的研究により、言語における感情的な単語は動的であり、文法における他の文法的意味を持つ単語と相関があることが示されている。以前の手法ではこれら2つの特徴を見落としており、感情の誤解や重要な意味の無視に繋がる。本稿では,共感的対話生成タスクのための動的感情・感情相関モデル(escm)を提案する。 ESCMは文脈と感情の相互作用を通じて動的感情意味ベクトルを構成する。感情と意味の相関関係を反映した依存木を導入する。動的感情論的ベクトルと係り受け木に基づいて,対話における文脈意味の学習と共感応答の生成においてモデルを導く動的相関グラフ畳み込みネットワークを提案する。 EMPAtheTIC-DIALOGUESデータセットによる実験結果から,ESCMは意味や感情をより正確に理解し,流動的で情報的な共感反応を表現できることがわかった。分析の結果,感情と意味の相関関係は,情緒的知覚や表現に非常に重要である対話において頻繁に用いられていることが示唆された。 Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression.	翻訳日:2024-02-28 16:34:54 公開日:2024-02-27
# 事前学習したコントラスト型EEG-Text Masked Autoencoderからの伝達可能な表現によるEEG-to-Textデコーディングの強化 Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder ( http://arxiv.org/abs/2402.17433v1 ) ライセンス: Link先を確認	Jiaqi Wang, Zhenxi Song, Zhengyu Ma, Xipeng Qiu, Min Zhang, Zhiguo Zhang	(参考訳) 非侵襲的脳波から自然言語を再構築することは、bcis(brain-computer interface)のための言語デコード技術として大きな期待を抱いている。しかし、EEGベースの言語デコーディングはまだ初期段階にあり、次のような技術的な問題に直面している。 1) 脳波の特徴又はテクストシーケンスのモダリティ内自己構築と(脳波とテキストの間の)相互モダリティを効果的に統合できるハイブリッド戦略の欠如 2) 大規模言語モデル(llms)の過小利用によるeegに基づく言語デコーディングの強化。以上の課題に対処するため,コントラスト型脳波テキストマスケドオートエンコーダ(CET-MAE)を提案する。さらに、CET-MAEからのEEGストリームと並行してトレーニング済みのモジュールを活用できるE2T-PTR(Pretrained Transferable Representationsを用いたEEG-to-Text decoding)というフレームワークを開発し、さらにLLM(特にBART)がEEGシーケンスからテキストをデコードできるようにする。一般的なテキスト誘発脳波データベースであるzucoを用いた包括的な実験により、e2t-ptrはrouge-1 f1とbleu-4のスコアをそれぞれ8.34%、32.21%で上回っている。これらの結果はこの分野の大きな進歩を示し、より強力で広範なbciアプリケーションを可能にするフレームワークの可能性を強調している。 Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. Furthermore, we develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations), which leverages pre-trained modules alongside the EEG stream from CET-MAE and further enables an LLM (specifically BART) to decode text from EEG sequences. Comprehensive experiments conducted on the popular text-evoked EEG database, ZuCo, demonstrate the superiority of E2T-PTR, which outperforms the state-of-the-art in ROUGE-1 F1 and BLEU-4 scores by 8.34% and 32.21%, respectively. These results indicate significant advancements in the field and underscores the proposed framework's potential to enable more powerful and widespread BCI applications.	翻訳日:2024-02-28 16:34:32 公開日:2024-02-27
# KanDYベンチマーク: インクリメンタルニューロシンボリック学習とカンディンスキーパターンによる推論 The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns ( http://arxiv.org/abs/2402.17431v1 ) ライセンス: Link先を確認	Luca Salvatore Lorello, Marco Lippi, Stefano Melacci	(参考訳) 人工知能は、パフォーマンスを効果的に測定し、最先端を前進させるために、新しい挑戦とベンチマークを継続的に求めている。本稿では,Kandinskyパターンにインスパイアされたさまざまな学習・推論タスクを生成するためのベンチマークフレームワークであるKanDYを紹介する。複雑さが増し、少ない監督でバイナリ分類タスクのカリキュラムを作成することで、kandyは記号構成性に特化しながら、連続的および半教師付き学習のためのベンチマークを実装することができる。分類規則は、解釈可能な解の分析を可能にするために基底真理でも提供されている。ベンチマーク生成パイプラインと合わせて,研究コミュニティに新たな課題として提案する,より簡単で難しい2つのカリキュラムをリリースします。実験により,最先端のニューラルモデルと純粋にシンボリックなアプローチが,課題のほとんどを解決するのにいかに苦労しているかを検証し,時間とともに訓練された高度なニューロシンボリック手法の適用を求める。 Artificial intelligence is continuously seeking novel challenges and benchmarks to effectively measure performance and to advance the state-of-the-art. In this paper we introduce KANDY, a benchmarking framework that can be used to generate a variety of learning and reasoning tasks inspired by Kandinsky patterns. By creating curricula of binary classification tasks with increasing complexity and with sparse supervisions, KANDY can be used to implement benchmarks for continual and semi-supervised learning, with a specific focus on symbol compositionality. Classification rules are also provided in the ground truth to enable analysis of interpretable solutions. Together with the benchmark generation pipeline, we release two curricula, an easier and a harder one, that we propose as new challenges for the research community. With a thorough experimental evaluation, we show how both state-of-the-art neural models and purely symbolic approaches struggle with solving most of the tasks, thus calling for the application of advanced neuro-symbolic methods trained over time.	翻訳日:2024-02-28 16:34:03 公開日:2024-02-27
# ベクトルマップ構築のための点集合の強化クエリの活用 Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction ( http://arxiv.org/abs/2402.17430v1 ) ライセンス: Link先を確認	Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, and Ningyi Xu	(参考訳) 自律運転では、ハイデフィニション(HD)マップはローカライゼーションと計画において重要な役割を果たす。近年,DeTRのようなフレームワークにおけるエンドツーエンドのオンラインマップ構築を容易にする手法がいくつかある。しかし、クエリメカニズムを探索する潜在的な能力にはほとんど注意が払われていない。本稿では,オンラインベクトル化マップ構築のためのクエリ機能の向上を重視したエンドツーエンドのMapQRを紹介する。マップの構成は基本的にはポイントセット予測タスクであるが、MapQRはポイントクエリではなくインスタンスクエリを使用する。これらのインスタンスクエリは点集合の予測のために分散され、その後最終マッチングのために収集される。このクエリ設計は、scatter-and-gatherクエリと呼ばれ、コンテンツ情報を同じマップ要素で共有し、ポイントクエリにおけるコンテンツ情報の矛盾を回避する。さらに、参照ポイントから埋め込まれた位置情報を追加することにより、事前情報を活用してインスタンスクエリを強化する。 BEVエンコーダの単純かつ効果的な改善とともに、提案したMapQRは、最高の平均精度(mAP)を達成し、nuScenesとArgoverse 2の両方で優れた効率を維持する。さらに、クエリ設計を他のモデルに統合することで、パフォーマンスを大幅に向上できます。コードはhttps://github.com/hxmap/mapqrで入手できる。 In autonomous driving, the high-definition (HD) map plays a crucial role in localization and planning. Recently, several methods have facilitated end-to-end online map construction in DETR-like frameworks. However, little attention has been paid to the potential capabilities of exploring the query mechanism. This paper introduces MapQR, an end-to-end method with an emphasis on enhancing query capabilities for constructing online vectorized maps. Although the map construction is essentially a point set prediction task, MapQR utilizes instance queries rather than point queries. These instance queries are scattered for the prediction of point sets and subsequently gathered for the final matching. This query design, called the scatter-and-gather query, shares content information in the same map element and avoids possible inconsistency of content information in point queries. We further exploit prior information to enhance an instance query by adding positional information embedded from their reference points. Together with a simple and effective improvement of a BEV encoder, the proposed MapQR achieves the best mean average precision (mAP) and maintains good efficiency on both nuScenes and Argoverse 2. In addition, integrating our query design into other models can boost their performance significantly. The code will be available at https://github.com/HXMap/MapQR.	翻訳日:2024-02-28 16:33:43 公開日:2024-02-27
# VastGaussian: 大きなシーン再構築のための3Dガウシアン VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction ( http://arxiv.org/abs/2402.17427v1 ) ライセンス: Link先を確認	Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang	(参考訳) 既存のNeRFベースの大規模なシーン再構成手法は、視覚的品質とレンダリング速度に制限があることが多い。最近の3D Gaussian Splattingは、小規模でオブジェクト中心のシーンでうまく機能するが、大きなシーンにスケールアップすることは、ビデオメモリの制限、長い最適化時間、目立った外観の変化による課題を引き起こす。これらの課題に対処するため,我々は3次元ガウス型スプラッティングによる高品質な再現とリアルタイムレンダリングのための最初の手法である vastgaussian を提案する。本研究では,大規模シーンを複数のセルに分割し,訓練用カメラとポイントクラウドを空域対応の可視性基準で適切に配置するプログレッシブパーティショニング戦略を提案する。これらのセルは並列最適化後に完全なシーンにマージされる。また,レンダリング画像の外観変化を低減させるため,デカップリングされた外観モデリングを最適化プロセスに導入する。提案手法は,既存のNeRF手法より優れ,複数の大規模シーンデータセットの最先端結果を実現し,高速な最適化と高速リアルタイムレンダリングを実現する。 Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering.	翻訳日:2024-02-28 16:33:25 公開日:2024-02-27
# ViTaL:視覚変換器と線形投影を用いた葉画像中の植物病自動識別のための高度なフレームワーク ViTaL: An Advanced Framework for Automated Plant Disease Identification in Leaf Images Using Vision Transformers and Linear Projection For Feature Reduction ( http://arxiv.org/abs/2402.17424v1 ) ライセンス: Link先を確認	Abhishek Sebastian, Annis Fathima A, Pragna R, Madhan Kumar S, Yaswanth Kannan G, Vinay Murali	(参考訳) 本稿では,植物葉画像中の疾患の自動識別のための堅牢な枠組みを提案する。このフレームワークは、病気認識の精度を高めるためにいくつかの重要な段階を組み込んでいる。プリプロセッシング段階では、画像のリサイズにサムネイルリサイズ技術が用いられ、重要な画像詳細の損失を最小限に抑えつつ、計算効率を確保できる。特徴抽出の前に画像データの標準化に正規化手順を適用する。画像解析における最先端のアプローチであるvision transformers上に構築された新しいフレームワークによって、機能抽出が容易になる。さらに、線形射影とブロックワイズ線形射影の追加層を持つフレームワークの代替バージョンも検討されている。この比較分析により、線形射影が特徴抽出および全体モデル性能に与える影響を評価することができる。提案手法の有効性を評価するために,様々な畳み込みニューラルネットワーク(CNN)アーキテクチャを用いて,線形射影が鍵評価指標に与える影響を総合的に評価する。その結果, 提案手法の有効性が示され, トップパフォーマンスモデルのハミング損失は0.054。さらに,病葉を全方位的にスキャンするための新しいハードウェア設計を提案する。ハードウェア実装では、Raspberry Pi Compute Moduleを使用して低メモリ構成に対応し、実用性と手頃さを確保する。この革新的なハードウェアソリューションは、提案する自動疾患識別システムの全体的な実現可能性とアクセシビリティを高める。この研究は、植物病の早期発見と管理のための貴重な洞察とツールを提供することで、農業の分野で貢献し、収穫量の向上と食料安全保障の向上に繋がる可能性がある。 Our paper introduces a robust framework for the automated identification of diseases in plant leaf images. The framework incorporates several key stages to enhance disease recognition accuracy. In the pre-processing phase, a thumbnail resizing technique is employed to resize images, minimizing the loss of critical image details while ensuring computational efficiency. Normalization procedures are applied to standardize image data before feature extraction. Feature extraction is facilitated through a novel framework built upon Vision Transformers, a state-of-the-art approach in image analysis. Additionally, alternative versions of the framework with an added layer of linear projection and blockwise linear projections are explored. This comparative analysis allows for the evaluation of the impact of linear projection on feature extraction and overall model performance. To assess the effectiveness of the proposed framework, various Convolutional Neural Network (CNN) architectures are utilized, enabling a com- prehensive evaluation of linear projection's influence on key evaluation metrics. The findings demonstrate the efficacy of the proposed framework, with the top- performing model achieving a Hamming loss of 0.054. Furthermore, we propose a novel hardware design specifically tailored for scanning diseased leaves in an omnidirectional fashion. The hardware implementation utilizes a Raspberry Pi Compute Module to address low-memory configurations, ensuring practicality and affordability. This innovative hardware solution enhances the overall feasibility and accessibility of the proposed automated disease identification system. This research contributes to the field of agriculture by offering valuable insights and tools for the early detection and management of plant diseases, potentially leading to improved crop yields and enhanced food security.	翻訳日:2024-02-28 16:33:04 公開日:2024-02-27
# 強化インコンテキストブラックボックス最適化 Reinforced In-Context Black-Box Optimization ( http://arxiv.org/abs/2402.17423v1 ) ライセンス: Link先を確認	Lei Song, Chenxiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian	(参考訳) Black-Box Optimization (BBO) は、科学と工学の分野で成功している。近年、bboアルゴリズムの特定のコンポーネントをメタ学習することで最適化をスピードアップし、退屈な手作りのヒューリスティックを取り除こうという関心が高まっている。拡張として、データからアルゴリズム全体を学習するには、専門家の少ない労力が必要であり、最も柔軟性を提供することができる。本稿では,bboアルゴリズムをオフラインデータからエンドツーエンドで強化学習する手法であるribboを提案する。 RIBBOは、複数の行動アルゴリズムとタスクによって生成された最適化履歴を学習するために表現的シーケンスモデルを使用し、大規模モデルのコンテキスト内学習能力を活用してタスク情報を抽出し、それに応じて決定を行う。提案手法の中心は,蓄積的後悔に基づくアルゴリズムの性能を表現するために設計されたreste-to-goトークンを用いた最適化履歴の強化である。 RIBBOは,BBOB関数やハイパーパラメータ最適化,ロボット制御問題など,さまざまな問題に対して,ユーザ希望の後悔を満足するクエリポイントのシーケンスを自動的に生成する。 Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most flexibility. In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks, leveraging the in-context learning ability of large models to extract task information and make decisions accordingly. Central to our method is to augment the optimization histories with regret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret of the histories. The integration of regret-to-go tokens enables RIBBO to automatically generate sequences of query points that satisfy the user-desired regret, which is verified by its universally good empirical performance on diverse problems, including BBOB functions, hyper-parameter optimization and robot control problems.	翻訳日:2024-02-28 16:32:39 公開日:2024-02-27
# PANDAS: プロトタイプベースの新しいクラス発見と検出 PANDAS: Prototype-based Novel Class Discovery and Detection ( http://arxiv.org/abs/2402.17420v1 ) ライセンス: Link先を確認	Tyler L. Hayes, C\'esar R. de Souza, Namil Kim, Jiwon Kim, Riccardo Volpi, Diane Larlus	(参考訳) オブジェクト検出器は通常、固定されたクラスのセットで一度、あるいはすべてトレーニングされる。しかし、このクローズドワールドの仮定は実際には非現実的であり、検出器が野生に展開された後に必然的に新しいクラスが出現する。この研究では、基礎クラスのセットのために訓練された検出器を拡張する方法について検討する。一新規の授業の有無を突き止めること、及び ii) 新たに発見されたクラスをベースクラスとともに検出できるように、自動的にレパートリーを豊かにすること。本研究では,新しいクラス発見・検出手法であるPANDASを提案する。ラベルのないデータから新しいクラスを表すクラスタを発見し、プロトタイプで古いクラスと新しいクラスを表現する。推論中、距離ベースの分類器はこれらのプロトタイプを使用して検出された各オブジェクトインスタンスにラベルを割り当てる。我々の方法の単純さは広く適用できる。 VOC 2012 と COCO-to-LVIS ベンチマークにおける PANDAS の有効性を実験的に検証した。計算量的に手頃な価格ながら、このタスクの最先端技術に対して有利に機能する。 Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its repertoire to be able to detect those newly discovered classes together with the base ones. We propose PANDAS, a method for novel class discovery and detection. It discovers clusters representing novel classes from unlabeled data, and represents old and new classes with prototypes. During inference, a distance-based classifier uses these prototypes to assign a label to each detected object instance. The simplicity of our method makes it widely applicable. We experimentally demonstrate the effectiveness of PANDAS on the VOC 2012 and COCO-to-LVIS benchmarks. It performs favorably against the state of the art for this task while being computationally more affordable.	翻訳日:2024-02-28 16:32:18 公開日:2024-02-27
# メモリ効果検出のための距離とエントロピー識別性定量化器の比較 Comparison of Distances and Entropic Distinguishability Quantifiers for the Detection of Memory Effects ( http://arxiv.org/abs/2402.17419v1 ) ライセンス: Link先を確認	Bassano Vacchini	(参考訳) 本稿では,量子状態の微分可能性量化器に基づくメモリ効果記述のためのフレームワークについて考察する。提案手法を簡潔に提示した後,デコヒーレンスを行う2レベルシステムの還元ダイナミクスの特性評価において,異なる量子化器の性能を考慮した検証を行う。本研究では,これらの量化器の挙動が,環境温度や結合強度といったモデルの物理的特性に依存するかを検討した。異なる量化器の性能は同じ物理情報を伝達するが、異なる感度を持つため、アプローチの堅牢性を支持する。 We consider a recently introduced framework for the description of memory effects based on quantum state distinguishability quantifiers, in which entropic quantifiers can be included. After briefly presenting the approach, we validate it considering the performance of different quantifiers in the characterization of the reduced dynamics of a two-level system undergoing decoherence. We investigate the different behavior of these quantifiers in the dependence on physical features of the model, such as environmental temperature and coupling strength. It appears that the performance of the different quantifiers conveys the same physical information, though with different sensitivities, thus supporting robustness of the approach.	翻訳日:2024-02-28 16:32:03 公開日:2024-02-27
# CARZero: ゼロショット分類のためのクロスアテンションアライメント CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification ( http://arxiv.org/abs/2402.17417v1 ) ライセンス: Link先を確認	Haoran Lai and Qingsong Yao and Zihang Jiang and Rongsheng Wang and Zhiyang He and Xiaodong Tao and S. Kevin Zhou	(参考訳) 医用領域におけるゼロショット学習の進歩は、画像テキストアライメントを中心に、大規模画像テキストペア上で事前訓練されたモデルを使用することによって前進してきた。しかし、既存の方法は主にアライメントのコサイン類似性に依存しており、医療画像とレポートの複雑な関係を完全に捉えることはできない。このギャップに対処するために,ラジオロジーゼロショット分類のためのクロスアライメントアライメント(carzero)と呼ばれる新しいアプローチを提案する。本手法は,画像の処理と特徴の報告にクロスアテンション機構を革新的に活用し,医用意味論における複雑な関係をより正確に反映した類似性表現を創出する。この表現は線形に投影され、画像-テキスト類似性行列を形成する。さらに、ゼロショット学習におけるプロンプト選択の重要な役割を認識し、carzeroは大きな言語モデルに基づくプロンプトアライメント戦略を取り入れている。この戦略は多種多様な診断表現を訓練と推論のフェーズを統一した形式に標準化し、手動プロンプト設計の課題を克服する。本手法は単純だが有効であり, 胸部X線写真診断5セットのゼロショット分類において, 稀な疾患の長期分布を示すデータセットの顕著な結果を含む, 最先端の成績を示す。この成果は、医用画像とレポートの複雑な関係を効果的に扱う新しい画像テキストアライメント戦略によるものである。 The advancement of Zero-Shot Learning in the medical domain has been driven forward by using pre-trained models on large-scale image-text pairs, focusing on image-text alignment. However, existing methods primarily rely on cosine similarity for alignment, which may not fully capture the complex relationship between medical images and reports. To address this gap, we introduce a novel approach called Cross-Attention Alignment for Radiology Zero-Shot Classification (CARZero). Our approach innovatively leverages cross-attention mechanisms to process image and report features, creating a Similarity Representation that more accurately reflects the intricate relationships in medical semantics. This representation is then linearly projected to form an image-text similarity matrix for cross-modality alignment. Additionally, recognizing the pivotal role of prompt selection in zero-shot learning, CARZero incorporates a Large Language Model-based prompt alignment strategy. This strategy standardizes diverse diagnostic expressions into a unified format for both training and inference phases, overcoming the challenges of manual prompt design. Our approach is simple yet effective, demonstrating state-of-the-art performance in zero-shot classification on five official chest radiograph diagnostic test sets, including remarkable results on datasets with long-tail distributions of rare diseases. This achievement is attributed to our new image-text alignment strategy, which effectively addresses the complex relationship between medical images and reports.	翻訳日:2024-02-28 16:31:53 公開日:2024-02-27
# MGE: トレーニング不要で効率的なモデル生成と拡張スキーム MGE: A Training-Free and Efficient Model Generation and Enhancement Scheme ( http://arxiv.org/abs/2402.17486v1 ) ライセンス: Link先を確認	Xuan Wang, Zeshan Pang, Yuliang Lu, Xuehu Yan	(参考訳) ディープラーニングモデルの研究の基盤を提供するには、モデルプールの構築が不可欠である。本稿では,MGE(Training-free and Efficient Model Generation and Enhancement Scheme)を提案する。このスキームは、主にモデル生成プロセスにおいてモデルパラメータの分布とモデル性能の2つの側面を考察する。実験の結果、生成したモデルは通常の訓練によって得られたモデルに匹敵し、場合によっては優れていることが示された。さらに、モデル生成に費やされる時間は、通常のモデルトレーニングに必要な時間のわずか1\%に過ぎません。さらに重要なのは、Evolution-MGEの強化により、生成されたモデルは、数ショットタスクで競合的な一般化能力を示すことである。そして、生成されたモデルの行動的相違性は、敵防衛の可能性を秘めている。 To provide a foundation for the research of deep learning models, the construction of model pool is an essential step. This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE). This scheme primarily considers two aspects during the model generation process: the distribution of model parameters and model performance. Experiments result shows that generated models are comparable to models obtained through normal training, and even superior in some cases. Moreover, the time consumed in generating models accounts for only 1\% of the time required for normal model training. More importantly, with the enhancement of Evolution-MGE, generated models exhibits competitive generalization ability in few-shot tasks. And the behavioral dissimilarity of generated models has the potential of adversarial defense.	翻訳日:2024-02-28 16:26:54 公開日:2024-02-27
# EMO:Emote Portrait Alive - 弱み条件下でのAudio2 Video Diffusionモデルによる画像生成 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions ( http://arxiv.org/abs/2402.17485v1 ) ライセンス: Link先を確認	Linrui Tian, Qi Wang, Bang Zhang, Liefeng Bo	(参考訳) 本研究では,音声キューと顔の動きの動的・ニュアンスな関係に着目し,対話型ヘッドビデオ生成におけるリアリズムと表現力の向上に挑戦する。人間の表情の全スペクトルを捉えるのに失敗する伝統的な技法の限界と、個々の顔のスタイルのユニークさを識別する。これらの課題に対処するために,中間的な3Dモデルや顔のランドマークの必要性を回避し,直接音声合成アプローチを利用する新しいフレームワークであるEMOを提案する。本手法は,映像全体のフレームのシームレスな遷移と一貫したアイデンティティ保存を保証し,高い表現力とライフスタイルのアニメーションを実現する。実験結果から,EMOは説得力のあるビデオだけでなく,様々なスタイルの歌唱ビデオを生成することが可能であり,表現性やリアリズムの点で既存の最先端の方法論を著しく上回っている。 In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.	翻訳日:2024-02-28 16:26:44 公開日:2024-02-27
# AlignMiF:LiDAR-Camera結合合成のための幾何配向多モードインピーダンス場 AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis ( http://arxiv.org/abs/2402.17483v1 ) ライセンス: Link先を確認	Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang	(参考訳) ニューラル暗黙的場は、新しいビュー合成におけるデファクトスタンダードである。近年,一つのフィールド内で複数のモダリティを融合し,異なるモダリティから暗黙的な特徴を共有する手法が提案されている。しかし、これらのモダリティは、LiDARのような1つのモダリティを最適化することは、カメラ性能のような他のモダリティに悪影響を及ぼし、その逆も起こり得る。本研究では,lidar-camera関節合成のマルチモーダル暗黙的場に関する包括的解析を行い,異なるセンサの不一致が問題点であることを明らかにした。さらに,2つのモジュール(GAA)と共有幾何初期化(SGI)という,幾何学的に整合した多モード暗黙フィールドであるAlignMiFを紹介する。これらのモジュールは、異なるモード間で粗い幾何学を効果的に整列させ、LiDARとカメラデータの融合プロセスを大幅に強化する。様々なデータセットやシーンにわたる広範な実験を通じて、統一されたニューラルネットワーク内でのLiDARとカメラモダリティの相互作用を改善するためのアプローチの有効性を実証する。具体的には,最近の暗黙融合法(KITTI-360およびWaymoデータセットにおける+2.01および+3.11画像PSNR)に対する顕著な改善を実現し,各データセットにおけるLiDARチャンファー距離の13.8%と14.2%の低減)を一貫して上回っている。 Neural implicit fields have been a de facto standard in novel view synthesis. Recently, there exist some methods exploring fusing multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance. However, these modalities often exhibit misaligned behaviors: optimizing for one modality, such as LiDAR, can adversely affect another, like camera performance, and vice versa. In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors. Furthermore, we introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI). These modules effectively align the coarse geometry across different modalities, significantly enhancing the fusion process between LiDAR and camera data. Through extensive experiments across various datasets and scenes, we demonstrate the effectiveness of our approach in facilitating better interaction between LiDAR and camera modalities within a unified neural field. Specifically, our proposed AlignMiF, achieves remarkable improvement over recent implicit fusion methods (+2.01 and +3.11 image PSNR on the KITTI-360 and Waymo datasets) and consistently surpasses single modality performance (13.8% and 14.2% reduction in LiDAR Chamfer Distance on the respective datasets).	翻訳日:2024-02-28 16:26:25 公開日:2024-02-27
# Raw Ultrasound Imaging を用いた子音の音声セグメントの自動分類 Automated Classification of Phonetic Segments in Child Speech Using Raw Ultrasound Imaging ( http://arxiv.org/abs/2402.17482v1 ) ライセンス: Link先を確認	Saja Al Ani, Joanne Cleland, Ahmed Zoha	(参考訳) 音声障害 (SSD) は, 音声の持続的障害として定義され, 音声の明瞭度が低下し, 言語コミュニケーションが阻害される。早期のssd児の認識と介入と、治療のための言語療法士(slts)へのタイムリーな紹介が重要である。音声障害の自動検出は, 集団を検査・スクリーニングする効率的な手法であると考えられる。本研究は、超音波舌画像(UTI)とディープラーニングモデルを統合する技術ソリューションを提案し、幼児期におけるSSDの自動診断の進歩に焦点を当てた。導入されたFusionNetモデルは、UTIデータを抽出したテクスチャ特徴と組み合わせてUTIを分類する。本研究の目的は,UTI分析の精度と効率を高めることであり,特にSSDに関連する音声の分類である。本研究は、FusionNetアプローチと標準ディープラーニング手法を比較し、UTI分類におけるFusionNetモデルの優れた改善結果と、音声治療クリニックにおけるUTI分類の改善におけるマルチラーニングの可能性を強調した。 Speech sound disorder (SSD) is defined as a persistent impairment in speech sound production leading to reduced speech intelligibility and hindered verbal communication. Early recognition and intervention of children with SSD and timely referral to speech and language therapists (SLTs) for treatment are crucial. Automated detection of speech impairment is regarded as an efficient method for examining and screening large populations. This study focuses on advancing the automatic diagnosis of SSD in early childhood by proposing a technical solution that integrates ultrasound tongue imaging (UTI) with deep-learning models. The introduced FusionNet model combines UTI data with the extracted texture features to classify UTI. The overarching aim is to elevate the accuracy and efficiency of UTI analysis, particularly for classifying speech sounds associated with SSD. This study compared the FusionNet approach with standard deep-learning methodologies, highlighting the excellent improvement results of the FusionNet model in UTI classification and the potential of multi-learning in improving UTI classification in speech therapy clinics.	翻訳日:2024-02-28 16:25:54 公開日:2024-02-27
# GPT-4はプロパガンダを同定できるか? ニュース記事におけるプロパガンダスパンのアノテーションと検出 Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles ( http://arxiv.org/abs/2402.17478v1 ) ライセンス: Link先を確認	Maram Hasanain, Fatema Ahmed, Firoj Alam	(参考訳) プロパガンダの使用は主流やソーシャルメディアに急増し、ユーザーを操ったり誤解させたりすることを目指している。テキスト、ビジュアル、マルチモーダルコンテンツにおけるプロパガンダ技術を自動的に検出する取り組みが増加しているが、そのほとんどは主に英語コンテンツに焦点を当てている。中から低リソース言語をターゲットとする最近の取り組みの大部分は、比較的小さな注釈付きデータセットを生成しており、分布が歪んでいて、洗練されたプロパガンダ検出モデルの開発に挑戦している。この課題に対処するため,本稿では,これまでで最大のプロパガンダデータセットであるArProを,23のプロパガンダ手法の分類基準に従って,テキストスパンレベルにラベル付けした新聞記事から8K節からなる。さらに,本研究は,GPT-4を用いた大規模言語モデル(LLM)の性能をテキストから微細なプロパガンダ検出に利用するための最初の試みである。その結果, GPT-4の性能低下は, 段落を単にプロパガンダ的か否かの分類から, プロパガンダ技術の検出やテキストでの表現のきめ細かいタスクへと移行することが明らかとなった。異なる分類粒度でプロパガンダ検出のためのデータセットに微調整されたモデルと比較すると、gpt-4はまだずっと遅れている。最後に,他の6つの言語からなるデータセット上でGPT-4を評価し,そのモデルが言語間のタスクに苦しむことを示唆した。私たちのデータセットとリソースはコミュニティにリリースされます。 The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text. Compared to models fine-tuned on the dataset for propaganda detection at different classification granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a dataset consisting of six other languages for span detection, and results suggest that the model struggles with the task across languages. Our dataset and resources will be released to the community.	翻訳日:2024-02-28 16:25:37 公開日:2024-02-27
# グローバルおよびローカルリレーショナルインタラクションによる不正検出 Fraud Detection with Binding Global and Local Relational Interaction ( http://arxiv.org/abs/2402.17472v1 ) ライセンス: Link先を確認	Haolin Li, Shuyang Jiang, Lifeng Zhang, Siyuan Du, Guangnan Ye, Hongfeng Chai	(参考訳) グラフニューラルネットワークは,ノード間の相互作用を符号化し,総合的な視点で特徴を集約することで,不正検出に有効であることが証明されている。近年,配列符号化能力の優れたTransformerネットワークは,文学における他のGNN手法よりも優れている。しかし、GNNベースのネットワークとTransformerベースのネットワークはグラフ全体の一視点のみをエンコードし、GNNはグローバル機能をエンコードし、Transformerネットワークはローカルをエンコードする。さらに、以前の研究では、異種グラフのグローバルインタラクション特徴を別々のネットワークでエンコーディングすることを無視していたため、サブオプティマイズ性能が向上した。本稿では,対象ノードに局所的特徴と大域的特徴を同時に組み込む,relation-aware gnn with transformer (ragformer) という新しいフレームワークを提案する。単純かつ効果的なネットワークは、各トランスフォーマー層に相互結合アグリゲーション層が続く修正gagaモジュールを適用し、異なる関係をまたいだ局所埋め込みとノード間相互作用を符号化する。トランスベースネットワークとは別に,グローバル埋め込みを学習するための関係認識型gnnモジュールも導入し,後にアテンション融合モジュールとスキップ接続によってローカル埋め込みにマージする。 2つの人気のあるパブリックデータセットと産業データセットに関する広範な実験は、ragformerが最先端のパフォーマンスを達成していることを示している。実質的な分析実験は、ragformerの各サブモジュールの有効性と、小規模データと低ハイパーパラメータ感度の活用におけるその高い効率を検証する。 Graph Neural Network has been proved to be effective for fraud detection for its capability to encode node interaction and aggregate features in a holistic view. Recently, Transformer network with great sequence encoding ability, has also outperformed other GNN-based methods in literatures. However, both GNN-based and Transformer-based networks only encode one perspective of the whole graph, while GNN encodes global features and Transformer network encodes local ones. Furthermore, previous works ignored encoding global interaction features of the heterogeneous graph with separate networks, thus leading to suboptimal performance. In this work, we present a novel framework called Relation-Aware GNN with transFormer (RAGFormer) which simultaneously embeds local and global features into a target node. The simple yet effective network applies a modified GAGA module where each transformer layer is followed by a cross-relation aggregation layer, to encode local embeddings and node interactions across different relations. Apart from the Transformer-based network, we further introduce a Relation-Aware GNN module to learn global embeddings, which is later merged into the local embeddings by an attention fusion module and a skip connection. Extensive experiments on two popular public datasets and an industrial dataset demonstrate that RAGFormer achieves the state-of-the-art performance. Substantial analysis experiments validate the effectiveness of each submodule of RAGFormer and its high efficiency in utilizing small-scale data and low hyper-parameter sensitivity.	翻訳日:2024-02-28 16:25:09 公開日:2024-02-27
# JPEG-AI標準化におけるビット分布研究と空間品質マップの実装 Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization ( http://arxiv.org/abs/2402.17470v1 ) ライセンス: Link先を確認	Panqi Jia, Jue Mao, Esin Koyuncu, A. Burakhan Koyuncu, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Elena Alshina, Andre Kaup	(参考訳) 現在、ニューラルネットワークベースの画像圧縮コーデックには高い需要がある。これらのコーデックは非線形変換を用いてコンパクトなビット表現を作成し、古典的なフレームワークで使用される手作りの変換と比較してデバイス上でのコーディング速度を高速化する。科学と工業のコミュニティはこれらの特性に非常に興味を持ち、JPEG-AIの標準化努力に繋がる。 JPEG-AI検証モデルがリリースされ、現在標準化に向けて開発中である。ニューラルネットワークを利用することで、従来のコーデックvvc intraを、ベース操作点での10%以上のbdレートで上回ることができる。この成功は、一定の品質ポイントで生成されるvvc intraのアンカーとは対照的に、空間領域におけるフレキシブルなビット分布が特徴である。しかし,vvcイントラは,様々なブロックサイズの実装により,より適応性の高いビット分布構造を示すことが明らかになった。本研究では,JPEG-AI検証モデルのビット分布を最適化し,視覚的品質を向上させるための空間ビット割り当て手法を提案する。さらに、VVCビット分散戦略を適用することにより、JPEG-AI検証モードの客観的性能をさらに向上し、PSNR-Yでは最大0.45dBとなる。 Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y.	翻訳日:2024-02-28 16:24:42 公開日:2024-02-27
# シンボリック音楽生成と情報検索のための自然言語処理手法に関する調査 Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey ( http://arxiv.org/abs/2402.17467v1 ) ライセンス: Link先を確認	Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller and Dorien Herremans	(参考訳) 自然言語処理(NLP)におけるブレークスルー以来、トランスフォーマーモデルのいくつかの適応が様々な領域で開発されてきた。この傾向は音楽データ処理の研究を含む音楽情報検索(MIR)の分野にも及んでいる。しかし, シンボリックな音楽データにNLPツールを活用する実践は, MIRにおいて新しいものではない。音楽は、テキストや音楽の逐次表現などいくつかの類似点を共有しているため、しばしば言語と比較される。これらの類似は、MIRやNLPでも同様のタスクを通して反映される。本調査では,2つの軸によるシンボリック音楽生成と情報検索に応用したNLP手法について検討する。まず,自然言語の逐次表現から適応した記号音楽の表現について概説する。このような表現は、象徴音楽の特異性を考慮して設計されている。これらの表現はモデルによって処理される。このようなモデルは、おそらく元々テキスト用に開発され、象徴音楽に適応したもので、様々なタスクで訓練されている。これらのモデル、特にディープラーニングモデルについて、さまざまなプリズムを通じて説明し、音楽特化メカニズムを強調する。最終的に、シンボリック音楽データに対するNLPツールの有効利用に関する議論を行う。これには、NLPの手法に関する技術的な問題と、テキストと音楽の根本的な違いが含まれており、NLPツールをより効果的に記号的MIRに適応させるためのいくつかの扉を開く可能性がある。 Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music. These analogies are also reflected through similar tasks in MIR and NLP. This survey reviews NLP methods applied to symbolic music generation and information retrieval studies following two axes. We first propose an overview of representations of symbolic music adapted from natural language sequential representations. Such representations are designed by considering the specificities of symbolic music. These representations are then processed by models. Such models, possibly originally developed for text and adapted for symbolic music, are trained on various tasks. We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms. We finally present a discussion surrounding the effective use of NLP tools for symbolic music data. This includes technical issues regarding NLP methods and fundamental differences between text and music, which may open several doors for further research into more effectively adapting NLP tools to symbolic MIR.	翻訳日:2024-02-28 16:24:20 公開日:2024-02-27
# モデルX線:決定境界によるバックドアモデルの検出 Model X-ray:Detect Backdoored Models via Decision Boundary ( http://arxiv.org/abs/2402.17465v1 ) ライセンス: Link先を確認	Yanghao Su, Jie Zhang, Ting Xu, Tianwei Zhang, Weiming Zhang, Nenghai Yu	(参考訳) ディープニューラルネットワーク(DNN)は、さまざまな産業に革命をもたらし、MLaaS(Machine Learning as a Service)の台頭につながった。このパラダイムでは、よく訓練されたモデルは一般的にAPIを通じてデプロイされます。しかし、DNNはバックドア攻撃の影響を受けやすく、アプリケーションに重大なリスクをもたらす。この脆弱性は、使用前にAPIが悪用されているかどうかを確認する方法を必要とする。多くのバックドア検出方法が開発されているが、ディフェンダーが攻撃の詳細、モデルAPIからのソフトな予測、さらにはモデルパラメータの知識といった特定の情報にアクセスでき、MLaaSシナリオでの実用性を制限するという仮定の下で運用されることが多い。そこで本論文では, バックドアモデルの決定境界は, クリーンモデルよりも密接度が高いという興味深い観察結果から始める。同時に、1つのラベルしか感染しない場合、攻撃されたラベルが領域の大部分を占めることになる。そこで本研究では,mlaasにおける新しいバックドア検出手法であるmodel x-rayを提案する。 Model X-rayは、ターゲットAPIがバックドアアタックに感染しているかどうかを識別するだけでなく、オールツーワンアタック戦略の下で攻撃対象ラベルを決定する。重要なことは、攻撃に関する仮定やモデルのトレーニング詳細に関する事前知識に関係なく、クリーンな入力のハード予測によってのみこれを達成します。大規模な実験により、モデルX線はさまざまなバックドア攻撃、データセット、アーキテクチャにわたってMLaaSに有効であることが示された。 Deep neural networks (DNNs) have revolutionized various industries, leading to the rise of Machine Learning as a Service (MLaaS). In this paradigm, well-trained models are typically deployed through APIs. However, DNNs are susceptible to backdoor attacks, which pose significant risks to their applications. This vulnerability necessitates a method for users to ascertain whether an API is compromised before usage. Although many backdoor detection methods have been developed, they often operate under the assumption that the defender has access to specific information such as details of the attack, soft predictions from the model API, and even the knowledge of the model parameters, limiting their practicality in MLaaS scenarios. To address it, in this paper, we begin by presenting an intriguing observation: the decision boundary of the backdoored model exhibits a greater degree of closeness than that of the clean model. Simultaneously, if only one single label is infected, a larger portion of the regions will be dominated by the attacked label. Building upon this observation, we propose Model X-ray, a novel backdoor detection approach for MLaaS through the analysis of decision boundaries. Model X-ray can not only identify whether the target API is infected by backdoor attacks but also determine the target attacked label under the all-to-one attack strategy. Importantly, it accomplishes this solely by the hard prediction of clean inputs, regardless of any assumptions about attacks and prior knowledge of the training details of the model. Extensive experiments demonstrated that Model X-ray can be effective for MLaaS across diverse backdoor attacks, datasets, and architectures.	翻訳日:2024-02-28 16:24:00 公開日:2024-02-27
# 部分Whole-Hierarchy Message Passingによる3次元部品組み立て Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing ( http://arxiv.org/abs/2402.17464v1 ) ライセンス: Link先を確認	Bi'an Du, Xiang Gao, Wei Hu, Renjie Liao	(参考訳) 生成的3d部品アセンブリは、部品関係を理解し、リアルな3d形状を組み立てるための6dofポーズを予測する。先行研究は、しばしば個々の部分の幾何学に焦点を合わせ、対象の全体階層を無視する。 2つの重要な観察を活用。 1)スーパーパートポーズはパートポーズに関する強いヒントを与え、 2) より少ないスーパーパーツによりスーパーパーツのポーズを予測しやすく, 効率的な3次元部品組み立てのための部分階層型メッセージパッシングネットワークを提案する。まず、意味ラベルを使わずに幾何学的に類似した部分をグループ化する。次に,スーパーパートエンコーダが入力部に基づいて潜在スーパーパートポーズを予測した,パートwhole階層エンコーダを用いる。その後、潜在ポーズを用いて点雲を変換し、超部分情報を集約する部分エンコーダに供給し、部分関係を推論して全ての部分ポーズを予測する。トレーニングでは、対地的なポーズのみが必要となる。推論中、予測された超部分の潜在ポーズは解釈可能性を高める。 partnetデータセットにおける実験結果から,本手法は最先端性能と接続精度を実現し,解釈可能な階層的部品アセンブリを実現する。 Generative 3D part assembly involves understanding part relationships and predicting their 6-DoF poses for assembling a realistic 3D shape. Prior work often focus on the geometry of individual parts, neglecting part-whole hierarchies of objects. Leveraging two key observations: 1) super-part poses provide strong hints about part poses, and 2) predicting super-part poses is easier due to fewer superparts, we propose a part-whole-hierarchy message passing network for efficient 3D part assembly. We first introduce super-parts by grouping geometrically similar parts without any semantic labels. Then we employ a part-whole hierarchical encoder, wherein a super-part encoder predicts latent super-part poses based on input parts. Subsequently, we transform the point cloud using the latent poses, feeding it to the part encoder for aggregating super-part information and reasoning about part relationships to predict all part poses. In training, only ground-truth part poses are required. During inference, the predicted latent poses of super-parts enhance interpretability. Experimental results on the PartNet dataset show that our method achieves state-of-the-art performance in part and connectivity accuracy and enables an interpretable hierarchical part assembly.	翻訳日:2024-02-28 16:23:33 公開日:2024-02-27
# 大規模言語モデルの学習自由長期スケーリング Training-Free Long-Context Scaling of Large Language Models ( http://arxiv.org/abs/2402.17463v1 ) ライセンス: Link先を確認	Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong	(参考訳) 大規模言語モデル(LLM)によるコヒーレントテキストの処理と生成能力は,入力トークンの数が事前学習期間を超えると著しく低下する。大規模モデルをより長いシーケンスで微調整するコストのかかるオーバーヘッドを考えると、llama2 70bは1万以上のトークンのコンテキストウィンドウを継続的にトレーニングすることなくサポートできるデュアルチャンクアテンション(dca)を提案します。長いシーケンスの注意計算をチャンクベースのモジュールに分解することで、DCAは同じチャンク(Intra-Chunk)と異なるチャンク(Inter-Chunk)内のトークンの相対的な位置情報を効果的にキャプチャし、Flash Attentionとシームレスに統合する。 DCAは、その印象的な補間能力に加えて、微調整されたモデルに匹敵する、あるいはそれ以上に優れた、実用的な長期コンテキストタスクのパフォーマンスを達成する。プロプライエタリモデルと比較すると,トレーニングフリーの70Bモデルでは,gpt-3.5-16kのパフォーマンスの94%を達成しています。この作業で使用されるすべてのコードとデータは、 \url{https://github.com/HKUNLP/ChunkLlama} でリリースされる。 The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By decomposing the attention computation for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens within the same chunk (Intra-Chunk) and across distinct chunks (Inter-Chunk), as well as integrates seamlessly with Flash Attention. In addition to its impressive extrapolation capability, DCA achieves performance on practical long-context tasks that is comparable to or even better than that of finetuned models. When compared with proprietary models, our training-free 70B model attains 94% of the performance of gpt-3.5-16k, indicating it is a viable open-source alternative. All code and data used in this work are released at \url{https://github.com/HKUNLP/ChunkLlama}.	翻訳日:2024-02-28 16:23:12 公開日:2024-02-27
# フィードバック制御による機械的スクイージングの促進 Enhancement of mechanical squeezing via feedback control ( http://arxiv.org/abs/2402.17460v1 ) ライセンス: Link先を確認	Chao Meng and Warwick P. Bowen	(参考訳) 連続位置測定とフィードバック制御を組み合わせた非古典的機械状態の生成について検討する。フィードバックによって誘発されるスプリング軟化は、位置絞りを大きく促進する。逆に, 純位置測定においても, スプリング硬化により運動量スクイーズが可能となることがわかった。スクイージングの強化に加えて,フィードバックは背景機械モードによる劣化を緩和することを示した。これにより、室温での非古典的機械的状態の測定に基づく調製に対する障壁が著しく低下する。 We explore the generation of nonclassical mechanical states by combining continuous position measurement and feedback control. We find that feedback-induced spring softening can greatly enhance position squeezing. Conversely, even with a pure position measurement, we find that spring hardening can enable momentum squeezing. Beyond enhanced squeezing, we show that feedback also mitigates degradation introduced by background mechanical modes. Together, this significantly lowers the barrier to measurement-based preparation of nonclassical mechanical states at room temperature.	翻訳日:2024-02-28 16:22:50 公開日:2024-02-27
# 学習率の伝達はなぜか? ディープラーニングのための最適化とスケーリングの限界 Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning ( http://arxiv.org/abs/2402.17457v1 ) ライセンス: Link先を確認	Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto	(参考訳) 近年、ニューラルネットワークの幅と深さが、いわゆるリッチな特徴学習限界(\mu$Pとその深さ拡張)に向かってスケールされている場合、学習率などのハイパーパラメータは、小さなモデルから非常に大きなモデルへの転送を示すため、ハイパーパラメータチューニングのコストが削減されるという証拠が増えている。最適化の観点からは、この現象は極めて異なるモデルサイズで、損失ランドスケープが顕著に一致していることを示すため、ファジィである。本研究は,学習速度伝達が$\mu$Pとその深さ延長の下で,トレーニング損失ヘッセン(すなわち鋭さ)の最大固有値が,ネットワークの幅と深さから持続的なトレーニング期間に大きく依存しているという事実に起因する,実証的な証拠を見出した。一方,ニューラル・タンジェント・カーネル(ntk)環境下では,シャープネスは異なるスケールで非常に異なるダイナミクスを示し,学習速度の伝達を阻害することを示した。しかし、なぜシャープネスのダイナミクスに違いが生じるのか? ヘッセン行列とNTK行列のスペクトルの接続を通して、原因は特徴学習の存在($\mu$P)や進行的不在(NTK体制)にあると論じ、それがNTKの異なる進化をもたらし、鋭さをもたらす。ベンチマークビジョンデータセットでトレーニングされたresnetsやvision transformersからwikitextでトレーニングされたtransformersベースの言語モデルまで、幅広いデータセットとアーキテクチャをカバーする。 Recently, there has been growing evidence that if the width and depth of a neural network are scaled toward the so-called rich feature learning limit ($\mu$P and its depth extension), then some hyperparameters - such as the learning rate - exhibit transfer from small to very large models, thus reducing the cost of hyperparameter tuning. From an optimization perspective, this phenomenon is puzzling, as it implies that the loss landscape is remarkably consistent across very different model sizes. In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i.e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time. On the other hand, we show that under the neural tangent kernel (NTK) regime, the sharpness exhibits very different dynamics at different scales, thus preventing learning rate transfer. But what causes these differences in the sharpness dynamics? Through a connection between the spectra of the Hessian and the NTK matrix, we argue that the cause lies in the presence (for $\mu$P) or progressive absence (for the NTK regime) of feature learning, which results in a different evolution of the NTK, and thus of the sharpness. We corroborate our claims with a substantial suite of experiments, covering a wide range of datasets and architectures: from ResNets and Vision Transformers trained on benchmark vision datasets to Transformers-based language models trained on WikiText	翻訳日:2024-02-28 16:22:43 公開日:2024-02-27
# a piece of theatre: 思春期のサイバーいじめ教育を支援するために教師がllmチャットボットをどのように設計するかを調査 A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education ( http://arxiv.org/abs/2402.17456v1 ) ライセンス: Link先を確認	Michael A. Hedderich, Natalie N. Bazarova, Wenting Zou, Ryun Shim, Xinda Ma, Qian Yang	(参考訳) サイバーいじめはティーンエイジャーのメンタルヘルスを損なうものであり、それらを先行する介入を教えることが重要である。ウィザード・オブ・オズの研究では、チャットボットはパーソナライズされた、インタラクティブなサイバーいじめ教育をスケールできるが、そのようなチャットボットの実装は困難で繊細なタスクである。私たちは、K-12教師のためのノーコードチャットボットデザインツールを作成しました。大規模言語モデルとプロンプトチェインを用いることで,教師は対話フローやチャットボット発話のプロトタイプを作成できる。このツールを提供することで、教師が指導を支援するチャットボットを設計する際のユニークなニーズと、チャットボット設計ツールがそれらをどのようにサポートするかを探る。調査の結果,教師が熱心にツールを歓迎していることが判明した。さらに彼らは自らを、学生の行動とチャットボットの行動の両方を導く劇作家として捉え、即興を許す。彼らの目標は、生徒が安全な環境でのサイバーいじめに対する望ましい反応と望ましくない反応の両方をリハーサルできるようにすることだ。 llm-chainsが教師に力を与えるための設計機会と、この研究の機会について論じる。 Cyberbullying harms teenagers' mental health, and teaching them upstanding intervention is crucial. Wizard-of-Oz studies show chatbots can scale up personalized and interactive cyberbullying education, but implementing such chatbots is a challenging and delicate task. We created a no-code chatbot design tool for K-12 teachers. Using large language models and prompt chaining, our tool allows teachers to prototype bespoke dialogue flows and chatbot utterances. In offering this tool, we explore teachers' distinctive needs when designing chatbots to assist their teaching, and how chatbot design tools might better support them. Our findings reveal that teachers welcome the tool enthusiastically. Moreover, they see themselves as playwrights guiding both the students' and the chatbot's behaviors, while allowing for some improvisation. Their goal is to enable students to rehearse both desirable and undesirable reactions to cyberbullying in a safe environment. We discuss the design opportunities LLM-Chains offer for empowering teachers and the research opportunities this work opens up.	翻訳日:2024-02-28 16:22:13 公開日:2024-02-27
# ct, pet, mriマルチモダリティ画像を用いた頭頸部腫瘍分割のためのsegment anythingモデル Segment anything model for head and neck tumor segmentation with CT, PET and MRI multi-modality images ( http://arxiv.org/abs/2402.17454v1 ) ライセンス: Link先を確認	Jintao Ren, Mathis Rasmussen, Jasper Nijkamp, Jesper Grau Eriksen and Stine Korreman	(参考訳) 深層学習は頭頸部癌(hnc)における総腫瘍量(gtv)の自動診断の新たな機会を提供するが、完全に自動化された手法は通常、重要な手作業による改善を必要とする。本研究は,人間のプロンプトを最小にすることと,そのゼロショット一般化能力が自然画像間で認識されるセグメント・エナジーモデル(sam)について検討する。具体的には,大規模な医用画像を用いたSAMのバージョンであるMedSAMについて検討する。その進歩にもかかわらず、効率的なGTVデライン化のための多モード画像(CT、PET、MRI)の統合は依然として課題である。 HNC GTVセグメンテーションにおけるSAMの応用に着目し、単一(CTのみ)および融合多モード画像を用いて、ゼロショットシナリオと微調整シナリオの両方において、その性能を評価する。本研究は,拘束箱プロンプトによって達成された既に有効なゼロショット結果に基づいて,細調整SAMがセグメンテーション精度を大幅に向上することを示す。これらの所見は半自動HNC GTVセグメンテーションの有望な道を開く。 Deep learning presents novel opportunities for the auto-segmentation of gross tumor volume (GTV) in head and neck cancer (HNC), yet fully automatic methods usually necessitate significant manual refinement. This study investigates the Segment Anything Model (SAM), recognized for requiring minimal human prompting and its zero-shot generalization ability across natural images. We specifically examine MedSAM, a version of SAM fine-tuned with large-scale public medical images. Despite its progress, the integration of multi-modality images (CT, PET, MRI) for effective GTV delineation remains a challenge. Focusing on SAM's application in HNC GTV segmentation, we assess its performance in both zero-shot and fine-tuned scenarios using single (CT-only) and fused multi-modality images. Our study demonstrates that fine-tuning SAM significantly enhances its segmentation accuracy, building upon the already effective zero-shot results achieved with bounding box prompts. These findings open a promising avenue for semi-automatic HNC GTV segmentation.	翻訳日:2024-02-28 16:21:53 公開日:2024-02-27
# 模倣学習における言語条件付きスキル発見のための相互情報再考 Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning ( http://arxiv.org/abs/2402.17511v1 ) ライセンス: Link先を確認	Zhaoxun Ju, Chao Yang, Hongbo Wang, Yu Qiao and Fuchun Sun	(参考訳) 言語条件付きロボットの動作は、人間の命令や指示と知覚や動作を関連付けることで複雑なタスクを実行する上で重要な役割を果たす。制約のない言語命令に基づいて長期タスクを構成する能力は、多種多様な汎用スキルの獲得を必要とする。しかし,対外報酬や人的監督を伴わない連成・長期ホリゾン環境における本質的原始的スキルの獲得には大きな課題がある。本稿では,言語条件付き政策学習の枠組みの中で,2種類の相互情報を用いた数学的観点から,スキルと言語指導の関係を評価する。教師なしの方法で言語とスキルの相互情報を最大化するために,言語条件付きスキル発見(lcsd)と呼ばれるエンドツーエンドの模倣学習手法を提案する。具体的には,ベクトル量子化を用いて離散潜在スキルを学習し,軌跡のスキルシーケンスを活用し,高レベル意味命令を再構成する。 BabyAI,LORel,CALVINを含む言語条件のロボットナビゲーションおよび操作タスクに関する広範な実験を通じて,本手法が先行研究よりも優れていることを示す。提案手法は,未確認タスクに対する一般化能力の向上,スキル解釈性の向上,タスク完了の成功率の向上などを示す。 Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills. However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning. To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions. Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and CALVIN, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.	翻訳日:2024-02-28 16:19:02 公開日:2024-02-27
# 視覚言語表現学習におけるショートカットの実証と削減 Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning ( http://arxiv.org/abs/2402.17510v1 ) ライセンス: Link先を確認	Maurits Bleeker, Mariya Hendriksen, Andrew Yates, Maarten de Rijke	(参考訳) 視覚言語モデル(VLM)は主に画像やキャプションの汎用表現を学ぶための対照的な訓練に依存している。 1つの画像が複数のキャプションに関連付けられた状況に注目し,各キャプションに共有された情報と,その画像に描かれたシーンに関するキャプションごとにユニークな情報の両方を含むキャプションについて述べる。このようなケースでは、キャプションが提供する全ての情報を含むタスク最適表現を学習するのにコントラスト的損失が十分であるかどうか、あるいはコントラスト的損失を最小限に抑える単純なショートカットの学習を奨励しているかどうかが不明である。画像テキストデータに合成ショートカットを注入する学習・評価フレームワークである視覚言語のための合成ショートカットを紹介する。これらの合成ショートカットを含むデータをスクラッチまたは微調整したコントラストvlmは、主にショートカットを表す特徴を学習する。したがって、画像と関連するキャプション間で共有されるすべてのタスク関連情報を含む、タスク最適表現を学ぶのに、対照的な損失は十分ではない。トレーニングおよび評価フレームワークにおけるショートカット学習を減らす2つの方法を検討する。 (i)潜伏目標復号、及び (ii)暗黙的な特徴修正。いずれの手法も評価タスクの性能を向上させるが,ショートカット学習フレームワークを用いてトレーニングや評価を行う際のショートカット学習を部分的に削減する。したがって、コントラッシブな視覚言語表現学習のためのショートカット学習フレームワークの難しさと課題を示す。 Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose representations of images and captions. We focus on the situation when one image is associated with several captions, each caption containing both information shared among all captions and unique information per caption about the scene depicted in the image. In such cases, it is unclear whether contrastive losses are sufficient for learning task-optimal representations that contain all the information provided by the captions or whether the contrastive learning setup encourages the learning of a simple shortcut that minimizes contrastive loss. We introduce synthetic shortcuts for vision-language: a training and evaluation framework where we inject synthetic shortcuts into image-text data. We show that contrastive VLMs trained from scratch or fine-tuned with data containing these synthetic shortcuts mainly learn features that represent the shortcut. Hence, contrastive losses are not sufficient to learn task-optimal representations, i.e., representations that contain all task-relevant information shared between the image and associated captions. We examine two methods to reduce shortcut learning in our training and evaluation framework: (i) latent target decoding and (ii) implicit feature modification. We show empirically that both methods improve performance on the evaluation task, but only partly reduce shortcut learning when training and evaluating with our shortcut learning framework. Hence, we show the difficulty and challenge of our shortcut learning framework for contrastive vision-language representation learning.	翻訳日:2024-02-28 16:18:24 公開日:2024-02-27
# 極端な誤解と敵意の強固さの錯覚 Extreme Miscalibration and the Illusion of Adversarial Robustness ( http://arxiv.org/abs/2402.17509v1 ) ライセンス: Link先を確認	Vyas Raina, Samson Tan, Volkan Cevher, Aditya Rawal, Sheng Zha, George Karypis	(参考訳) ディープラーニングベースの自然言語処理(NLP)モデルは、小さな摂動によってモデルが誤分類される可能性のある敵攻撃に対して脆弱である。逆行訓練(AT)は、しばしばモデルの堅牢性を高めるために使用される。しかし、我々は、意図的または誤ってモデルが勾配をマスクし、敵の攻撃探索方法に干渉し、強靭性が明らかに増加するという興味深い現象を発見した。本研究は, 強靭性の観察による獲得は, 強靭性の錯覚(IOR)であり, 上記の干渉を無効化し, 敵の攻撃が敵の例を見つけることを可能にするために, 種々のテスト時間温度校正を行うことができることを示す。したがって、我々はNLPコミュニティに対して、観測された利得が本物であることを確実にするために、試験時間温度スケーリングを彼らの堅牢性評価に組み込むよう促す。最後に、真に堅牢性を改善するために、 \textit{training}中に温度をスケールする方法を示す。 Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, we have discovered an intriguing phenomenon: deliberately or accidentally miscalibrating models masks gradients in a way that interferes with adversarial attack search methods, giving rise to an apparent increase in robustness. We show that this observed gain in robustness is an illusion of robustness (IOR), and demonstrate how an adversary can perform various forms of test-time temperature calibration to nullify the aforementioned interference and allow the adversarial attack to find adversarial examples. Hence, we urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations to ensure that any observed gains are genuine. Finally, we show how the temperature can be scaled during \textit{training} to improve genuine robustness.	翻訳日:2024-02-28 16:17:40 公開日:2024-02-27
# 線形複雑度を有する対話型多頭部セルフアテンション Interactive Multi-Head Self-Attention with Linear Complexity ( http://arxiv.org/abs/2402.17507v1 ) ライセンス: Link先を確認	Hankyul Kang, Ming-Hsuan Yang, Jongbin Ryu	(参考訳) 本稿では,マルチヘッド・セルフアテンションの分解による効率的な対話的手法を提案する。マルチヘッドセルフアテンションを用いた既存の手法では、各ヘッドの注意操作を独立に計算する。しかし,アテンションマトリックスのクロスヘッド間の相互作用はアテンション操作の情報フローを増加させることを示した。それぞれの頭部の注意行列をネットワークの特徴と見なすことができれば,対話をよりよく捉えるために,それらの間の接続を確立することは有益である。しかし、複雑度が注目行列の高次元とともに大きくなるにつれて、クロスヘッド間の相互作用を捉える直接的なアプローチは計算的に禁止される。本研究では,アテンション操作をクエリとキーレスのコンポーネントに分割する効果的な手法を提案する。これにより、注意行列、特に対頭相互作用に対してより管理可能なサイズが得られる。実験結果から,提案手法は既存の効率的な注目手法や最先端のバックボーンモデルに対して良好に作用することが示された。 We propose an efficient interactive method for multi-head self-attention via decomposition. For existing methods using multi-head self-attention, the attention operation of each head is computed independently. However, we show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation. Considering that the attention matrix of each head can be seen as a feature of networks, it is beneficial to establish connectivity between them to capture interactions better. However, a straightforward approach to capture the interactions between the cross-heads is computationally prohibitive as the complexity grows substantially with the high dimension of an attention matrix. In this work, we propose an effective method to decompose the attention operation into query- and key-less components. This will result in a more manageable size for the attention matrix, specifically for the cross-head interactions. Expensive experimental results show that the proposed cross-head interaction approach performs favorably against existing efficient attention methods and state-of-the-art backbone models.	翻訳日:2024-02-28 16:17:09 公開日:2024-02-27
# 熱力学による希少時間力学データの超解像 Thermodynamics-informed super-resolution of scarce temporal dynamics data ( http://arxiv.org/abs/2402.17506v1 ) ライセンス: Link先を確認	Carlos Bermejo-Barbanoj, Beatriz Moya, Alberto Bad\'ias, Francisco Chinesta, El\'ias Cueto	(参考訳) 本稿では,物理系の測定値の分解能を高め,熱力学を意識したニューラルネットワークを用いて時間進化を予測する手法を提案する。本手法は逆オートエンコーダを用いて,例えば正規分布に適合するように強制される潜在変数の集合に対して,全順序モデルの次元性を低減する。逆オートエンコーダは生成モデルと見なされ、低レゾリューション入力から高分解能のサンプルを生成するように訓練することができる。次に、第2のニューラルネットワークがトレーニングされ、潜伏変数の物理的構造を学び、その時間的進化を予測する。このニューラルネットワークは構造保存ニューラルネットワークとして知られている。系のメトリクティック構造を学習し、熱力学の第一原理と第二原理が満たされることを保証するために物理的バイアスを適用する。積分された軌道は、逆オートエンコーダによって生成される高次元空間と同様に元の次元にデコードされ、基底真理解と比較される。この手法はシリンダー上の流れの2つの例で試験され、それぞれの例で流体特性が変化する。 We present a method to increase the resolution of measurements of a physical system and subsequently predict its time evolution using thermodynamics-aware neural networks. Our method uses adversarial autoencoders, which reduce the dimensionality of the full order model to a set of latent variables that are enforced to match a prior, for example a normal distribution. Adversarial autoencoders are seen as generative models, and they can be trained to generate high-resolution samples from low-resoution inputs, meaning they can address the so-called super-resolution problem. Then, a second neural network is trained to learn the physical structure of the latent variables and predict their temporal evolution. This neural network is known as an structure-preserving neural network. It learns the metriplectic-structure of the system and applies a physical bias to ensure that the first and second principles of thermodynamics are fulfilled. The integrated trajectories are decoded to their original dimensionality, as well as to the higher dimensionality space produced by the adversarial autoencoder and they are compared to the ground truth solution. The method is tested with two examples of flow over a cylinder, where the fluid properties are varied between both examples.	翻訳日:2024-02-28 16:16:41 公開日:2024-02-27
# bases: 大言語モデルに基づくエージェントによる大規模web検索ユーザシミュレーション BASES: Large-scale Web Search User Simulation with Large Language Model based Agents ( http://arxiv.org/abs/2402.17505v1 ) ライセンス: Link先を確認	Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang	(参考訳) 大規模言語モデル(LLM)の優れた能力のため、信頼性の高いユーザシミュレーションのためのLLMベースのエージェントの開発が可能である。本稿では,実際のユーザデータの不足と限界(プライバシ問題など)を考慮して,web検索における大規模ユーザシミュレーションを行い,ユーザ検索行動の分析とモデリングを改善する。特に,web検索ユーザの行動の総合的なシミュレーションを容易にするために,llmベースのエージェントを用いた新しいユーザシミュレーションフレームワークであるbasesを提案する。シミュレーションフレームワークは,ユーザプロファイルを大規模に生成することで,検索行動の多様化を実現する。ベースの有効性を実証するために,中国語と英語の2つのベンチマークに基づく評価実験を行い,ベースが大規模人間ライクな検索行動を効果的にシミュレートできることを実証した。 web 検索の研究をさらに充実させるために,中国語版と英語版の両方を含む web 検索ユーザの行動を包含する新たな大規模データセット warriors を開発した。コードとデータはまもなく公開されます。 Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation. Considering the scarcity and limit (e.g., privacy issues) of real user data, in this paper, we conduct large-scale user simulation for web search, to improve the analysis and modeling of user search behavior. Specially, we propose BASES, a novel user simulation framework with LLM-based agents, designed to facilitate comprehensive simulations of web search user behaviors. Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors. To demonstrate the effectiveness of BASES, we conduct evaluation experiments based on two human benchmarks in both Chinese and English, demonstrating that BASES can effectively simulate large-scale human-like search behaviors. To further accommodate the research on web search, we develop WARRIORS, a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions, which can greatly bolster research in the field of information retrieval. Our code and data will be publicly released soon.	翻訳日:2024-02-28 16:15:58 公開日:2024-02-27
# FedLPPA:Federated Weaklysupervised Medical Image Segmentationのための個人化プロンプトとアグリゲーションの学習 FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation ( http://arxiv.org/abs/2402.17502v1 ) ライセンス: Link先を確認	Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang	(参考訳) フェデレートラーニング(FL)は、ポリシーやプライバシの懸念によって引き起こされるデータサイロの課題を効果的に軽減し、深いモデルトレーニングにより多くのデータを活用する。しかし、従来の集中型FLモデルは様々なマルチセンターデータ、特に医学的文脈において重要なデータ不均一性に直面している。医用画像セグメンテーションの分野では,アノテーションコストを削減しようとする命令性が高まり,ポイントやスクリブルなどの分散アノテーションを利用する弱い教師技術の重要性が増している。実用的flパラダイムは、研究テーマが未調査のままである様々なサイトにわたる多様なアノテーション形式を適応させなければならない。このような状況下で,医用画像セグメンテーションのための不均一な弱い監督を均一に活用するために,学習可能なプロンプトとアグリゲーション(FedLPPA)を備えた新規なFLフレームワークを提案する。 FedLPPAでは、学習可能な普遍的な知識プロンプトが維持され、複数の学習可能なパーソナライズされたデータ配布プロンプトと、監督空間を表すプロンプトが補完される。デュアルアテンション機構によってサンプル機能と統合され、各ローカルタスクデコーダが局所分布と監督形式の両方に順応的に適応するように促される。同時に、局所データに固有のオーバーフィッティングやノイズの蓄積を緩和し、弱教師付き学習における擬似ラベルの生成を促進するために、プロンプト類似性に先行するデュアルデコーダ戦略を導入し、タスクデコーダをパラメータ単位でカスタマイズする適応型アグリゲーション法を用いる。異なるモダリティを含む3つの異なる医用画像セグメンテーションタスクに対する広範囲な実験は、FedLPPAの優位性を浮き彫りにした。コードとデータは利用可能です。 Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing imperative to curtail annotation costs has amplified the importance of weakly-supervised techniques which utilize sparse annotations such as points, scribbles, etc. A pragmatic FL paradigm shall accommodate diverse annotation formats across different sites, which research topic remains under-investigated. In such context, we propose a novel personalized FL framework with learnable prompt and aggregation (FedLPPA) to uniformly leverage heterogeneous weak supervision for medical image segmentation. In FedLPPA, a learnable universal knowledge prompt is maintained, complemented by multiple learnable personalized data distribution prompts and prompts representing the supervision sparsity. Integrated with sample features through a dual-attention mechanism, those prompts empower each local task decoder to adeptly adjust to both the local distribution and the supervision form. Concurrently, a dual-decoder strategy, predicated on prompt similarity, is introduced for enhancing the generation of pseudo-labels in weakly-supervised learning, alleviating overfitting and noise accumulation inherent to local data, while an adaptable aggregation method is employed to customize the task decoder on a parameter-wise basis. Extensive experiments on three distinct medical image segmentation tasks involving different modalities underscore the superiority of FedLPPA, with its efficacy closely parallels that of fully supervised centralized training. Our code and data will be available.	翻訳日:2024-02-28 16:15:21 公開日:2024-02-27
# ビッグシーケンスモデリング問題としての集中ケア Intensive Care as One Big Sequence Modeling Problem ( http://arxiv.org/abs/2402.17501v1 ) ライセンス: Link先を確認	Vadim Liventsev, Tobias Fritz	(参考訳) 医療における強化学習は、典型的には敗血症の予測や麻酔管理のような狭い自己完結したタスクに関係している。しかし、従来の研究では、暗黙的な伝達学習能力により、タスク固有のアプローチよりも優れた汎用モデル(主な例はLarge Language Models)の可能性を実証している。医療基盤モデルのトレーニングを可能にするとともに,トランスフォーマーアーキテクチャの能力を活用するために,患者と医療提供者のインタラクションをイベントストリームとして表現し,診断や治療選択などのタスクをストリーム内の将来の事象の予測としてモデル化する,医療のパラダイム・アズ・シーケンス・モデリングを提案する。このパラダイムを実験的に検討するために、MIMIC-IVデータセットから異種臨床記録を均一なイベントストリーム形式に翻訳し、ベースラインモデルをトレーニングし、その能力を探索するシーケンスモデリングベンチマークMIMIC-SEQを開発した。 Reinforcement Learning in Healthcare is typically concerned with narrow self-contained tasks such as sepsis prediction or anesthesia control. However, previous research has demonstrated the potential of generalist models (the prime example being Large Language Models) to outperform task-specific approaches due to their capability for implicit transfer learning. To enable training of foundation models for Healthcare as well as leverage the capabilities of state of the art Transformer architectures, we propose the paradigm of Healthcare as Sequence Modeling, in which interaction between the patient and the healthcare provider is represented as an event stream and tasks like diagnosis and treatment selection are modeled as prediction of future events in the stream. To explore this paradigm experimentally we develop MIMIC-SEQ, a sequence modeling benchmark derived by translating heterogenous clinical records from MIMIC-IV dataset into a uniform event stream format, train a baseline model and explore its capabilities.	翻訳日:2024-02-28 16:14:42 公開日:2024-02-27
# 複雑発振器ネットワークにおける不安定性予測 : ネットワーク計測と機械学習の限界と可能性 Predicting Instability in Complex Oscillator Networks: Limitations and Potentials of Network Measures and Machine Learning ( http://arxiv.org/abs/2402.17500v1 ) ライセンス: Link先を確認	Christian Nauck, Michael Lindner, Nora Molkenthin, J\"urgen Kurths, Eckehard Sch\"oll, J\"org Raisch and Frank Hellmann	(参考訳) ネットワーク科学の中心的な問題は、システムの機能的性質がその構造からどのように生じるかである。ネットワーク力学系の場合、構造は通常ネットワーク測度で定量化される。振動系において理論的かつ実用的な興味を持つ機能的性質は、局所摂動に対する同期の安定性である。近年、グラフニューラルネットワーク(gnns)は、この安定性をうまく予測できることが示されている。ここでは46の関連するネットワーク測度を収集し、小さなサブセットが確実に安定性を予測できることを見出す。 GNNの性能は、すべてのネットワーク測度とノードワイズ機械学習を組み合わせることでのみ一致できる。しかし、GNNとは異なり、このアプローチはネットワークアンサンブルから複数の実電力グリッドトポロジへの外挿に失敗する。このことは,ネットワーク計測と関数の相関関係が誤解を招く可能性があり,GNNが構造と安定性の因果関係をかなりよく捉えていることを示唆している。 A central question of network science is how functional properties of systems arise from their structure. For networked dynamical systems, structure is typically quantified with network measures. A functional property that is of theoretical and practical interest for oscillatory systems is the stability of synchrony to localized perturbations. Recently, Graph Neural Networks (GNNs) have been shown to predict this stability successfully; at the same time, network measures have struggled to paint a clear picture. Here we collect 46 relevant network measures and find that no small subset can reliably predict stability. The performance of GNNs can only be matched by combining all network measures and nodewise machine learning. However, unlike GNNs, this approach fails to extrapolate from network ensembles to several real power grid topologies. This suggests that correlations of network measures and function may be misleading, and that GNNs capture the causal relationship between structure and stability substantially better.	翻訳日:2024-02-28 16:14:25 公開日:2024-02-27
# REAR: オープンドメイン質問応答のための関連性対応検索フレームワーク REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering ( http://arxiv.org/abs/2402.17497v1 ) ライセンス: Link先を確認	Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, Jing Liu, Ji-Rong Wen	(参考訳) 内部パラメトリック知識の制限を考慮すると、検索拡張生成(RAG)は大規模言語モデル(LLM)の知識範囲の拡大に広く用いられている。 RAG研究の広範な取り組みにもかかわらず、既存の手法では、LLMは検索された文書の関連性を正確に評価できないため、外部知識(すなわち、回収された文書)の誤用や不正な利用につながる可能性がある。本稿では,オープンドメイン質問応答(qa)のための関連性を考慮した検索手法であるreaを提案する。鍵となる動機は、RAGシステムにおける外部知識を適応的に活用するために、LLMにおけるソース関連性の自己認識を高めることである。具体的には、検索した文書の関連性を正確に評価する特別設計のランクヘッドを組み込むことにより、LLMベースのRAGシステムのための新しいアーキテクチャを開発する。さらに,両粒度相関融合と耐雑音訓練に基づく改良訓練法を提案する。アーキテクチャとトレーニングの両方の改善を組み合わせることで,検索文書の関連性を効果的に把握することにより,外部知識をより活用することができる。オープンドメインの4つのQAタスクの実験では、REARは以前の競合RAGアプローチよりも大幅に優れていた。私たちのコードとデータはhttps://github.com/RUCAIBox/REARでアクセスできます。 Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large language models (LLMs). Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i.e., retrieved documents). To address this issue, in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). As the key motivation, we aim to enhance the self-awareness of source relevance for LLMs, so as to adaptively utilize external knowledge in RAG systems. Specially, we develop a new architecture for LLM based RAG system, by incorporating a specially designed rank head that precisely assesses the relevance of retrieved documents. Furthermore, we propose an improved training method based on bi-granularity relevance fusion and noise-resistant training. By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches. Our code and data can be accessed at https://github.com/RUCAIBox/REAR.	翻訳日:2024-02-28 16:14:09 公開日:2024-02-27
# 感情音声メッセージ(emovome)データベース : 自発的音声メッセージにおける感情認識 Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages ( http://arxiv.org/abs/2402.17496v1 ) ライセンス: Link先を確認	Luc\'ia G\'omez Zaragoz\'a (1), Roc\'io del Amor (1), Elena Parra Vargas (1), Valery Naranjo (1), Mariano Alca\~niz Raya (1), Javier Mar\'in-Morales (1) ((1) HUMAN-tech Institute, Universitat Polit\`enica de Val\`encia, Valencia, Spain)	(参考訳) EMOVOME(Emotional Voice Messages)は、スペイン語話者100人のメッセージアプリで、実際の会話から999の音声メッセージを含む、自発的な音声データセットである。ボイスメッセージは、参加者が採用される前に、実験室環境による意識的な偏見を避けるために、現場で発生した。音声は3人の非専門家と2人の専門家によってヴァレンスと覚醒次元でラベル付けされ、それらを組み合わせて次元ごとに最終ラベルを得た。専門家は7つの感情カテゴリーに対応する追加ラベルも提供した。 EMOVOMEを用いた将来の調査のベースラインを設定するために,音声と音声の両方を用いた感情認識モデルを実装した。音声では,標準の eGeMAPS 機能セットとサポートベクターを用いて,それぞれ49.27% と44.71% の未加重精度を得た。テキストでは、多言語BERTモデルを微調整し、それぞれ61.15%と47.43%の未重み付き精度を達成した。このデータベースは、野生における感情認識の研究に大きく貢献すると同時に、スペイン語に固有の自然で自由にアクセスできるリソースを提供する。 Emotional Voice Messages (EMOVOME) is a spontaneous speech dataset containing 999 audio messages from real conversations on a messaging app from 100 Spanish speakers, gender balanced. Voice messages were produced in-the-wild conditions before participants were recruited, avoiding any conscious bias due to laboratory environment. Audios were labeled in valence and arousal dimensions by three non-experts and two experts, which were then combined to obtain a final label per dimension. The experts also provided an extra label corresponding to seven emotion categories. To set a baseline for future investigations using EMOVOME, we implemented emotion recognition models using both speech and audio transcriptions. For speech, we used the standard eGeMAPS feature set and support vector machines, obtaining 49.27% and 44.71% unweighted accuracy for valence and arousal respectively. For text, we fine-tuned a multilingual BERT model and achieved 61.15% and 47.43% unweighted accuracy for valence and arousal respectively. This database will significantly contribute to research on emotion recognition in the wild, while also providing a unique natural and freely accessible resource for Spanish.	翻訳日:2024-02-28 16:13:45 公開日:2024-02-27
# 周術期ケアのための大規模言語モデルの作成:事前学習モデルに適切な用法は何か? Prescribing Large Language Models for Perioperative Care: What's The Right Dose for Pre-trained Models? ( http://arxiv.org/abs/2402.17493v1 ) ライセンス: Link先を確認	Bing Xue, Charles Alba, Joanna Abraham, Thomas Kannampallil, Chenyang Lu	(参考訳) 術後のリスク予測は、効果的な周術期ケア管理と計画に影響を及ぼす。臨床大言語モデル (LLM) が術後のリスクを予測できるかどうかを, 様々なトレーニング戦略を用いて評価することを目的とした。 2018年から2021年の間、バーンズ・ユダヤ人病院(BJH)の84,875件の記録を保有していた。方法はベス・イスラエル・デコネスのMIMICデータセットで再現された。両研究とも術後のICU持続期間は7日以内であった。 BJHデータセットでは,30日間の死亡,肺塞栓症(PE),肺炎が認められた。 BioGPT, ClinicalBERT, BioClinicalBERTの3つのドメイン適応および微調整戦略が, 自己指導目的, ラベルを半教師付き微調整, マルチタスク学習による基礎的モデリングによって実現された。モデル性能は,受信者の動作特性曲線 (auroc) の下の領域と, 分類タスクの精度リコール曲線 (auprc) の領域, 回帰タスクの平均二乗誤差 (mse) と r2 を用いて比較した。事前訓練されたLLMは従来の単語埋め込みよりも優れており、AUROCは38.3%、AUPRCは14%だった。適応モデルの性能はさらに向上した:(1)aurocでは3.2%、auprcでは1.5%、(2)aurocでは1.8%、auprcでは2%、(3)aurocでは3.6%、auprcでは2.6%の自己教師付き微調整である。事前訓練された臨床LSMは、周術期医療におけるLSMの一般化可能性に対するタスク非依存学習の可能性を示す基礎モデルにおいて、予期せぬデータにおける術後リスク予測の機会を提供する。 Postoperative risk predictions can inform effective perioperative care management and planning. We aimed to assess whether clinical large language models (LLMs) can predict postoperative risks using clinical texts with various training strategies. The main cohort involved 84,875 records from Barnes Jewish Hospital (BJH) system between 2018 and 2021. Methods were replicated on Beth Israel Deaconess's MIMIC dataset. Both studies had mean duration of follow-up based on the length of postoperative ICU stay less than 7 days. For the BJH dataset, outcomes included 30-day mortality, pulmonary embolism (PE) and pneumonia. Three domain adaptation and finetuning strategies were implemented for BioGPT, ClinicalBERT and BioClinicalBERT: self-supervised objectives; incorporating labels with semi-supervised fine-tuning; and foundational modelling through multi-task learning. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) for classification tasks, and mean squared error (MSE) and R2 for regression tasks. Pre-trained LLMs outperformed traditional word embeddings, with absolute maximal gains of 38.3% for AUROC and 14% for AUPRC. Adapting models further improved performance: (1) self-supervised finetuning by 3.2% for AUROC and 1.5% for AUPRC; (2) semi-supervised finetuning by 1.8% for AUROC and 2% for AUPRC, compared to self-supervised finetuning; (3) foundational modelling by 3.6% for AUROC and 2.6% for AUPRC, compared to self-supervised finetuning. Pre-trained clinical LLMs offer opportunities for postoperative risk predictions in unforeseen data, with peaks in foundational models indicating the potential of task-agnostic learning towards the generalizability of LLMs in perioperative care.	翻訳日:2024-02-28 16:13:24 公開日:2024-02-27
# syren-halofit:$\lambda$cdm非線形物質パワースペクトルの高速で解釈可能で高精度な公式 syren-halofit: A fast, interpretable, high-precision formula for the $\Lambda$CDM nonlinear matter power spectrum ( http://arxiv.org/abs/2402.17492v1 ) ライセンス: Link先を確認	Deaglan J. Bartlett, Benjamin D. Wandelt, Matteo Zennaro, Pedro G. Ferreira, Harry Desmond	(参考訳) 宇宙論パラメータと赤方偏移の関数としての非線形物質パワースペクトルの迅速かつ正確な評価は、宇宙論において基本的な重要性である。解析近似は解釈可能な解を提供するが、現在の近似はブラックボックス数値エミュレータと比較して高速でも正確でもない。シンボリック回帰法を用いて、非線形スケールに対する単純な解析近似である$k_\sigma$、有効スペクトル指数である$n_{\rm eff}$、およびハロフィットモデルに必要な曲率である$C$を得る。次にハロフィットの係数を再最適化し、幅広い宇宙論と赤方偏移に適合させる。次に、シンボリック回帰を利用して解析表現の空間を探索し、$p(k)$ と halofit の最適化された予測の間の残差に適合させる。すべてのメソッドは$n$-bodyシミュレーションに対して検証される。 k_\sigma$, $n_{\rm eff}$ と $C$ の記号式はそれぞれ 3 以下の赤方偏移と幅広い宇宙論に対して 0.8%, 0.2%, 0.3% の根平均二乗誤差を持つ。再最適化ハロフィットパラメータは、波数 $k=9\times10^{-3}-9 \, h{\rm mpc^{-1}}$ に対して、根平均二乗分数誤差を3%から2%以下に低減する。本稿では,短い記号補正を含むハロフィットの拡張であるシレンハロフィット(シンボリック・レグレス・エンハンス・ハロフィット)を導入し,この誤差を1%に改善する。本手法は,現在のhalofitおよびhmcodeの実装よりも2350および3170倍高速であり,euclidemulator2(実行クラスが必要)およびbaccoエミュレータよりも2680および64倍高速である。 n$-bodyシミュレーションでテストすると euclidemulator2 と bacco emulator に匹敵する精度が得られる。我々の研究はシンボリック近似の速度と精度を$P(k)$に大きく上げ、精度を損なうことなく数値計算よりも大幅に速くする。 Rapid and accurate evaluation of the nonlinear matter power spectrum, $P(k)$, as a function of cosmological parameters and redshift is of fundamental importance in cosmology. Analytic approximations provide an interpretable solution, yet current approximations are neither fast nor accurate relative to black-box numerical emulators. We use symbolic regression to obtain simple analytic approximations to the nonlinear scale, $k_\sigma$, the effective spectral index, $n_{\rm eff}$, and the curvature, $C$, which are required for the halofit model. We then re-optimise the coefficients of halofit to fit a wide range of cosmologies and redshifts. We then again exploit symbolic regression to explore the space of analytic expressions to fit the residuals between $P(k)$ and the optimised predictions of halofit. All methods are validated against $N$-body simulations. Our symbolic expressions for $k_\sigma$, $n_{\rm eff}$ and $C$ have root mean squared fractional errors of 0.8%, 0.2% and 0.3%, respectively, for redshifts below 3 and a wide range of cosmologies. The re-optimised halofit parameters reduce the root mean squared fractional error from 3% to below 2% for wavenumbers $k=9\times10^{-3}-9 \, h{\rm Mpc^{-1}}$. We introduce syren-halofit (symbolic-regression-enhanced halofit), an extension to halofit containing a short symbolic correction which improves this error to 1%. Our method is 2350 and 3170 times faster than current halofit and hmcode implementations, respectively, and 2680 and 64 times faster than EuclidEmulator2 (which requires running class) and the BACCO emulator. We obtain comparable accuracy to EuclidEmulator2 and the BACCO emulator when tested on $N$-body simulations. Our work greatly increases the speed and accuracy of symbolic approximations to $P(k)$, making them significantly faster than their numerical counterparts without loss of accuracy.	翻訳日:2024-02-28 16:12:46 公開日:2024-02-27
# メカニカル・トルクネス:戦術メディアアートと企業AIの批判 The Mechanical Turkness: Tactical Media Art and the Critique of Corporate AI ( http://arxiv.org/abs/2402.17490v1 ) ライセンス: Link先を確認	Dejan Grba	(参考訳) 2010年代中盤以降、人工知能(AI)の広範な工業化は、その経済的・社会政治的な結果に対処するためにアーティストを動機付けている。本章では,AI技術の社会的ルーツを明らかにするために,クリエイティブエージェンシー,クラウドソースワーク,委譲アートメーカを模倣する相互関連技術プラクティスについて論じ,その発展における生産的ヒューマンロールの基盤となるものについて論じる。私は、現代AIによる科学、技術、経済、社会の幅広い問題を示す詩的な特徴を持つ作品に焦点を当てています。企業AIの政治体制をディスラプトする上でのそれらの効果の概念的、方法論的、倫理的側面を探求することにより、その戦術的影響に影響を与えるいくつかの問題を特定し、課題に対処し、分野を前進させる潜在的な道筋を概説する。 The extensive industrialization of artificial intelligence (AI) since the mid-2010s has increasingly motivated artists to address its economic and sociopolitical consequences. In this chapter, I discuss interrelated art practices that thematize creative agency, crowdsourced labor, and delegated artmaking to reveal the social rootage of AI technologies and underline the productive human roles in their development. I focus on works whose poetic features indicate broader issues of contemporary AI-influenced science, technology, economy, and society. By exploring the conceptual, methodological, and ethical aspects of their effectiveness in disrupting the political regime of corporate AI, I identify several problems that affect their tactical impact and outline potential avenues for tackling the challenges and advancing the field.	翻訳日:2024-02-28 16:12:05 公開日:2024-02-27
# jpeg-ai検証モデルにおけるビットレートマッチングアルゴリズム最適化 Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model ( http://arxiv.org/abs/2402.17487v1 ) ライセンス: Link先を確認	Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup	(参考訳) ニューラルネットワーク(NN)に基づく画像圧縮の研究は、古典的な圧縮フレームワークよりも優れた性能を示している。古典的フレームワークのハンドエンジニアリング変換とは異なり、nnベースのモデルはよりコンパクトなビット表現を提供する非線形変換を学習し、従来のモデルよりも並列デバイスで高速なコーディング速度を達成する。これらの特性は、科学と工業の両方のコミュニティの注目を集め、JPEG-AIの標準化活動に繋がった。 JPEG-AIの標準化プロセスの検証モデルは、すでに開発されており、高度なVVCイントラコーデックを超えている。 1ピクセルあたりの所望のビット数で再構成された画像を生成し、jpeg-ai検証モデルとvvc intraの両方のbdレート性能を評価するために、ビットレートマッチングを用いる。しかし、jpeg-ai検証モデルの現在の状態はビットレートマッチング中に大幅に遅くなり、不適切なモデルのために最適でない性能をもたらす。提案手法は,ビットレートのマッチングを段階的に最適化し,基本動作点における4倍の加速とBDレートの1%以上の改善を実現する。高い演算点では、加速は最大6倍に増加する。 The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold.	翻訳日:2024-02-28 16:11:43 公開日:2024-02-27
# 画像品質評価モデルに対するブラックボックス広告攻撃 Black-box Adversarial Attacks Against Image Quality Assessment Models ( http://arxiv.org/abs/2402.17533v1 ) ライセンス: Link先を確認	Yu Ran, Ao-Xiang Zhang, Mingjie Li, Weixuan Tang, Yuan-Gen Wang	(参考訳) No-Reference Image Quality Assessment (NR-IQA)の目標は、画像の知覚的品質を主観的評価に従って予測することである。 NR-IQAモデルを実践するためには、モデル改良のための潜在的な抜け穴を研究することが不可欠である。本稿では,NR-IQAモデルに対するブラックボックス攻撃を初めて検討する。具体的には、まず、視覚品質保存のための摂動画像歪みを制限しつつ、原画像と摂動画像の推定品質スコアの偏差を最大化する攻撃問題を定式化する。このような定式化の下では,最大偏差のある反対方向に向けて,敵例の推定品質スコアを誤解させる双方向損失関数を設計する。そこで我々はNR-IQAモデルに対する効率的かつ効果的なブラックボックス攻撃法を開発した。実験の結果,評価されたNR-IQAモデルはすべて攻撃法に弱いことがわかった。生成された摂動は伝達不可能であり、異なるIQAモデルの特殊性の調査に役立てることができる。 The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the attack problem as maximizing the deviation between the estimated quality scores of original and perturbed images, while restricting the perturbed image distortions for visual quality preservation. Under such formulation, we then design a Bi-directional loss function to mislead the estimated quality scores of adversarial examples towards an opposite direction with maximum deviation. On this basis, we finally develop an efficient and effective black-box attack method against NR-IQA models. Extensive experiments reveal that all the evaluated NR-IQA models are vulnerable to the proposed attack method. And the generated perturbations are not transferable, enabling them to serve the investigation of specialities of disparate IQA models.	翻訳日:2024-02-28 16:06:52 公開日:2024-02-27
# 検索は正確な生成です Retrieval is Accurate Generation ( http://arxiv.org/abs/2402.17532v1 ) ライセンス: Link先を確認	Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi	(参考訳) 標準言語モデルは、固定、有限、スタンドアロンの語彙からトークンを選択してテキストを生成する。本稿では,支援文書の集合から文脈認識句を選択する新しい手法を提案する。このパラダイムシフトの最も重要な課題の1つは、テキストの文字列を様々な方法でセグメント化でき、各セグメントを多数の可能なドキュメントから検索できるため、トレーニングのオラクルを決定することである。そこで本稿では,言語的ヒューリスティックス(Huristics)を用いたオークルの初期化と,反復的自己強化によるオークルのブートストラップを提案する。広範な実験により,我々は知識集約型タスクで標準言語モデルを上回るだけでなく,オープンエンドテキスト生成における生成品質の向上を実証した。例えば、標準言語モデルと比較して、私たちのモデルはOpenbookQAで23.47%から36.27%に精度を上げ、オープンエンドテキスト生成で42.61%から81.58%にMAUVEのスコアを改善する。注目すべきことに,本モデルでは,いくつかの検索拡張ベースラインにおいて,最高の性能と低レイテンシを実現している。結論として,検索はより正確な生成であり,新たなパラダイムシフトのさらなる研究を促進することを願っている。 Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.	翻訳日:2024-02-28 16:06:32 公開日:2024-02-27

Title

Authors

Abstract

論文公表日・翻訳日

# アナログ回路設計のための機械学習駆動グローバル最適化フレームワーク

Machine Learning Driven Global Optimisation Framework for Analog Circuit Design ( http://arxiv.org/abs/2404.02911v1 )

ライセンス: Link先を確認

Ria Rashid, Komala Krishna, Clint Pazhayidam George, Nandakumar Nambath,

(参考訳) 本稿では,アナログ回路設計のための機械学習による最適化フレームワークを提案する。主な目的は、与えられた仕様セットに対するアナログ回路の最適性能のためのデバイスサイズを決定することである。提案手法では,機械学習モデルとスパイスシミュレーションを用いて,アナログ回路の最適設計に向けて最適化アルゴリズムを誘導する。機械学習に基づくグローバルオフラインサロゲートモデルは、回路設計パラメータを入力として、研究中のアナログ回路の設計空間に構築され、最適化アルゴリズムを導出するために使用される。アナログ回路の設計仕様を予測するために多層パーセプトロンとランダムフォレスト回帰器を用いる。トランジスタの飽和状態はアナログ回路の適切な動作に不可欠であるため、回路内の各トランジスタの飽和状態を予測するために多層パーセプトロン分類器が使用される。スパイスシミュレーションを実行する前に、機械学習モデルを用いて候補解の有効性を検証する。提案手法は,バンドギャップ参照,折り畳まれたカスコード動作増幅器,二段動作増幅器の3つの回路トポロジを用いて検証する。シミュレーションの結果、収束後のフィットネス関数の最適値と標準偏差がより低いことがわかった。最適化手法で提案した機械学習に基づく予測を組み込むことで,本研究で検討した3つのテストケースの標準手法と比較して,スパイスコールが56%,59%,83%削減された。

We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for analog circuits. Machine learning based global offline surrogate models, with the circuit design parameters as the input, are built in the design space for the analog circuits under study and is used to guide the optimisation algorithm, resulting in faster convergence and a reduced number of spice simulations. Multi-layer perceptron and random forest regressors are employed to predict the required design specifications of the analog circuit. Since the saturation condition of transistors is vital in the proper working of analog circuits, multi-layer perceptron classifiers are used to predict the saturation condition of each transistor in the circuit. The feasibility of the candidate solutions is verified using machine learning models before invoking spice simulations. We validate the proposed framework using three circuit topologies--a bandgap reference, a folded cascode operational amplifier, and a two-stage operational amplifier. The simulation results show better optimum values and lower standard deviations for fitness functions after convergence. Incorporating the machine learning-based predictions proposed in the optimisation method has resulted in the reduction of spice calls by 56%, 59%, and 83% when compared with standard approaches in the three test cases considered in the study.

翻訳日:2024-07-01 12:08:31 公開日:2024-02-27

# ニューラルネットワークとSMOTE統合アプローチによるクレジットカード不正検出の強化

Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach ( http://arxiv.org/abs/2405.00026v1 )

ライセンス: Link先を確認

Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang,

(参考訳) クレジットカード詐欺検出は金融セクターにとって重要な課題であり、不正取引を正確に識別するための高度なアプローチを要求している。本研究では、ニューラルネットワーク(NN)とSMOTE(Synthet ic Minority Over-Sampling Technique)を組み合わせて検出性能を向上させる革新的な手法を提案する。この研究は、クレジットカード取引データに固有の不均衡に対処し、堅牢で正確な不正検出のための技術的進歩に焦点を当てた。その結果、NNとSMOTEの統合は従来のモデルに比べて精度、リコール、F1スコアが優れており、クレジットカード不正検出シナリオにおいて不均衡なデータセットを扱うための高度なソリューションとしての可能性を強調している。このリースアーチは、不正行為から金融取引を保護するための効果的かつ効率的なメカニズムを開発するための継続的な努力に寄与している。

Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities.

翻訳日:2024-07-01 11:29:30 公開日:2024-02-27

# 分光光場イメージングのための多次元圧縮センシング

Multidimensional Compressed Sensing for Spectral Light Field Imaging ( http://arxiv.org/abs/2405.00027v1 )

ライセンス: Link先を確認

Wen Cao, Ehsan Miandji, Jonas Unger,

(参考訳) 本稿では, 単一単色センサを用いて, 空間, 角, スペクトル情報を捉えるために, 単孔スペクトル符号化マスクとマイクロレンズアレイを用いた圧縮型マルチスペクトル光場カメラモデルを提案する。我々は, 圧縮センシング技術を用いて, アンダーサンプド計測から全マルチスペクトル光場を再構成するモデルを提案する。光電場を1次元信号にベクトル化する従来の手法とは異なり、本手法では5次元ベースと新しい5次元計測モデルを用いて、マルチスペクトル光電場の固有次元をマッチングする。我々は, 5D と 1D のセンシングモデルの等価性を数学的かつ経験的に示し, 最も重要なことは, メモリのごく一部を必要としながら, 5D フレームワークが桁違いに高速な再構成を実現することである。さらに,新しい多次元センシングモデルにより,効率的な視覚的データ取得アルゴリズムとハードウェアを設計するための新たな研究方向が開かれる。

This paper considers a compressive multi-spectral light field camera model that utilizes a one-hot spectralcoded mask and a microlens array to capture spatial, angular, and spectral information using a single monochrome sensor. We propose a model that employs compressed sensing techniques to reconstruct the complete multi-spectral light field from undersampled measurements. Unlike previous work where a light field is vectorized to a 1D signal, our method employs a 5D basis and a novel 5D measurement model, hence, matching the intrinsic dimensionality of multispectral light fields. We mathematically and empirically show the equivalence of 5D and 1D sensing models, and most importantly that the 5D framework achieves orders of magnitude faster reconstruction while requiring a small fraction of the memory. Moreover, our new multidimensional sensing model opens new research directions for designing efficient visual data acquisition algorithms and hardware.

翻訳日:2024-07-01 11:29:30 公開日:2024-02-27

# Rydberg原子系アンテナのドップラー感度と共振チューニング

Doppler sensitivity and resonant tuning of Rydberg atom-based antennas ( http://arxiv.org/abs/2405.07993v1 )

ライセンス: Link先を確認

Peter B. Weichman,

(参考訳) Rydberg 原子蒸気セルをベースとした電波アンテナは、原則としてどのワイヤアンテナよりも感度が高い。その他の望ましい特徴として、非金属、低いプロファイル、元素がある。本稿では、Rydbergアンテナの感度に関する詳細な理論的研究を行い、現在テストされている構成よりも2～3桁の感度を累積的に増加させることができるパラメータ構造を解明する。重要な洞察は、2つのよく研究されたアプローチの利点を最適に組み合わせることである。 (i)レーザー「`2D星配置'」は、レーザーパワーの増大とともに強化され、原子運動誘起ドップラー膨張の補償に役立てられる。 (II)局所発振器と入射信号との共振器により調整された近接縮退Rydbergレベル間の共振結合。恒星のセットアップの利点は、期待されるドップラー制限値に対する全体的な感度を回復するだけで、異なる移動する原子集団がネット信号で互いに破壊的に干渉する追加の非共鳴還元を補うため微妙である。局所発振器チューニングのさらなる独特な利点は、コア状態寿命によって設定される典型的な10MHzではなく、内在的リドベルク状態寿命によって設定される ~10kHz まで、ライン幅が大幅に狭くなることである。直感的には、2つのRydberg状態は独立な高Q空洞として振る舞うように調整され、アンテナ共鳴応答の周波数依存性の研究を通して支持される視点である。様々な外在線拡大効果を抑え、このキャビティ応答を完全に活用するためには、多くの実用的な実験的進歩、特に1cmのレーザービーム幅が必要とされる。

Radio frequency antennas based on Rydberg atom vapor cells can in principle reach sensitivities beyond those of any wire antenna, especially at lower frequencies where long wires are needed to accommodate a growing wavelength. They also have other desirable features such as nonmetallic, lower profile, elements. This paper presents a detailed theoretical investigation of Rydberg antenna sensitivity, elucidating parameter regimes that could cumulatively lead to 2--3 orders of magnitude sensitivity increase beyond that of currently tested configurations. The key insight is to optimally combine the advantages of two well-studied approaches: (i) three laser ``2D star configuration'' setups that, enhanced also with increased laser power, help compensate for atom motion-induced Doppler broadening, and (ii) resonant coupling between a pair of near-degenerate Rydberg levels, tuned via a local oscillator to the incident signal. The advantage of the star setup is subtle because it only restores overall sensitivity to the expected Doppler-limited value, compensating for additional off-resonance reductions where differently moving atom populations destructively interfere with each other in the net signal. The additional unique advantage of the local oscillator tuning is that it leads to vastly narrower line widths, as low as ~10 kHz set by the intrinsic Rydberg state lifetimes, rather than the typical ~10 MHz set by the core state lifetimes. Intuitively, with this setup the two Rydberg states may be tuned to act as an independent high-q cavity, a point of view supported through a study of the frequency-dependence of the antenna resonant response. There are a number of practical experimental advances, especially larger ~1 cm laser beam widths, required to suppress various extrinsic line broadening effects and to fully exploit this cavity response.

翻訳日:2024-07-01 08:49:26 公開日:2024-02-27

# 米国におけるAI, ML, 5G技術を用いた森林火災対策と管理の相乗的アプローチ

A Synergistic Approach to Wildfire Prevention and Management Using AI, ML, and 5G Technology in the United States ( http://arxiv.org/abs/2403.14657v1 )

ライセンス: Link先を確認

Stanley Chinedu Okoro, Alexander Lopez, Austine Unuriode,

(参考訳) 過去数年間、山火事は世界的な環境危機となり、自然の生息地に大きな被害を与え、気候変動の加速に寄与した。森林火災管理手法には、予防、対応、回復の努力が含まれる。検出技術の改善にもかかわらず、山火事の発生が増加すると、迅速な識別と効果的な制御のための創造的な解決策が要求される。本研究は、人工知能(AI)、機械学習(ML)、および5G技術を利用して、米国の山火事を検知・処理するための積極的な方法を検討する。本研究の目的は、先進技術を用いた山火事の能動的検出と予防、遠隔センシングと5G技術を利用した信号マッピングによる能動的モニタリングとマッピング、ドローンとIOTデバイスを用いた山火事に対する高度な応答メカニズムについてである。本研究は,政府データベースから収集した二次データに基づいて記述統計を用いて分析した。また、過去の出版物は内容分析を通じてレビューし、物語合成を用いて様々な研究から得られた知見を提示した。その結果,新技術開発は山火事を積極的に検出・管理する機会を与えることがわかった。高度な技術を利用することで命を救うことができ、山火事による経済的損失を防ぐことができる。 AI対応のリモートセンシングや5Gベースのアクティブモニタリングなど、さまざまな方法により、アクティブな山火事の検出と管理が強化される。さらに、超インテリジェントドローンとIOTデバイスは、山火事に対するより安全な応答に使用できる。これは、消防管理機関と政府に対する勧告の中核をなす。

Over the past few years, wildfires have become a worldwide environmental emergency, resulting in substantial harm to natural habitats and playing a part in the acceleration of climate change. Wildfire management methods involve prevention, response, and recovery efforts. Despite improvements in detection techniques, the rising occurrence of wildfires demands creative solutions for prompt identification and effective control. This research investigates proactive methods for detecting and handling wildfires in the United States, utilizing Artificial Intelligence (AI), Machine Learning (ML), and 5G technology. The specific objective of this research covers proactive detection and prevention of wildfires using advanced technology; Active monitoring and mapping with remote sensing and signaling leveraging on 5G technology; and Advanced response mechanisms to wildfire using drones and IOT devices. This study was based on secondary data collected from government databases and analyzed using descriptive statistics. In addition, past publications were reviewed through content analysis, and narrative synthesis was used to present the observations from various studies. The results showed that developing new technology presents an opportunity to detect and manage wildfires proactively. Utilizing advanced technology could save lives and prevent significant economic losses caused by wildfires. Various methods, such as AI-enabled remote sensing and 5G-based active monitoring, can enhance proactive wildfire detection and management. In addition, super intelligent drones and IOT devices can be used for safer responses to wildfires. This forms the core of the recommendation to the fire Management Agencies and the government.

翻訳日:2024-04-01 03:43:10 公開日:2024-02-27

# 人工知能開発プロセスにおける人間の電位入射の同定

Identifying Potential Inlets of Man in the Artificial Intelligence Development Process ( http://arxiv.org/abs/2403.14658v1 )

ライセンス: Link先を確認

Deja Workman, Christopher L. Dancy,

(参考訳) 本稿では,典型的あるいは標準的な人工知能開発プロセスが,人種化技術の創造をいかに促進するか,あるいは促進するかを明らかにすることを目的とする。我々は、シルヴィア・ウィンター(Sylvia Wynter)による生物中心マンのジャンルの定義と、黒さを人間性から排除することから始める。問題、開発プロセスと管理ツールの選択、データセットの開発とデータ処理、モデル開発、デプロイメントとリスクアセスメント、統合と監視です。この論文の目的は、Wynterのバイオセンシティブ・マンがどのようにAIライフサイクルとライフサイクル自体で生み出されている技術によってどのように表現され、強化されているのかをよりよく理解することである。この開発プロセスのデコンストラクションによって、一般的に人間が優先順位付けされていない方法や、その影響が疎外された人々にどのように影響するかを特定できる可能性がある。 AI開発サイクルの変更を促進するソリューションを提供したいと思っています。

In this paper we hope to identify how the typical or standard artificial intelligence development process encourages or facilitates the creation of racialized technologies. We begin by understanding Sylvia Wynter's definition of the biocentric Man genre and its exclusion of Blackness from humanness. We follow this with outlining what we consider to be the typical steps for developing an AI-based technology, which we have broken down into 6 stages: identifying a problem, development process and management tool selection, dataset development and data processing, model development, deployment and risk assessment, and integration and monitoring. The goal of this paper is to better understand how Wynter's biocentric Man is being represented and reinforced by the technologies we are producing in the AI lifecycle and by the lifecycle itself; we hope to identify ways in which the distinction of Blackness from the "ideal" human leads to perpetual punishment at the hands of these technologies. By deconstructing this development process, we can potentially identify ways in which humans in general have not been prioritized and how those affects are disproportionately affecting marginalized people. We hope to offer solutions that will encourage changes in the AI development cycle.

翻訳日:2024-04-01 03:43:10 公開日:2024-02-27

# スペーサー選択によるスパースモデルの効率向上

Enhancing Efficiency in Sparse Models with Sparser Selection ( http://arxiv.org/abs/2403.18926v1 )

ライセンス: Link先を確認

Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao, Zenglin Xu,

(参考訳) Sparse Mixture-of-Experts (MoE)モデルを含むスパースモデルは、Transformerモデルをスケールするための効果的なアプローチとして現れている。しかし、多くのパラメータがゼロまたは低いアクティベーション値の乗算によって計算に不要に関わっているため、計算の非効率さに悩まされることが多い。この問題に対処するために,スパースモデルの有効性と効率性を両立させる新しいMOEである \tool を提案する。 \toolは小さなエキスパートとしきい値ベースのルータを活用して、トークンが必須パラメータのみを選択的にエンゲージできるようにする。言語モデリングと機械翻訳タスクに関する広範な実験により,性能を犠牲にすることなく,MoE層での計算負荷を50%以上削減し,モデル性能を向上させることができることを示した。さらに,高密度モデルに適用することで,推論時のスパース計算を可能にした。包括的な分析を行い、https://anonymous.4open.science/r/XMoEでコードを利用できるようにします。

Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are unnecessarily involved in computations via multiplying values by zero or low activation values. To address this issue, we present \tool, a novel MoE designed to enhance both the efficacy and efficiency of sparse MoE models. \tool leverages small experts and a threshold-based router to enable tokens to selectively engage only essential parameters. Our extensive experiments on language modeling and machine translation tasks demonstrate that \tool can enhance model performance while decreasing the computation load at MoE layers by over 50\% without sacrificing performance. Furthermore, we present the versatility of \tool by applying it to dense models, enabling sparse computation during inference. We provide a comprehensive analysis and make our code available at https://anonymous.4open.science/r/XMoE.

翻訳日:2024-04-01 02:25:04 公開日:2024-02-27

# OpenAPI Specification Extended Security Scheme:Broken Object Level Authorizationの頻度を下げる方法

OpenAPI Specification Extended Security Scheme: A method to reduce the prevalence of Broken Object Level Authorization ( http://arxiv.org/abs/2212.06606v2 )

ライセンス: Link先を確認

Rami Haddad, Rim El Malki,

(参考訳) APIは、サービス間通信を達成するための重要な技術になっています。 APIデプロイメントの増加により、セキュリティ標準の欠如に対処する緊急性が高まっている。 API Securityは、OpenAPI標準の標準化された認証がないため、不適切な認証は、既知の脆弱性や未知の脆弱性の可能性を開く。本稿は,API Security: Broken Object Level Authorization (BOLA) における第1の脆弱性について検討し,この脆弱性の頻度を下げるための方法とツールを提案する。 BOLAはさまざまなAPIフレームワークに影響を与えており、私たちのスコープはOpenAPI Specification(OAS)に固定されています。 OASはAPIの記述と実装の標準であり、一般的なOAS実装はFastAPI、Connexion(Flask)などである。これらの実装には、OASsのAPIプロパティに関する知識に関連する長所と短所がある。 Open API Specificationsのセキュリティプロパティは、オブジェクト認証に対処せず、そのようなオブジェクトプロパティを定義するための標準化されたアプローチを提供しない。これにより、オブジェクトレベルのセキュリティは開発者の慈悲に委ねられ、意図しない攻撃ベクタ生成のリスクが増大する。私たちの目標は、この空白に挑戦することです。 1) OAS ESS(OpenAPI Specification Extended Security Scheme)には、OAS(Design-based approach)内のオブジェクトに対する宣言型セキュリティ制御が含まれている。 2) APIサービス(Flask/FastAPI)にインポートして、オブジェクトレベルで認証チェックを実行することができる認証モジュール(開発ベースのアプローチ)。 APIサービスを構築する場合、開発者はAPI設計(仕様)またはそのコードから始めることができる。どちらの場合も、BOLAの頻度を緩和し、削減するために一連のメカニズムが導入される。

APIs have become the prominent technology of choice for achieving inter-service communications. The growth of API deployments has driven the urgency in addressing its lack of security standards. API Security is a topic for concern given the absence of standardized authorization in the OpenAPI standard, improper authorization opens the possibility for known and unknown vulnerabilities, which in the past years have been exploited by malicious actors resulting in data loss. This paper examines the number one vulnerability in API Security: Broken Object Level Authorization(BOLA), and proposes methods and tools to reduce the prevalence of this vulnerability. BOLA affects various API frameworks, our scope is fixated on the OpenAPI Specification(OAS). The OAS is a standard for describing and implementing APIs; popular OAS Implementations are FastAPI, Connexion (Flask), and many more. These implementations carry the pros and cons that are associated with the OASs knowledge of API properties. The Open API Specifications security properties do not address object authorization and provide no standardized approach to define such object properties. This leaves object-level security at the mercy of developers, which presents an increased risk of unintentionally creating attack vectors. Our aim is to tackle this void by introducing 1) the OAS ESS (OpenAPI Specification Extended Security Scheme) which includes declarative security controls for objects in OAS (design-based approach), and 2) an authorization module that can be imported to API services (Flask/FastAPI) to enforce authorization checks at the object level (development-based approach). When building an API service, a developer can start with the API design (specification) or its code. In both cases, a set of mechanisms are introduced to help developers mitigate and reduce the prevalence of BOLA.

翻訳日:2024-03-19 08:01:36 公開日:2024-02-27

# BarraCUDA:GPUはDNNの重量をリークする

BarraCUDA: GPUs do Leak DNN Weights ( http://arxiv.org/abs/2312.07783v2 )

ライセンス: Link先を確認

Peter Horvath, Lukasz Chmielewski, Leo Weissbart, Lejla Batina, Yuval Yarom,

(参考訳) 過去10年間で、ニューラルネットワーク(NN)の応用は、私たちの生活のさまざまな側面に広がってきました。多くの企業は、顔認識、機械翻訳、自動運転車といったタスクにニューラルネットワークを使用する製品の開発にビジネスを基盤としている。これらの製品を支える知的特性の多くは、ニューラルネットワークの正確なパラメータに符号化されている。したがって、これらの保護は企業にとって最優先事項である。同時に、これらの製品の多くは強力な脅威モデルの下で運用する必要がある。本研究では,Nvidia Jetson Nanoデバイス上で動作するニューラルネットワークのパラメータを抽出可能な汎用グラフ処理ユニット(GPU)に対する新たな攻撃であるBarraCUDAを提案する。 BarraCUDAは相関電磁分析を用いて、現実世界の畳み込みニューラルネットワークのパラメータを復元する。

Over the last decade, applications of neural networks (NNs) have spread to various aspects of our lives. A large number of companies base their businesses on building products that use neural networks for tasks such as face recognition, machine translation, and self-driving cars. Much of the intellectual property underpinning these products is encoded in the exact parameters of the neural networks. Consequently, protecting these is of utmost priority to businesses. At the same time, many of these products need to operate under a strong threat model, in which the adversary has unfettered physical control of the product. In this work, we present BarraCUDA, a novel attack on general purpose Graphic Processing Units (GPUs) that can extract parameters of neural networks running on the popular Nvidia Jetson Nano device. BarraCUDA uses correlation electromagnetic analysis to recover parameters of real-world convolutional neural networks.

翻訳日:2024-03-18 12:26:52 公開日:2024-02-27

# 楕円曲線を用いたステルスアドレスプロトコル

Elliptic Curve Pairing Stealth Address Protocols ( http://arxiv.org/abs/2312.12131v2 )

ライセンス: Link先を確認

Marija Mikic, Mihajlo Srbakoski,

(参考訳) トランザクションのプライバシ保護は、ユーザにとって非常に重要です。ステルスアドレスプロトコル(SAP)を使用すると、ユーザはステルスメタアドレスにリンクしないステルスアドレスでアセットを受け取ることができる。 SAPは様々な暗号手法を用いて生成される。 DKSAPは楕円曲線の乗算と共有秘密のハッシュを使用する。もうひとつのアプローチは、双線型マッピングを使用することだ。本稿では楕円曲線ペアリングを暗号解として用いる2つのSAプロトコルを提案する。 ECPDKSAP はペアリングベースのプロトコルであり、ECPSKSAP はペアリングベースのプロトコルであり、消費と視聴キーが導出される単一のキーを使用する。ビュータグを用いたDKSAPよりもECPDKSAPの方が優れた結果が得られることがわかった。 ECPSKSAPは非常に遅いが、1つの秘密鍵しか使わないため、興味深い理論的結果である。

The protection of transactions privacy is extremely important for the user. With stealth address protocols (SAP), users can receive assets on stealth addresses that they do not link to their stealth meta-addresses. SAP can be generated using various cryptographic approaches. DKSAP uses elliptic curve multiplication and hashing of the resulting shared secret. Another approach is to use a bilinear mapping. The paper presents two SA protocols that use elliptic curve pairing as a cryptographic solution. ECPDKSAP is a pairing-based protocol that includes viewing key and spending key, while ECPSKSAP is a pairing-based protocol that uses a single key with which spending and the viewing key are derived. We obtain that ECPDKSAP has better results than DKSAP with the view tag. ECPSKSAP is significantly slower, but it represents an interesting theoretical result, because it uses only one private key.

翻訳日:2024-03-18 11:47:54 公開日:2024-02-27

# 液体抽出誘導体(LSD)を用いたレバレッジ・ステーク--機会とリスク

Leverage Staking with Liquid Staking Derivatives (LSDs): Opportunities and Risks ( http://arxiv.org/abs/2401.08610v2 )

ライセンス: Link先を確認

Xihan Xiong, Zhipeng Wang, Xi Chen, William Knottenbelt, Michael Huth,

(参考訳) LidoはEthereum上のLiquid Stake Derivative(LSD)プロバイダで、ユーザが任意の量のETHを持てばstETHを受け取り、Aaveのような分散ファイナンス(DeFi)プロトコルと統合することができる。 Lido と Aave のコンポーザビリティにより、ユーザが Lido に ETH を賭けて stETH を取得し、stETH を Aave に担保として利用して ETH を借用し、Lido に借用した ETH を再利用する、"leverage stake" と呼ばれる新しい戦略が実現される。ユーザは、リスクプロファイルに基づいて、このプロセスを反復的に実行して、潜在的なリターンを最適化することができる。本稿では,レバレッジ・ステークに関連する機会とリスクを体系的に研究する。私たちは、Lido-Aaveエコシステム内のレバレッジ戦略を形式化した最初の人です。実験により、Ethereum上の262のレバレッジ・ステーク位置が同定され、合計295,243 ETH (482M USD) が得られた。 90.13%のレバレッジステークが従来のステークよりも高いリターンを達成したことが判明した。さらに,過酷な条件下でのレバレッジ・ステークによって引き起こされるリスクを評価するため,ストレステストを実施している。我々はレバレッジ・ステークがカスケード液化のリスクを著しく増幅することを発見した。本稿は,Lido-Aave LSDエコシステムを保護すべく,ロバストリスク管理手法の開発を促進することを願っている。

Lido, the leading Liquid Staking Derivative (LSD) provider on Ethereum, allows users to stake an arbitrary amount of ETH to receive stETH, which can be integrated with Decentralized Finance (DeFi) protocols such as Aave. The composability between Lido and Aave enables a novel strategy called "leverage staking", where users stake ETH on Lido to acquire stETH, utilize stETH as collateral on Aave to borrow ETH, and then restake the borrowed ETH on Lido. Users can iteratively execute this process to optimize potential returns based on their risk profile. This paper systematically studies the opportunities and risks associated with leverage staking. We are the first to formalize the leverage staking strategy within the Lido-Aave ecosystem. Our empirical study identifies 262 leverage staking positions on Ethereum, with an aggregated staking amount of 295,243 ETH (482M USD). We discover that 90.13% of leverage staking positions have achieved higher returns than conventional staking. Furthermore, we perform stress tests to evaluate the risk introduced by leverage staking under extreme conditions. We find that leverage staking significantly amplifies the risk of cascading liquidations. We hope this paper can inform and encourage the development of robust risk management approaches to protect the Lido-Aave LSD ecosystem.

翻訳日:2024-03-18 08:46:40 公開日:2024-02-27

# スマートグリッド公開鍵基盤のための認証取得リスト付きハイブリッドオンライン認証ステータスプロトコル

Hybrid Online Certificate Status Protocol with Certificate Revocation List for Smart Grid Public Key Infrastructure ( http://arxiv.org/abs/2401.10787v4 )

ライセンス: Link先を確認

Hong-Sheng Huang, Zhe-Yi Jiang, Hsuan-Tung Chen, Hung-Min Sun,

(参考訳) Hsu et al (2022)は、スマートグリッドメーターのセキュリティを強化するために、公開鍵インフラストラクチャ内の暗号スキームを提案した。彼らの提案には、シンプルな認証登録プロトコルを確立するためのCMSメカニズムによる認証管理とセキュアトランスポートプロトコルによる登録の開発が含まれていた。さらに彼らは、証明書のステータスを独立してクエリするために、OCSP(Online Certificate Status Protocol)サービスを実装した。しかし、その実装は単一のOCSPサーバで全てのクエリ要求を処理する。数万以上のエンドメーターを持つスマートグリッドPKI環境における典型的なシナリオを考慮すると、ハイブリッドオンライン認証ステータスプロトコル機構を導入しました。このアプローチは、クライアントからCertificate Revocation Listsと連携したOCSPサーバへのクエリリソースの需要を減少させる。我々のシミュレーションでは、メーターの挙動を模倣して効率を向上し、スマートグリッドメーターのランドスケープに合わせてより堅牢なアーキテクチャを構築しました。

Hsu et al. (2022) proposed a cryptographic scheme within the public key infrastructure to bolster the security of smart grid meters. Their proposal involved developing the Certificate Management over CMS mechanism to establish Simple Certificate Enrollment Protocol and Enrollment over Secure Transport protocol. Additionally, they implemented Online Certificate Status Protocol (OCSP) services to independently query the status of certificates. However, their implementation featured a single OCSP server handling all query requests. Considering the typical scenario in smart grid PKI environments with over tens of thousands of end-meters, we introduced a Hybrid Online Certificate Status Protocol mechanism. This approach decreases demand of query resources from the client to OCSP servers collaborating with Certificate Revocation Lists. Our simulations, mimicking meter behavior, demonstrated increased efficiency, creating a more robust architecture tailored to the smart grid meter landscape.

翻訳日:2024-03-18 08:36:55 公開日:2024-02-27

# ピオニアリング研究とイノベーティブ情報理論に基づくフィッシング検出における透明性向上手法

A Pioneering Study and An Innovative Information Theory-based Approach to Enhance The Transparency in Phishing Detection ( http://arxiv.org/abs/2402.17092v1 )

ライセンス: Link先を確認

Van Nguyen, Tingmin Wu, Xingliang Yuan, Marthie Grobler, Surya Nepal, Carsten Rudolph,

(参考訳) フィッシング攻撃は、検出、説明、防衛において深刻で困難な問題となっている。フィッシングに関する10年以上の研究が、技術と非技術の両方を包含しているにもかかわらず、フィッシングは深刻な問題であり続けている。現在、AIベースのフィッシング検出は、データに対する脆弱性(フィッシングや良心)の予測を提供することによってフィッシング攻撃を防御する最も効果的なソリューションの1つとして注目されている。しかし、データのフィッシングとして分類される原因となる特定の情報を特定するなど、予測に対する包括的な解釈を提供するという点では、説明容易性に欠ける。この目的のために,メール(最も一般的なフィッシング方式)のフィッシング攻撃ローカライゼーションのための革新的なディープラーニングベースのアプローチを提案する。本手法は,メールデータの脆弱性を予測できるだけでなく,フィッシングメールにおける最も重要なフィッシング関連情報(文)を自動的に抽出し,ハイライトする。選択された情報は、フィッシングメールデータの脆弱性に関する有用な説明を示す。 7つの実世界の電子メールデータセットに対する厳密な実験により,2つの主要なラベル精度と認知的傾向の指標において,フィッシング・メールにおける最も重要なフィッシング関連情報(フィッシング・メールにおけるフィッシング関連情報)の脆弱性に対する包括的説明(フィッシング・メールにおける最も重要な情報とフィッシング関連情報の抽出)の有効性と進歩が示された。

Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulnerability (i.e., phishing or benign) predictions for the data. However, it lacks explainability in terms of providing comprehensive interpretations for the predictions, such as identifying the specific information that causes the data to be classified as phishing. To this end, we propose an innovative deep learning-based approach for email (the most common phishing way) phishing attack localization. Our method can not only predict the vulnerability of the email data but also automatically figure out and highlight the most important and phishing-relevant information (i.e., sentences) in each phishing email. The selected information indicates useful explanations for the vulnerability of the phishing email data. The rigorous experiments on seven real-world email datasets show the effectiveness and advancement of our proposed method in providing comprehensive explanations (by successfully figuring out the most important and phishing-relevant information in phishing emails) for the vulnerability of corresponding phishing data with higher performances from nearly (1% to 3%) and (1% to 4%) in two main Label-Accuracy and Cognitive-True-Positive measures, respectively, compared to the state-of-the-art potential baselines.

翻訳日:2024-03-18 07:09:00 公開日:2024-02-27

# 金融のためのブロックチェーン: 調査

Blockchain for Finance: A Survey ( http://arxiv.org/abs/2402.17219v1 )

ライセンス: Link先を確認

Hanjie Wu, Qian Yao, Zhenguang Liu, Butian Huang, Yuan Zhuang, Huayun Tang, Erwu Liu,

(参考訳) 信頼性、セキュリティ、リスク管理を強化する革新的な技術として、ブロックチェーンは、貿易と金融システムで広く採用されている。イミュータビリティや透明性といったブロックチェーンのユニークな機能は、分散データストレージの新しいビジネスモデル、ポイントツーポイントトランザクション、分散型自律型組織を可能にします。本稿では,ブロックチェーンベースの証券取引に注目し,ブロックチェーン技術が金融サービスにおいて重要な役割を担っている。私たちは、最も人気のある12のブロックチェーンプラットフォームを調査し、金融に関連する6つのプラットフォームを精査し、証券取引プラクティスのパノラマを提供しようとしています。一方、この調査はブロックチェーンベースの証券取引アプリケーションの包括的な概要を提供する。ブロックチェーンベースの証券取引の実践的応用を数多く集め、それらを4つのカテゴリに分類する。各カテゴリについて、典型例を紹介し、FinTech企業や研究者が直面する重要な問題を解決するためにブロックチェーンがどのように貢献するかを説明します。最後に、メインストリームのブロックチェーンベースの金融機関から、分散金融アプリケーションのセキュリティ問題まで、金融における現在のブロックチェーンエコシステムを見極めるための興味深い観察結果を提供しています。

As an innovative technology for enhancing authenticity, security, and risk management, blockchain is being widely adopted in trade and finance systems. The unique capabilities of blockchain, such as immutability and transparency, enable new business models of distributed data storage, point-to-point transactions, and decentralized autonomous organizations. In this paper, we focus on blockchain-based securities trading, in which blockchain technology plays a vital role in financial services as it ultimately lifts trust and frees the need for third-party verification by using consensus-based verification. We investigate the 12 most popular blockchain platforms and elaborate on 6 platforms that are related to finance, seeking to provide a panorama of securities trading practices. Meanwhile, this survey provides a comprehensive summary of blockchain-based securities trading applications. We gather numerous practical applications of blockchain-based securities trading and categorize them into four distinct categories. For each category, we introduce a typical example and explain how blockchain contributes to solving the key problems faced by FinTech companies and researchers. Finally, we provide interesting observations ranging from mainstream blockchain-based financial institutions to security issues of decentralized finance applications, aiming to picture the current blockchain ecosystem in finance.

翻訳日:2024-03-18 07:09:00 公開日:2024-02-27

# PoW系ブロックチェーンの時間制限二重発振攻撃

Time-Restricted Double-Spending Attack on PoW-based Blockchains ( http://arxiv.org/abs/2402.17223v1 )

ライセンス: Link先を確認

Yiming Jiang, Jiangfan Zhang,

(参考訳) このようなブロックチェーンアプリケーションに対するダブルスペンディングアタック(DSA)は、タスクが完了する前に、特に有限時間枠内で実行される傾向にあります。さらに、既存の研究では、実際の攻撃者は計算資源が限られているため、有限時間枠内でのDSAの実行を好んでいることが示唆されている。これらの観察は、Proof-of-Workベースのブロックチェーン上での時間制限付きDSA(TR-DSA)モデルを調査する上での鍵となる。このTR-DSAモデルでは、攻撃者は有限時間枠内でのみそのブランチをマイニングし、攻撃者のブランチが正直なマイナーのブランチを超えることができなければ、そのブランチが特定のブロック数で成長すると、TR-DSAは失敗すると考えられる。まず,TR-DSAの成功確率に対する一般閉形式式を開発した。この発達した確率は、タイムリーなタスクでブロックチェーンアプリケーション上でのDSAのリスクを評価するのに役立つだけでなく、限られた計算資源を持つ実用的な攻撃者がTR-DSAを起動する可能性と期待される報酬を評価することができる。さらに、TR-DSAの成功確率が、攻撃者が無期限にその分岐を採掘する制限のないDSAの成功確率よりも大きいという厳密な証明を提供する。この結果から、タイムリーなタスクを持つブロックチェーンアプリケーションは、攻撃に対して無制限のタイムフレームを提供するブロックチェーンアプリケーションよりも、DSAに対する脆弱性が低いことが示唆される。さらに,攻撃者がネットワーク内のハッシュレートの半分以上を制御しているにも関わらず,TR-DSAの成功確率は常に1よりも小さいことを示す。この結果は、ネットワーク内のハッシュレートの大部分を蓄積しても、TR-DSAの起動に失敗するリスクがまだあることを攻撃者に警告する。

Numerous blockchain applications are designed with tasks that naturally have finite durations, and hence, a double-spending attack (DSA) on such blockchain applications leans towards being conducted within a finite timeframe, specifically before the completion of their tasks. Furthermore, existing research suggests that practical attackers typically favor executing a DSA within a finite timeframe due to their limited computational resources. These observations serve as the impetus for this paper to investigate a time-restricted DSA (TR-DSA) model on Proof-of-Work based blockchains. In this TR-DSA model, an attacker only mines its branch within a finite timeframe, and the TR-DSA is considered unsuccessful if the attacker's branch fails to surpass the honest miners' branch when the honest miners' branch has grown by a specific number of blocks. First, we developed a general closed-form expression for the success probability of a TR-DSA. This developed probability not only can assist in evaluating the risk of a DSA on blockchain applications with timely tasks, but also can enable practical attackers with limited computational resources to assess the feasibility and expected reward of launching a TR-DSA. In addition, we provide rigorous proof that the success probability of a TR-DSA is no greater than that of a time-unrestricted DSA where the attacker indefinitely mines its branch. This result implies that blockchain applications with timely tasks are less vulnerable to DSAs than blockchain applications that provide attackers with an unlimited timeframe for their attacks. Furthermore, we show that the success probability of a TR-DSA is always smaller than one even though the attacker controls more than half of the hash rate in the network. This result alerts attackers that there is still a risk of failure in launching a TR-DSA even if they amass a majority of the hash rate in the network.

翻訳日:2024-03-18 07:09:00 公開日:2024-02-27

# ソフトウェア脆弱性の発見と修正のための大規模言語モデルの連鎖プロンプト

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities ( http://arxiv.org/abs/2402.17230v1 )

ライセンス: Link先を確認

Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, Haipeng Cai,

(参考訳) 現代のソフトウェアでは、セキュリティの脆弱性がますます多くなり、私たちの社会に広く当てはまります。これらの脆弱性に対して防御する様々なアプローチが提案されており、その中にはディープラーニング(DL)を利用する者が他の手法による大きな障壁を回避しているため、近年は注目を集めている。しかし、DLベースのアプローチは、サイズと品質をラベル付けしたタスク固有のデータセットの欠如や、目に見えない現実世界のシナリオにうまく一般化できないなど、重要な課題に直面している。近年、大規模言語モデル (LLM) はこれらの課題を克服し、特にチェーン・オブ・思想 (CoT) のプロンプトを通じて、様々な領域において顕著な可能性を実証している。本稿では, LLMとCoTを利用して, 脆弱性の特定, 脆弱性の発見, 検出された脆弱性のパッチ作成という, 3つの重要なソフトウェア脆弱性解析課題に対処する方法について検討する。我々は、これらのタスクのコンテキストにおいて、VSPを通じて一般的なCoT方法論をインスタンス化し、VSPを3つのLLMと2つのデータセットに対して5つのベースラインに対して評価する広範囲な実験を行う。結果は、ベースラインよりもCoTにインスパイアされたプロンプト(553.3%、36.5%、30.8%高いF1精度で脆弱性の識別、発見、パッチング)がかなり優れていることを示している。 VSPの障害を分析した詳細なケーススタディを通じて、脆弱性ケースに対するLLM/CoTの現在のギャップを明らかにし、それぞれの改善を提案し、検証する。

Security vulnerabilities are increasingly prevalent in modern software and they are widely consequential to our society. Various approaches to defending against these vulnerabilities have been proposed, among which those leveraging deep learning (DL) avoid major barriers with other techniques hence attracting more attention in recent years. However, DL-based approaches face critical challenges including the lack of sizable and quality-labeled task-specific datasets and their inability to generalize well to unseen, real-world scenarios. Lately, large language models (LLMs) have demonstrated impressive potential in various domains by overcoming those challenges, especially through chain-of-thought (CoT) prompting. In this paper, we explore how to leverage LLMs and CoT to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. We instantiate the general CoT methodology in the context of these tasks through VSP , our unified, vulnerability-semantics-guided prompting approach, and conduct extensive experiments assessing VSP versus five baselines for the three tasks against three LLMs and two datasets. Results show substantial superiority of our CoT-inspired prompting (553.3%, 36.5%, and 30.8% higher F1 accuracy for vulnerability identification, discovery, and patching, respectively, on CVE datasets) over the baselines. Through in-depth case studies analyzing VSP failures, we also reveal current gaps in LLM/CoT for challenging vulnerability cases, while proposing and validating respective improvements.