Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230723となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# DATを用いたデジタルオブジェクト空間管理サービス(DOSM)のデータアーキテクチャ Data Architecture for Digital Object Space Management Service (DOSM) using DAT ( http://arxiv.org/abs/2306.12909v3 ) ライセンス: Link先を確認	Moamin Abughazala, Henry Muccini	(参考訳) IoT(Internet of Things)データとソーシャルメディアデータは、急成長するデータセグメントの2つだ。高品質なデータを持つことは、インフォームドビジネスの決定に不可欠です。データからの洞察を活用する戦略的プロセスは、データ駆動意思決定として知られている。これを達成するためには、データの収集、保存、分析、保護を可能な限り最善の方法で行う必要があります。データアーキテクチャは、ソースから目的地へのデータフローを記述し、情報に対するビジネスニーズを満たすためにデータを管理するブループリントを作成する複雑なタスクである。本稿では,データアーキテクチャツール(Data Architecture Tool, DAT)を用いて,VASARIプロジェクトの一部として開発されたDigital Space Management Serviceのデータモデリングを行う。本研究は、データ移動、データフォーマット、データロケーション、データ処理(バッチまたはリアルタイム)、データストレージ技術、データに関する主要な操作を記述することに焦点を当てる。 The Internet of Things (IoT) data and social media data are two of the fastest-growing data segments. Having high-quality data is crucial for making informed business decisions. The strategic process of leveraging insights from data is known as data-driven decision-making. To achieve this, it is necessary to collect, store, analyze, and protect data in the best ways possible. Data architecture is a complex task that involves describing the flow of data from its source to its destination and creating a blueprint for managing the data to meet business needs for information. In this paper, we utilize the Data Architecture Tool (DAT) to model data for Digital Space Management Service, which was developed as part of the VASARI project. This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.	翻訳日:2023-10-23 19:06:42 公開日:2023-07-23
# DATを用いたスマートシティデータ駆動アプリケーションのためのデータ分析アーキテクチャのモデリング Modeling Data Analytics Architecture for Smart Cities Data-Driven Applications using DAT ( http://arxiv.org/abs/2307.08870v2 ) ライセンス: Link先を確認	Moamin Abughazala, Henry Muccini	(参考訳) 大量の情報から貴重な洞察を抽出することは、データの取得、保存、管理、分析、視覚化を含む重要なプロセスである。データ分析アプリケーションの抽象的な概要を提供することは、収集されたデータが有意義な情報に変換されることを保証するために重要である。この目標を達成する効果的な方法の1つはデータアーキテクチャである。本稿では,データ駆動型スマートシティアプリケーションのためのモデル駆動設計を用いたデータ分析アーキテクチャ(daa)の開発経験について紹介する。 Extracting valuable insights from vast amounts of information is a critical process that involves acquiring, storing, managing, analyzing, and visualizing data. Providing an abstract overview of data analytics applications is crucial to ensure that collected data is transformed into meaningful information. One effective way of achieving this objective is through Data Architecture. This article shares our experiences in developing a Data Analytics Architecture (DAA) using model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.	翻訳日:2023-10-23 17:11:13 公開日:2023-07-23
# 大規模産業における要求工学と検証の整合化の課題 Challenges in aligning requirements engineering and verification in a large-scale industrial context ( http://arxiv.org/abs/2307.12419v1 ) ライセンス: Link先を確認	Giedre Sabaliauskaite, Annabella Loconsole, Emelie Engstr\"om, Michael Unterkalmsteiner, Bj\"orn Regnell, Per Runeson, Tony Gorschek, Robert Feldt	(参考訳) 状況とモチベーション] ソフトウェア開発では,予算内および時間内において,高品質な製品を開発するためには,組織単位間の調整が不可欠です。特に、開発済みのソフトウェア製品が顧客の要求を満たすことを保証するためには、要求と検証プロセスの同期が不可欠である。要求と検証プロセスの整合化における現在の課題は何ですか? 主なアイデア/結果] 大規模ソフトウェア開発会社でインタビュー研究を行いました。本稿では,要件の整合と検証プロセスにおける重要な課題を明らかにするインタビューの予備的知見について述べる。 [貢献]本研究の結果は,組織やプロセス,人,ツール,要件プロセス,テストプロセス,変更管理,トレーサビリティ,測定といった,研究対象の組織が直面するさまざまな課題を含む。本研究の成果は,組織内アライメントの基盤として実践者や,要求と検証のアライメントをより効率的かつ効果的に管理するためのアプローチを開発する科学者によって利用することができる。 [Context and motivation] When developing software, coordination between different organizational units is essential in order to develop a good quality product, on time and within budget. Particularly, the synchronization between requirements and verification processes is crucial in order to assure that the developed software product satisfies customer requirements. [Question/problem] Our research question is: what are the current challenges in aligning the requirements and verification processes? [Principal ideas/results] We conducted an interview study at a large software development company. This paper presents preliminary findings of these interviews that identify key challenges in aligning requirements and verification processes. [Contribution] The result of this study includes a range of challenges faced by the studied organization grouped into the categories: organization and processes, people, tools, requirements process, testing process, change management, traceability, and measurement. The findings of this study can be used by practitioners as a basis for investigating alignment in their organizations, and by scientists in developing approaches for more efficient and effective management of the alignment between requirements and verification.	翻訳日:2023-10-23 16:30:27 公開日:2023-07-23
# 金融ポートフォリオ管理のためのディープラーニングとオンラインソース感の活用 Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management ( http://arxiv.org/abs/2309.16679v1 ) ライセンス: Link先を確認	Paraskevi Nousi, Loukia Avramelou, Georgios Rodinos, Maria Tzelepi, Theodoros Manousis, Konstantinos Tsampazis, Kyriakos Stefanidis, Dimitris Spanos, Emmanouil Kirtas, Pavlos Tosidis, Avraam Tsantekidis, Nikolaos Passalis and Anastasios Tefas	(参考訳) ファイナンシャル・ポートフォリオ・マネジメント(英: financial portfolio management)とは、株式、インデックスファンド、外国為替、暗号通貨などの一連の金融資産において、当該事業の損失を最小化しつつ利益を最大化することを目的とした、資金の分配及び取引業務を行う業務をいう。ディープラーニング(DL)メソッドは、さまざまなタスクにおいて一貫して優れており、自動化された金融取引はその中のひとつです。本稿では,金融取引における様々なdl手法について,監督学習と強化学習の両面で見識を提供することを目的としている。同時に、取引資産に関する感情情報を考慮し、対応する研究研究を通してそれらの有用性を議論し、実証する。最後に、このような金融エージェントの訓練においてよく見られる問題について議論し、これらの問題を避けるために必要な知識を読者に与え、実際に議論する方法を適用する。 Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.	翻訳日:2023-10-23 05:56:47 公開日:2023-07-23
# 多空間深層モデルを用いた脳波信号によるメンタルワークロード推定 Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models ( http://arxiv.org/abs/2308.02409v1 ) ライセンス: Link先を確認	Hong-Hai Nguyen, Ngumimi Karen Iyortsuun, Hyung-Jeong Yang, Guee-Sang Lee, and Soo-Hyung Kim	(参考訳) 人間の脳は、仕事と休息の間、継続的な活動状態にある。精神活動は日常的なプロセスであり、脳が過剰に働くと人間の健康に悪影響を及ぼす可能性がある。近年,深刻な健康問題の発生防止と生活の質向上に寄与するため,精神疾患の早期発見に注目が集まっている。いくつかの信号は精神状態を評価するために使用されるが、脳波(EEG)は脳に関する大量の情報を提供するため、研究者によって広く用いられている。本稿では,メンタルワーク負荷を3つの状態に分類し,連続レベルを推定することを目的とした。本手法は,複数次元の空間を組み合わせ,心的推定に最適な結果を得る。時間領域アプローチでは、時間的畳み込みネットワークを使用し、周波数領域では、残留ブロックを組み合わせた多次元残留ブロックと呼ばれる新しいアーキテクチャを提案する。 The human brain is in a continuous state of activity during both work and rest. Mental activity is a daily process, and when the brain is overworked, it can have negative effects on human health. In recent years, great attention has been paid to early detection of mental health problems because it can help prevent serious health problems and improve quality of life. Several signals are used to assess mental state, but the electroencephalogram (EEG) is widely used by researchers because of the large amount of information it provides about the brain. This paper aims to classify mental workload into three states and estimate continuum levels. Our method combines multiple dimensions of space to achieve the best results for mental estimation. In the time domain approach, we use Temporal Convolutional Networks, and in the frequency domain, we propose a new architecture called the Multi-Dimensional Residual Block, which combines residual blocks.	翻訳日:2023-08-14 01:49:17 公開日:2023-07-23
# スマートコントラクトの実装:ペイパーライクなNFT-rentalの場合 Implementing Smart Contracts: The case of NFT-rental with pay-per-like ( http://arxiv.org/abs/2308.02424v1 ) ライセンス: Link先を確認	Alfred Sopi, Johannes Schneider, Jan vom Brocke	(参考訳) 非偽造トークン(NFT)が上昇している。それらは、企業やオンラインストアのウェブページでマーケティング目的で展示されたアートワークを表現できる。 NFTの貸与は所有者にとって魅力的な受動的収入形態であるが、リスク(アイテムは返却されない)とエスクローエージェントのコストが伴う。同様に、レンタル業者はアートワークの影響を予測できない(例えば、NFTの観客が彼らをどう感じているかなど)。これらの課題に対処するため、ブロックチェーン技術を使用したペイパーライクな価格モデル、すなわちEthereumチェーンに基づいたスマートコントラクトに基づくNFTレンタルソリューションを導入しました。ブロックチェーンソリューションは、他のアプリケーションでも報告されている多くの利点を享受していますが、興味深いことに、(大きな)ブロックチェーン料金の暗い側面も観察しています。ブロックチェーンソリューションはニッチなアーティストには不公平で、文化的多様性を阻害する可能性がある。さらに、ブロックチェーン外の当事者による操作による不正に対処するために、信頼コストのトレードオフが発生する。ソリューションのすべてのコードは、https://github.com/asopi/rental-projectで公開されている。 Non-fungible tokens(NFTs) are on the rise. They can represent artworks exhibited for marketing purposes on webpages of companies or online stores -- analogously to physical artworks. Lending of NFTs is an attractive form of passive income for owners but comes with risks (e.g., items are not returned) and costs for escrow agents. Similarly, renters have difficulties in anticipating the impact of artworks, e.g., how spectators of NFTs perceive them. To address these challenges, we introduce an NFT rental solution based on a pay-per-like pricing model using blockchain technology, i.e., smart contracts based on the Ethereum chain. We find that blockchain solutions enjoy many advantages also reported for other applications, but interestingly, we also observe dark sides of (large) blockchain fees. Blockchain solutions appear unfair to niche artists and potentially hamper cultural diversity. Furthermore, a trust-cost tradeoff arises to handle fraud caused by manipulation from parties outside the blockchain. All code for the solution is publicly available at: https://github.com/asopi/rental-project	翻訳日:2023-08-14 01:39:18 公開日:2023-07-23
# 衛星画像へのディープラーニングの適用によるキプロス農村地域のゴミ捨て場の同定 The identification of garbage dumps in the rural areas of Cyprus through the application of deep learning to satellite imagery ( http://arxiv.org/abs/2308.02502v1 ) ライセンス: Link先を確認	Andrew Keith Wilkinson	(参考訳) ごみ処理は先進国中で難しい問題である。 In Cyprus, as elsewhere, illegal ``fly-tipping" is a significant issue, especially in rural areas where few legal garbage disposal options exist. However, there is a lack of studies that attempt to measure the scale of this problem, and few resources available to address it. A method of automating the process of identifying garbage dumps would help counter this and provide information to the relevant authorities. The aim of this study was to investigate the degree to which artificial intelligence techniques, together with satellite imagery, can be used to identify illegal garbage dumps in the rural areas of Cyprus. This involved collecting a novel dataset of images that could be categorised as either containing, or not containing, garbage. The collection of such datasets in sufficient raw quantities is time consuming and costly. Therefore a relatively modest baseline set of images was collected, then data augmentation techniques used to increase the size of this dataset to a point where useful machine learning could occur. From this set of images an artificial neural network was trained to recognise the presence or absence of garbage in new images. A type of neural network especially suited to this task known as ``convolutional neural networks" was used. その結果, 独立に収集したテスト画像を用いて, モデルの有効性を評価した。その結果、約90%のケースでゴミを含む画像を正しく識別できるディープラーニングモデルが得られた。このモデルがキプロスの景観全体を体系的に分析し、島の総合的な「ガーベッジ」マップを構築する、将来のシステムの基礎を形成する可能性が考えられている。 Garbage disposal is a challenging problem throughout the developed world. In Cyprus, as elsewhere, illegal ``fly-tipping" is a significant issue, especially in rural areas where few legal garbage disposal options exist. However, there is a lack of studies that attempt to measure the scale of this problem, and few resources available to address it. A method of automating the process of identifying garbage dumps would help counter this and provide information to the relevant authorities. The aim of this study was to investigate the degree to which artificial intelligence techniques, together with satellite imagery, can be used to identify illegal garbage dumps in the rural areas of Cyprus. This involved collecting a novel dataset of images that could be categorised as either containing, or not containing, garbage. The collection of such datasets in sufficient raw quantities is time consuming and costly. Therefore a relatively modest baseline set of images was collected, then data augmentation techniques used to increase the size of this dataset to a point where useful machine learning could occur. From this set of images an artificial neural network was trained to recognise the presence or absence of garbage in new images. A type of neural network especially suited to this task known as ``convolutional neural networks" was used. The efficacy of the resulting model was evaluated using an independently collected dataset of test images. The result was a deep learning model that could correctly identify images containing garbage in approximately 90\% of cases. It is envisaged that this model could form the basis of a future system that could systematically analyse the entire landscape of Cyprus to build a comprehensive ``garbage" map of the island.	翻訳日:2023-08-14 01:29:13 公開日:2023-07-23
# バイオメディカルおよび非バイオメディカル環境における合成画像のクラス内多様性と品質評価 Assessing Intra-class Diversity and Quality of Synthetically Generated Images in a Biomedical and Non-biomedical Setting ( http://arxiv.org/abs/2308.02505v1 ) ライセンス: Link先を確認	Muhammad Muneeb Saad, Mubashir Husain Rehmani, and Ruairi O'Reilly	(参考訳) 生体医用画像解析において、データの不均衡は複数の画像モダリティに共通である。データ拡張はこの制限に対処する上で重要なソリューションのひとつです。 generative adversarial networks (gans) はますますデータ拡張タスクに依存しています。生体画像の特徴は合成画像の有効性の評価に敏感である。これらの特徴は、異なる生体画像モダリティ間で合成画像を評価する際に、メートル法スコアに大きな影響を及ぼす可能性がある。実画像の多様性と品質を比較することで合成画像を評価することができる。多スケール構造類似度指標とコサイン距離はクラス内多様性の評価に使用され、フレシェ開始距離は合成画像の品質評価に使用される。バイオメディカルおよび非バイオメディカルイメージングのためのこれらの指標を評価することは、合成画像の多様性と品質を評価するための情報戦略を検討する上で重要である。本研究では, バイオメディカルで非バイオメディカルな環境下で, 深部畳み込み型GANに対して, 実験的な測定を行った。異なるサンプルサイズを用いて合成画像の多様性と品質を評価する。本研究は,バイオメディカルおよび非バイオメディカルイメージングモダリティにおける多様性と品質のばらつきについて検討することを目的とする。その結果,バイオメディカルからバイオメディカルへ,バイオメディカルからバイオメディカルへ,非バイオメディカルなイメージングモダリティにおいて,多様性と品質の指標は著しく異なることがわかった。 In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impact on metric scores when evaluating synthetic images across different biomedical imaging modalities. Synthetically generated images can be evaluated by comparing the diversity and quality of real images. Multi-scale Structural Similarity Index Measure and Cosine Distance are used to evaluate intra-class diversity, while Frechet Inception Distance is used to evaluate the quality of synthetic images. Assessing these metrics for biomedical and non-biomedical imaging is important to investigate an informed strategy in evaluating the diversity and quality of synthetic images. In this work, an empirical assessment of these metrics is conducted for the Deep Convolutional GAN in a biomedical and non-biomedical setting. The diversity and quality of synthetic images are evaluated using different sample sizes. This research intends to investigate the variance in diversity and quality across biomedical and non-biomedical imaging modalities. Results demonstrate that the metrics scores for diversity and quality vary significantly across biomedical-to-biomedical and biomedical-to-non-biomedical imaging modalities.	翻訳日:2023-08-14 01:18:22 公開日:2023-07-23
# MyVoice: アラビア語音声リソースコラボレーションプラットフォーム MyVoice: Arabic Speech Resource Collaboration Platform ( http://arxiv.org/abs/2308.02503v1 ) ライセンス: Link先を確認	Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, and Ahmed Ali	(参考訳) MyVoiceはアラビア語の音声を収集して方言の音声技術を強化するためのクラウドソーシングプラットフォームである。このプラットフォームは、大きな方言の音声データセットを設計する機会を提供し、それらを一般公開する。 MyVoiceを使えば、コントリビュータは都市や州レベルのきめ細かい方言を選択して、表示された発話を記録することができる。ユーザーはコントリビュータとアノテーションを切り替えることができる。このプラットフォームには品質保証システムがあり、品質の低い録音をフィルタリングし、検証のために送信する。検証フェーズの間、コントリビュータは録音の品質を評価し、注釈を付け、フィードバックを提供し、管理者によってレビューされる。さらに、このプラットフォームは、管理者の役割に柔軟性を提供し、方言の音声や単語の収集以外の新しいデータやタスクを追加し、コントリビュータに表示する。したがって、多種多様なアラビア語の音声データを収集する共同作業を可能にする。 We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.	翻訳日:2023-08-14 01:17:58 公開日:2023-07-23
# AMaizeD: 自動トウモロコシ病検出のためのエンド・トゥ・エンドパイプライン AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection ( http://arxiv.org/abs/2308.03766v1 ) ライセンス: Link先を確認	Anish Mall, Sanchit Kabra, Ankur Lhila and Pawan Ajmera	(参考訳) 本研究は,マルチスペクトル画像を用いたトウモロコシ作物の病害早期検出のための自動フレームワークである,トウモロコシ病検出用エンド・ツー・エンドパイプラインであるamaizedを提案する。トウモロコシの収穫に特化した手作りのカスタムデータセットは、専門家や農学者によって慎重に収集された。このデータセットは様々な種類のトウモロコシ品種、栽培慣行、環境条件を含み、トウモロコシの成長と病気の進行の様々な段階を捉えている。マルチスペクトル画像を活用することで、スペクトル分解能が向上し、植物の健康状態の微妙な変化に対する感度が向上する。提案するフレームワークは,コンボリューションニューラルネットワーク(CNN)を特徴抽出器とセグメンテーション技術に組み合わせて,トウモロコシの植物とその関連疾患を同定する。実験により, 粉状ミドウ, アントラクトース, 葉緑化など, 各種のトウモロコシ病の検出に有効であることが示された。このフレームワークは、カスタムハンドコンパイルデータセットにおける最先端のパフォーマンスを達成し、農業における自動疾患検出の分野に貢献し、トウモロコシ作物の病気を早期に識別するための実用的なソリューションを提供する。 This research paper presents AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection, an automated framework for early detection of diseases in maize crops using multispectral imagery obtained from drones. A custom hand-collected dataset focusing specifically on maize crops was meticulously gathered by expert researchers and agronomists. The dataset encompasses a diverse range of maize varieties, cultivation practices, and environmental conditions, capturing various stages of maize growth and disease progression. By leveraging multispectral imagery, the framework benefits from improved spectral resolution and increased sensitivity to subtle changes in plant health. The proposed framework employs a combination of convolutional neural networks (CNNs) as feature extractors and segmentation techniques to identify both the maize plants and their associated diseases. Experimental results demonstrate the effectiveness of the framework in detecting a range of maize diseases, including powdery mildew, anthracnose, and leaf blight. The framework achieves state-of-the-art performance on the custom hand-collected dataset and contributes to the field of automated disease detection in agriculture, offering a practical solution for early identification of diseases in maize crops advanced machine learning techniques and deep learning architectures.	翻訳日:2023-08-14 00:39:57 公開日:2023-07-23
# キューイング型トラヒック割当手法を用いたスマートワークゾーンアプリケーションのためのリーダ追従型自動車両システムの配置 Deployment of Leader-Follower Automated Vehicle Systems for Smart Work Zone Applications with a Queuing-based Traffic Assignment Approach ( http://arxiv.org/abs/2308.03764v1 ) ライセンス: Link先を確認	Qing Tang, Xianbiao Hu	(参考訳) ATMA(Autonomous Truck Mounted Attenuator)の新たな技術は、ワークゾーンにおける交通インフラのメンテナンス中の安全性を高めるために、コネクテッドおよびオートマチックな車両機能を活用している。しかし、ATMA車両と一般車両の速度差は、キャパシティを減少させ、待ち時間を増加させる移動ボトルネックを生じさせ、さらなる遅延をもたらす。 atmaによって取られた異なる経路は、ユーザの平衡トラフィック割り当てに影響し、異なるシステムコストにつながる可能性がある、時間変動容量低下の多様なパターンを引き起こす。本書は,ネットワーク上でのATMA車両の経路最適化に焦点をあて,低速動作に伴うシステムコストを最小化する。これを実現するために,atmaシステムによるシステムコストを特定するため,待ち行列に基づくトラヒック割当手法を提案する。キャパシティ低下を考慮した待ち時間依存旅行時間関数(QBTD)を導入し,動的特性を付加した結果,静的ユーザ平衡トラフィック割り当て問題に適用した。その後、待ち行列に基づくトラフィック割り当て問題を定式化し、修正パスベースのアルゴリズムを用いて解決する。本手法は,小型ネットワークと大規模ネットワークを用いて検証し,キャパシティドロップモデリングとQBTD走行時間関数の利点を分析するための2つのベンチマークモデルと比較した。さらに、異なる経路の交通システムへの影響を定量化し、保守作業を行うatma車両の最適経路を特定するためのアプローチを適用した。最後に,交通需要の変動やキャパシティの低下に伴う影響について,感度解析を行った。 The emerging technology of the Autonomous Truck Mounted Attenuator (ATMA), a leader-follower style vehicle system, utilizes connected and automated vehicle capabilities to enhance safety during transportation infrastructure maintenance in work zones. However, the speed difference between ATMA vehicles and general vehicles creates a moving bottleneck that reduces capacity and increases queue length, resulting in additional delays. The different routes taken by ATMA cause diverse patterns of time-varying capacity drops, which may affect the user equilibrium traffic assignment and lead to different system costs. This manuscript focuses on optimizing the routing for ATMA vehicles in a network to minimize the system cost associated with the slow-moving operation. To achieve this, a queuing-based traffic assignment approach is proposed to identify the system cost caused by the ATMA system. A queuing-based time-dependent (QBTD) travel time function, considering capacity drop, is introduced and applied in the static user equilibrium traffic assignment problem, with a result of adding dynamic characteristics. Subsequently, we formulate the queuing-based traffic assignment problem and solve it using a modified path-based algorithm. The methodology is validated using a small-size and a large-size network and compared with two benchmark models to analyze the benefit of capacity drop modeling and QBTD travel time function. Furthermore, the approach is applied to quantify the impact of different routes on the traffic system and identify an optimal route for ATMA vehicles performing maintenance work. Finally, sensitivity analysis is conducted to explore how the impact changes with variations in traffic demand and capacity reduction.	翻訳日:2023-08-14 00:39:36 公開日:2023-07-23
# 自然と機械 Nature and the Machines ( http://arxiv.org/abs/2308.04440v1 ) ライセンス: Link先を確認	Huw Price and Matthew Connolly	(参考訳) 人工知能(AI)は人間に現実的なリスクをもたらすか? 一部の批評家は、この疑問があまりに注目を集めていると感じており、AIの即時的なリスクに関する議論を後押ししたいと考えている。この雑誌では、最近の論説で「今日のAIがリスクを冒す明日のAIの運命について話すのをやめよう」と促されている。我々は、これは本質的な判断の重大な失敗であると主張する。科学では、日常生活と同様に、影響のある俳優が誤りの結果を考えることを期待する。世界有数の科学雑誌として、自然は間違いなく影響力のある俳優であり、特にaiの堅牢な国際規制が欠如している。しかし、このケースでエラーのコストを考慮できなかったことは明らかです。 Does artificial intelligence (AI) pose existential risks to humanity? Some critics feel this question is getting too much attention, and want to push it aside in favour of conversations about the immediate risks of AI. These critics now include the journal Nature, where a recent editorial urges us to 'stop talking about tomorrow's AI doomsday when AI poses risks today.' We argue that this is a serious failure of judgement, on Nature's part. In science, as in everyday life, we expect influential actors to consider the consequences of error. As the world's leading scientific journal, Nature is certainly an influential actor, especially so in the absence of robust global regulation of AI. Yet it has manifestly failed to consider the cost of error in this case.	翻訳日:2023-08-14 00:29:06 公開日:2023-07-23
# 階層型模倣学習による多段ケーブルルーティング Multi-Stage Cable Routing through Hierarchical Imitation Learning ( http://arxiv.org/abs/2307.08927v3 ) ライセンス: Link先を確認	Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine	(参考訳) 本研究では,複数段階のロボット操作タスクを学習し,ケーブルルーティングに適用するために,ロボットが一連のクリップを通してケーブルをルーティングしなければならない問題について検討する。この設定では、変形可能なオブジェクトの処理、視覚知覚のループのクローズ、タスク全体の完了に成功して実行しなければならない複数のステップからなる拡張動作の処理など、複雑な多段階ロボット操作シナリオを代表する課題が提示される。このような状況下では、時間的に拡張されたタスクを実行するのに十分な割合で成功する各ステージの個々のプリミティブを学習することは、実用的ではない:もし各ステージが成功し、失敗の不可解な確率を持つなら、タスク全体の完了の可能性は無視できる。したがって、このようなマルチステージタスクで成功したコントローラは、障害から回復し、低レベルのコントローラの欠陥を補うために、任意のタイミングでどのコントローラをトリガーするかをスマートに選択したり、リトライしたり、必要に応じて修正アクションを取るかを選択する必要がある。そこで本研究では,下方(運動制御)と上方(シーケンス)の両方のレベルのデモンストレーションから訓練された視覚に基づくポリシーを用いた模倣学習システムについて述べるとともに,この手法をインスタンス化してケーブルルーティングタスクを学習するシステムを提案し,非常に困難なクリップ配置変動に一般化する上で,優れた性能を示す評価を行う。補足ビデオ、データセット、コードはhttps://sites.google.com/view/cableroutingで見ることができる。 We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.	翻訳日:2023-08-06 11:36:53 公開日:2023-07-23
# 自動運転のための交通流シミュレーション Traffic Flow Simulation for Autonomous Driving ( http://arxiv.org/abs/2307.16762v1 ) ライセンス: Link先を確認	Junfeng Li, Changqing Yan	(参考訳) 交通システムはランダムで複雑な大規模システムであり、実際の交通環境において繰り返しモデリングや制御研究を行うことは困難である。自動運転技術の発展に伴い、自動運転技術の試験・評価の要件がますます高くなってきているため、交通シミュレーションにおけるコンピュータ技術の応用が極めて有効な技術手段となっている。本稿では,マイクロトラフィックフローモデリングに基づいて,セルオートマトンに基づく車両運動モデルと自転車知能理論を採用し,自律車両流れのシミュレーション環境を構築する。自動運転車のアーキテクチャは一般的に認識システム、意思決定システム、制御システムに分けられる。認識システムは一般に多くのサブシステムに分けられ、自動運転車の位置決め、障害物認識、信号の検出と認識、その他のタスクに責任を負う。意思決定システムは通常、経路計画、経路計画、行動選択、行動計画、制御などのタスクに責任を持つ多くのサブシステムに分割される。制御システムは、自動運転車の基礎であり、車両の各制御システムは、バスを介して意思決定システムに接続される必要があり、車両の自律運転を実現するために、意思決定システムによって発行されたバス指示に従って、加速度、ブレーキ度、ステアリング振幅、照明制御その他の運転動作を正確に制御することができる。 A traffic system is a random and complex large system, which is difficult to conduct repeated modelling and control research in a real traffic environment. With the development of automatic driving technology, the requirements for testing and evaluating the development of automatic driving technology are getting higher and higher, so the application of computer technology for traffic simulation has become a very effective technical means. Based on the micro-traffic flow modelling, this paper adopts the vehicle motion model based on cellular automata and the theory of bicycle intelligence to build the simulation environment of autonomous vehicle flow. The architecture of autonomous vehicles is generally divided into a perception system, decision system and control system. The perception system is generally divided into many subsystems, responsible for autonomous vehicle positioning, obstacle recognition, traffic signal detection and recognition and other tasks. Decision systems are typically divided into many subsystems that are responsible for tasks such as path planning, path planning, behavior selection, motion planning, and control. The control system is the basis of the selfdriving car, and each control system of the vehicle needs to be connected with the decision-making system through the bus, and can accurately control the acceleration degree, braking degree, steering amplitude, lighting control and other driving actions according to the bus instructions issued by the decision-making system, so as to achieve the autonomous driving of the vehicle.	翻訳日:2023-08-06 11:21:02 公開日:2023-07-23
# 非構造化医療データからのデータ抽象化のためのゼロショット学習自然言語処理ツールの検証 Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data ( http://arxiv.org/abs/2308.00107v1 ) ライセンス: Link先を確認	Basil Kaufmann, Dallin Busby, Chandan Krushna Das, Neeraja Tillu, Mani Menon, Ashutosh K. Tewari, Michael A. Gorin	(参考訳) 目的: 電子健康記録などのpdf文書に含まれる構造化されていないテキストからデータを抽象化するゼロショット学習自然言語処理(nlp)ツールの開発と検証を記述する。材料と方法: openai の gpt-3.5 モデルに基づくデータ抽象化ツールを開発し、199 個の非同定根治的前立腺切除病理報告から 14 個の特異変数のデータ抽象化を行うための時間からタスク完了までの時間と正確性の観点から3 つの医師の人間抽象化ツールと比較した。レポートは、ベクトル化およびスキャンされたフォーマットでソフトウェアツールによって処理され、データ抽象化に対する光学的文字認識の影響を確立する。このツールは、データの抽象化速度と精度の非偽性に優れていると評価された。結果: 人間の抽象化者は,データ抽象化に1レポートあたり平均101秒を必要とし,その時間は15～284秒であった。比較として、ソフトウェアツールはベクトル化されたレポートを処理するのに平均12.8秒、スキャンされたレポートを処理する平均15.8秒を必要とした(p < 0.001)。 3つの抽象概念の全体としての精度は94.7%、97.8%、96.4%であった。このソフトウェアツールは、ベクトル化されたレポートの全体的な精度は94.2%であり、人間の抽象論者に対して-10%(=0.025ドル)の差で非競合であることが証明された。このツールの精度はスキャンされたレポートで88.7%とわずかに低く、人間の3つのうち2つに非偽性であることが判明した。結論: 開発したゼロショット学習NLPツールは、研究者が人間の抽象体と同等の精度で、かなりの時間を節約できる。タスク固有のモデルトレーニングの必要性がないため、開発されたツールは高度に一般化でき、医学の分野以外でも、さまざまなデータ抽象化タスクに使用できる。 Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($\alpha$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.	翻訳日:2023-08-06 11:12:55 公開日:2023-07-23
# 安全クリティカル自律システムのフレーミング関連 Framing Relevance for Safety-Critical Autonomous Systems ( http://arxiv.org/abs/2307.14355v1 ) ライセンス: Link先を確認	Astrid Rakow	(参考訳) 私たちは、構築された信念を持ち、環境を知覚し、情報を交換する複雑な高度に自律的なシステムを構築する過程にあります。これらのシステムはそれぞれの世界観を構築し、それに基づいて将来の計画、すなわち、将来の予測に基づいて目標を確立するために行動を選択する。通常、これらのシステムは、すべてが関連しない様々な情報源によって提供される膨大な情報に直面している。我々の研究の目的は、現在のミッションにおいて安全クリティカルな自律システムに関連するものを決定するための公式なアプローチを開発することであり、すなわち、ミッション目標を達成するために適切な世界観を構築するのに十分な情報である。 We are in the process of building complex highly autonomous systems that have build-in beliefs, perceive their environment and exchange information. These systems construct their respective world view and based on it they plan their future manoeuvres, i.e., they choose their actions in order to establish their goals based on their prediction of the possible futures. Usually these systems face an overwhelming flood of information provided by a variety of sources where by far not everything is relevant. The goal of our work is to develop a formal approach to determine what is relevant for a safety critical autonomous system at its current mission, i.e., what information suffices to build an appropriate world view to accomplish its mission goals.	翻訳日:2023-07-28 19:11:34 公開日:2023-07-23
# 実用シナリオにおけるマルチビュークラスタリングにおけるノイズビューの副作用の調査と緩和 Investigating and Mitigating the Side Effects of Noisy Views in Multi-view Clustering in Practical Scenarios ( http://arxiv.org/abs/2303.17245v2 ) ライセンス: Link先を確認	Jie Xu, Gang Niu, Xiaolong Wang, Yazhou Ren, Lei Feng, Xiaoshuang Shi, Zheng Zhang, Heng Tao Shen, Xiaofeng Zhu	(参考訳) マルチビュークラスタリング(MvC)は,ラベルの監督なしに,マルチビューデータのカテゴリ構造を探索することを目的とする。複数のビューは単一のビューよりも多くの情報を提供するので、既存のMvCメソッドは十分なパフォーマンスを得ることができる。しかし、実際のシナリオでは、ビューが騒がしい場合、パフォーマンスが著しく低下する可能性がある。本稿ではまず,まず,ノイズの多い視点の欠点を公式に検討し,その問題に対処するための理論的基盤を持つ深層MvC法(MvCAN)を提案する。具体的には、複数のビューにまたがる非共有パラメータと一貫性のないクラスタリング予測を可能にし、ノイズの多いビューの副作用を低減するための新しいMvC目標を提案する。さらに、複数のビューの有用な情報をマイニングするための堅牢な学習目標を生成するために、非パラメトリック反復プロセスが設計されている。理論的解析により、mvcanはマルチビュー一貫性、相補性、ノイズロバスト性を達成することで機能する。最後に、大規模な公開データセットの実験により、MvCANは最先端の手法よりも優れ、ノイズの多いビューの存在に対して堅牢であることが示された。 Multi-view clustering (MvC) aims at exploring category structures among multi-view data without label supervision. Multiple views provide more information than single views and thus existing MvC methods can achieve satisfactory performance. However, their performance might seriously degenerate when the views are noisy in practical scenarios. In this paper, we first formally investigate the drawback of noisy views and then propose a theoretically grounded deep MvC method (namely MvCAN) to address this issue. Specifically, we propose a novel MvC objective that enables un-shared parameters and inconsistent clustering predictions across multiple views to reduce the side effects of noisy views. Furthermore, a non-parametric iterative process is designed to generate a robust learning target for mining multiple views' useful information. Theoretical analysis reveals that MvCAN works by achieving the multi-view consistency, complementarity, and noise robustness. Finally, experiments on extensive public datasets demonstrate that MvCAN outperforms state-of-the-art methods and is robust against the existence of noisy views.	翻訳日:2023-07-26 21:01:04 公開日:2023-07-23
# 継続的学習を超えた深層学習の予測に関する包括的調査 A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning ( http://arxiv.org/abs/2307.09218v2 ) ライセンス: Link先を確認	Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang	(参考訳) 蓄積とは、以前取得した情報や知識の喪失または劣化を指す。忘れることに関する既存の調査は、主に継続的学習に焦点を当てているが、深層学習における他の様々な研究領域でよく見られる現象である。ジェネレータシフトによる生成モデルや、クライアント間での不均一なデータ分布によるフェデレーション学習などの研究分野におけるフォーミングの現れ。忘れることへの対処には、古いタスク知識の保持と新しいタスクの迅速な学習のバランス、競合する目標とのタスク干渉の管理、プライバシー漏洩の防止など、いくつかの課題が含まれている。さらに、継続学習に関する既存の調査のほとんどは、忘れが常に有害であると暗黙的に仮定している。対照的に、われわれの調査は、忘れは二重刃の剣であり、プライバシー保護シナリオのような特定のケースで有益で望ましいものだと主張している。より広い文脈で忘れることを検討することで、我々はこの現象をより微妙な理解を示し、その潜在的な利点を浮き彫りにする。この包括的な調査を通じて、忘れを扱ったさまざまな分野のアイデアやアプローチを描き出すことで、潜在的な解決策を明らかにすることを目指している。従来の境界を越えて忘れることを調べることで、将来の作業では、実際のアプリケーションにおける忘れを緩和、活用、あるいは受け入れるための新しい戦略の開発を奨励したいと考えています。様々な研究分野における忘れに関する包括的な論文の一覧は、 \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning} にある。 Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.	翻訳日:2023-07-26 20:13:36 公開日:2023-07-23
# 同期NPA階層と応用 A synchronous NPA hierarchy with applications ( http://arxiv.org/abs/2105.01555v2 ) ライセンス: Link先を確認	Travis B. Russell	(参考訳) 本稿では,同期相関行列の設定に対するnpa階層の適応について述べる。我々の適応は、より小さな証明書と少ない制約を用いて元のnpa階層を改善するが、同期相関を証明するためにしか適用できない。同期量子交換と同期量子相関の集合の特性を復元する。応用として、対称的完全正の演算子値測度と相互に偏りのない基底の最大集合の存在を、適応されたNPA階層の2つの証明書で検証または無効化できることを示す。 We present an adaptation of the NPA hierarchy to the setting of synchronous correlation matrices. Our adaptation improves upon the original NPA hierarchy by using smaller certificates and fewer constraints, although it can only be applied to certify synchronous correlations. We recover characterizations for the sets of synchronous quantum commuting and synchronous quantum correlations. For applications, we show that the existence of symmetric informationally complete positive operator-valued measures and maximal sets of mutually unbiased bases can be verified or invalidated with only two certificates of our adapted NPA hierarchy.	翻訳日:2023-07-26 01:40:03 公開日:2023-07-23
# 実演のない共模倣学習 Co-Imitation Learning without Expert Demonstration ( http://arxiv.org/abs/2103.14823v2 ) ライセンス: Link先を確認	Kun-Peng Ning, Hu Xu, Kun Zhu, Sheng-Jun Huang	(参考訳) 模倣学習は、専門家のデモンストレーションを利用して強化学習の効率を向上させるための主要なアプローチである。しかし、多くの現実のシナリオでは、専門家のデモンストレーションを得るのは非常に高価か、あるいは不可能かもしれない。この課題を克服するために,本稿では,エージェントの過去の優れた経験を専門家のデモンストレーションなしに活用するための,CoIL(Co-Imitation Learning)と呼ばれる新しい学習フレームワークを提案する。具体的には,それぞれのエージェントが交互に環境を探索し,ピアエージェントの経験を生かして,異なるエージェントを訓練する。経験は価値や誤解を招く可能性があるが、我々は各経験の潜在的有用性を価値関数の期待値で見積もることを提案する。これにより、ノイズをフィルタリングしながら、より有用な体験を強調して、エージェント同士を選択的に模倣することができる。様々な課題に対する実験結果から,提案する共励学習フレームワークは,エージェント同士が外部の監督なしに相互に利益を享受できるという有意な優位性を示した。 Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.	翻訳日:2023-07-26 01:39:19 公開日:2023-07-23
# LAnoBERT: BERT Masked Language Modelに基づくシステムログ異常検出 LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model ( http://arxiv.org/abs/2111.09564v3 ) ライセンス: Link先を確認	Yukyung Lee, Jina Kim and Pilsung Kang	(参考訳) コンピュータシステムで生成されたシステムログは、同時に収集され、エラー、侵入、異常行動を決定する基本データとして使用される大規模データを指す。システムログ異常検出の目的は、人間の介入を最小限に抑えながら異常を迅速に特定することである。従来の研究では,様々なログデータを解析器を用いて標準化テンプレートに変換し,アルゴリズムによる異常検出を行った。特に、ログキー内の情報が失われる可能性のあるすべてのログデータに対して、特定のイベントに対応するテンプレートを事前に定義する必要がある。本研究では,自然言語処理性能に優れたbertモデルを用いたパーザフリーシステムログ異常検出手法であるlanobertを提案する。提案手法であるLAnoBERTは,BERTに基づく事前学習手法であるマスク言語モデリングを用いてモデルを学習し,テスト中にログキー毎のマスク言語モデリング損失関数を用いて教師なし学習に基づく異常検出を行う。さらに,実際のシステムに適用可能なパイプラインを構築するための効率的な推論手法を提案する。 HDFS、BGL、Thunderbirdの3つの有名なログデータセットの実験では、LAnoBERTは教師なし学習ベースのベンチマークモデルよりも高い異常検出性能を示しただけでなく、教師なし学習ベースのベンチマークモデルと同等のパフォーマンスを得た。 The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining errors, intrusion and abnormal behaviors. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template using a parser. Particularly, a template corresponding to a specific event should be defined in advance for all the log data using which the information within the log key may get lost. In this study, we propose LAnoBERT, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance. The proposed method, LAnoBERT, learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modeling loss function per log key during the test process. In addition, we also propose an efficient inference process to establish a practically applicable pipeline to the actual system. Experiments on three well-known log datasets, i.e., HDFS, BGL, and Thunderbird, show that not only did LAnoBERT yield a higher anomaly detection performance compared to unsupervised learning-based benchmark models, but also it resulted in a comparable performance with supervised learning-based benchmark models.	翻訳日:2023-07-26 01:31:10 公開日:2023-07-23
# 等価性と推定オントロジーマッチングのための機械学習フレンドリーなバイオメディカルデータセット Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching ( http://arxiv.org/abs/2205.03447v8 ) ライセンス: Link先を確認	Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian, Ian Horrocks	(参考訳) オントロジーマッチング(OM)はバイオインフォマティクスやセマンティックウェブなど多くの分野において重要な役割を担い、特に機械学習(ML)技術の適用によってその研究はますます人気が高まっている。オントロジーアライメント評価イニシアチブ(OAEI)は,OMシステムの体系的評価に多大な努力を払っているものの,サブエミッションマッピングの限定的な評価,最適でない参照マッピング,MLベースのシステム評価の限定的なサポートなど,いくつかの制限に悩まされている。これらの制約に対処するために,Mondo と UMLS から抽出したオントロジーを含む5つの新しいバイオメディカル OM タスクを導入する。各タスクは等価性と仮定マッチングの両方を含み、参照マッピングの品質は人間のキュレーションやオントロジープルーニングなどで保証される。 MLベースのOMシステムと非MLベースのOMシステムの両方において,様々な観点からOM性能を測定するための総合評価フレームワークを提案する。我々は,OAEI 2022における新たなBioMLトラックの一部として,これらのリソースの利用状況を示すため,異なるタイプのOMシステムの評価結果を報告する。 Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.	翻訳日:2023-07-26 01:21:29 公開日:2023-07-23
# 物理的カスケードイベントにおける推論と行動の学習 Learning to reason about and to act on physical cascading events ( http://arxiv.org/abs/2202.01108v2 ) ライセンス: Link先を確認	Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik	(参考訳) 動的環境の推論とインタラクションは、AIの基本的な問題だが、アクションがクロス依存イベントのカスケードをトリガーできると、極めて困難になる。そこで,エージェントが物理的にシミュレートされた動的シーンの映像を提示し,システムが"国的"な目標に達するように,イベントのカスケードを介入して起動するように要求する,"em cascade"と呼ばれる新しい教師付き学習設定を導入する。例えば、エージェントは「青いボールが緑色のボールを押して赤いボールを打つように」依頼される。エージェントの介入は連続空間から引き出され、事象のカスケードはダイナミクスを非常に非線形にする。セマンティックツリー探索とイベント駆動フォワードモデルを組み合わせることで,連続空間におけるセマンティックツリーの探索を学習するアルゴリズムを考案する。提案手法は,これまで見つからなかった複雑な場面に介入する命令を効果的に追従することを学ぶ。観測された事象のカスケードを提供する場合、別の結果も推論できる。 Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events. We introduce a new supervised learning setup called {\em Cascade} where an agent is shown a video of a physically simulated dynamic scene, and is asked to intervene and trigger a cascade of events, such that the system reaches a "counterfactual" goal. For instance, the agent may be asked to "Make the blue ball hit the red one, by pushing the green ball". The agent intervention is drawn from a continuous space, and cascades of events makes the dynamics highly non-linear. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to search in semantic trees in continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in previously unseen complex scenes. It can also reason about alternative outcomes, when provided an observed cascade of events.	翻訳日:2023-07-26 01:18:11 公開日:2023-07-23
# DSTEA: エンティティ適応型事前トレーニングによる対話状態追跡の改善 DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training ( http://arxiv.org/abs/2207.03858v2 ) ライセンス: Link先を確認	Yukyung Lee, Takyoung Kim, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim	(参考訳) 対話状態追跡(DST)は、ユーザとシステム発話を包括的に解釈するために重要であり、それによって効率的な対話システムの基礎を形成する。過去の研究は、モデル構造の変更やグラフ関係などの追加機能の統合によるDSTパフォーマンスの向上に重点を置いていたが、しばしば外部対話コーパスによる事前学習が必要である。本研究では,対話発話におけるキーエンティティを集中的に訓練することにより,エンコーダを強化可能な,エンティティ適応型事前学習による対話状態追跡を改善するDSTEAを提案する。 DSTEAは、オントロジー情報、名前付き認識、paCy、flairライブラリの4つの異なる方法を用いて、これらの重要なエンティティを入力対話から識別する。その後、モデルを効果的に訓練するために選択的知識マスクを用いる。注目すべきは、DSTEAはDSTモデルに追加知識を直接注入することなく、事前学習を必要とすることだ。このアプローチにより、MultiWOZ 2.0, 2.1, 2.2の4つの堅牢DSTモデルの性能が大幅に向上し、目標精度は2.69%(52.41%から55.10%)まで向上した。 DSTEAの有効性のさらなる検証は、様々なエンティティタイプとマスキング戦略やマスキング率などの異なるエンティティ適応事前学習構成を考慮した比較実験によって行われた。 Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue corpora. In this study, we propose DSTEA, improving Dialogue State Tracking via Entity Adaptive pre-training, which can enhance the encoder through by intensively training key entities in dialogue utterances. DSTEA identifies these pivotal entities from input dialogues utilizing four different methods: ontology information, named-entity recognition, the spaCy, and the flair library. Subsequently, it employs selective knowledge masking to train the model effectively. Remarkably, DSTEA only requires pre-training without the direct infusion of extra knowledge into the DST model. This approach resulted in substantial performance improvements of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further validation of DSTEA's efficacy was provided through comparative experiments considering various entity types and different entity adaptive pre-training configurations such as masking strategy and masking rate.	翻訳日:2023-07-26 01:11:10 公開日:2023-07-23
# TF-GNN:TensorFlowのグラフニューラルネットワーク TF-GNN: Graph Neural Networks in TensorFlow ( http://arxiv.org/abs/2207.03522v2 ) ライセンス: Link先を確認	Oleksandr Ferludin, Arno Eigenwillig, Martin Blais, Dustin Zelle, Jan Pfeifer, Alvaro Sanchez-Gonzalez, Wai Lok Sibon Li, Sami Abu-El-Haija, Peter Battaglia, Neslihan Bulut, Jonathan Halcrow, Filipe Miguel Gon\c{c}alves de Almeida, Pedro Gonnet, Liangze Jiang, Parth Kothari, Silvio Lattanzi, Andr\'e Linhares, Brandon Mayer, Vahab Mirrokni, John Palowitch, Mihir Paradkar, Jennifer She, Anton Tsitsulin, Kevin Villela, Lisa Wang, David Wong, Bryan Perozzi	(参考訳) TensorFlow-GNN(TF-GNN)は、TensorFlowのグラフニューラルネットワークのためのスケーラブルなライブラリである。これは、今日の情報エコシステムで発生する豊富な異種グラフデータの種類をサポートするために、下から設計されている。機械学習の研究者や高度な開発者を可能にすることに加えて、tf-gnnはグラフ学習の幅広い開発者コミュニティに力を与えるローコードソリューションを提供する。 Googleの多くのプロダクションモデルはTF-GNNを使用しており、最近オープンソースプロジェクトとしてリリースされた。本稿では,tf-gnnデータモデル,kerasメッセージパッシングapi,グラフサンプリングや分散トレーニングといった関連する機能について述べる。 TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many production models at Google use TF-GNN, and it has been recently released as an open source project. In this paper we describe the TF-GNN data model, its Keras message passing API, and relevant capabilities such as graph sampling and distributed training.	翻訳日:2023-07-26 01:10:48 公開日:2023-07-23
# パーソナライゼーションのハーム:予測におけるグループ属性の利用を再考する When Personalization Harms: Reconsidering the Use of Group Attributes in Prediction ( http://arxiv.org/abs/2206.02058v3 ) ライセンス: Link先を確認	Vinith M. Suriyakumar, Marzyeh Ghassemi, Berk Ustun	(参考訳) マシンラーニングモデルは、保護、機密性、自己報告、あるいは取得コストのかかるカテゴリ属性でパーソナライズされることが多い。本研究では,グループ属性でパーソナライズされたモデルがグループレベルでのパフォーマンスを低下させることを示す。予測タスクにおけるグループ属性の「公平な使用」を保証するための形式的条件として,1つの追加モデルを訓練することを提案する。実験的なリスク最小化における公正な使用を保証するための十分な条件を提示し、モデル開発とデプロイメントの標準プラクティスによる公正な使用違反につながる障害モードを特徴付ける。臨床予測タスクにおける公正な使用に関する総合的な実証研究を行う。本研究は, フェアユース違反の実態を実証し, 害を軽減するための簡単な介入を例示するものである。 Machine learning models are often personalized with categorical attributes that are protected, sensitive, self-reported, or costly to acquire. In this work, we show models that are personalized with group attributes can reduce performance at a group level. We propose formal conditions to ensure the "fair use" of group attributes in prediction tasks by training one additional model -- i.e., collective preference guarantees to ensure that each group who provides personal data will receive a tailored gain in performance in return. We present sufficient conditions to ensure fair use in empirical risk minimization and characterize failure modes that lead to fair use violations due to standard practices in model development and deployment. We present a comprehensive empirical study of fair use in clinical prediction tasks. Our results demonstrate the prevalence of fair use violations in practice and illustrate simple interventions to mitigate their harm.	翻訳日:2023-07-26 01:10:11 公開日:2023-07-23
# 学習型ロボットグラスピングのための物理誘導階層的リワード機構 Physics-Guided Hierarchical Reward Mechanism for Learning-Based Robotic Grasping ( http://arxiv.org/abs/2205.13561v3 ) ライセンス: Link先を確認	Yunsik Jung, Lingfeng Tao, Michael Bowman, Jiucai Zhang, Xiaoli Zhang	(参考訳) 学習に基づく把持は、高い計算効率により、多指ロボットハンドのリアルタイム把持動作計画を可能にする。しかし,学習過程において大きな探索空間を探索するためには,学習に基づく手法が必要となる。検索スペースは学習効率を低下させ、それがその実践的採用の主要な障壁となっている。加えて、トレーニングされたポリシーには、オブジェクトがトレーニングされたオブジェクトと同一でない限り、一般的な結果が欠けている。本研究では,学習効率と学習に基づく自律的把握の一般化性を向上させるために,階層的リワード機構を備えた物理誘導型深層強化学習を開発する。従来の観察に基づくグリップラーニングとは異なり、物理インフォームドメトリクスは手の構造と物体の相関関係を伝達し、学習効率と結果を改善する。さらに、階層的な報酬機構により、ロボットは把握タスクの優先順位付けされたコンポーネントを学習することができる。本手法は3本指MICOロボットアームを用いたロボット把握作業において有効である。その結果,ロボットの把握作業において,標準的なDeep Reinforcement Learning法よりも優れていた。 Learning-based grasping can afford real-time grasp motion planning of multi-fingered robotics hands thanks to its high computational efficiency. However, learning-based methods are required to explore large search spaces during the learning process. The search space causes low learning efficiency, which has been the main barrier to its practical adoption. In addition, the trained policy lacks a generalizable outcome unless objects are identical to the trained objects. In this work, we develop a novel Physics-Guided Deep Reinforcement Learning with a Hierarchical Reward Mechanism to improve learning efficiency and generalizability for learning-based autonomous grasping. Unlike conventional observation-based grasp learning, physics-informed metrics are utilized to convey correlations between features associated with hand structures and objects to improve learning efficiency and outcomes. Further, the hierarchical reward mechanism enables the robot to learn prioritized components of the grasping tasks. Our method is validated in robotic grasping tasks with a 3-finger MICO robot arm. The results show that our method outperformed the standard Deep Reinforcement Learning methods in various robotic grasping tasks.	翻訳日:2023-07-26 01:09:55 公開日:2023-07-23
# EchoGNN: グラフニューラルネットワークによる説明可能な射出差分推定 EchoGNN: Explainable Ejection Fraction Estimation with Graph Neural Networks ( http://arxiv.org/abs/2208.14003v2 ) ライセンス: Link先を確認	Masoud Mokhtari, Teresa Tsang, Purang Abolmaesumi, Renjie Liao	(参考訳) エジェクション分画(EF)は心機能の重要な指標であり、心不全などの心機能障害に起因した患者の識別を可能にする。 EFは、左心室を手動で追跡し、その容積を特定のフレームで推定することにより、心エコー(echo)として知られる心エコービデオから推定される。これらの推定は、マニュアルプロセスとビデオ品質の違いにより、オブザーバ間の可変性が高い。このような不正確さの源泉と迅速な評価の必要性は、信頼性と説明可能な機械学習技術を必要とする。本研究では,グラフニューラルネットワーク(GNN)に基づくモデルであるEchoGNNを導入し,エコービデオからEFを推定する。我々のモデルはまず、1つまたは複数のエコーシン系列のフレームから潜時エコーグラフを推測する。次に、このグラフのノードとエッジの重みを推定し、EF推定に役立つ個々のフレームの重要性を示す。 GNN回帰器はこの重み付きグラフを使用してEFを予測する。我々は,学習グラフの重み付けが,人的介入が必要なタイミングを決定するために,EF推定のためのクリティカルフレームの同定を通じて説明可能性を提供することを示す。 EchoNet-DynamicパブリックEFデータセットでは、EchoGNNは、最先端のEF予測のパフォーマンスを達成し、説明可能性を提供する。 Ejection fraction (EF) is a key indicator of cardiac function, allowing identification of patients prone to heart dysfunctions such as heart failure. EF is estimated from cardiac ultrasound videos known as echocardiograms (echo) by manually tracing the left ventricle and estimating its volume on certain frames. These estimations exhibit high inter-observer variability due to the manual process and varying video quality. Such sources of inaccuracy and the need for rapid assessment necessitate reliable and explainable machine learning techniques. In this work, we introduce EchoGNN, a model based on graph neural networks (GNNs) to estimate EF from echo videos. Our model first infers a latent echo-graph from the frames of one or multiple echo cine series. It then estimates weights over nodes and edges of this graph, indicating the importance of individual frames that aid EF estimation. A GNN regressor uses this weighted graph to predict EF. We show, qualitatively and quantitatively, that the learned graph weights provide explainability through identification of critical frames for EF estimation, which can be used to determine when human intervention is required. On EchoNet-Dynamic public EF dataset, EchoGNN achieves EF prediction performance that is on par with state of the art and provides explainability, which is crucial given the high inter-observer variability inherent in this task.	翻訳日:2023-07-26 01:01:36 公開日:2023-07-23
# Frouros: 機械学習システムにおけるドリフト検出のためのPythonライブラリ Frouros: A Python library for drift detection in machine learning systems ( http://arxiv.org/abs/2208.06868v4 ) ライセンス: Link先を確認	Jaime C\'espedes-Sisniega and \'Alvaro L\'opez-Garc\'ia	(参考訳) FrourosはオープンソースのPythonライブラリで、機械学習システムのドリフトを検出することができる。ドリフト検出のための古典的なアルゴリズムとより最近のアルゴリズムの組み合わせを提供する:概念とデータドリフトの両方である。私たちは、あらゆる機械学習フレームワークと互換性を持たせ、現実世界のユースケースに容易に適応できるように設計しました。このライブラリは、メンテナンスの容易さと拡張性を確保するために、最良の開発と継続的インテグレーションのプラクティスに従って開発されている。ソースコードはhttps://github.com/ifca/frouros.com/で入手できる。 Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection: both concept and data drift. We have designed it with the objective of making it compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.	翻訳日:2023-07-26 01:00:38 公開日:2023-07-23
# 分離のないスパースモーメント問題の効率的なアルゴリズム Efficient Algorithms for Sparse Moment Problems without Separation ( http://arxiv.org/abs/2207.13008v2 ) ライセンス: Link先を確認	Zhiyuan Fan and Jian Li	(参考訳) 我々は,任意の次元の雑音モーメント情報から高次元空間におけるk$-spike混合の学習におけるスパースモーメント問題を考える。移動距離を用いて学習した混合物の精度を測定する。以前のアルゴリズムは、特定の分離仮定を仮定するか、より多くのリカバリモーメントを使用するか、あるいは(超)指数関数時間で実行する。我々の一次元問題に対するアルゴリズム(スパースハウスドルフモーメント問題とも呼ばれる)は古典的なプロニーの手法の頑健なバージョンであり、我々の貢献は主に解析に関係している。従来の研究(プロニーの手法の中間結果の摂動を解析する)よりも大域的かつより厳密な分析を採用する。有用な技術的要素は、ヴァンダーモンド行列で定義される線形系とシュール多項式の間の接続であり、これは分離とは独立に束縛され、他の文脈で有用である。この高次元問題に取り組むために,まず1次元アルゴリズムと解析を複素数に拡張して2次元問題を解く。高次元の場合のアルゴリズムは、混合の1次元投影とランダムベクトルと混合の2次元投影のセットを整合させることにより、各スパイクの座標を決定する。この結果から,トピックモデルとガウス混合の学習に応用でき,サンプル複雑性の改善や事前作業の時間短縮が期待できる。 We consider the sparse moment problem of learning a $k$-spike mixture in high-dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the one-dimensional problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high-dimensional problem, we first solve the two-dimensional problem by extending the one-dimensional algorithm and analysis to complex numbers. Our algorithm for the high-dimensional case determines the coordinates of each spike by aligning a 1d projection of the mixture to a random vector and a set of 2d projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work.	翻訳日:2023-07-26 00:59:15 公開日:2023-07-23
# GMA3D: シーンフローの蓄積した動きを推定するローカル・グローバル・アテンション学習 GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow ( http://arxiv.org/abs/2210.03296v2 ) ライセンス: Link先を確認	Zhiyang Lu, Ming Cheng	(参考訳) シーンフローは、3dポイント雲内の各ポイントの動き情報を表す。モーションセグメンテーションやオブジェクトトラッキングなど、多くのタスクに適用される、重要な下流手法である。しかしながら、2つの連続した点雲の間には常に閉塞点があり、スパーシティデータサンプリングや実世界の閉塞からである。本稿では,移動物体のセマンティックな自己相似性と動きの整合性によるシーンフローのオクルージョン問題に焦点をあてる。本稿では, 局所的および大域的セマンティックな類似性を利用して, 局所的および大域的非包含点の運動情報から包含点の運動情報を推定し, オフセットアグリゲータを用いてそれらを集約するGMA3Dモジュールを提案する。我々のモジュールは、最初にトランスフォーマーベースのアーキテクチャを適用して、点雲上のシーンフロー閉塞問題を測定する。実験により,GMA3Dはシーンフロー,特に実シーンにおける閉塞問題を解くことができることがわかった。提案手法は,ポイントクラウドデータセットのオクルードバージョンで評価し,実シーンkittiデータセットで最新の結果を得た。また,GMA3Dが非閉塞シーンフローに対してまだ有効であることを示すために,非閉塞バージョンデータセットの実験を行い,FlyThings3DとKITTIで有望な性能を達成した。コードはhttps://anonymous.4open.science/r/gma3d-e100で入手できる。 Scene flow represents the motion information of each point in the 3D point clouds. It is a vital downstream method applied to many tasks, such as motion segmentation and object tracking. However, there are always occlusion points between two consecutive point clouds, whether from the sparsity data sampling or real-world occlusion. In this paper, we focus on addressing occlusion issues in scene flow by the semantic self-similarity and motion consistency of the moving objects. We propose a GMA3D module based on the transformer framework, which utilizes local and global semantic similarity to infer the motion information of occluded points from the motion information of local and global non-occluded points respectively, and then uses an offset aggregator to aggregate them. Our module is the first to apply the transformer-based architecture to gauge the scene flow occlusion problem on point clouds. Experiments show that our GMA3D can solve the occlusion problem in the scene flow, especially in the real scene. We evaluated the proposed method on the occluded version of point cloud datasets and get state-of-the-art results on the real scene KITTI dataset. To testify that GMA3D is still beneficial to non-occluded scene flow, we also conducted experiments on non-occluded version datasets and achieved promising performance on FlyThings3D and KITTI. The code is available at https://anonymous.4open.science/r/GMA3D-E100.	翻訳日:2023-07-26 00:52:41 公開日:2023-07-23
# データ拡張によるグラフ異常検出モデルの一般化性の向上 Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation ( http://arxiv.org/abs/2209.10168v2 ) ライセンス: Link先を確認	Shuang Zhou, Xiao Huang, Ninghao Liu, Fu-Lai Chung, Long-Kai Huang	(参考訳) グラフ異常検出(GAD)は、少数の異常でさえ、良心的なユーザーに大きな脅威をもたらす可能性があるため、重要なタスクである。従来の知識として利用可能なラベルを効果的に活用できる最近の半教師付きGAD法は、教師なし手法よりも優れた性能を実現している。実際には、人々はビジネスを確保するために新しい(サブ)グラフ上の異常を識別する必要があるが、効果的な検出モデルをトレーニングするラベルが欠落している可能性がある。自然なアイデアのひとつは、トレーニング済みのgadモデルをテスト用の新しい(サブ)グラフに直接導入することだ。しかし、既存の半教師付きGAD法は一般化の問題に悩まされており、例えば、よく訓練されたモデルは、同じグラフの見えない領域(つまり、トレーニングではアクセスできない)ではうまく機能しない。それは大きなトラブルを引き起こすかもしれない。本稿では,この現象を基礎として,学習領域グラフと未発見テストグラフの両方の異常を効果的に識別し,潜在的な危険を解消することを目的とした,一般化グラフ異常検出の一般的かつ新しい研究問題を提案する。それでも、限られたラベルしか利用できないため、通常のバックグラウンドはトレーニングとテストデータの違いがあるため、難しい作業です。そこで本研究では,学習データを充実させ,GADモデルの一般化性を高めるために,textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) というデータ拡張手法を提案する。モデル一般化性向上における本手法の有効性を検証する。 Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.	翻訳日:2023-07-26 00:51:10 公開日:2023-07-23
# 画像セグメンテーションのロバスト化に向けて Towards Robust Referring Image Segmentation ( http://arxiv.org/abs/2209.09554v2 ) ライセンス: Link先を確認	Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, Dacheng Tao	(参考訳) Referring Image Segmentation (RIS)は、テキスト記述に基づいてオブジェクトマスクを出力する基本的な視覚言語タスクである。様々な融合法の設計を含む多くの研究がRISでかなりの進歩を遂げた。本研究では,「もしテキスト記述が間違っていたり誤解を招いたりしたらどうするか」という本質的な質問を探索する。私たちはそのような文を否定的な文と呼ぶ。しかし、RISの既存のソリューションはそのような設定を扱えない。この目的のために,ロバスト参照画像セグメンテーション (R-RIS) という新しいRISの定式化を提案する。正のテキスト入力以外に負の文入力も考慮している。この新しいタスクを容易にするために,既存のrisデータセットを負の文で拡張し,両方の入力を統一的に評価するための新しい指標を提案する。さらに,トークンベースのビジョンと言語融合モジュールを備えたRefSegformerと呼ばれるトランスフォーマーモデルを提案する。我々の設計は、余分な空白トークンを追加することでR-RIS設定に容易に拡張できる。提案したRefSegformerは、RISとR-RISの両方のデータセットで最先端の結果を達成し、両方の設定にしっかりとしたベースラインを確立する。プロジェクトページは \url{https://github.com/jianzongwu/robust-ref-seg} にある。 Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions. Many works have achieved considerable progress for RIS, including different fusion method designs. In this work, we explore an essential question, ``What if the text description is wrong or misleading?'' For example, the described objects are not in the image. We term such a sentence as a negative sentence. However, existing solutions for RIS cannot handle such a setting. To this end, we propose a new formulation of RIS, named Robust Referring Image Segmentation (R-RIS). It considers the negative sentence inputs besides the regular positive text inputs. To facilitate this new task, we create three R-RIS datasets by augmenting existing RIS datasets with negative sentences and propose new metrics to evaluate both types of inputs in a unified manner. Furthermore, we propose a new transformer-based model, called RefSegformer, with a token-based vision and language fusion module. Our design can be easily extended to our R-RIS setting by adding extra blank tokens. Our proposed RefSegformer achieves state-of-the-art results on both RIS and R-RIS datasets, establishing a solid baseline for both settings. Our project page is at \url{https://github.com/jianzongwu/robust-ref-seg}.	翻訳日:2023-07-26 00:50:42 公開日:2023-07-23
# ヘテロケクタスティック分布の神経活動的学習 Neural Active Learning on Heteroskedastic Distributions ( http://arxiv.org/abs/2211.00928v2 ) ライセンス: Link先を確認	Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb	(参考訳) 最高品質のトレーニングデータを積極的に探せるモデルは、より正確で適応性があり、効率的な機械学習の可能性を秘めている。アクティブな学習テクニックは、分類するのが最も難しい例を好むことが多い。これは均一なデータセットでうまく機能するが、ラベルノイズやヘテロスケダスティック性が異なる複数の分布で実行された場合、破滅的な障害を引き起こす可能性がある。これらのアクティブな学習アルゴリズムは、例えばランダムなラベルを持つ固体カラー画像のような)情報構造を持たない場合でも、よりノイズの多い分布から引き出すことを強く望んでいる。そこで本研究では,これらアクティブ学習アルゴリズムのヘテロセクタスティック分布における破壊的失敗を実証し,これらの障害を軽減するための微調整に基づくアプローチを提案する。さらに,データポイント毎にモデル差スコアリング機能を組み込んだ新しいアルゴリズムを提案し,ノイズの多いサンプルをフィルタリングし,精度を最大化するクリーンサンプルを抽出し,既存のアクティブラーニング手法をヘテロスケクタスティックデータセットで上回らせる手法を提案する。これらの観察とテクニックが実践者にとってすぐに役に立ち、アクティブラーニングアルゴリズムの設計において共通の仮定に挑戦できることを願っている。 Models that can actively seek out the best quality training data hold the promise of more accurate, adaptable, and efficient machine learning. Active learning techniques often tend to prefer examples that are the most difficult to classify. While this works well on homogeneous datasets, we find that it can lead to catastrophic failures when performed on multiple distributions with different degrees of label noise or heteroskedasticity. These active learning algorithms strongly prefer to draw from the distribution with more noise, even if their examples have no informative structure (such as solid color images with random labels). To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures. Further, we propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples that maximize accuracy, outperforming the existing active learning techniques on the heteroskedastic datasets. We hope these observations and techniques are immediately helpful to practitioners and can help to challenge common assumptions in the design of active learning algorithms.	翻訳日:2023-07-26 00:41:42 公開日:2023-07-23
# 対数線形ガードネスとその意義 Log-linear Guardedness and its Implications ( http://arxiv.org/abs/2210.10012v3 ) ライセンス: Link先を確認	Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell	(参考訳) 線形性を仮定する神経表現から人間の解釈可能な概念を消去する方法は、扱いやすく有用であることが判明している。しかし、この除去が修正表現で訓練された下流分類器の挙動に与える影響は、完全には理解されていない。本研究では,対数線形ガードドネスの概念を,その表現から直接その概念を予測できない敵に定義し,その意味について検討する。バイナリの場合、ある仮定の下では、下流の対数線形モデルでは消去された概念を復元できないことを示す。しかし,マルチクラス対数線形モデルであるemph{can}が,対数線形ガード性の本質的な限界を下流バイアス緩和手法として指摘し,間接的に概念を回復することを示す。これらの結果は線形消去法の理論的限界に光を当て、神経モデルにおける内因バイアスと外因バイアスの関係についてさらなる研究の必要性を強調した。 Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.	翻訳日:2023-07-26 00:39:09 公開日:2023-07-23
# ソフトコントラスト学習とオールインワン分類器を用いた新しいカテゴリー発見の促進 Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier ( http://arxiv.org/abs/2211.11262v3 ) ライセンス: Link先を確認	Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li	(参考訳) 非教師なしドメイン適応(UDA)は、ラベルリッチソースドメインからラベルスカースターゲットドメインへの知識の転送に非常に効果的であることが証明されている。しかし、対象領域に新たなカテゴリを追加することで、open-set domain adaptation (oda) と universal domain adaptation (unda) が開発された。既存のOdaおよびUNDAメソッドは、すべての新しいカテゴリを単一の統一された未知のクラスとして扱い、トレーニング中にそれを検出しようとする。しかし, 領域の分散は, 教師なしデータ拡張において, 比較学習(CL)の有効性に影響を及ぼし, 新たなカテゴリー発見においてモデルが過度に信頼される原因となりうる。これらの課題に対処するため,ODAおよびUNDAタスクに対して,Soft-Contrastive All-in-one Network (SAN) というフレームワークを提案する。 SANには、特徴伝達のためのバックボーンを微調整する新しいデータ拡張ベースのソフトコントラスト学習(SCL)と、新しいクラス発見機能を改善するためのより人間の直感的な分類器が含まれている。 SCL損失は、ドメイン転送タスクで増幅されたデータ拡張ビューノイズ問題の悪影響を弱める。 All-in-One(AIO)分類器は、現在の主流閉集合および開集合分類器の過信問題を克服する。可視化およびアブレーション実験は、提案されたイノベーションの有効性を示す。さらに、織田とUNDAの広範な実験結果から、SANは既存の最先端手法よりも優れていることが示された。 Unsupervised domain adaptation (UDA) has proven to be highly effective in transferring knowledge from a label-rich source domain to a label-scarce target domain. However, the presence of additional novel categories in the target domain has led to the development of open-set domain adaptation (ODA) and universal domain adaptation (UNDA). Existing ODA and UNDA methods treat all novel categories as a single, unified unknown class and attempt to detect it during training. However, we found that domain variance can lead to more significant view-noise in unsupervised data augmentation, which affects the effectiveness of contrastive learning (CL) and causes the model to be overconfident in novel category discovery. To address these issues, a framework named Soft-contrastive All-in-one Network (SAN) is proposed for ODA and UNDA tasks. SAN includes a novel data-augmentation-based soft contrastive learning (SCL) loss to fine-tune the backbone for feature transfer and a more human-intuitive classifier to improve new class discovery capability. The SCL loss weakens the adverse effects of the data augmentation view-noise problem which is amplified in domain transfer tasks. The All-in-One (AIO) classifier overcomes the overconfidence problem of current mainstream closed-set and open-set classifiers. Visualization and ablation experiments demonstrate the effectiveness of the proposed innovations. Furthermore, extensive experiment results on ODA and UNDA show that SAN outperforms existing state-of-the-art methods.	翻訳日:2023-07-26 00:32:37 公開日:2023-07-23
# ホモフレンドリグラフとヘテロフレンドリグラフのためのシングルパスコントラスト学習 Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph ( http://arxiv.org/abs/2211.10890v3 ) ライセンス: Link先を確認	Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, Xiaokui Xiao	(参考訳) 既存のグラフコントラスト学習(gcl)技術では、1つのインスタンスでコントラスト損失を構築するために2つのフォワードパスが必要であり、ノードの特徴の低周波信号を捉えるのに有効である。このような二重パス設計はホモ親和グラフにおいて経験的成功を示しているが、直結したノードが通常異なるラベルを持つヘテロ親和グラフの有効性は分かっていない。加えて、既存のgclアプローチは強力なパフォーマンス保証を提供しない。異種グラフに対するGCLアプローチの不予測性と相まって、実世界の文脈における適用性は限定的である。そして、自然な疑問が生まれます: 性能保証のあるホモフィルグラフとヘテロフィルグラフの両方で機能するGCL法を設計できますか? そこで本研究では,近傍集計により得られた特徴の集中特性について理論的に検討し,その特性に基づく単パスグラフのコントラスト学習損失を導入し,下流課題における損失の最小化のための性能保証を提供する。分析の結果,Single-Pass Graph Contrastive Learning法(SP-GCL)を実装した。経験的に、14のベンチマークデータセットにおいて、sp-gclによって得られた機能は、既存の強力なベースラインと非常に少ない計算オーバーヘッドでマッチしたり、性能を上回ったりすることができる。 Existing graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss, which is effective for capturing the low-frequency signals of node features. Such a dual-pass design has shown empirical success on homophilic graphs, but its effectiveness on heterophilic graphs, where directly connected nodes typically have different labels, is unknown. In addition, existing GCL approaches fail to provide strong performance guarantees. Coupled with the unpredictability of GCL approaches on heterophilic graphs, their applicability in real-world contexts is limited. Then, a natural question arises: Can we design a GCL method that works for both homophilic and heterophilic graphs with a performance guarantee? To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks. As a direct consequence of our analysis, we implement the Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14 benchmark datasets with varying degrees of homophily, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead, which demonstrates the usefulness of our findings in real-world cases.	翻訳日:2023-07-26 00:32:08 公開日:2023-07-23
# 絶対軌道誤差って何が悪いの? What's Wrong with the Absolute Trajectory Error? ( http://arxiv.org/abs/2212.05376v3 ) ライセンス: Link先を確認	Seong Hun Lee, Javier Civera	(参考訳) 一般的な絶対軌道誤差 (ate) の限界の一つは、異常値に対する感度が高いことである。その結果、少数の外れ値が存在する場合、異常軌道誤差や外れ値数が変化するため、異なる精度を反映することがしばしば発生する。本研究では,再構成されたカメラ軌跡の精度を評価するための代替誤差指標を提案する。筆者らの測度はDTE (Disnalible Trajectory Error) と命名され,(1) 基底軌道と推定軌道をシフトし,両者の幾何的中央値が起点となるように計算した。 2)対応するカメラ配向間の測地距離の和を最小限に抑えるように推定軌道を回転させる。 (3) カメラの中央値から幾何学的中央値までの距離が地上の真理と同じであるような推定軌道をスケールする。 (4)対応するカメラ間の距離を計算し、ウィンナライズし、正規化する。 (5) 平均距離と結果距離の根平均二乗(RMS)の値を取ることによりDTEを得る。この計量は、慣性軌道誤差や外れ値の数が変化するため、軌跡の精度の変化を識別できるという点で、ateの魅力的な代替手段である。また,同様の考え方を用いて,dteと同様の利点を持つ識別可能な回転誤差(dre)という新しい回転誤差測定法を提案する。さらに,測定値の計算に必要なカメラ対マーカ回転の校正を行うための簡易かつ効果的な手法を提案する。我々の手法は広範なシミュレーションによって検証される。 One of the limitations of the commonly used Absolute Trajectory Error (ATE) is that it is highly sensitive to outliers. As a result, in the presence of just a few outliers, it often fails to reflect the varying accuracy as the inlier trajectory error or the number of outliers varies. In this work, we propose an alternative error metric for evaluating the accuracy of the reconstructed camera trajectory. Our metric, named Discernible Trajectory Error (DTE), is computed in five steps: (1) Shift the ground-truth and estimated trajectories such that both of their geometric medians are located at the origin. (2) Rotate the estimated trajectory such that it minimizes the sum of geodesic distances between the corresponding camera orientations. (3) Scale the estimated trajectory such that the median distance of the cameras to their geometric median is the same as that of the ground truth. (4) Compute, winsorize and normalize the distances between the corresponding cameras. (5) Obtain the DTE by taking the average of the mean and the root-mean-square (RMS) of the resulting distances. This metric is an attractive alternative to the ATE, in that it is capable of discerning the varying trajectory accuracy as the inlier trajectory error or the number of outliers varies. Using the similar idea, we also propose a novel rotation error metric, named Discernible Rotation Error (DRE), which has similar advantages to the DTE. Furthermore, we propose a simple yet effective method for calibrating the camera-to-marker rotation, which is needed for the computation of our metrics. Our methods are verified through extensive simulations.	翻訳日:2023-07-26 00:21:16 公開日:2023-07-23
# cellmix:病理画像分類のためのデータ拡張のための汎用インスタンス関係ベース手法 CellMix: A General Instance Relationship based Method for Data Augmentation Towards Pathology Image Classification ( http://arxiv.org/abs/2301.11513v2 ) ライセンス: Link先を確認	Tianyi Zhang, Zhiling Yan, Chunhui Li, Nan Ying, Yanli Lei, Yunlu Feng, Yu Zhao, Guanglei Zhang	(参考訳) 病理画像解析では、高品質な注釈付きサンプルの取得と維持は非常に労働集約的な作業である。この課題を克服するために、従来の前処理データ拡張技術に代わる効果的な方法として混合方式が登場した。しかしながら、これらの手法は、局所特異性、グローバル分布、内部/外部インスタンス関係など、病理画像のユニークな特徴を完全に考慮できていない。これらの特徴をよりよく理解し、貴重な擬似サンプルを作成するために、新しい分布指向インプレースシャッフル手法であるCellMixフレームワークを提案する。病理インスタンスの粒度に基づいてイメージをパッチに分割し、同じバッチ内でシャッフルすることで、新しいサンプルを生成する際にインスタンス間の絶対的な関係を効果的に保存することができる。さらに,学習にインスパイアされた損失駆動型学習戦略を開発し,学習中に摂動や分布関連ノイズを処理し,モデルが拡張データに適応的に適合できるようにする。病理画像分類タスクにおける実験は、7つの異なるデータセット上での最先端(SOTA)性能を示す。このイノベーティブなインスタンス関係中心の手法は、病理画像分類のための一般的なデータ拡張アプローチを通知する可能性がある。関連コードはhttps://github.com/sagizty/cellmixで入手できる。 In pathology image analysis, obtaining and maintaining high-quality annotated samples is an extremely labor-intensive task. To overcome this challenge, mixing-based methods have emerged as effective alternatives to traditional preprocessing data augmentation techniques. Nonetheless, these methods fail to fully consider the unique features of pathology images, such as local specificity, global distribution, and inner/outer-sample instance relationships. To better comprehend these characteristics and create valuable pseudo samples, we propose the CellMix framework, which employs a novel distribution-oriented in-place shuffle approach. By dividing images into patches based on the granularity of pathology instances and shuffling them within the same batch, the absolute relationships between instances can be effectively preserved when generating new samples. Moreover, we develop a curriculum learning-inspired, loss-driven strategy to handle perturbations and distribution-related noise during training, enabling the model to adaptively fit the augmented data. Our experiments in pathology image classification tasks demonstrate state-of-the-art (SOTA) performance on 7 distinct datasets. This innovative instance relationship-centered method has the potential to inform general data augmentation approaches for pathology image classification. The associated codes are available at https://github.com/sagizty/CellMix.	翻訳日:2023-07-26 00:14:13 公開日:2023-07-23
# DetectGPT:確率曲線を用いたゼロショットマシン生成テキスト検出 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature ( http://arxiv.org/abs/2301.11305v2 ) ライセンス: Link先を確認	Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn	(参考訳) 大規模言語モデル(LLMs)の普及と普及により,LLM生成テキストの検出を支援するツールへの期待が高まっている。本稿では,そのような検出に有用な llm の確率関数の構造の性質を明らかにする。具体的には、LLMからサンプリングされたテキストがモデルのログ確率関数の負の曲率領域を占める傾向があることを示す。この観察を生かして、与えられたLLMから通路が生成されるかどうかを判断するための新しい曲率ベースの基準を定義する。このアプローチは detectiongpt と呼ばれ、個別の分類器を訓練したり、実文や生成文のデータセットを収集したり、生成されたテキストを明示的にウォーターマークしたりする必要がありません。興味のモデルと他の一般的な事前訓練された言語モデル(例えばT5)からのパスのランダムな摂動によって計算されるログ確率のみを使用する。本研究では,20Bパラメータ GPT-NeoX による偽ニュース記事の検出を,最強ゼロショットベースラインの 0.81 AUROC から DetectGPT の 0.95 AUROC に改善した。コード、データ、その他のプロジェクト情報についてはhttps://ericmitchell.ai/detectgptを参照してください。 The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.	翻訳日:2023-07-26 00:13:49 公開日:2023-07-23
# 石油・ガス運転におけるその場水質モニタリング In-situ Water quality monitoring in Oil and Gas operations ( http://arxiv.org/abs/2301.08800v2 ) ライセンス: Link先を確認	Satish Kumar, Rui Kou, Henry Hill, Jake Lempges, Eric Qian, and Vikram Jayaram	(参考訳) 農業から鉱業、エネルギーに至るまで、水質モニタリングは重要な課題である。石油・ガス事業者が淡水の消費を減らすために活動する中、長期にわたって生鮮・非フレッシュの水資源を積極的に管理することが重要となる。大規模なモニタリングのためには、多くの場所で手動のサンプリングが時間がかかりすぎて持続不可能になり、多くの分散した池、小さな湖、プレイア、湿地が広い範囲に分散している。したがって、衛星による環境モニタリングは大きな可能性を秘めている。既存の衛星ベースの監視研究の多くは、川や海などの大きな水域を監視するためにインデックスベースの手法を使用している。しかし,小池の観測では,小型水域から受信した反射信号が弱すぎて検出できなかった。この課題に対処するために, 反射率の弱い水域における汚染レベルを推定できる新しい水質指標(WQEI)モデルを提案する。私たちの結果は 1)wqeiは,実験室で測定した1200試料の水濁度を示す良質な指標である。 2) 一般に利用可能な衛星データ(LandSat8など)に本手法を適用することにより, 広域で高精度な水質モニタリングを実現することができる。これは、水面貯水池に蓄えられた水の品質を最適化し、非フレッシュ水の即応性と可用性を高めるためのツールを提供する。 From agriculture to mining, to energy, surface water quality monitoring is an essential task. As oil and gas operators work to reduce the consumption of freshwater, it is increasingly important to actively manage fresh and non-fresh water resources over the long term. For large-scale monitoring, manual sampling at many sites has become too time-consuming and unsustainable, given the sheer number of dispersed ponds, small lakes, playas, and wetlands over a large area. Therefore, satellite-based environmental monitoring presents great potential. Many existing satellite-based monitoring studies utilize index-based methods to monitor large water bodies such as rivers and oceans. However, these existing methods fail when monitoring small ponds-the reflectance signal received from small water bodies is too weak to detect. To address this challenge, we propose a new Water Quality Enhanced Index (WQEI) Model, which is designed to enable users to determine contamination levels in water bodies with weak reflectance patterns. Our results show that 1) WQEI is a good indicator of water turbidity validated with 1200 water samples measured in the laboratory, and 2) by applying our method to commonly available satellite data (e.g. LandSat8), one can achieve high accuracy water quality monitoring efficiently in large regions. This provides a tool for operators to optimize the quality of water stored within surface storage ponds and increasing the readiness and availability of non-fresh water.	翻訳日:2023-07-26 00:12:51 公開日:2023-07-23
# clipter: シーンのテキスト認識で大きな画像を見る CLIPTER: Looking at the Bigger Picture in Scene Text Recognition ( http://arxiv.org/abs/2301.07464v2 ) ライセンス: Link先を確認	Aviad Aberdam, David Bensa\"id, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman	(参考訳) 現実世界のシナリオでテキストを読むには、周囲の状況を理解する必要がある。しかし、現在のシーンのテキスト認識者は、切り抜かれたテキスト画像を操作するとき、より大きな画像に気づいていない。本研究では,CLIPのような現代視覚言語モデルの代表的能力を利用して,作物認識者にシーンレベルの情報を提供する。視覚言語モデルから得られた画像全体のリッチな表現と,ゲート型クロスアテンション機構による認識者単語レベルの特徴を融合することにより,これを実現する。このコンポーネントは徐々にコンテキスト強調表現に移行し、事前訓練された認識器の安定した微調整を可能にする。本稿では,モデル非依存のフレームワークであるclipter (clip text recognition) の有効性を示し,複数のベンチマークで最新の結果を得る。さらに,語彙外単語に対するロバスト性の向上と,低データ体制における一般化の強化も強調した。 Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.	翻訳日:2023-07-26 00:12:29 公開日:2023-07-23
# キャプションで裏切られた:open vocabularyインスタンスセグメンテーションのための共同キャプショングラウンドと生成 Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation ( http://arxiv.org/abs/2301.00805v2 ) ライセンス: Link先を確認	Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy	(参考訳) 本研究では,オープン語彙のインスタンスセグメンテーションに着目し,セグメンテーションモデルを拡張して,インスタンスレベルの新規カテゴリを分類・分割する。従来のアプローチでは、大量のキャプションデータセットと複雑なパイプラインを使用して、キャプション内の画像領域と単語間の1対1のマッピングを確立してきた。しかし、このような手法は、形容詞や動詞などの画像領域に非可視的な単語をマッチングすることで、ノイズの多い監視を構築する。一方、文脈語は、新しいカテゴリーと高い相関関係を示すため、新しい対象の存在を推測する上でも重要である。このような制約を克服するため、学習効率を向上させるために、一致したオブジェクト名詞にのみ焦点をあてる新しい接地損失を取り入れた、共同で \textbf{Caption Grounding and Generation (CGG) フレームワークを考案した。また,接地損失の補足として,追加の監督と文脈モデリングを可能にするキャプション生成ヘッドを導入する。解析と結果から,新たな授業のセグメンテーション性能を大幅に向上させ,グラウンドディングとジェネレーションコンポーネントが相互に補完することを示す。 OVIS(Open Vocabulary Instance Segmentation)とOSPS(Open Set Panoptic Segmentation)の2つの設定によるCOCOデータセットの実験は、CGGの優位性を示している。特に、CGGはOVISタスクの余分なデータなしで新規クラスの6.8% mAPを大幅に改善し、OSPSベンチマークでは新しいクラスの15%のPQ改善を実現している。 In this work, we focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories. Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and words in captions. However, such methods build noisy supervision by matching non-visible words to image regions, such as adjectives and verbs. Meanwhile, context words are also important for inferring the existence of novel objects as they show high inter-correlations with novel categories. To overcome these limitations, we devise a joint \textbf{Caption Grounding and Generation (CGG)} framework, which incorporates a novel grounding loss that only focuses on matching object nouns to improve learning efficiency. We also introduce a caption generation head that enables additional supervision and contextual modeling as a complementation to the grounding loss. Our analysis and results demonstrate that grounding and generation components complement each other, significantly enhancing the segmentation performance for novel classes. Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG. Specifically, CGG achieves a substantial improvement of 6.8% mAP for novel classes without extra data on the OVIS task and 15% PQ improvements for novel classes on the OSPS benchmark.	翻訳日:2023-07-26 00:10:50 公開日:2023-07-23
# グラフ生成からグラフ分類へ From Graph Generation to Graph Classification ( http://arxiv.org/abs/2302.07989v3 ) ライセンス: Link先を確認	Oliver Schulte	(参考訳) 本稿では,グラフ生成モデル(GGM)を利用したグラフの分類手法について述べる。グラフとそのクラスラベル上の合同確率分布を定義する ggm を仮定すると、私はグラフが与えられたクラスラベルの確率の分類公式を導出する。新しい条件付きelboは、識別のための生成グラフオートエンコーダモデルを訓練するために使用できる。生成モデルを分類に活用することは、非関係データ、すなわちデータに対してよく研究されているが、我々の知識では、グラフ分類に対する新しいアプローチである。 This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leveraging generative models for classification has been well explored for non-relational i.i.d. data, to our knowledge it is a novel approach to graph classification.	翻訳日:2023-07-26 00:02:31 公開日:2023-07-23
# 測定交替イジング量子臨界 Measurement-altered Ising quantum criticality ( http://arxiv.org/abs/2302.04325v3 ) ライセンス: Link先を確認	Sara Murciano, Pablo Sala, Yue Liu, Roger S. K. Mong and Jason Alicea	(参考訳) 量子臨界系は、摂動に自然に敏感なため、新しい測定によって引き起こされる現象を探索するための魅力的なプラットフォームを構成する。数値測定が量子臨界鎖のパラダイム的イジングに与える影響を明示的なプロトコルを用いて検討し,相関したアンシラが臨界鎖と絡み合って投影的に測定されることを示した。広範囲な数値シミュレーションによって支持される摂動解析フレームワークを用いて, 測定値がエンタングゲートの選択, 基数, 測定結果, 基数に依存する方法で, 長距離相関を定性的に変化させることができることを実証した。測定結果における相関の挙動を定量的に予測し,測定平均値における測定交替イジング臨界性を検出するための2つの手法を同定した。まず、測定結果に対するオーダーパラメータ期待値の2乗平均化は、平均的なオーダーパラメータ自体が消滅しても、一定の測定結果で発芽したオーダーパラメータ凝縮の記憶を保持する。第二に、ある場合において、異なる対称性分野に属する測定結果よりも観測可能度を個別に評価できることを示し、これらの「対称性解決平均」は、標準線形平均化観測可能度を考慮しても測定効果を明らかにする。対称解法の平均値とポスト選択法を実験的に合理的に追求できる相補的レジームを同定し,前者は十分弱いアンシラ臨界鎖の絡み合いの限界において後者よりも優れていた。我々のフレームワークは自然に、よりエキゾチックな量子臨界点に適応し、NISQハードウェアやRydberg配列での実験的な実現の可能性を強調する。 Quantum critical systems constitute appealing platforms for the exploration of novel measurement-induced phenomena due to their innate sensitivity to perturbations. We study the impact of measurement on paradigmatic Ising quantum critical chains using an explicit protocol, whereby correlated ancilla are entangled with the critical chain and then projectively measured. Using a perturbative analytic framework supported by extensive numerical simulations, we demonstrate that measurements can qualitatively alter long-distance correlations in a manner dependent on the choice of entangling gate, ancilla measurement basis, measurement outcome, and nature of ancilla correlations. We derive numerous quantitative predictions for the behavior of correlations in select measurement outcomes, and also identify two strategies for detecting measurement-altered Ising criticality in measurement-averaged quantities. First, averaging the square of the order-parameter expectation value over measurement outcomes retains memory of order parameter condensation germinated in fixed measurement outcomes -- even though on average the order parameter itself vanishes. Second, we show that, in certain cases, observables can be averaged separately over measurement outcomes residing in distinct symmetry sectors, and that these `symmetry-resolved averages' reveal measurement effects even when considering standard linearly averaged observables. We identify complementary regimes in which symmetry-resolved averages and post-selection can be pursued reasonably efficiently in experiment, with the former generically outperforming the latter in the limit of sufficiently weak ancilla-critical chain entanglement. Our framework naturally adapts to more exotic quantum critical points and highlights opportunities for potential experimental realization in NISQ hardware and in Rydberg arrays.	翻訳日:2023-07-26 00:01:50 公開日:2023-07-23
# 効率の良い勾配値推定に向けて Toward Efficient Gradient-Based Value Estimation ( http://arxiv.org/abs/2301.13757v3 ) ライセンス: Link先を確認	Arsalan Sharifnassab, Richard Sutton	(参考訳) 強化学習における値推定法は安定性がよいが,時間差(TD)学習法よりもかなり遅いのが一般的である。この遅さの根本原因を考察し,平均正方形ベルマン誤差 (msbe) が条件数が大きいという意味では不条件損失関数であることを示した。グラデーションベース法におけるmsbeの低条件化の悪影響を解決するため,ガウス・ニュートン方向にほぼ従い,パラメータ化に漸近的にロバストな低複雑性バッチフリー近位法を提案する。 RANSと呼ばれる本アルゴリズムは, 計算複雑性がほぼ同じでありながら, 残留勾配法よりもかなり高速であるという意味で効率的であり, テストした古典的問題に対してTDと競合する。 Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.	翻訳日:2023-07-26 00:00:02 公開日:2023-07-23
# menucraft: 大きな言語モデルによるインタラクティブなメニューシステム設計 MenuCraft: Interactive Menu System Design with Large Language Models ( http://arxiv.org/abs/2303.04496v2 ) ライセンス: Link先を確認	Amir Hossein Kargaran, Nafiseh Nikeghbal, Abbas Heydarnoori and Hinrich Sch\"utze	(参考訳) メニューシステム設計は多くの設計オプションと様々なヒューマンファクターを含む課題である。例えば、デザイナーが考慮する必要がある重要な要素はメニューコマンドの意味的かつ体系的な関係である。しかし、利用可能なリソースが限られているため、これらの関係を捉えることは困難である。ニューラル言語モデルの進歩により、大きな言語モデルはメニューシステムの設計と精錬において、既存の膨大な知識を利用することができる。本稿では,メニューデザインのためのai支援デザイナーであるメニュークラフトを提案する。 MenuCraftはインタラクティブな言語ベースのメニューデザインツールで、メニューデザインプロセスをシンプルにし、デザインオプションを簡単にカスタマイズできる。 menucraftはダイアログを通じてさまざまなインタラクションをサポートし、ゼロ/フェーショット学習を実行できる。 Menu system design is a challenging task involving many design options and various human factors. For example, one crucial factor that designers need to consider is the semantic and systematic relation of menu commands. However, capturing these relations can be challenging due to limited available resources. With the advancement of neural language models, large language models can utilize their vast pre-existing knowledge in designing and refining menu systems. In this paper, we propose MenuCraft, an AI-assisted designer for menu design that enables collaboration between the designer and a dialogue system to design menus. MenuCraft offers an interactive language-based menu design tool that simplifies the menu design process and enables easy customization of design options. MenuCraft supports a variety of interactions through dialog that allows performing zero/few-shot learning.	翻訳日:2023-07-25 23:53:33 公開日:2023-07-23
# 3次元点雲における開ボキャブラリーアフォーアンス検出 Open-Vocabulary Affordance Detection in 3D Point Clouds ( http://arxiv.org/abs/2303.02401v5 ) ライセンス: Link先を確認	Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen	(参考訳) 加速度検出は様々なロボット応用において難しい問題である。従来のアフォーアンス検出手法は、予め定義されたアフォーアンスラベルに制限されており、複雑な動的環境でのインテリジェントロボットの適応性を制限する可能性がある。そこで,本稿では,3次元点雲内の無拘束数を検出できるopen-vocabulary affordance detection (openad)法を提案する。 OpenADは、手当テキストとポイント特徴を同時に学習することで、手当間の意味的関係をうまく活用する。したがって,提案手法はゼロショット検出が可能であり,単一アノテーションの例を使わずに,事前の認識不能を検出できる。集中的な実験結果から,OpenADは幅広いアベイランス検出装置で効果的に機能し,他のベースラインよりも大きなマージンで優れていた。さらに,高速な推論速度(約100ms)を持つ実世界のロボットアプリケーションにおいて,提案するOpenADの実用性を示す。私たちのプロジェクトはhttps://openad2023.github.ioで利用可能です。 Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.	翻訳日:2023-07-25 23:53:03 公開日:2023-07-23
# 散逸キャットキュービット用高忠実性ゼノゲートの設計 Designing High-Fidelity Zeno Gates for Dissipative Cat Qubits ( http://arxiv.org/abs/2303.00760v3 ) ライセンス: Link先を確認	Ronan Gautier, Mazyar Mirrahimi, Alain Sarlette	(参考訳) 誘導二光子散逸で安定化されたボソニック・キャット量子ビットは指数的にバイアスのあるノイズを持つシステムであり、低オーバーヘッド、フォールトトレラント、普遍量子コンピューティングへの扉を開く。しかし、そのような量子ビットに対する現在のゲート提案は、関連する実験パラメータによるスケーリングが不十分な非保護型のノイズをかなり引き起こす。そこで本研究では,2光子偏光の設計に用いるリザーバモードを再考し,ゲート誘起誤差の軽減にどのように活用できるかを示すことにより,放散猫量子ビットに対する新たな視点を提案する。そこで我々は,高忠実度および偏りを保った猫キュービットゲートの4つの新しい設計を導入し,これらを一般的なゲート方式と比較した。これら4つの設計は、異なる相補的なアイデアを持つ散逸系のためのゲートエンジニアリングの概要を提供する。特に,すでに達成可能な低エラーゲート設計と長期実装を提案する。 Bosonic cat qubits stabilized with a driven two-photon dissipation are systems with exponentially biased noise, opening the door to low-overhead, fault-tolerant and universal quantum computing. However, current gate proposals for such qubits induce substantial noise of the unprotected type, whose poor scaling with the relevant experimental parameters limits their practical use. In this work, we provide a new perspective on dissipative cat qubits by reconsidering the reservoir mode used to engineer the tailored two-photon dissipation, and show how it can be leveraged to mitigate gate-induced errors. Doing so, we introduce four new designs of high-fidelity and bias-preserving cat qubit gates, and compare them to the prevalent gate methods. These four designs should give a broad overview of gate engineering for dissipative systems with different and complementary ideas. In particular, we propose both already achievable low-error gate designs and longer-term implementations.	翻訳日:2023-07-25 23:52:45 公開日:2023-07-23
# 知識コンパイルによるニューラルネットワーク分類器のシャップ説明スコアの効率的な計算 Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation ( http://arxiv.org/abs/2303.06516v3 ) ライセンス: Link先を確認	Leopoldo Bertossi and Jorge E. Leon	(参考訳) Shapスコアの使用は、Explainable AIで広く使われている。しかし、特にニューラルネットワークのようなブラックボックスの分類器で処理された場合、計算は一般には難解である。最近の研究では、Shapを効率的に計算できるオープンボックスブール回路分類器のクラスが明らかにされている。効率的なシェープ計算のために,二進ニューラルネットワークをそれらの回路に変換する方法を示し,論理に基づく知識コンパイル手法を用いる。私たちの実験で示しているように、パフォーマンスの向上は巨大です。 The use of Shap scores has become widespread in Explainable AI. However, their computation is in general intractable, in particular when done with a black-box classifier, such as neural network. Recent research has unveiled classes of open-box Boolean Circuit classifiers for which Shap can be computed efficiently. We show how to transform binary neural networks into those circuits for efficient Shap computation.We use logic-based knowledge compilation techniques. The performance gain is huge, as we show in the light of our experiments.	翻訳日:2023-07-25 23:40:41 公開日:2023-07-23
# PDPP:教育ビデオにおけるプロシージャ計画のための拡散計画 PDPP:Projected Diffusion for Procedure Planning in Instructional Videos ( http://arxiv.org/abs/2303.14676v2 ) ライセンス: Link先を確認	Hanlin Wang, Yilu Wu, Sheng Guo, Limin Wang	(参考訳) 本稿では,非構造化映像における現状の視覚的観察から目標指向の計画を作成することを目的とした,指導ビデオにおける手順計画の問題について検討する。以前の研究は、この問題をシーケンス計画問題として位置づけ、重い中間視覚観察または自然言語指示を監督として活用し、複雑な学習スキームと高価なアノテーションコストを生み出した。対照的に,この問題は分布適合問題として扱われる。この意味では, 拡散モデル(pdpp)を用いて, 中間動作列分布全体をモデル化し, この分布から計画問題をサンプリングプロセスに変換する。さらに,コストのかかる中間監督を除去し,代わりに指導ビデオからのタスクラベルを監督として使用する。我々のモデルはU-Netに基づく拡散モデルであり、学習した分布からのアクションシーケンスを与えられた開始と終了の観測で直接サンプリングする。さらに,学習およびサンプリング過程において,モデルに対して正確な条件付きガイドを提供するための効率的なプロジェクション手法を適用した。異なるスケールの3つのデータセットで実験したところ、PDPPモデルはタスクの監督なしに複数のメトリクスで最先端のパフォーマンスを達成できることがわかった。コードとトレーニングされたモデルはhttps://github.com/MCG-NJU/PDPPで入手できる。 In this paper, we study the problem of procedure planning in instructional videos, which aims to make goal-directed plans given the current visual observations in unstructured real-life videos. Previous works cast this problem as a sequence planning problem and leverage either heavy intermediate visual observations or natural language instructions as supervision, resulting in complex learning schemes and expensive annotation costs. In contrast, we treat this problem as a distribution fitting problem. In this sense, we model the whole intermediate action sequence distribution with a diffusion model (PDPP), and thus transform the planning problem to a sampling process from this distribution. In addition, we remove the expensive intermediate supervision, and simply use task labels from instructional videos as supervision instead. Our model is a U-Net based diffusion model, which directly samples action sequences from the learned distribution with the given start and end observations. Furthermore, we apply an efficient projection method to provide accurate conditional guides for our model during the learning and sampling process. Experiments on three datasets with different scales show that our PDPP model can achieve the state-of-the-art performance on multiple metrics, even without the task supervision. Code and trained models are available at https://github.com/MCG-NJU/PDPP.	翻訳日:2023-07-25 23:34:04 公開日:2023-07-23
# ハイブリッドCNN-RNNの重ね合わせを用いた構造振動信号復調 Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN ( http://arxiv.org/abs/2303.11413v4 ) ライセンス: Link先を確認	Youzhi Liang, Wen Liang, Jianguo Jia	(参考訳) 振動信号は, 構造的健康モニタリング, 故障診断, 損傷検出など, 様々な工学的目的に利用され, 構造物の状態や整合性に関する貴重な情報を提供するようになっている。近年,生物工学の分野では振動信号の利用が増加している。活動誘発構造振動、特にフットステップによる信号は、人体や動物などの生体系の運動を分析するのに役立ち、個人の歩行、体重、姿勢に関する貴重な情報を提供し、健康モニタリング、セキュリティ、人間とコンピュータの相互作用のための魅力的なツールとなる。しかし、様々なノイズの存在は、フットステップによる信号解析の精度を損なう可能性がある。本稿では、複数の信号のアンサンブルと、再帰的および畳み込み型ニューラルネットワークの予測の両方を利用する新しいアンサンブルモデルを提案する。提案モデルは,前処理,ハイブリッドモデリング,アンサンブルの3段階からなる。プリプロセッシング段階では、高速フーリエ変換とウェーブレット変換を用いて特徴を抽出し、系の物理に支配されたダイナミクスを捉え、空間的および時間的特徴を抽出する。ハイブリッドモデリング段階では、fft結果と連結されたノイズ信号に双方向lstmを用い、cnnを用いて信号の凝縮特徴表現を得る。アンサンブル段階では、完全に接続されたニューラルネットワークの3つの層を用いて最終識別信号を生成する。提案モデルでは,PSNR,SNR,WMAPEを用いて,広帯域の雑音レベルのアルゴリズムよりも優れる構造振動信号に関する課題に対処する。 Vibration signals have been increasingly utilized in various engineering fields for analysis and monitoring purposes, including structural health monitoring, fault diagnosis and damage detection, where vibration signals can provide valuable information about the condition and integrity of structures. In recent years, there has been a growing trend towards the use of vibration signals in the field of bioengineering. Activity-induced structural vibrations, particularly footstep-induced signals, are useful for analyzing the movement of biological systems such as the human body and animals, providing valuable information regarding an individual's gait, body mass, and posture, making them an attractive tool for health monitoring, security, and human-computer interaction. However, the presence of various types of noise can compromise the accuracy of footstep-induced signal analysis. In this paper, we propose a novel ensemble model that leverages both the ensemble of multiple signals and of recurrent and convolutional neural network predictions. The proposed model consists of three stages: preprocessing, hybrid modeling, and ensemble. In the preprocessing stage, features are extracted using the Fast Fourier Transform and wavelet transform to capture the underlying physics-governed dynamics of the system and extract spatial and temporal features. In the hybrid modeling stage, a bi-directional LSTM is used to denoise the noisy signal concatenated with FFT results, and a CNN is used to obtain a condensed feature representation of the signal. In the ensemble stage, three layers of a fully-connected neural network are used to produce the final denoised signal. The proposed model addresses the challenges associated with structural vibration signals, which outperforms the prevailing algorithms for a wide range of noise levels, evaluated using PSNR, SNR, and WMAPE.	翻訳日:2023-07-25 23:31:35 公開日:2023-07-23
# Sparse から Precise へ:心内エコー分割術の実際的編集法 From Sparse to Precise: A Practical Editing Approach for Intracardiac Echocardiography Segmentation ( http://arxiv.org/abs/2303.11041v2 ) ライセンス: Link先を確認	Ahmed H. Shahin, Yan Zhuang, Noha El-Zehiry	(参考訳) 心房細動に対する正確なカテーテル・アブレーション法は心内エコー画像(ICE)で心構造を正確に区分けする必要がある。従来の研究では、ICEトランスデューサからの3次元幾何情報を用いて、3次元グリッドに2次元フレームを配置することで、スパースICEボリュームを作成する手法が提案されている。しかし、これらのモデルから得られた3dマスクは不正確であり、氷データやフレームのずれ、心臓の運動による深刻な臨床合併症を引き起こす可能性がある。この問題に対処するために,ユーザが2次元フレームにスクリブルを描画することでセグメンテーション出力を編集できるインタラクティブな編集フレームワークを提案する。ユーザインタラクションを3Dグリッドにマッピングして、前のセグメンテーションをインタラクションから離れて保存しながら、インタラクションの近傍のセグメンテーションを変更する編集ステップを実行する。さらに,従来の編集を妥協することなく,セグメンテーション出力に複数の編集を順次対応させる。本稿では,新しい損失関数と編集専用に設計された新しい評価指標を提案する。クロスバリデーションとテストの結果から,提案する損失関数は,セグメンテーション品質およびユーザ入力後の標準損失およびトレーニング戦略を上回っていることが示唆された。さらに,通常のセグメント化損失とは対照的に,その後の編集が従来の編集を損なわないことを定量的に定性的に示す。全体としては,ユーザのインタラクションから望ましくない変更を回避しつつ,事前に編集した領域の品質を損なうことなくセグメント化の精度を高め,患者の予後を改善する。 Accurate and safe catheter ablation procedures for patients with atrial fibrillation require precise segmentation of cardiac structures in Intracardiac Echocardiography (ICE) imaging. Prior studies have suggested methods that employ 3D geometry information from the ICE transducer to create a sparse ICE volume by placing 2D frames in a 3D grid, enabling training of 3D segmentation models. However, the resulting 3D masks from these models can be inaccurate and may lead to serious clinical complications due to the sparse sampling in ICE data, frames misalignment, and cardiac motion. To address this issue, we propose an interactive editing framework that allows users to edit segmentation output by drawing scribbles on a 2D frame. The user interaction is mapped to the 3D grid and utilized to execute an editing step that modifies the segmentation in the vicinity of the interaction while preserving the previous segmentation away from the interaction. Furthermore, our framework accommodates multiple edits to the segmentation output in a sequential manner without compromising previous edits. This paper presents a novel loss function and a novel evaluation metric specifically designed for editing. Results from cross-validation and testing indicate that our proposed loss function outperforms standard losses and training strategies in terms of segmentation quality and following user input. Additionally, we show quantitatively and qualitatively that subsequent edits do not compromise previous edits when using our method, as opposed to standard segmentation losses. Overall, our approach enhances the accuracy of the segmentation while avoiding undesired changes away from user interactions and without compromising the quality of previously edited regions, leading to better patient outcomes.	翻訳日:2023-07-25 23:31:07 公開日:2023-07-23
# 推定テーマ最適化と統合推定最適化とサンプル平均近似:確率的優位性の観点から Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective ( http://arxiv.org/abs/2304.06833v2 ) ライセンス: Link先を確認	Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, Yunfan Zhao	(参考訳) データ駆動確率最適化では、最適化タスクに加えて、基盤となる分布のモデルパラメータをデータから推定する必要がある。近年の文献では、最高の経験的客観的性能につながるモデルパラメータを選択することによって、推定と最適化のプロセスを統合することを考える。統合推定最適化(ieo)と呼ばれるこの統合アプローチは、モデルが誤って特定された場合、単純な推定最適化(eto)を上回ることが容易に示せる。本稿では,モデルクラスが十分に特定され,十分なデータがある場合に,逆挙動が現れることを示す。具体的には, 一般の非線形確率最適化問題に対して, モデルクラスが基底真理をカバーしている場合, 単純ETOがIEOの漸近的に優れていることを示す。つまり、後悔の分布全体、すなわち平均や他の瞬間だけでなく、IEOと比べて常にETOの方が良い。結果はまた、決定が観測された特徴に依存する制約付き文脈最適化問題にも適用できる。また, 標準サンプル平均近似 (saa) が, モデルクラスが後悔の観点でよく特定され, 誤特定された場合に最善の場合には, いかに最悪かを実証する。最後に、理論的比較を裏付ける実験結果を提供し、洞察が有限サンプル状態および様々な誤識別の下でいつ保持されるかを示す。 In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.	翻訳日:2023-07-25 23:23:28 公開日:2023-07-23
# 変動境界付近で観測可能な場 Field observables near a fluctuating boundary ( http://arxiv.org/abs/2304.05992v2 ) ライセンス: Link先を確認	Federico Armata, Salvatore Butera, Federico Montalbano, Roberto Passante and Lucia Rizzuto	(参考訳) 本稿では,有限質量の可動導電壁を有するキャビティ内の無質量スカラー場の閉じ込めに関するいくつかの側面について検討し,高調波ポテンシャルによって結合される平衡位置を自由に移動でき,その力学的自由度を量子力学的に記述する。この系は、その平衡位置から可動壁の小さな変位に対して、場とミラーの間の効果的な相互作用ハミルトニアン、場作用素における二次、ミラー作用素における線形によって記述することができる。相互作用,すなわち服装,基底状態において,まず場エネルギー密度などの局所場観測性について考察し,固定壁の場合に対する可動壁を有するキャビティ内の場エネルギー密度の変化と,2つの壁の間の通常のカシミール力の補正について検討する。次に、有限質量の可動壁によって分離された2つの1次元キャビティのケースと、2つのキャビティで定義された2つのマスレススカラー場について検討する。この場合, 2つのキャビティの正方形場間の相関は, 可動壁を媒介とし, 固定壁の場合と異なっていた。 We review several aspects related to the confinement of a massless scalar field in a cavity with a movable conducting wall of finite mass, free to move around its equilibrium position to which it is bound by a harmonic potential, and whose mechanical degrees of freedom are described quantum mechanically. This system, for small displacements of the movable wall from its equilibrium position, can be described by an effective interaction Hamiltonian between the field and the mirror, quadratic in the field operators and linear in the mirror operators. In the interacting, i.e. dressed, ground state, we first consider local field observables such as the field energy density: we evaluate changes of the field energy density in the cavity with the movable wall with respect to the case of a fixed wall, and corrections to the usual Casimir forces between the two walls. We then investigate the case of two one-dimensional cavities separated by a movable wall of finite mass, with two massless scalar fields defined in the two cavities. We show that in this case correlations between the squared fields in the two cavities exist, mediated by the movable wall, at variance with the fixed-wall case.	翻訳日:2023-07-25 23:23:02 公開日:2023-07-23
# ELVIS:モーダル内類似性を考慮した視覚言語事前学習の局所性向上 ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity ( http://arxiv.org/abs/2304.05303v2 ) ライセンス: Link先を確認	Sumin Seo, JaeWoong Shin, Jaewoo Kang, Tae Soo Kim, Thijs Kooi	(参考訳) 深層学習は胸部X線画像(CXR)の読影において放射線技師を支援する大きな可能性を示しているが、パフォーマンス向上のための高価なアノテーションの必要性は、広く臨床応用を妨げている。視覚言語事前学習(VLP)は、大量の無線画像とペア形式(画像テキストペア)の定期的なレポートを活用することで、アノテーションの負担とコストを軽減することができる。さらに、CXRにおけるコンピュータ支援診断(CAD)の異常の正確な局在化の必要性に対処するために、ローカライズ対応VLPの拡張も提案されている。しかし, 局所性を考慮したVLP文献による定式化は, 下流の局所化作業に必要な空間的関係の喪失につながることがわかった。そこで本研究では,VLP の局所性をモダル内類似性に富む ELVIS を提案し,モダル内類似性を認識した VLP を用いて,X線写真やレポート内の局所性をよりよく保存し,テキストレポートにおける位置参照の理解能力を高める。我々の局所性認識型VLP法は,複数のセグメンテーションタスクとMS-CXRフレーズグラウンドタスクにおいて,最先端のアートベースラインを著しく上回る。 ELVISは,従来の手法と比較して,レポートテキストに記述された関心領域によく焦点が当てられており,解釈可能性の向上が期待できる。 Deep learning has shown great potential in assisting radiologists in reading chest X-ray (CXR) images, but its need for expensive annotations for improving performance prevents widespread clinical application. Visual language pre-training (VLP) can alleviate the burden and cost of annotation by leveraging routinely generated reports for radiographs, which exist in large quantities as well as in paired form (image-text pairs). Additionally, extensions to localization-aware VLPs are being proposed to address the needs for accurate localization of abnormalities for computer-aided diagnosis (CAD) in CXR. However, we find that the formulation proposed by locality-aware VLP literature actually leads to a loss in spatial relationships required for downstream localization tasks. Therefore, we propose Empowering Locality of VLP with Intra-modal Similarity, ELVIS, a VLP aware of intra-modal locality, to better preserve the locality within radiographs or reports, which enhances the ability to comprehend location references in text reports. Our locality-aware VLP method significantly outperforms state-of-the art baselines in multiple segmentation tasks and the MS-CXR phrase grounding task. Qualitatively, we show that ELVIS focuses well on regions of interest described in the report text compared to prior approaches, allowing for enhanced interpretability.	翻訳日:2023-07-25 23:22:22 公開日:2023-07-23
# 空洞磁気光学における非相互絡み合い Nonreciprocal entanglement in cavity-magnon optomechanics ( http://arxiv.org/abs/2305.03325v2 ) ライセンス: Link先を確認	Jiaojiao Chen, Xiao-Gang Fan, Wei Xiong, Dong Wang, Liu Ye	(参考訳) マクロな量子効果を研究するための有望なプラットフォームであるキャビティ光学は、サグネック効果による非相互絡みの研究に広く用いられている。本稿では,マグノンカー効果を用いるハイブリッドキャビティ-マグノン光学系において,マグノン,光子,フォノン間の非相互絡み合いを実現する方法を提案する。我々はカー効果がマグノン周波数シフトと追加の2つのマグノン効果をもたらすことを示す。どちらも正から負まで、マゼクティック場の方向をチューニングすることで調整でき、非相反性に繋がる。マグノン周波数デチューニングや2マグノン効果の係数などのシステムパラメータをチューニングすることにより、二成分および三成分の絡み合いを非相対的に向上させることができる。定義した双方向コントラスト比のさらなる研究により, システム内の非相互性はオン/オフ可能であり, 浴槽温度で操作できることがわかった。本提案は,マグノンカー効果と非相互絡み合いを示す潜在経路を提供するだけでなく,非線形効果を持つハイブリッドキャビティ・マグノン光学系における多種多様な非相互デバイスの設計・設計への道を開く。 Cavity optomechanics, a promising platform to investigate macroscopic quantum effects, has been widely used to study nonreciprocal entanglement with Sagnec effect. Here we propose an alternative way to realize nonreciprocal entanglemment among magnons, photons, and phonons in a hybrid cavity-magnon optomechanics, where magnon Kerr effect is used. We show that the Kerr effect gives rise to a magnon frequency shift and an additional two-magnon effect. Both of them can be tuned from positive to negative via tuning the magectic field direction, leading to nonreciprocity. By tuning system parameters such as magnon frequency detuning or the coefficient of the two-magnon effect, bipartite and tripartite entanglements can be nonreciprocally enhanced. By further studying the defined bidirectional contrast ratio, we find that nonreciprocity in our system can be switch on and off, and can be engineered by the bath temperature. Our proposal not only provides a potential path to demonstrate nonreciprocal entanglement with the magnon Kerr effect, but also opens a direction to engineer and design diverse nonreciprocal devices in hybrid cavity-magnon optomechanics with nonlinear effects.	翻訳日:2023-07-25 23:03:35 公開日:2023-07-23
# メタバースにおける意味コミュニケーションとAI生成コンテンツの統合フレームワーク A Unified Framework for Integrating Semantic Communication and AI-Generated Content in Metaverse ( http://arxiv.org/abs/2305.11911v2 ) ライセンス: Link先を確認	Yijing Lin, Zhipeng Gao, Hongyang Du, Dusit Niyato, Jiawen Kang, Abbas Jamalipour, Xuemin Sherman Shen	(参考訳) Metaverseが成長を続けるにつれて、効率的なコミュニケーションとインテリジェントなコンテンツ生成の必要性がますます重要になっている。セマンティックコミュニケーションはユーザ入力から意味と理解を伝えることに焦点を当て、AI生成コンテンツは人工知能を使用してデジタルコンテンツと体験を作成する。統合セマンティックコミュニケーションとAI生成コンテンツ(ISGC)は最近多くの注目を集めており、ユーザ入力から意味情報を転送し、デジタルコンテンツを生成し、Metaverseのグラフィックを描画する。本稿では,isgcの資源割当を最適化するための統合ゲインと,目標指向の高品質コンテンツ生成のための協調ゲインと,コミュニケーションとコンテンツの両方の観点からの没入性を改善するための統合フレームワークを提案する。また,既存のisgcソリューションを分類し,isgcの主要コンポーネントを分析し,いくつかのユースケースを示す。次に,拡散モデルに基づくケーススタディを構築し,メタバースにおける意味抽出,コンテンツ生成,グラフィックレンダリングを行うための最適なリソース割当戦略を同定する。最後に,いくつかのオープン研究課題について議論し,isgcとその関連応用の可能性についてさらに検討する。 As the Metaverse continues to grow, the need for efficient communication and intelligent content generation becomes increasingly important. Semantic communication focuses on conveying meaning and understanding from user inputs, while AI-Generated Content utilizes artificial intelligence to create digital content and experiences. Integrated Semantic Communication and AI-Generated Content (ISGC) has attracted a lot of attentions recently, which transfers semantic information from user inputs, generates digital content, and renders graphics for Metaverse. In this paper, we introduce a unified framework that captures ISGC two primary benefits, including integration gain for optimized resource allocation and coordination gain for goal-oriented high-quality content generation to improve immersion from both communication and content perspectives. We also classify existing ISGC solutions, analyze the major components of ISGC, and present several use cases. We then construct a case study based on the diffusion model to identify an optimal resource allocation strategy for performing semantic extraction, content generation, and graphic rendering in the Metaverse. Finally, we discuss several open research issues, encouraging further exploring the potential of ISGC and its related applications in the Metaverse.	翻訳日:2023-07-25 21:16:44 公開日:2023-07-23
# 視覚的接地・自己監督音声モデルにおけるシラブル発見と言語間一般化 Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model ( http://arxiv.org/abs/2305.11435v2 ) ライセンス: Link先を確認	Puyuan Peng, Shang-Wen Li, Okko R\"as\"anen, Abdelrahman Mohamed, David Harwath	(参考訳) 本稿では,視座訓練目標を用いた自己教師あり音声モデルの訓練において,音節単位を捉えた表現が出現することを示す。マスク付き言語モデリング損失で訓練されたほぼ同一のモデルアーキテクチャ(HuBERT)が、このような能力を示していないことを実証し、この現象の出現に視覚的基盤が関与していることを示す。本研究では,音声中の音節境界を自動的に予測する最小カットアルゴリズムと,同一音節をグループ化する2段階クラスタリング法を提案する。我々のモデルは、訓練された言語(英語)で最先端の音節セグメンテーション法を上回っているだけでなく、ゼロショット方式でエストニア語に一般化している。最後に,Zerospeech Challengeの他の4言語に対する単語分割タスクに対して,同じモデルでゼロショットの一般化が可能であることを示す。 In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective. We demonstrate that a nearly identical model architecture (HuBERT) trained with a masked language modeling loss does not exhibit this same ability, suggesting that the visual grounding objective is responsible for the emergence of this phenomenon. We propose the use of a minimum cut algorithm to automatically predict syllable boundaries in speech, followed by a 2-stage clustering method to group identical syllables together. We show that our model not only outperforms a state-of-the-art syllabic segmentation method on the language it was trained on (English), but also generalizes in a zero-shot fashion to Estonian. Finally, we show that the same model is capable of zero-shot generalization for a word segmentation task on 4 other languages from the Zerospeech Challenge, in some cases beating the previous state-of-the-art.	翻訳日:2023-07-25 21:16:00 公開日:2023-07-23
# MGR:マルチジェネレータに基づく合理化 MGR: Multi-generator Based Rationalization ( http://arxiv.org/abs/2305.04492v8 ) ライセンス: Link先を確認	Wei Liu, Haozhao Wang, Jun Wang, Ruixuan Li, Xinyang Li, Yuankai Zhang, Yang Qiu	(参考訳) 合理化は、ジェネレータと予測器を用いて、ジェネレータが入力テキストの人間の知性の部分集合を次の予測器に選択する自己説明型NLPモデルを構築することである。しかし、合理化には2つの重要な課題、すなわち、スプリアス相関とデジェネレーションがあり、予測器は、未熟な訓練済みジェネレータによって選択されたスプリアスまたは無意味なピースを過剰に適合させ、ジェネレータを劣化させる。 2つの課題に対処するために多くの研究が提案されているが、通常は個別に設計されており、どちらも考慮していない。本稿では,この2つの問題を同時に解くために,MGRというシンプルな手法を提案する。 MGRの鍵となる考え方は、実際の部品の発生安定性を改善し、より有意義な部品を予測者に届けるように複数の発電機を採用することである。実験により,MGRは最先端手法と比較してF1スコアを最大20.9%改善することがわかった。コードはhttps://github.com/jugechengzi/Rationalization-MGRで公開されている。 Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization suffers from two key challenges, i.e., spurious correlation and degeneration, where the predictor overfits the spurious or meaningless pieces solely selected by the not-yet well-trained generator and in turn deteriorates the generator. Although many studies have been proposed to address the two challenges, they are usually designed separately and do not take both of them into account. In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems. The key idea of MGR is to employ multiple generators such that the occurrence stability of real pieces is improved and more meaningful pieces are delivered to the predictor. Empirically, we show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods. Codes are available at https://github.com/jugechengzi/Rationalization-MGR .	翻訳日:2023-07-25 21:15:10 公開日:2023-07-23
# 分子ドッキングと機械学習回帰法を用いたCOVID-19 3CLプロテアーゼを標的とした薬物精製 Drug Repurposing Targeting COVID-19 3CL Protease using Molecular Docking and Machine Learning Regression Approach ( http://arxiv.org/abs/2305.18088v5 ) ライセンス: Link先を確認	Imra Aqeel, and Abdul Majid	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは世界的な健康危機を招き、緊急治療を必要としている。薬物の再利用は、時間、コスト、労働を節約できるので、有望な解決策として現れてきた。しかし、新型コロナウイルス治療のために特定された再使用薬物の数はまだ限られており、より効率的で包括的な薬物再利用アプローチが必要である。本研究では,分子ドッキング法と機械学習回帰法を組み合わせた薬物再導入法を用いて,covid-19治療の候補候補を同定することを目的とした。ウイルスの複製における重要な酵素であるSARS-CoV-2の主プロテアーゼ3CLを標的とした5903の薬剤のスクリーニングに,Zincデータベースを利用した。薬物の主プロテアーゼ3CLへの結合親和性を評価するために分子ドッキングを行い、QSARモデリングに機械学習回帰アプローチを用いて高い結合親和性を持つ薬物を同定した。以上の結果から,決定木回帰モデル (dtr) は r2 と rmse の最も優れた統計指標であり,15 kcal/mol から -13 kcal/mol の範囲で 6 種類の有望薬を短縮した。これらの薬剤は、他の研究で既に同定されている1つの抗ウイルス性ZINC203757351化合物を除いて、新規な再精製能を有する。我々はさらに、これらのトップランク選択薬の理化学的および薬物動態的性質と、特定の標的プロテアーゼ3clproに対する最適な結合相互作用を解析した。本研究は、covid-19に対する薬物再導入のための効率的な枠組みを提供し、分子ドッキングと機械学習回帰アプローチを組み合わせることによって、潜在的な治療候補の同定を加速する可能性を実証する。この結果は、世界的な健康上の重要な課題である新型コロナウイルスの効果的な治療法を見つけるという大きな目標に寄与する。 The COVID-19 pandemic has created a global health crisis, with an urgent need for effective treatments. Drug repurposing has emerged as a promising solution, as it can save time, cost, and labor. However, the number of identified repurposed drugs for COVID-19 treatment remains limited, and there is a need for more efficient and comprehensive drug repurposing approaches. In this study, we aimed to identify potential therapeutic candidates for COVID-19 treatment through drug repurposing using a combination of molecular docking and machine learning regression approaches. We utilized the Zinc database to screen 5903 World-approved drugs for their potential to target the main protease 3CL of SARS-CoV-2, which is a key enzyme in the replication of the virus. We performed molecular docking to evaluate the binding affinity of the drugs to the main protease 3CL, and used several machine learning regression approaches for QSAR modeling to identify drugs with high binding affinity. Our results showed that the Decision Tree Regression (DTR) model had the best statistical measures of R2 and RMSE, and we shortlisted six promising drugs within the range of -15 kcal/mol to -13 kcal/mol. These drugs have novel repurposing potential, except for one antiviral ZINC203757351 compound that has already been identified in other studies. We further analyzed the physiochemical and pharmacokinetic properties of these top-ranked selected drugs and their best binding interaction for specific target protease 3CLpro. Our study provides an efficient framework for drug repurposing against COVID-19, and demonstrates the potential of combining molecular docking with machine learning regression approaches to accelerate the identification of potential therapeutic candidates. Our findings contribute to the larger goal of finding effective treatments for COVID-19, which is a critical global health challenge.	翻訳日:2023-07-25 21:06:46 公開日:2023-07-23
# 長文のニューラル自然言語処理:最新技術に関する調査 Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art ( http://arxiv.org/abs/2305.16259v5 ) ライセンス: Link先を確認	Dimitrios Tsirmpas, Ioannis Gkionis, Ioannis Mademlis, Georgios Papadopoulos	(参考訳) ディープニューラルネットワーク(DNN)の採用は、過去10年間で自然言語処理(NLP)に大きな恩恵を受けている。しかし、長文解析の要求は短いテキストの要求とはかなり異なるが、オンラインにアップロードされた文書のサイズが増大すると、長文の自動理解が重要な問題となる。関連するアプリケーションは、自動化されたWebマイニング、法的文書レビュー、医療記録分析、財務報告分析、契約管理、環境影響評価、ニュース集約などである。長い文書を解析するための効率的なアルゴリズムが近年開発されているにもかかわらず、この分野の実践的ツールは現在盛んである。この記事では、この動的ドメインのエントリポイントとして機能し、2つの目的を達成することを目的としています。まず、関連するニューラルネットワーク構築ブロックの概要を提供し、フィールドの簡潔なチュートリアルとして機能する。第二に、ドキュメント分類と文書要約という2つの重要なタスクを中心に、ロングドキュメントnlpにおける現在の最先端の簡単な検証を提供する。典型的には文書分類の特定の事例として扱われるので、長文の感性分析もカバーされている。そこで本稿では,文書レベルの分析の序文として,主な課題,課題,既存ソリューションについて述べる。最後に、この記事は、この分野のさらなる研究を促進するために利用可能な注釈付きデータセットを提示している。 The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long document analysis are quite different from those of shorter texts, while the ever increasing size of documents uploaded on-line renders automated understanding of lengthy texts a critical issue. Relevant applications include automated Web mining, legal document review, medical records analysis, financial reports analysis, contract management, environmental impact assessment, news aggregation, etc. Despite the relatively recent development of efficient algorithms for analyzing long documents, practical tools in this field are currently flourishing. This article serves as an entry point into this dynamic domain and aims to achieve two objectives. Firstly, it provides an overview of the relevant neural building blocks, serving as a concise tutorial for the field. Secondly, it offers a brief examination of the current state-of-the-art in long document NLP, with a primary focus on two key tasks: document classification and document summarization. Sentiment analysis for long texts is also covered, since it is typically treated as a particular case of document classification. Consequently, this article presents an introductory exploration of document-level analysis, addressing the primary challenges, concerns, and existing solutions. Finally, the article presents publicly available annotated datasets that can facilitate further research in this area.	翻訳日:2023-07-25 21:05:34 公開日:2023-07-23
# マイクロ波ドレッシングによるRydberg状態dc偏光率の低減 Reducing Rydberg state dc polarizability by microwave dressing ( http://arxiv.org/abs/2305.15200v3 ) ライセンス: Link先を確認	J.C. Bohorquez, R. Chinnarasu, J. Isaacs, D. Booth, M. Beck, R. McDermott, and M. Saffman	(参考訳) 我々はセシウム原子Rydberg状態のdc偏光率の減少をマイクロ波電場ドレッシングを用いた77K環境で実証した。特に、5,35 GHzから5,1D_{5/2}$の共鳴を持つ5,2P_{3/2}$の偏光性は、低温環境下で超伝導共振器と対向するのに適している。磁気オプティカルトラップ(MOT)損失分光法を用いてライドバーグ状態の偏光性を測定する。 52P_{3/2}$と51D_{5/2}$を結合した非共振性無線周波数(RF)ドレッシング場を用いて、52P_{3/2}$状態のdc偏光性の80$$以上を実証する。実験結果はSherley-Floquetフォーマリズムを用いて開発された原子組立場系の数値モデルとよく一致している。また,dc の偏光性低下は,dc とドレッシング場が整列している場合にはほぼゼロ化可能であるが,直交する場合には2つの偏光性低下の要因しか示さない。これらの結果は、表面近傍に存在する様々なdc場に対するリドベルク共鳴の安定化に役立ち、ハイブリッドリドベルク原子アップコンダクタンス共振器量子ゲートの開発を進展させる。 We demonstrate reduction of the dc polarizability of Cesium atom Rydberg states in a 77 K environment utilizing microwave field dressing. In particular we reduce the polarizability of $52P_{3/2}$ states which have resonances at 5.35 GHz to $51D_{5/2}$, suitable for interfacing Rydberg atoms to superconducting resonators in a cryogenic environment. We measure the polarizability of the Rydberg states using Magneto-Optical-Trap (MOT) loss spectroscopy. Using an off-resonant radio-frequency (RF) dressing field coupling $52P_{3/2}$ and $51D_{5/2}$ we demonstrate a reduction in dc polarizability of the $ 52P_{3/2}$ states over 80$\%$. Experimental findings are in good agreement with a numerical model of the atom-dressing field system developed using the Shirley-Floquet formalism. We also demonstrate that the dc polarizability reduction is highly anisotropic, with near total nulling possible when the dc and dressing fields are aligned, but only a factor of two reduction in polarizability when the fields are orthogonal. These results may aid in stabilizing Rydberg resonances against varying dc fields present near surfaces, enabling advancement in the development of hybrid Rydberg atom - superconducting resonator quantum gates.	翻訳日:2023-07-25 21:04:42 公開日:2023-07-23
# TriMLP: シーケンスレコメンデーションにおけるMLPのようなアーキテクチャの回避 TriMLP: Revenge of a MLP-like Architecture in Sequential Recommendation ( http://arxiv.org/abs/2305.14675v2 ) ライセンス: Link先を確認	Yiheng Jiang, Yuanbo Xu, Yongjian Yang, Funing Yang, Pengyang Wang and Hui Xiong	(参考訳) シークエンシャルレコメンデーション(Sequential recommendation)は、動的嗜好の推論を改善するために、歴史的なユーザ・イテムの対話行動(またはトークンと呼ばれる)のシーケンスをモデル化する。 rnn、cnn、transformerといった改良されたニューラルネットワークアーキテクチャによって、この分野はここ数年で急速にパフォーマンスが向上した。オールMLPモデルの最近の進歩は、過去の行動の変換パターンを学習するために、より少ない計算量であるトークン混合MLPの効率的な方法に光を当てている。しかし,制約のないクロストケン通信を許容し,時系列順序を無視する固有の完全接続設計により,トークン混合mlpを逐次レコメンデーションに直接適用することで性能が低下することがわかった。本稿では、修正された \underline{MLP} がトークンに順序付き相互作用を付与する新しい \underline{Tri}angular Mixer を備えた、純粋な MLP ベースのシーケンシャルレコメンデーションアーキテクチャTriMLPを提案する。 mlpのクロス-トケン相互作用は実際には行列の乗算であるので、三角形のミキサーは重み行列内の低三角ニューロンを落とし、将来のトークンからの接続をブロックし、情報漏洩を防ぎ、標準の自己回帰訓練方式で予測能力を向上させる。細粒度での長期および短期の嗜好を更にモデル化するため、ミキサーは、上述の繊細なmlp、すなわちグローバルおよびローカルミキシングに基づくデュアルブランチ構造を採用し、シーケンシャルな長距離依存性と局所パターンを別々に捉える。 MovieLens、Amazon、Tenrecを含む、さまざまなベンチマークの9つの異なるスケールデータセット(50K\textasciitilde20Mの振る舞いを含む)に関する実証的研究は、TriMLPが有望で安定した精度/効率のトレードオフを実現していることを実証している。 Sequential recommendation models sequences of historical user-item interactive behaviors (or referred as token) to better infer dynamic preferences. Fueled by the improved neural network architectures such as RNN, CNN and Transformer, this field has enjoyed rapid performance boost in the past years. Recent progress on all-MLP models lights on an efficient method with less intensive computation, token-mixing MLP, to learn the transformation patterns among historical behaviors. However, due to the inherent fully-connection design that allows the unrestricted cross-token communication and ignores the chronological order, we find that directly applying token-mixing MLP into sequential recommendation leads to subpar performance. In this paper, we present a purely MLP-based sequential recommendation architecture TriMLP with a novel \underline{Tri}angular Mixer where the modified \underline{MLP} endows tokens with ordered interactions. As the cross-token interaction in MLP is actually matrix multiplication, Triangular Mixer drops the lower-triangle neurons in the weight matrix and thus blocks the connections from future tokens, which prevents information leakage and improves prediction capability under the standard auto-regressive training fashion. To further model long and short-term preferences on fine-grained level, the mixer adopts a dual-branch structure based on the delicate MLP described above, namely global and local mixing, to separately capture the sequential long-range dependencies and local patterns. Empirical study on 9 different scale datasets (contain 50K\textasciitilde20M behaviors) of various benchmarks, including MovieLens, Amazon and Tenrec, demonstrates that TriMLP attains promising and stable accuracy/efficiency trade-off, i.e., averagely surpasses several state-of-the-art baselines by 5.32\% and saves 8.44\% inference time cost.	翻訳日:2023-07-25 21:04:17 公開日:2023-07-23
# CLIPSonic: 未ラベルビデオと事前学習言語ビジョンモデルによる音声合成 CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models ( http://arxiv.org/abs/2306.09635v2 ) ライセンス: Link先を確認	Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serr\`a, Taylor Berg-Kirkpatrick, Julian McAuley	(参考訳) 近年,大量のテキスト音声データを用いた音声合成の研究が行われている。しかし,高品質なテキストアノテーションを用いた音声記録は入手が困難である。本研究では,未ラベルビデオと事前学習言語ビジョンモデルを用いて音声合成を行う。視覚モダリティを橋梁として活用し,所望のテキスト音声対応を学習することを提案する。我々は,事前学習されたコントラスト言語画像前訓練(clip)モデルで符号化されたビデオフレームに対して,映像の音声トラックを生成するための条件拡散モデルを訓練する。テスト時には,まずゼロショットモダリティ転送を行い,クリップエンコードされたテキストクエリで拡散モデルを条件付けする。しかし,画像クエリに対する顕著な性能低下が観察された。このギャップを埋めるために,事前学習した拡散事前モデルを採用し,クリップテキスト埋め込みによりクリップ画像埋め込みを生成する。その結果,提案手法の有効性が示され,事前学習した拡散前処理によりモーダリティ伝達ギャップを低減できることがわかった。音声合成に注目する一方で,提案モデルでは画像クエリから音声を生成することが可能であり,主観的聞き取りテストにおいて最先端の画像音声合成モデルと競合する性能を示す。本研究は,ビデオにおける自然な音声-視覚対応と事前学習された言語-視覚モデルのパワーを活用する,テキスト-音声合成への新たな方向を提供する。 Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.	翻訳日:2023-07-25 20:57:20 公開日:2023-07-23
# 密度に基づくクラスタリング手法の検討 A Survey of Some Density Based Clustering Techniques ( http://arxiv.org/abs/2306.09256v2 ) ライセンス: Link先を確認	Rupanka Bhuyan and Samarjeet Borah	(参考訳) 密度ベースのクラスタリングは、データセットから未知のパターンを抽出するためにデータマイニングで使用されるクラスタリングの一種である。 DBSCAN、OPTICS、DENCLUE、VDBSCAN、DVBSCAN、DBCLASD、ST-DBSCANなどの密度ベースのクラスタリング手法がある。本稿では,これらの手法について,その特性,長所,短所,そして最も重要な点として,有用かつ適切なパターンをマイニングするための異なる種類のデータセットへの適用性について検討する。 Density Based Clustering are a type of Clustering methods using in data mining for extracting previously unknown patterns from data sets. There are a number of density based clustering methods such as DBSCAN, OPTICS, DENCLUE, VDBSCAN, DVBSCAN, DBCLASD and ST-DBSCAN. In this paper, a study of these methods is done along with their characteristics, advantages and disadvantages and most importantly, their applicability to different types of data sets to mine useful and appropriate patterns.	翻訳日:2023-07-25 20:56:28 公開日:2023-07-23
# 南フロリダにおける水ステージ予測のための深層学習モデル Deep Learning Models for Water Stage Predictions in South Florida ( http://arxiv.org/abs/2306.15907v2 ) ライセンス: Link先を確認	Jimeng Shi, Zeda Yin, Rukmangadh Myana, Khandker Ishtiaq, Anupama John, Jayantha Obeysekera, Arturo Leon, Giri Narasimhan	(参考訳) 河川システムにおける水位シミュレーションと予測は,洪水警報,水理操作,洪水軽減に不可欠である。工学分野では、HEC-RAS、MIKE、SWMMといったツールを使用して、詳細な物理に基づく水理・水理計算モデルを構築し、流域全体をシミュレートし、システム内の任意の時点での水ステージを予測する。しかし、これらの物理学に基づくモデルは、特に大きな流域やより長いシミュレーションのために、計算集約的である。この問題を克服するために,我々は複数の深層学習モデル(DL)を代理モデルとして使用し,水ステージを迅速に予測する。南フロリダのマイアミ川の下流は,本論文の事例研究として選択されている。データセットは2010年1月1日から2020年12月31日まで、南フロリダ水管理地区(SFWMD)のDBHYDROデータベースからダウンロードされる。大規模な実験により、DLモデルの性能は極度の降水条件(熱帯嵐)においても物理学に基づくモデルの性能に匹敵することが示された。さらに,予測長の増加に伴うDLモデルの予測精度の低下について検討した。今後の水ステージを予測するため,我々のDLモデルでは,近年の河川系の測定変数と,近い将来に確実に予測できる共変量を用いている。要約すると、ディープラーニングモデルは、物理ベースのモデルと比較して、少なくとも1000倍のスピードアップで、同等またはより良いエラー率を達成する。 Simulating and predicting water levels in river systems is essential for flood warnings, hydraulic operations, and flood mitigations. In the engineering field, tools such as HEC-RAS, MIKE, and SWMM are used to build detailed physics-based hydrological and hydraulic computational models to simulate the entire watershed, thereby predicting the water stage at any point in the system. However, these physics-based models are computationally intensive, especially for large watersheds and for longer simulations. To overcome this problem, we train several deep learning (DL) models for use as surrogate models to rapidly predict the water stage. The downstream stage of the Miami River in South Florida is chosen as a case study for this paper. The dataset is from January 1, 2010, to December 31, 2020, downloaded from the DBHYDRO database of the South Florida Water Management District (SFWMD). Extensive experiments show that the performance of the DL models is comparable to that of the physics-based models, even during extreme precipitation conditions (i.e., tropical storms). Furthermore, we study the decline in prediction accuracy of the DL models with an increase in prediction lengths. In order to predict the water stage in the future, our DL models use measured variables of the river system from the recent past as well as covariates that can be reliably predicted in the near future. In summary, the deep learning models achieve comparable or better error rates with at least 1000x speedup in comparison to the physics-based models.	翻訳日:2023-07-25 20:47:16 公開日:2023-07-23
# オープンボキャブラリ学習に向けて:調査 Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v3 ) ライセンス: Link先を確認	Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao	(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。 In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.	翻訳日:2023-07-25 20:46:55 公開日:2023-07-23
# Magic123: 2次元および3次元拡散プリミティブを用いた高品質な3Dオブジェクト生成 Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ( http://arxiv.org/abs/2306.17843v2 ) ライセンス: Link先を確認	Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem	(参考訳) Magic123は、高品質でテクスチャ化された3Dメッシュを2Dと3Dの両方の先行画像から生成する2段階粗いアプローチである。第1段階では、神経放射場を最適化して粗い幾何を生成する。第2段階では、メモリ効率の良い微分可能なメッシュ表現を採用し、視覚的に魅力的なテクスチャを持つ高分解能メッシュを生成する。いずれの段階でも、参照ビューの監督と、2d拡散前処理と3d拡散前処理の組み合わせによる新しいビューによって3dコンテンツが学習される。生成した幾何の探索(より想像力のある)と利用(より正確な)を制御するために, 2D と 3D の先行の1つのトレードオフパラメータを導入する。さらに,テキストインバージョンと単眼深度正規化を用いて,ビュー間の一貫した外観を奨励し,解の退化を防止する。 Magic123は、合成ベンチマークと多様な実世界の画像に関する広範な実験を通じて検証され、従来の画像から3Dへの技術よりも大幅に改善されている。私たちのコード、モデル、生成された3dアセットは、https://github.com/guochengqian/magic123で利用可能です。 We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.	翻訳日:2023-07-25 20:38:09 公開日:2023-07-23
# ロジウムフォメートパドルホイール複合体の^<103}$rhnmr分光法と緩和度測定 The $^{103}$Rh NMR Spectroscopy and Relaxometry of the Rhodium Formate Paddlewheel Complex ( http://arxiv.org/abs/2306.17457v2 ) ライセンス: Link先を確認	Harry Harbor Collins, Mohamed Sabba, Gamal Moustafa, Bonifac Legrady, Murari Soundararajan, Markus Leutzsch, Malcolm H. Levitt	(参考訳) 強磁性比の低いスピン-1/2核のNMR分光は、NMR信号強度の低いため困難である。ロージウムフォーメイト「パドルホイール」複素$\mathrm{Rh_2(HCO_2)_4}$の場合、$^{103}$Rh NMRパラメータの迅速取得の手法が示されている。この手法は、$^{1}$h核からの偏光移動によって$^{103}$rh信号強度を増大させ、また、低い$\gamma$原子核を直接観測するための共通のハードルであるリングアーティファクトからの干渉を大幅に低減させる。 $^{103}$rh緩和時間定数$t_1$と$t_2$は、$^{1}$h検出実験を用いて20分以内に測定される。そして、$^{103}$Rh $T_1$のフィールド依存性を測定する。高磁場緩和は化学シフト異方性(CSA)機構によって支配される。 $^{103}$Rh遮蔽異方性は非常に大きい: $\|\Delta\sigma\|=9900\pm540\mathrm{\,ppm}$。この推定は密度汎関数理論計算と比較される。 The NMR spectroscopy of spin-1/2 nuclei with low gyromagnetic ratio is challenging, due to the low NMR signal strength. Methodology for the rapid acquisition of $^{103}$Rh NMR parameters is demonstrated for the case of the rhodium formate "paddlewheel" complex $\mathrm{Rh_2(HCO_2)_4}$. A scheme is described for enhancing the $^{103}$Rh signal strength by polarization transfer from $^{1}$H nuclei and which also greatly reduces the interference from ringing artifacts, a common hurdle for the direct observation of low-$\gamma$ nuclei. The $^{103}$Rh relaxation time constants $T_1$ and $T_2$ are measured within 20 minutes using $^{1}$H-detected experiments. The field-dependence of the $^{103}$Rh $T_1$ is measured. The high-field relaxation is dominated by the chemical shift anisotropy (CSA) mechanism. The $^{103}$Rh shielding anisotropy is found to be very large: $\|\Delta\sigma\|=9900\pm540\mathrm{\,ppm}$. This estimate is compared with density functional theory calculations.	翻訳日:2023-07-25 20:37:34 公開日:2023-07-23
# ManimML: アニメーションによる機械学習アーキテクチャのコミュニケーション ManimML: Communicating Machine Learning Architectures with Animation ( http://arxiv.org/abs/2306.17108v2 ) ライセンス: Link先を確認	Alec Helbling and Duen Horng Chau	(参考訳) 近年、機械学習(ML)への関心が爆発的に高まっている。しかし、ML技術が進歩するにつれて、新しいMLアルゴリズムの説明と視覚化ツールが遅れている。アニメーションは、時間とともに動的に変化するシステムのエンゲージメントな視覚化を実現する強力なツールであることが示されており、MLアルゴリズムの通信タスクに適している。しかし、MLアルゴリズムをアニメーションする現在のアプローチは、特定のアルゴリズムをハイライトするアプリケーションや複雑な一般化されたアニメーションソフトウェアを使用するハンドクラフトである。我々は,コードから直接MLアルゴリズムのアニメーションを生成するオープンソースPythonライブラリManimMLを開発した。我々は,複雑なアニメーションソフトウェアを学習するよりも,ML実践者の既存のプログラミング知識を活用することを試みた。 ManimMLには、Pytorchのような人気のあるディープラーニングフレームワークを模倣するニューラルネットワークを指定するための、よく知られた構文がある。ユーザは、既存のニューラルネットワークアーキテクチャを使用して、manimmlでアニメーションの仕様を簡単に記述することができ、システムのさまざまなコンポーネントのアニメーションをニューラルネットワーク全体の最終的なアニメーションに自動生成する。 ManimMLはオープンソースでhttps://github.com/helblazer811/ManimMLで入手できる。 There has been an explosion in interest in machine learning (ML) in recent years due to its applications to science and engineering. However, as ML techniques have advanced, tools for explaining and visualizing novel ML algorithms have lagged behind. Animation has been shown to be a powerful tool for making engaging visualizations of systems that dynamically change over time, which makes it well suited to the task of communicating ML algorithms. However, the current approach to animating ML algorithms is to handcraft applications that highlight specific algorithms or use complex generalized animation software. We developed ManimML, an open-source Python library for easily generating animations of ML algorithms directly from code. We sought to leverage ML practitioners' preexisting knowledge of programming rather than requiring them to learn complex animation software. ManimML has a familiar syntax for specifying neural networks that mimics popular deep learning frameworks like Pytorch. A user can take a preexisting neural network architecture and easily write a specification for an animation in ManimML, which will then automatically compose animations for different components of the system into a final animation of the entire neural network. ManimML is open source and available at https://github.com/helblazer811/ManimML.	翻訳日:2023-07-25 20:37:14 公開日:2023-07-23
# LaunchpadGPT:Launchpad上の音楽可視化デザイナとしての言語モデル LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad ( http://arxiv.org/abs/2307.04827v2 ) ライセンス: Link先を確認	Siting Xu, Yunlong Tang, Feng Zheng	(参考訳) Launchpadは、照明付きのボタンを押すことで、ユーザーが音楽を作り、演奏できる楽器だ。 launchpadライトエフェクトの設計を補助し、さらに初心者がこの楽器を使って音楽のビジュアライゼーションを行えるようにするために、launchpadgptモデルを提案し、自動的にlaunchpad上での音楽のビジュアライゼーションデザインを生成する。生成能力に優れた言語モデルに基づいて,提案したLaunchpadGPTは音声を入力として,ビデオ形式でLaunchpad-playingの照明効果を出力する(Launchpad-playing video)。我々はLaunchpadプレイングビデオを収集し、それらを処理して音楽とそれに対応するLaunchpadプレイングの動画フレームをプロンプト・コンプリートペアとして取得し、言語モデルを訓練する。実験結果から,提案手法はランダム生成法よりも優れた音楽可視化を実現し,幅広い音楽可視化応用の可能性を示す。私たちのコードはhttps://github.com/yunlong10/LaunchpadGPT/で利用可能です。 Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/.	翻訳日:2023-07-25 20:27:40 公開日:2023-07-23
# AppleとAppleの比較: ユーザレビューからアスペクト対応の比較文を生成する Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews ( http://arxiv.org/abs/2307.03691v2 ) ライセンス: Link先を確認	Jessica Echterhoff, An Yan, Julian McAuley	(参考訳) 多くの類似の選択肢の中で最良の製品を見つけるのに時間がかかります。比較文は、目立った項目の重要な特徴を強調する方法で、ある項目と他の項目を対比するのに役立ちます。 1つまたは複数の項目のレビューと関連する項目の特徴を考慮し、比較レビュー文を生成し、ユーザーが最適な項目を見つけるのに役立つ。具体的には,変換器内の3つの連続成分からなるモデルについて述べる。 (i)比較対象品目を符号化する商品符号化モジュール (ii)自己回帰的な比較文を生成する比較生成モジュール (iii)ユーザパーソナライズのための新しい復号化方法我々のパイプラインは、流動的で多様な比較文を生成する。我々は、人間の評価研究において、生成した文の関連性と忠実性に関する実験を行い、アルゴリズムが関連する真理のある比較レビュー文を作成することを発見した。 It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three successive components in a transformer: (i) an item encoding module to encode an item for comparison, (ii) a comparison generation module that generates comparative sentences in an autoregressive manner, (iii) a novel decoding method for user personalization. We show that our pipeline generates fluent and diverse comparative sentences. We run experiments on the relevance and fidelity of our generated sentences in a human evaluation study and find that our algorithm creates comparative review sentences that are relevant and truthful.	翻訳日:2023-07-25 20:27:09 公開日:2023-07-23
# 制御理論を満たしたテンソル分解:線形力学系の一般混合の学習 Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems ( http://arxiv.org/abs/2307.06538v2 ) ライセンス: Link先を確認	Ainesh Bakshi, Allen Liu, Ankur Moitra, Morris Yau	(参考訳) 最近、チェンとプアーは線形力学系の混合学習の研究を始めた。線形力学系はすでに時系列データのモデリングに広範囲の応用があるが、混合モデルを用いることで、データに表される下位のサブポピュレーションのよりリッチな理解につながる可能性がある。本研究では、テンソル分解に基づく線形力学系の混合を学習するための新しいアプローチを提案する。その結果,本アルゴリズムは,成分の分離条件が強くなければ成功し,軌道のベイズ最適クラスタリングと競合することができる。さらにアルゴリズムは,部分的観測された設定でも動作する。我々の出発点は、古典的ホカルマンアルゴリズムが潜在変数モデルを学習するための現代のテンソル分解法と密接な関係にあるという単純だが強力な観測である。これにより、より複雑な生成モデルで動作するように拡張するためのプレイブックが提供されます。 Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.	翻訳日:2023-07-25 20:18:09 公開日:2023-07-23
# NetGPT: パーソナライズされた生成サービスの提供を超えて、ネイティブAIネットワークアーキテクチャ NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services ( http://arxiv.org/abs/2307.06148v2 ) ライセンス: Link先を確認	Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, and Honggang Zhang	(参考訳) 大規模言語モデル(LLM)は、生成情報による日常生活の活性化に大きく成功し、LLMのパーソナライゼーションは、人間の意図との整合性の向上により、その応用にさらに貢献する可能性がある。パーソナライズされた生成サービスに向けて、コラボレーティブなクラウドエッジ方法論は有望に思える。異種分散通信とコンピューティングリソースの効率的なオーケストレーションを促進する。本稿では,複数のクラウドエッジコラボレーション技術の長所と短所を議論した後,そのコンピューティング能力に応じて,適切なllmをエッジとクラウドに適切にデプロイするためにnetgptを展開する。さらに、エッジllmは、パーソナライズされたプロンプト完了のためにロケーションベースの情報を効率的に活用することができ、クラウドllmとのインタラクションの恩恵を受ける。エッジとクラウドに代表的オープンソースLLM(例えばGPT-2ベースとLLaMAモデル)をデプロイした後、低ランク適応に基づく軽量微調整に基づくNetGPTの実現可能性を示す。続いて、ネイティブ人工知能(AI)ネットワークアーキテクチャがNetGPTに必要となる重要な変更を強調し、特に通信とコンピューティングリソースのより深い統合と論理的AIワークフローの慎重な校正に焦点を当てた。さらに,NetGPT の副産物的メリットとして,エッジ LLM がトレンドを予測し,意図を推測する驚くべき能力を備えている。簡単に言うと、NetGPTはパーソナライズされた生成サービスをプロビジョニングする以上の、有望なネイティブAIネットワークアーキテクチャである、ということです。 Large language models (LLMs) have triggered tremendous success to empower daily life by generative information, and the personalization of LLMs could further contribute to their applications due to better alignment with human intents. Towards personalized generative services, a collaborative cloud-edge methodology sounds promising, as it facilitates the effective orchestration of heterogeneous distributed communication and computing resources. In this article, after discussing the pros and cons of several candidate cloud-edge collaboration techniques, we put forward NetGPT to capably deploy appropriate LLMs at the edge and the cloud in accordance with their computing capacity. In addition, edge LLMs could efficiently leverage location-based information for personalized prompt completion, thus benefiting the interaction with cloud LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently, we highlight substantial essential changes required for a native artificial intelligence (AI) network architecture towards NetGPT, with special emphasis on deeper integration of communications and computing resources and careful calibration of logical AI workflow. Furthermore, we demonstrate several by-product benefits of NetGPT, given edge LLM's astonishing capability to predict trends and infer intents, which possibly leads to a unified solution for intelligent network management \& orchestration. In a nutshell, we argue that NetGPT is a promising native-AI network architecture beyond provisioning personalized generative services.	翻訳日:2023-07-25 20:17:28 公開日:2023-07-23
# RepViT: ViTの視点からモバイルCNNを再考 RepViT: Revisiting Mobile CNN From ViT Perspective ( http://arxiv.org/abs/2307.09283v2 ) ライセンス: Link先を確認	Ao Wang, Hui Chen, Zijia Lin, Hengjun Pu, Guiguang Ding	(参考訳) 近年、軽量視覚トランスフォーマ(vits)は、リソース制約のあるモバイルデバイスでの軽量畳み込みニューラルネットワーク(cnns)と比較して優れた性能と低レイテンシを示している。この改善は通常、モデルがグローバル表現を学習できるようにするマルチヘッド自己保持モジュールによるものである。しかし,軽量VTと軽量CNNのアーキテクチャ格差は十分に検討されていない。本研究では,軽量CNNの効率的な設計を再考し,モバイルデバイスにおけるその可能性を強調する。我々は、軽量VTの効率的なアーキテクチャ選択を統合することで、標準軽量CNN、特にMobileNetV3のモバイルフレンドリ性を徐々に強化する。最終的に、純粋な軽量CNN、すなわちRepViTの新しいファミリーが誕生する。大規模な実験によると、RepViTは既存の最先端の軽量ViTよりも優れており、様々なビジョンタスクにおいて好ましいレイテンシを示している。 ImageNetでは、RepViTは80\%以上のトップ1の精度を達成し、iPhone 12では1ms近いレイテンシを実現しています。我々の最大のモデルであるRepViT-M3は、1.3msのレイテンシで81.4\%の精度を得る。コードとトレーニングされたモデルは \url{https://github.com/jameslahm/repvit} で入手できる。 Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with nearly 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M3, obtains 81.4\% accuracy with only 1.3ms latency. The code and trained models are available at \url{https://github.com/jameslahm/RepViT}.	翻訳日:2023-07-25 20:07:34 公開日:2023-07-23
# TokenFlow: 一貫性のあるビデオ編集機能 TokenFlow: Consistent Diffusion Features for Consistent Video Editing ( http://arxiv.org/abs/2307.10373v2 ) ライセンス: Link先を確認	Michal Geyer and Omer Bar-Tal and Shai Bagon and Tali Dekel	(参考訳) 生成的AI革命は、最近ビデオにまで拡大した。それでも、現在の最先端のビデオモデルは、生成したコンテンツの視覚的品質とユーザコントロールの観点から、画像モデルに遅れを取っている。本稿では,テキストから画像への拡散モデルのパワーをテキスト駆動ビデオ編集のタスクに活用するフレームワークを提案する。具体的には、ソースビデオとターゲットテキストプロンプトを与えられた場合、入力ビデオの空間レイアウトと動きを維持しながら、対象テキストに準拠した高品質な映像を生成する。本手法は, 拡散特徴空間の一貫性を強制することにより, 編集映像の一貫性が得られることを示す。モデルで容易に利用できるフレーム間対応に基づいて拡散特徴を明示的に伝播することにより、これを実現できる。したがって,本フレームワークはトレーニングや微調整を一切必要とせず,市販のテキスト画像編集手法と併用できる。実世界の様々なビデオで最先端の編集結果を示す。 Webページ: https://diffusion-tokenflow.github.io/ The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/	翻訳日:2023-07-25 19:58:19 公開日:2023-07-23
# SentimentGPT:高度な感性分析のためのGPTの爆発と現在の機械学習からの逸脱 SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning ( http://arxiv.org/abs/2307.10234v2 ) ライセンス: Link先を確認	Kiana Kheiri and Hamid Karimi	(参考訳) 本研究では,感情分析におけるGPT(Generative Pretrained Transformer)の方法論について,特にSemEval 2017データセットのタスク4の文脈で詳細に検討する。主な戦略は3つある。 1)先進gpt-3.5ターボを用いたプロンプトエンジニアリング 2)微調整GPTモデル、及び 3)埋め込み分類への創発的アプローチ。この研究は、これらの戦略と個々のgptモデル間の詳細な比較洞察をもたらし、その特異な強みと潜在的な限界を明らかにする。さらに、この研究では、これらのGPTベースの方法論と、以前同じデータセットで使用されていた他の高性能モデルと比較した。その結果, GPT手法の予測性能において, F1スコアの22%以上において, 最先端と比較して有意な優位性を示した。さらに、文脈理解や皮肉の検出など、感情分析タスクにおける共通の課題について考察する。これらの複雑さを効果的に扱うため、GPTモデルの強化機能を強調している。これらの知見は、感情分析におけるGPTモデルの可能性を強調し、今後の研究の舞台となる。コードはhttps://github.com/DSAatUSU/SentimentGPTで見ることができる。 This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis, specifically in the context of Task 4 on the SemEval 2017 dataset. Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification. The research yields detailed comparative insights among these strategies and individual GPT models, revealing their unique strengths and potential limitations. Additionally, the study compares these GPT-based methodologies with other current, high-performing models previously used with the same dataset. The results illustrate the significant superiority of the GPT approaches in terms of predictive performance, more than 22\% in F1-score compared to the state-of-the-art. Further, the paper sheds light on common challenges in sentiment analysis tasks, such as understanding context and detecting sarcasm. It underscores the enhanced capabilities of the GPT models to effectively handle these complexities. Taken together, these findings highlight the promising potential of GPT models in sentiment analysis, setting the stage for future research in this field. The code can be found at https://github.com/DSAatUSU/SentimentGPT	翻訳日:2023-07-25 19:58:04 公開日:2023-07-23
# 心筋SPECT画像再構成のためのトランスフォーマーベースデュアルドメインネットワーク Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions ( http://arxiv.org/abs/2307.09624v2 ) ライセンス: Link先を確認	Huidong Xie, Bo Zhou, Xiongchao Chen, Xueqi Guo, Stephanie Thorn, Yi-Hwa Liu, Ge Wang, Albert Sinusas, Chi Liu	(参考訳) 心臓血管疾患(CVD)は世界中で死因の主要な疾患であり, SPECTを用いた心筋灌流像はCVDの診断に広く用いられている。 GE 530/570c専用心筋SPECTスキャナは静止形状を採用し、19個の投射を同時に取得して感度を高め、ダイナミックイメージングを実現する。しかし、角サンプリングの限られた量は画質に悪影響を及ぼす。静止データから高品質な画像を生成するディープラーニング手法を実装できる。これは本質的には数ビューの撮像問題である。本研究では,高品質3d心筋spect画像再構成のための新しい3dトランスフォーマーベースのデュアルドメインネットワークtip-netを提案する。本手法は,プロジェクション・ツー・イメージ・ドメイン・トランスフォーマーのカスタマイズにより,投影データから直接3次元SPECT画像を再構成することを目的としている。そして、その復元出力と元の少数視点再構成を考慮し、画像ドメイン再構築ネットワークを用いて再構成をさらに洗練する。 fda 510(k)-cleared clinical softwareによって定量化された心臓カテーテル画像、核心科医からの診断解釈、および欠陥サイズによって検証された本手法は、ヒト研究において従来の基準法と比較して高い心不全コントラストを有する画像を生成し、静止数ビュー専用心筋spectスキャナを用いて高品質の欠陥可視化を可能にする。 Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.	翻訳日:2023-07-25 19:56:01 公開日:2023-07-23
# モナディック深層学習 Monadic Deep Learning ( http://arxiv.org/abs/2307.12187v1 ) ライセンス: Link先を確認	Bo Yang, Zhihao Zhang Kirisame Marisa and Kai Shi	(参考訳) JavaとScalaコミュニティは、非常に成功したビッグデータエコシステムを構築しました。しかし、それ上で動作するニューラルネットワークのほとんどは動的型付けプログラミング言語でモデル化されている。これらの動的型付きディープラーニングフレームワークは、ニューラルネットワークを多くのトレーニング可能な変数を含む微分可能な表現として扱い、トレーニング時にそれらの表現を自動微分する。 2019年まで、静的型付け言語における学習フレームワークは、従来のフレームワークの表現力を提供していなかった。ユーザは、ハードコードされたバックプロパゲーションのために多くの定型コードを作成しない限り、カスタムアルゴリズムを使用できない。 DeepLearning.scalaでこの問題を解決しました。 1. 複数のトレーニング可能な変数を含む静的型付き関数に対して,逆モードで自動微分を行う新しい手法を発見し,メタ言語と自由に相互運用できるようにした。 2. 動的ニューラルネットワークを表現するモナド表現をユーザが作成できるように,モナドとモナド変換器のセットを設計した。 3 これらのモナドとともに、複数の計算を並列に行うための応用的関手を提供する。これらの機能により、DeepLearning.scalaのユーザは、直感的で簡潔な方法で複雑なニューラルネットワークを作成でき、型安全性を維持できた。 The Java and Scala community has built a very successful big data ecosystem. However, most of neural networks running on it are modeled in dynamically typed programming languages. These dynamically typed deep learning frameworks treat neural networks as differentiable expressions that contain many trainable variable, and perform automatic differentiation on those expressions when training them. Until 2019, none of the learning frameworks in statically typed languages provided the expressive power of traditional frameworks. Their users are not able to use custom algorithms unless creating plenty of boilerplate code for hard-coded back-propagation. We solved this problem in DeepLearning.scala 2. Our contributions are: 1. We discovered a novel approach to perform automatic differentiation in reverse mode for statically typed functions that contain multiple trainable variable, and can interoperate freely with the metalanguage. 2. We designed a set of monads and monad transformers, which allow users to create monadic expressions that represent dynamic neural networks. 3. Along with these monads, we provide some applicative functors, to perform multiple calculations in parallel. With these features, users of DeepLearning.scala were able to create complex neural networks in an intuitive and concise way, and still maintain type safety.	翻訳日:2023-07-25 17:30:17 公開日:2023-07-23
# FATRER:高精度かつロバストな会話感情認識のためのフルアテンショントピック正規化器 FATRER: Full-Attention Topic Regularizer for Accurate and Robust Conversational Emotion Recognition ( http://arxiv.org/abs/2307.12221v1 ) ライセンス: Link先を確認	Yuzhao Mao, Di Lu, Xiaojie Wang, Yang Zhang	(参考訳) 本稿では,会話発話における対話者の感情の理解に焦点をあてる。この文献における先行研究は、主により正確な感情予測に焦点をあて、一方で、局所的な文脈が敵の攻撃によって破壊されるときのモデル堅牢性を無視している。正確性を確保しつつ頑健性を維持するため,会話中のローカルコンテキストをモデル化する際の感情関連グローバルビューを可能にする,フルアテンショントピック正規化器によって強化された感情認識器を提案する。表現と損失の両方の観点から正規化を実装するために,共同トピックモデリング戦略を導入する。過剰規則化を避けるため、従来のトピックモデリングに存在する事前分布の制約を廃止し、アテンションアライメントに基づく確率的近似を行う。実験により,我々のモデルは最先端モデルよりも好適な結果が得られ,3種類の敵攻撃による強靭性が得られることが示された。 This paper concentrates on the understanding of interlocutors' emotions evoked in conversational utterances. Previous studies in this literature mainly focus on more accurate emotional predictions, while ignoring model robustness when the local context is corrupted by adversarial attacks. To maintain robustness while ensuring accuracy, we propose an emotion recognizer augmented by a full-attention topic regularizer, which enables an emotion-related global view when modeling the local context in a conversation. A joint topic modeling strategy is introduced to implement regularization from both representation and loss perspectives. To avoid over-regularization, we drop the constraints on prior distributions that exist in traditional topic modeling and perform probabilistic approximations based entirely on attention alignment. Experiments show that our models obtain more favorable results than state-of-the-art models, and gain convincing robustness under three types of adversarial attacks.	翻訳日:2023-07-25 17:22:03 公開日:2023-07-23
# プログレッシブ・レネエント・監督による高解像度リモートセンシング画像からの建物足跡分割の迅速化 Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision ( http://arxiv.org/abs/2307.12220v1 ) ライセンス: Link先を確認	Haonan Guo, Bo Du, Chen Wu, Xin Su, Liangpei Zhang	(参考訳) リモートセンシング画像からのビルフットプリントセグメンテーションの有効性は,モデル転送の有効性によって阻害されている。既存の多くのビルセグメンテーション手法は、イメージネットで事前学習された新しく開発されたバックボーンネットワークからエンコーダを微調整したu-netのエンコーダ-デコーダアーキテクチャに基づいて開発された。しかし、既存のデコーダ設計の重い計算負荷は、これらの現代のエンコーダネットワークをリモートセンシングタスクに移すのを妨げている。広く採用されている深層監視戦略でさえ、フォアグラウンドと背景画素が混在するハイブリッド領域において、これらの課題を軽減できない。本稿では,既存のデコーダネットワークの設計を包括的に評価し,学習効率と有効性を高めるためにbfsegと呼ばれる効率的な枠組みを提案する。具体的には,大規模にまたがる簡易かつ高速な特徴融合を容易にする高密結合型特徴融合デコーダネットワークを提案する。さらに,深層監視過程におけるダウンサンプリング・グラウンド真理におけるハイブリッド領域の無効性を考慮して,ネットワークが深層監視から適切な知識を学習できる寛大な深層監視・蒸留戦略を提案する。これらの進歩を基盤として、我々は、広範囲の新規開発エンコーダネットワークにまたがる性能と効率の優れた先行研究を一貫して超越した、建築セグメンテーションネットワークの新たなファミリーを開発した。コードはhttps://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Frameworkでリリースされる。 The efficacy of building footprint segmentation from remotely sensed images has been hindered by model transfer effectiveness. Many existing building segmentation methods were developed upon the encoder-decoder architecture of U-Net, in which the encoder is finetuned from the newly developed backbone networks that are pre-trained on ImageNet. However, the heavy computational burden of the existing decoder designs hampers the successful transfer of these modern encoder networks to remote sensing tasks. Even the widely-adopted deep supervision strategy fails to mitigate these challenges due to its invalid loss in hybrid regions where foreground and background pixels are intermixed. In this paper, we conduct a comprehensive evaluation of existing decoder network designs for building footprint segmentation and propose an efficient framework denoted as BFSeg to enhance learning efficiency and effectiveness. Specifically, a densely-connected coarse-to-fine feature fusion decoder network that facilitates easy and fast feature fusion across scales is proposed. Moreover, considering the invalidity of hybrid regions in the down-sampled ground truth during the deep supervision process, we present a lenient deep supervision and distillation strategy that enables the network to learn proper knowledge from deep supervision. Building upon these advancements, we have developed a new family of building segmentation networks, which consistently surpass prior works with outstanding performance and efficiency across a wide range of newly developed encoder networks. The code will be released on https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.	翻訳日:2023-07-25 17:21:47 公開日:2023-07-23
# 生成補間による分類器の分散外ロバスト性の向上 Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation ( http://arxiv.org/abs/2307.12219v1 ) ライセンス: Link先を確認	Haoyue Bai, Ceyuan Yang, Yinghao Xu, S.-H. Gary Chan, Bolei Zhou	(参考訳) ディープニューラルネットワークは、独立分散(i.i.d.)データから学習する上で優れた性能を達成する。しかし、トレーニングとテストが異なる分布から引き出される、od(out-of-distribution)データを扱う場合、その性能は著しく低下する。本稿では,生成モデルをデータ拡張源として活用して,ニューラル分類器の分布外ロバスト性を改善することを検討する。具体的には,多様なOoDサンプルを合成するために,複数のドメインから学習した生成モデルを融合させるジェネレーション補間法を開発した。ソースドメイン上で生成モデルをトレーニングする場合、モード崩壊に悩まされ、時にはデータバイアスを増幅する。代わりに、まず1つのソースドメイン上でStyleGANモデルをトレーニングし、それから他のドメインで微調整します。次に、ジェネレータのモデルパラメータを線形に補間し、新しいジェネレータセットを生成する。このような補間されたジェネレータは、分類器を訓練する余分なデータ拡張ソースとして使用される。補間係数は、増大方向及び強度を柔軟に制御することができる。また, 生成したoodサンプルの多様性をさらに向上させるために, スタイル混合機構を適用した。実験の結果,提案手法はトレーニング領域の多様性を明示的に向上し,データセット間のベースラインの整合性の向上と複数の分散シフトを実現する。 Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-distribution robustness of neural classifiers. Specifically, we develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples. Training a generative model directly on the source domains tends to suffer from mode collapse and sometimes amplifies the data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators where their model parameters have the same initialization thus are aligned. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength. In addition, a style-mixing mechanism is applied to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.	翻訳日:2023-07-25 17:21:21 公開日:2023-07-23
# 人工知能規制政策の総合的レビューと体系的分析 A Comprehensive Review and Systematic Analysis of Artificial Intelligence Regulation Policies ( http://arxiv.org/abs/2307.12218v1 ) ライセンス: Link先を確認	Weiyue Wu, Shaoshan Liu	(参考訳) 世界中の国々の文化と統治の相違により、現在、グローバルなAI規制分野に混乱をもたらしたAI規制ポリシーの提案が幅広い範囲に存在する。適切な規制AI技術は、法的制約と技術開発の間の微妙なバランスを必要とするため、非常に難しい。本稿では、まず、異なる地理的な場所と文化的背景からAI規制の提案を包括的にレビューする。そして、歴史的教訓から、AI規制提案の徹底的な分析を容易にする枠組みを開発する。最後に、これらのAI規制提案を体系的に分析し、それぞれの提案が失敗する可能性を理解する。この研究は、歴史的教訓と分析方法を含むもので、AI規制の混乱を解消する組織を分割・縮小的に管理することを目的としている。 Due to the cultural and governance differences of countries around the world, there currently exists a wide spectrum of AI regulation policy proposals that have created a chaos in the global AI regulatory space. Properly regulating AI technologies is extremely challenging, as it requires a delicate balance between legal restrictions and technological developments. In this article, we first present a comprehensive review of AI regulation proposals from different geographical locations and cultural backgrounds. Then, drawing from historical lessons, we develop a framework to facilitate a thorough analysis of AI regulation proposals. Finally, we perform a systematic analysis of these AI regulation proposals to understand how each proposal may fail. This study, containing historical lessons and analysis methods, aims to help governing bodies untangling the AI regulatory chaos through a divide-and-conquer manner.	翻訳日:2023-07-25 17:20:56 公開日:2023-07-23
# LoLep: 局所学習平面と自己認識オクルージョン推論を用いた単一ビュービュー合成 LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference ( http://arxiv.org/abs/2307.12217v1 ) ライセンス: Link先を確認	Cong Wang, Yu-Ping Wang, Dinesh Manocha	(参考訳) 本稿では,1枚のRGB画像から局所学習平面を回帰してシーンを正確に表現するLoLepを提案する。深度情報がなければ、適切な平面位置の後退は難しい問題である。この問題を解決するために、各ビンの複数の平面に対する局所オフセットを回帰する分散サンプリング器を設計し、各ビンに分散空間を分割する。しかし,そのようなサンプルを用いただけでネットワークは収束しない。さらに,データセットの異なる分散分布と組み合わせた2つの最適化戦略を提案し,簡易かつ効果的な幾何的監督手法として,オクルージョン認識の再投影損失を提案する。また、オクルージョン推論を改善する自己注意機構を導入し、大きな特徴マップに自己意識を適用する問題に対処するブロックサンプリング自己意識(BS-SA)モジュールを提案する。提案手法の有効性を実証し,異なるデータセットで最新の結果を生成する。 MINEと比較してLPIPSは4.8%-9.0%、RVは83.1%-84.7%である。また,実世界の画像における性能評価を行い,その効果を示す。 We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 83.1%-84.7%. We also evaluate the performance on real-world images and demonstrate the benefits.	翻訳日:2023-07-25 17:20:43 公開日:2023-07-23
# DeepCL: 距離空間におけるリモートセンシング画像によるディープラーニング機能学習 DeepCL: Deep Change Feature Learning on Remote Sensing Images in the Metric Space ( http://arxiv.org/abs/2307.12208v1 ) ライセンス: Link先を確認	Haonan Guo, Bo Du, Chen Wu, Chengxi Han, Liangpei Zhang	(参考訳) 変化検出(CD)は、地球表面のダイナミクスを監視するための地球観測分野において重要な課題である。ディープラーニング技術の出現は、最近、自動CDを技術革新へと駆り立てている。それでも、ディープラーニングベースのCDメソッドは、2つの主要な問題に悩まされている。 1)時間的関係モデリングの不十分 2)擬似変更誤分類これらの課題に対処するために、計量学習の強い時間的モデリング能力とセグメンテーションの顕著な適合性を補完し、堅牢で説明可能なCDのためのDeep Change Feature Learning(DeepCL)フレームワークを提案する。まず,ハードサンプルとシンプルなサンプルの重要性を強調する,ハードサンプル認識型コントラスト損失の設計を行った。この損失により、双方向リモートセンシング画像間の時間的相関を明示的にモデル化することができる。さらに、モデル化された時間関係を、変化領域を検出するセグメンテーションプロセスを導く前に知識として活用する。 deepclフレームワークは理論的にも実験的にも徹底的に評価され、優れた特徴判別性、擬似変更に対する弾力性、様々なcdアルゴリズムへの適応性を示している。広範な比較実験は、最先端cdアプローチにおけるdeepclの量的・質的優位性を実証するものである。 Change detection (CD) is an important yet challenging task in the Earth observation field for monitoring Earth surface dynamics. The advent of deep learning techniques has recently propelled automatic CD into a technological revolution. Nevertheless, deep learning-based CD methods are still plagued by two primary issues: 1) insufficient temporal relationship modeling and 2) pseudo-change misclassification. To address these issues, we complement the strong temporal modeling ability of metric learning with the prominent fitting ability of segmentation and propose a deep change feature learning (DeepCL) framework for robust and explainable CD. Firstly, we designed a hard sample-aware contrastive loss, which reweights the importance of hard and simple samples. This loss allows for explicit modeling of the temporal correlation between bi-temporal remote sensing images. Furthermore, the modeled temporal relations are utilized as knowledge prior to guide the segmentation process for detecting change regions. The DeepCL framework is thoroughly evaluated both theoretically and experimentally, demonstrating its superior feature discriminability, resilience against pseudo changes, and adaptability to a variety of CD algorithms. Extensive comparative experiments substantiate the quantitative and qualitative superiority of DeepCL over state-of-the-art CD approaches.	翻訳日:2023-07-25 17:20:22 公開日:2023-07-23
# 不聴音声起動装置を攻撃するための敵エージェント Adversarial Agents For Attacking Inaudible Voice Activated Devices ( http://arxiv.org/abs/2307.12204v1 ) ライセンス: Link先を確認	Forrest McKee and David Noever	(参考訳) NIST National Vulnerability Database (NVD) が独立に収集したセキュリティ上の重大な脆弱性を裏付ける。我々のベースラインネットワークモデルは、攻撃者が不正な音声コマンドを使用してセキュアなラップトップ上の機密情報に無許可でアクセスするシナリオを示す。このベースラインネットワークモデル上で多くの攻撃シナリオをシミュレートし,ハードウェアの追加やデバイススキルの強化を伴わずに,物理的アクセスを通じて特権情報を発見し,所有する可能性を明らかにする。 microsoftのcyberbattlesimフレームワークを使用して、6つの強化学習アルゴリズムを評価し、悪用によるディープq学習が最適であることが分かり、より少ないステップですべてのノードの迅速なオーナシップにつながった。特にモバイルデバイス、音声のアクティベーション、および悪意あるアクターがほぼ超音域または非音域で盗聴攻撃を行っていることを特徴とする非線形マイクが特徴である。 2024年までに、この新たな攻撃面は、地球上の人々よりも多くのデジタル音声アシスタントを含んでいるが、従来のパッチやファームウェアの修正よりも少ない修正を提供する。 Our analysis of inaudible attacks on voice-activated devices confirms the alarming risk factor of 7.6 out of 10, underlining significant security vulnerabilities scored independently by NIST National Vulnerability Database (NVD). Our baseline network model showcases a scenario in which an attacker uses inaudible voice commands to gain unauthorized access to confidential information on a secured laptop. We simulated many attack scenarios on this baseline network model, revealing the potential for mass exploitation of interconnected devices to discover and own privileged information through physical access without adding new hardware or amplifying device skills. Using Microsoft's CyberBattleSim framework, we evaluated six reinforcement learning algorithms and found that Deep-Q learning with exploitation proved optimal, leading to rapid ownership of all nodes in fewer steps. Our findings underscore the critical need for understanding non-conventional networks and new cybersecurity measures in an ever-expanding digital landscape, particularly those characterized by mobile devices, voice activation, and non-linear microphones susceptible to malicious actors operating stealth attacks in the near-ultrasound or inaudible ranges. By 2024, this new attack surface might encompass more digital voice assistants than people on the planet yet offer fewer remedies than conventional patching or firmware fixes since the inaudible attacks arise inherently from the microphone design and digital signal processing.	翻訳日:2023-07-25 17:20:02 公開日:2023-07-23
# ncart: 表データのための神経分類と回帰木 NCART: Neural Classification and Regression Tree for Tabular Data ( http://arxiv.org/abs/2307.12198v1 ) ライセンス: Link先を確認	Jiaqi Luo, Shixin Xu	(参考訳) 深層学習モデルは、決定木の限界に対処し、半教師付き学習、オンライン学習、転帰学習といった貴重な応用を可能にするため、表形式のデータ分析で人気がある。しかし、これらのディープラーニングアプローチはしばしばトレードオフに遭遇する。一方、大規模なデータセットや高次元データセットを扱う場合、計算コストが高い場合がある。一方、解釈性に欠ける可能性があり、小規模なデータセットには適さない可能性がある。本研究では,これらの課題を克服するために,ニューラル分類と回帰木(NCART)と呼ばれる新しい解釈可能なニューラルネットワークを提案する。 ncartは残差ネットワークの修正版で、完全接続層を複数の可微分可換決定木に置き換える。アーキテクチャに決定木を統合することで、NCARTはニューラルネットワークのエンドツーエンド機能の恩恵を受けながら、解釈可能性を維持している。 NCARTアーキテクチャの単純さにより、さまざまなサイズのデータセットに適しており、最先端のディープラーニングモデルと比較して計算コストを削減できる。広範な数値実験により、ncartは既存のディープラーニングモデルよりも優れた性能を示し、木ベースのモデルとの強力な競合として確立された。 Deep learning models have become popular in the analysis of tabular data, as they address the limitations of decision trees and enable valuable applications like semi-supervised learning, online learning, and transfer learning. However, these deep-learning approaches often encounter a trade-off. On one hand, they can be computationally expensive when dealing with large-scale or high-dimensional datasets. On the other hand, they may lack interpretability and may not be suitable for small-scale datasets. In this study, we propose a novel interpretable neural network called Neural Classification and Regression Tree (NCART) to overcome these challenges. NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees. By integrating decision trees into the architecture, NCART maintains its interpretability while benefiting from the end-to-end capabilities of neural networks. The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes and reduces computational costs compared to state-of-the-art deep learning models. Extensive numerical experiments demonstrate the superior performance of NCART compared to existing deep learning models, establishing it as a strong competitor to tree-based models.	翻訳日:2023-07-25 17:19:34 公開日:2023-07-23
# LIST:シングルビュー3次元再構成のための空間変換器からの学習 LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction ( http://arxiv.org/abs/2307.12194v1 ) ライセンス: Link先を確認	Mohammad Samiul Arshad and William J. Beksi	(参考訳) 単一の2d画像から3dオブジェクトの幾何学的および位相的詳細を正確に再構築することは、コンピュータビジョンにおける根本的な課題である。既存の明示的・単純解法は、自閉幾何を復元したり、位相的構造を忠実に再構築するのに苦労する。このジレンマを解決するために,局所的および大域的画像特徴を利用した新しいニューラルアーキテクチャであるLISTを導入し,単一の画像から3次元物体の幾何学的および位相的構造を正確に再構築する。対象物体の粗い形状を予測するためにグローバル2次元特徴を用い,高分解能復元のための基盤として利用する。画像からの局所的な2次元特徴と粗い予測からの3次元特徴の両方を活用することで、任意の点とターゲット表面の間の符号付き距離を、暗黙の予測器で高精度に予測できる。さらに,このモデルではカメラ推定や画素アライメントは不要である。インプットビュー方向からの影響のない再構築を提供する。定性的かつ定量的な分析により,合成画像と実世界画像の両方から3次元オブジェクトを再構成する際のモデルの有用性を示す。 Accurate reconstruction of both the geometric and topological details of a 3D object from a single 2D image embodies a fundamental challenge in computer vision. Existing explicit/implicit solutions to this problem struggle to recover self-occluded geometry and/or faithfully reconstruct topological shape structures. To resolve this dilemma, we introduce LIST, a novel neural architecture that leverages local and global image features to accurately reconstruct the geometric and topological structure of a 3D object from a single image. We utilize global 2D features to predict a coarse shape of the target object and then use it as a base for higher-resolution reconstruction. By leveraging both local 2D features from the image and 3D features from the coarse prediction, we can predict the signed distance between an arbitrary point and the target surface via an implicit predictor with great accuracy. Furthermore, our model does not require camera estimation or pixel alignment. It provides an uninfluenced reconstruction from the input-view direction. Through qualitative and quantitative analysis, we show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.	翻訳日:2023-07-25 17:19:15 公開日:2023-07-23
# 機械的相互作用と輸送を考慮したスピン量子ビットに基づくプログラマブル量子プロセッサ Programmable Quantum Processors based on Spin Qubits with Mechanically-Mediated Interactions and Transport ( http://arxiv.org/abs/2307.12193v1 ) ライセンス: Link先を確認	F. Fung, E. Rosenfeld, J. D. Schaefer, A. Kabcenell, J. Gieseler, T. X. Zhou, T. Madhavan, N. Aslam, A. Yacoby, M. D. Lukin	(参考訳) 固体スピン量子ビットは量子情報処理の候補として期待できるが、大規模なマルチキュービットシステムにおける制御された相互作用と絡み合いは、現時点では達成が難しい。本稿では、ダイヤモンドナノピラー内の窒素空孔(nv)中心を走査プローブ配置で磁気機能化した窒化ケイ素メカニカル共振器に結合したマルチ量子ビットスピン系のプログラマブル制御法について述べる。量子ビットはナノメカニカル共振器との相互作用によって絡み合い、プログラマブル接続はナノピラー内の量子ビットの機械的輸送によって実現される。この方法の実現可能性を示すために,ナノビーム共振器に設置したマイクロマグネットの力学特性と磁場勾配を特徴付ける。さらに、核スピンメモリを利用して、近位スピン量子ビットのコヒーレントな操作と機械的輸送を示し、nvセンターを用いて振動するマイクロマグネットから時変磁場を検出し、7.7(9)hzのスピンメカニカルカップリングを抽出する。現実的な改善により、高協力性体制に到達でき、スピン量子ビットによるスケーラブルな量子情報処理への新たな道のりを提供する。 Solid state spin qubits are promising candidates for quantum information processing, but controlled interactions and entanglement in large, multi-qubit systems are currently difficult to achieve. We describe a method for programmable control of multi-qubit spin systems, in which individual nitrogen-vacancy (NV) centers in diamond nanopillars are coupled to magnetically functionalized silicon nitride mechanical resonators in a scanning probe configuration. Qubits can be entangled via interactions with nanomechanical resonators while programmable connectivity is realized via mechanical transport of qubits in nanopillars. To demonstrate the feasibility of this approach, we characterize both the mechanical properties and the magnetic field gradients around the micromagnet placed on the nanobeam resonator. Furthermore, we show coherent manipulation and mechanical transport of a proximal spin qubit by utilizing nuclear spin memory, and use the NV center to detect the time-varying magnetic field from the oscillating micromagnet, extracting a spin-mechanical coupling of 7.7(9) Hz. With realistic improvements the high-cooperativity regime can be reached, offering a new avenue towards scalable quantum information processing with spin qubits.	翻訳日:2023-07-25 17:18:55 公開日:2023-07-23
# ResWCAE:Residual Wavelet-Conditioned Autoencoderを用いた生体パターン画像デノーミング ResWCAE: Biometric Pattern Image Denoising Using Residual Wavelet-Conditioned Autoencoder ( http://arxiv.org/abs/2307.12255v1 ) ライセンス: Link先を確認	Youzhi Liang, Wen Liang	(参考訳) パターン画像による生体認証の利用は、IoT(Internet of Things)デバイスでますます普及している。しかし,このようなシステムの信頼性は,特に高レベルのノイズが存在する場合,画質の問題によって損なわれる可能性がある。汎用的な画像推論のために設計された最先端のディープラーニングアルゴリズムは有望だが、その多数のパラメータとユニークなバイオメトリックパターン検索の最適化の欠如は、これらのデバイスやシナリオに適さない。これらの課題に対応するために,本論文では,指紋の識別に特化して設計されたKLD(Kulback-Leibler divergence)正規化を備えたResidual Wavelet-Conditioned Convolutional Autoencoder(Res-WCAE)を提案する。 res-wcaeはイメージエンコーダとウェーブレットエンコーダという2つのエンコーダと、1つのデコーダからなる。画像エンコーダとデコーダ間の残差接続を利用して、ウェーブレットエンコーダから得られた特徴の圧縮表現に基づくボトルネック層をウェーブレット変換ドメインの近似および細部サブイメージを用いて保存する。 res-wcaeの有効性は最先端のデノイジング法に比較して評価され,res-wcaeは,高レベルのノイズが存在する場合において,特に高度に劣化した指紋画像において,これらの手法よりも優れていることが実証された。全体として、Res-WCAEは、コンパクトIoTデバイスの生体認証システムで直面する課題に対する解決策として、Promiseを示している。 The utilization of biometric authentication with pattern images is increasingly popular in compact Internet of Things (IoT) devices. However, the reliability of such systems can be compromised by image quality issues, particularly in the presence of high levels of noise. While state-of-the-art deep learning algorithms designed for generic image denoising have shown promise, their large number of parameters and lack of optimization for unique biometric pattern retrieval make them unsuitable for these devices and scenarios. In response to these challenges, this paper proposes a lightweight and robust deep learning architecture, the Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE) with a Kullback-Leibler divergence (KLD) regularization, designed specifically for fingerprint image denoising. Res-WCAE comprises two encoders - an image encoder and a wavelet encoder - and one decoder. Residual connections between the image encoder and decoder are leveraged to preserve fine-grained spatial features, where the bottleneck layer conditioned on the compressed representation of features obtained from the wavelet encoder using approximation and detail subimages in the wavelet-transform domain. The effectiveness of Res-WCAE is evaluated against several state-of-the-art denoising methods, and the experimental results demonstrate that Res-WCAE outperforms these methods, particularly for heavily degraded fingerprint images in the presence of high levels of noise. Overall, Res-WCAE shows promise as a solution to the challenges faced by biometric authentication systems in compact IoT devices.	翻訳日:2023-07-25 17:11:32 公開日:2023-07-23
# 頭部運動パターンによる説明可能な抑うつ検出 Explainable Depression Detection via Head Motion Patterns ( http://arxiv.org/abs/2307.12241v1 ) ライセンス: Link先を確認	Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke	(参考訳) うつ病はマルチモーダルな非言語行動手段によって研究されているが、頭部運動行動はバイオマーカーとしてはあまり注目されていない。本研究は,2つの異なるアプローチを採用し,特徴を生かした抑うつ検出のための基本的な頭部運動単位であるemph{kinemes}の有用性を示す。 (a)うつ病患者と健常者の両方に対応する頭部運動データからキネムを発見し、 (b) 健常なコントロールからのみキネムパターンを学習し, 患者とコントロールクラスの両方の再構成誤差から得られた統計計算を行った。機械学習手法を用いて,<emph{BlackDog} と \emph{AVEC2013} データセットの抑うつ分類性能を評価する。その結果,(1)頭部運動パターンは抑うつ症状の検出に有効なバイオマーカーであり,(2)前報と一致した説明的キネメパターンは2つのクラスで観察できることがわかった。 AVEC2013の動画では,emph{thin-slices} の2進分類では BlackDog と AVEC2013 で,F1 は 0.79 と 0.82 となり,F1 は 0.72 である。 While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.	翻訳日:2023-07-25 17:10:58 公開日:2023-07-23
# DQ-Det: トランスフォーマーに基づくオブジェクト検出とセグメンテーションのための動的クエリの組み合わせ学習 DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation ( http://arxiv.org/abs/2307.12239v1 ) ライセンス: Link先を確認	Yiming Cui, Linjie Yang, Haichao Yu	(参考訳) Transformerベースの検出とセグメンテーション方法は、学習した検出クエリのリストを使用して、トランスフォーマーネットワークから情報を取得し、各クエリから特定のオブジェクトの位置とカテゴリを予測する。学習したクエリの無作為な凸の組み合わせは、まだ対応するモデルに相応しいことを実証的に見出した。次に,画像の高レベルなセマンティクスに基づいて,動的係数との凸結合を学習することを提案する。生成された動的クエリ、名前付き変調クエリは、異なる画像内のオブジェクトの位置やカテゴリをよりよくキャプチャする。変調クエリにより、オブジェクト検出、インスタンスセグメンテーション、パノスコープセグメンテーション、ビデオインスタンスセグメンテーションを含む複数のタスクにおいて、広範囲のDETRベースのモデルが一貫性と優れたパフォーマンスを達成する。 Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks including object detection, instance segmentation, panoptic segmentation, and video instance segmentation.	翻訳日:2023-07-25 17:10:32 公開日:2023-07-23
# ソフトウェアシステムにおける応答時間に基づくRetaining Useful Life (RUL)予測の実証 Demonstration of a Response Time Based Remaining Useful Life (RUL) Prediction for Software Systems ( http://arxiv.org/abs/2307.12237v1 ) ライセンス: Link先を確認	Ray Islam (Mohammad Rubyet Islam), Peter Sandborn	(参考訳) Prognostic and Health Management (PHM) は、電子工学や非エレクトロニクス分野のハードウェアシステムに広く応用されているが、ソフトウェアには適用されていない。ソフトウェアは時間とともに崩壊しないが、リリースサイクルで劣化する可能性がある。ソフトウェア健康管理は問題を特定する診断アセスメントに限られるが、予後アセスメントは将来問題が有害になる可能性を示唆している。ソフトウェア欠陥予測、ソフトウェア信頼性予測、ソフトウェアの予測メンテナンス、ソフトウェア劣化予測、ソフトウェアパフォーマンス予測といった関連する研究分野は存在するが、これら全ては歴史的データに基づいて構築された診断モデルであり、ソフトウェアに対するrulを予測することはできない。本稿では,故障予測とRUL推定のためのソフトウェアシステムへのPHMの概念の適用について述べる。具体的には,バージョン更新やアップグレード,モジュール変更,システム再設計,再帰,メンテナンススケジューリング,予算削減,トータル放棄といったソフトウェアシステムの意思決定に,phmをどのように活用するかについて述べる。本稿では,利用パラメータ(例えば,リリース数とカテゴリ)と性能パラメータ(例えば応答時間)に基づいて,ソフトウェアシステムのrulを確率的かつ連続的に予測する手法を提案する。開発したモデルは、予測モデルによって生成された結果と実際のデータを比較して検証された。統計的検証(回帰検証、k-fold Cross Validation)も行われている。 Bugzillaアプリケーション用の公開データに基づくケーススタディが紹介されている。このケーススタディは、PHMの概念をソフトウェアシステムに適用し、RULを計算してシステム管理の意思決定を行うことを示した。 Prognostic and Health Management (PHM) has been widely applied to hardware systems in the electronics and non-electronics domains but has not been explored for software. While software does not decay over time, it can degrade over release cycles. Software health management is confined to diagnostic assessments that identify problems, whereas prognostic assessment potentially indicates when in the future a problem will become detrimental. Relevant research areas such as software defect prediction, software reliability prediction, predictive maintenance of software, software degradation, and software performance prediction, exist, but all of these represent diagnostic models built upon historical data, none of which can predict an RUL for software. This paper addresses the application of PHM concepts to software systems for fault predictions and RUL estimation. Specifically, this paper addresses how PHM can be used to make decisions for software systems such as version update and upgrade, module changes, system reengineering, rejuvenation, maintenance scheduling, budgeting, and total abandonment. This paper presents a method to prognostically and continuously predict the RUL of a software system based on usage parameters (e.g., the numbers and categories of releases) and performance parameters (e.g., response time). The model developed has been validated by comparing actual data, with the results that were generated by predictive models. Statistical validation (regression validation, and k-fold cross validation) has also been carried out. A case study, based on publicly available data for the Bugzilla application is presented. This case study demonstrates that PHM concepts can be applied to software systems and RUL can be calculated to make system management decisions.	翻訳日:2023-07-25 17:10:17 公開日:2023-07-23
# オンラインストリーミングにおけるゲームスキル評価のためのマルチモーダル機械学習:CS:GOを事例として Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO ( http://arxiv.org/abs/2307.12236v1 ) ライセンス: Link先を確認	Longxiang Zhang, Wenping Wang	(参考訳) オンラインストリーミングは、多くの注目を集める新興市場だ。ビデオからゲームスキルを評価することは、ストリーミングサービスプロバイダが才能あるゲーマーを見つけるための重要なタスクである。サービス提供者は、顧客にカスタマイズされたレコメンデーションとサービスプロモーションを提供する情報を要求する。一方で、オンラインストリーミングはビジョン、オーディオ、テキストのモダリティを組み合わせるため、これは重要なマルチモーダル機械学習タスクでもある。本研究では、まずデータセットの欠陥を特定し、手動できれいにすることから始める。次に,複数のモダリティの結合表現を学ぶために,最新のエンド・ツー・エンドモデルのいくつかの変種を提案する。広範な実験を通じて,提案の有効性を実証する。さらに,提案モデルでは,意味のある表現を学習する代わりに,ユーザを識別する傾向がある。この問題に最終的に対処するために、今後の作業が目的です。 Online streaming is an emerging market that address much attention. Assessing gaming skills from videos is an important task for streaming service providers to discover talented gamers. Service providers require the information to offer customized recommendation and service promotion to their customers. Meanwhile, this is also an important multi-modal machine learning tasks since online streaming combines vision, audio and text modalities. In this study we begin by identifying flaws in the dataset and proceed to clean it manually. Then we propose several variants of latest end-to-end models to learn joint representation of multiple modalities. Through our extensive experimentation, we demonstrate the efficacy of our proposals. Moreover, we identify that our proposed models is prone to identifying users instead of learning meaningful representations. We purpose future work to address the issue in the end.	翻訳日:2023-07-25 17:09:51 公開日:2023-07-23
# MARS: 適応型マルチアクセラレータシステムにおけるDNNワークロードのためのマルチレベル並列処理 MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems ( http://arxiv.org/abs/2307.12234v1 ) ライセンス: Link先を確認	Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu, Quan Chen, Minyi Guo	(参考訳) ディープニューラルネットワークの急速な進化とともに、ハードウェアシステムも急速に発展している。高いスケーラビリティと低い製造コストを達成する有望なソリューションとして、データセンター、クラウドプラットフォーム、SoCにマルチアクセラレータシステムが広く存在する。したがって、マルチアクセラレータシステムでは、利用可能な設計からアクセラレーションの適切な組み合わせを選択し、効率的なDNNマッピング戦略を探すという、困難な問題が発生する。この目的のために,計算対応アクセラレータ選択が可能な新しいマッピングフレームワークMARSを提案し,通信対応シャーディング戦略を適用して並列性を最大化する。実験の結果、MARSはベースラインと比較して典型的なDNNワークロードの平均で32.2%のレイテンシ削減を実現でき、59.4%のレイテンシ削減を実現している。 Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers, cloud platforms, and SoCs. Thus, a challenging problem arises in multi-accelerator systems: selecting a proper combination of accelerators from available designs and searching for efficient DNN mapping strategies. To this end, we propose MARS, a novel mapping framework that can perform computation-aware accelerator selection, and apply communication-aware sharding strategies to maximize parallelism. Experimental results show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.	翻訳日:2023-07-25 17:09:38 公開日:2023-07-23
# 自己教師付き学習表現による音声分離と認識の統合の探索 Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation ( http://arxiv.org/abs/2307.12231v1 ) ライセンス: Link先を確認	Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe	(参考訳) ニューラル音声分離は目覚ましい進歩を遂げており、自動音声認識(ASR)との統合は、マルチスピーカASRの実現に向けた重要な方向である。本研究は,asrフロントエンドとして残響および雑音残響シナリオにおける音声分離に関する洞察的考察を提供する。本稿では,マルチチャネル分離法,マスクベースのビームフォーミング,複雑なスペクトルマッピング,およびASRバックエンドモデルで使用する最良の特徴について検討する。本稿では,最近の自己教師付き学習表現(sslr)を特徴とし,フィルタバンク機能の場合の認識性能を向上させる。マルチ話者認識性能をさらに向上させるため,音声認識とSSLRの統合を念頭に設計したトレーニング戦略を提案する。 TF-GridNet ベースの複素スペクトルマッピングと WavLM ベースのSSLR は、残響 WHAMR! テストセットの2.5% ワードエラー率を実現し、既存のマスクベースの MVDR ビームフォーミングとフィルタバンク統合(28.9%)を大幅に上回った。 Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. We employ the recent self-supervised learning representation (SSLR) as a feature and improve the recognition performance from the case with filterbank features. To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. The proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set, significantly outperforming an existing mask-based MVDR beamforming and filterbank integration (28.9%).	翻訳日:2023-07-25 17:09:20 公開日:2023-07-23
# EchoGLAD:心エコー図における左室ランドマーク検出のための階層型グラフニューラルネットワーク EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms ( http://arxiv.org/abs/2307.12229v1 ) ライセンス: Link先を確認	Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao	(参考訳) 左心室の機能評価には,4つの目印位置を検出し,左心室の内部次元と周囲の筋肉の近似質量を測定する必要がある。このタスクを機械学習で自動化する鍵となる課題は、臨床ラベルの空間性、すなわち高次元画像のいくつかのランドマークピクセルだけが注釈付けされており、多くの先行研究が等方性ラベルの平滑化に大きく依存している。しかし、そのようなラベルの平滑化戦略は画像の解剖情報を無視し、偏見を生じさせる。この課題に対処するために、左室ランドマーク検出(EchoGLAD)のための心エコーを用いた階層グラフニューラルネットワーク(GNN)を導入する。私たちの主な貢献は 1)GNNによるマルチ解像度ランドマーク検出のための階層グラフ表現学習フレームワーク 2)多層的損失を用いた粒度の異なる階層的監視を行った。我々は,本モデルについて,分布内(ID)および分布外(OOD)設定下で,パブリックおよびプライベートデータセット上で評価する。 ID設定では、2つのデータセット上で1.46mmと1.86mmの最先端平均絶対誤差(MAE)を達成する。また,本モデルでは,従来の4.3mmの試験MAEよりもOODの一般化が優れていた。 The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.	翻訳日:2023-07-25 17:09:00 公開日:2023-07-23
# 事前学習モデルに対する幾何認識適応 Geometry-Aware Adaptation for Pretrained Models ( http://arxiv.org/abs/2307.12226v1 ) ライセンス: Link先を確認	Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala	(参考訳) 著名なゼロショットモデルを含む機械学習モデルは、ラベルがより大きなラベル空間のごく一部に過ぎないデータセットでトレーニングされることが多い。そのような空間は、ラベルを距離で関連付けるメトリクスを備えている。我々は、トレーニングされたモデルを使って新しいクラスを確実に予測したり、ゼロショット予測の場合、追加のトレーニングなしでパフォーマンスを改善するための単純なアプローチを提案する。我々の手法は標準予測規則のドロップイン置換であり、argmaxをfr\'echet平均に置き換える。このアプローチを包括的に理論的に分析し (i)ラベル空間の直径、サンプルの複雑さ、モデル次元を交換する学習理論的結果 (ii)観測されていないクラスを予測できるシナリオの全範囲の特徴、および (iii)非観察クラス全体の予測ができない場合に最適なトレーニングクラスを得るための最適アクティブラーニング型次類選択手順。経験的に、簡単に利用できる外部メトリクスを使用することで、提案手法であるlokiは、imagenetのsimclrよりも29.7%改善され、数十万のクラスにスケールできる。そのようなメトリクスが利用できない場合、Lokiはクラス埋め込みから自己派生メトリクスを使用でき、CLIPのような事前訓練されたゼロショットモデルで10.5%改善される。 Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.	翻訳日:2023-07-25 17:08:39 公開日:2023-07-23
# ASCON:低用量CTデノーミングのための解剖学的監視型コントラスト学習フレームワーク ASCON: Anatomy-aware Supervised Contrastive Learning Framework for Low-dose CT Denoising ( http://arxiv.org/abs/2307.12225v1 ) ライセンス: Link先を確認	Zhihao Chen, Qi Gao, Yi Zhang, Hongming Shan	(参考訳) 低線量ct(low-dose computed tomography)では,様々な深層学習法が提案されているが,そのほとんどが正常線量ct画像を用いてデノージングプロセスを監視する。これらの方法は通常、単一のct画像、特に人間の組織の解剖学的意味論における固有の相関を無視し、分別過程における解釈可能性に欠ける。本稿では,低用量ctデノーミングのための解剖学的意味論を探索し,解剖学的解釈可能性を提供しながら,教師付きコントラスト学習フレームワークasconを提案する。提案したASCONは、効率的な自己注意に基づくU-Net(ESAU-Net)とマルチスケールの解剖学的コントラストネットワーク(MAC-Net)の2つの新しい設計で構成されている。まず,グローバルな対話をよりよく捉え,高分解能な入力に適応させるために,チャネルワイド自己認識機構を用いて効率的なESAU-Netを導入する。第2に、MAC-Netは固有の解剖情報を取得するパッチワイド非競合モジュールと、固有の解剖学的一貫性を維持するピクセルワイドコントラストモジュールを組み込んでいる。 2つの公開低用量CTデノゲーションデータセットの大規模な実験結果から,ASCONの最先端モデルよりも優れた性能を示した。特筆すべきは,ASCONが低用量CTに初めて解剖学的解釈性を提供することだ。ソースコードはhttps://github.com/hao1635/ASCONで入手できる。 While various deep learning methods have been proposed for low-dose computed tomography (CT) denoising, most of them leverage the normal-dose CT images as the ground-truth to supervise the denoising process. These methods typically ignore the inherent correlation within a single CT image, especially the anatomical semantics of human tissues, and lack the interpretability on the denoising process. In this paper, we propose a novel Anatomy-aware Supervised CONtrastive learning framework, termed ASCON, which can explore the anatomical semantics for low-dose CT denoising while providing anatomical interpretability. The proposed ASCON consists of two novel designs: an efficient self-attention-based U-Net (ESAU-Net) and a multi-scale anatomical contrastive network (MAC-Net). First, to better capture global-local interactions and adapt to the high-resolution input, an efficient ESAU-Net is introduced by using a channel-wise self-attention mechanism. Second, MAC-Net incorporates a patch-wise non-contrastive module to capture inherent anatomical information and a pixel-wise contrastive module to maintain intrinsic anatomical consistency. Extensive experimental results on two public low-dose CT denoising datasets demonstrate superior performance of ASCON over state-of-the-art models. Remarkably, our ASCON provides anatomical interpretability for low-dose CT denoising for the first time. Source code is available at https://github.com/hao1635/ASCON.	翻訳日:2023-07-25 17:08:15 公開日:2023-07-23
# コンセンサス指向マルチエージェント通信による分散適応形成 Decentralized Adaptive Formation via Consensus-Oriented Multi-Agent Communication ( http://arxiv.org/abs/2307.12287v1 ) ライセンス: Link先を確認	Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao and Honggang Zhang	(参考訳) アダプティブ・マルチエージェント形成制御は、エージェントの量変化を分散的に柔軟に調整する必要があるが、特に通信制限下では、マルチエージェントシステムにおいて最も困難な問題の1つである。本稿では,Consensus-based Decentralized Adaptive Formation (Cons-DecAF) フレームワークを提案する。具体的には,コンセンサス指向のマルチエージェント通信(ConsMAC)という新しいマルチエージェント強化学習手法を開発し,エージェントがグローバルな情報を認識し,近隣のメッセージを効果的に集約することで,地域のコンセンサスを確立する。その後,政策蒸留を利用して適応形成調整を行う。一方,剤の特定の位置を事前に割り当てる代わりに,ハウスドルフ距離による変位に基づく形成を行い,その形成効率を大幅に向上させる。シミュレーションによる実験結果から,提案手法は速度と安定性の両面において優れた性能を示した。 Adaptive multi-agent formation control, which requires the formation to flexibly adjust along with the quantity variations of agents in a decentralized manner, belongs to one of the most challenging issues in multi-agent systems, especially under communication-limited constraints. In this paper, we propose a novel Consensus-based Decentralized Adaptive Formation (Cons-DecAF) framework. Specifically, we develop a novel multi-agent reinforcement learning method, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish the consensus from local states by effectively aggregating neighbor messages. Afterwards, we leverage policy distillation to accomplish the adaptive formation adjustment. Meanwhile, instead of pre-assigning specific positions of agents, we employ a displacement-based formation by Hausdorff distance to significantly improve the formation efficiency. The experimental results through extensive simulations validate that the proposed method has achieved outstanding performance in terms of both speed and stability.	翻訳日:2023-07-25 17:02:03 公開日:2023-07-23
# ミリミリクラウドソーシングによる並列データ収集 Milimili. Collecting Parallel Data via Crowdsourcing ( http://arxiv.org/abs/2307.12282v1 ) ライセンス: Link先を確認	Alexander Antonov	(参考訳) 本稿では,クラウドソーシングによる並列コーパスの収集手法を提案する。さらに,Chechen- Russian と Fula- English のペアに対して,実験的な並列データを収集した。 We present a methodology for gathering a parallel corpus through crowdsourcing, which is more cost-effective than hiring professional translators, albeit at the expense of quality. Additionally, we have made available experimental parallel data collected for Chechen-Russian and Fula-English language pairs.	翻訳日:2023-07-25 17:01:47 公開日:2023-07-23
# ダウンストリーム・アグノスティック・アドバーサリの例 Downstream-agnostic Adversarial Examples ( http://arxiv.org/abs/2307.12280v1 ) ライセンス: Link先を確認	Ziqi Zhou, Shengshan Hu, Ruizhi Zhao, Qian Wang, Leo Yu Zhang, Junhui Hou, Hai Jin	(参考訳) 自己教師付き学習は、通常、大量の未ラベルデータを使用してエンコーダを事前訓練するが、これは汎用的な特徴抽出器として使用することができるため、下流のユーザは「大規模モデル」の利点を享受するためにのみ微調整を行う必要がある。この有望な見通しにもかかわらず、プリトレーニングエンコーダのセキュリティは、特にプリトレーニングエンコーダが商用に利用可能である場合に、まだ完全には調査されていない。本稿では,事前学習したエンコーダに基づいて,下流非依存の普遍的逆例を生成する最初のフレームワークであるadvencoderを提案する。 advencoderは、被害者が事前学習したエンコーダを継承する下流タスクをすべて騙すことのできる、一連の自然画像に対する普遍的な敵対的摂動またはパッチを構築することを目的としている。従来の逆行例とは異なり、プリトレーニングエンコーダはラベルの分類ではなく特徴ベクトルのみを出力する。そこで,我々はまず,画像の高周波成分情報を利用して,敵対例の生成を導く。次に,攻撃サロゲートデータセットの分布を学習し,攻撃成功率と伝達性を改善することにより,攻撃側摂動・パッチを構築するための生成攻撃フレームワークを設計する。その結果、攻撃者はトレーニング済みのデータセットや下流のデータセットを知らずにダウンストリームタスクを攻撃できることがわかった。また,プリトレーニングエンコーダに対する4つの防御を調整し,アドベンコーダの攻撃能力をさらに証明した。 Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-trained encoder is publicly available for commercial use. In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. AdvEncoder aims to construct a universal adversarial perturbation or patch for a set of natural images that can fool all the downstream tasks inheriting the victim pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Therefore, we first exploit the high frequency component information of the image to guide the generation of adversarial examples. Then we design a generative attack framework to construct adversarial perturbations/patches by learning the distribution of the attack surrogate dataset to improve their attack success rates and transferability. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset. We also tailor four defenses for pre-trained encoders, the results of which further prove the attack ability of AdvEncoder.	翻訳日:2023-07-25 17:01:42 公開日:2023-07-23
# FDCT: 透明物体の高速深度補完 FDCT: Fast Depth Completion for Transparent Objects ( http://arxiv.org/abs/2307.12274v1 ) ライセンス: Link先を確認	Tianan Li, Zhehan Chen, Huan Liu, Chen Wang	(参考訳) 深さの完成は、自律運転や3D再構築、操作といった多くのロボット作業にとって不可欠である。著しい進歩にもかかわらず、既存の手法は計算集約的であり、しばしば低消費電力ロボットプラットフォームのリアルタイム要求を満たさない。加えて、ほとんどのメソッドは不透明なオブジェクトのために設計されており、反射と屈折の特別な特性のために透明なオブジェクトに苦しむ。これらの課題に対処するため,我々は,オブジェクトポーズ推定などの下流タスクにも有効である透過的オブジェクト(fdct)のための高速深さ補完フレームワークを提案する。地域情報を活用し,グローバル情報と統合する際の過剰フィッティングを回避するために,新しい融合ブランチとショートカットを設計し,低レベル機能と損失関数を活用し、過剰フィッティングを抑制する。これにより,RGB-D画像のみからの深度推定を再現する,高精度でユーザフレンドリな深度補正フレームワークが実現される。広範な実験により、fdctは最先端の手法よりも高い精度で約70fpsで動作できることが示されている。また,fdctは対象把握タスクにおけるポーズ推定を改善できることを実証する。ソースコードはhttps://github.com/Nonmy/FDCTで入手できる。 Depth completion is crucial for many robotic tasks such as autonomous driving, 3-D reconstruction, and manipulation. Despite the significant progress, existing methods remain computationally intensive and often fail to meet the real-time requirements of low-power robotic platforms. Additionally, most methods are designed for opaque objects and struggle with transparent objects due to the special properties of reflection and refraction. To address these challenges, we propose a Fast Depth Completion framework for Transparent objects (FDCT), which also benefits downstream tasks like object pose estimation. To leverage local information and avoid overfitting issues when integrating it with global information, we design a new fusion branch and shortcuts to exploit low-level features and a loss function to suppress overfitting. This results in an accurate and user-friendly depth rectification framework which can recover dense depth estimation from RGB-D images alone. Extensive experiments demonstrate that FDCT can run about 70 FPS with a higher accuracy than the state-of-the-art methods. We also demonstrate that FDCT can improve pose estimation in object grasping tasks. The source code is available at https://github.com/Nonmy/FDCT	翻訳日:2023-07-25 17:01:18 公開日:2023-07-23
# 線欠陥によるトポロジカル保護ヘリカルエッジ状態の後方散乱 Backscattering of topologically protected helical edge states by line defects ( http://arxiv.org/abs/2307.12271v1 ) ライセンス: Link先を確認	Mohadese Karimi, Mohsen Amini, Morteza Soltani, and Mozhgan Sadeghizadeh	(参考訳) 非磁性点欠陥の存在下での伝導の量子化は、2次元トポロジカル絶縁体におけるヘリカルエッジ状態のトポロジカル保護とスピンモーメントロックの結果である。この保護は、システムの量子ホール相におけるヘリカルエッジモードの後方散乱がないことを保証する。しかし,本研究は,この保護を損なう新たなアプローチを検討することに焦点を当てている。オンサイト不純物の線形配置は,ケイン・ミールモデルにおけるエッジ状態の位相的保護を効果的に高めることができる。この現象を調べるために,その幅にまたがるライン欠陥を含むアームチェアリボンについて検討する。タイト結合モデルと非平衡グリーン関数法を用いて,システムの伝送係数を計算する。その結果, 正のオンサイト電位に対するバルクギャップ下端近傍のエネルギーのコンダクタンスの抑制が明らかになった。この挙動をさらに理解するため,解析計算を行い,不純物チャネルの形成について議論する。このチャネルは、リボンの下端と上端をつなぐガップ内結合状態の重なりによって生じ、結果として後方散乱が容易になる。我々の説明は不純物の位置に近い場所における状態の局所密度の分析によって裏付けられている。 The quantization of conductance in the presence of non-magnetic point defects is a consequence of topological protection and the spin-momentum locking of helical edge states in two-dimensional topological insulators. This protection ensures the absence of backscattering of helical edge modes in the quantum Hall phase of the system. However, our study focuses on exploring a novel approach to disrupt this protection. We propose that a linear arrangement of on-site impurities can effectively lift the topological protection of edge states in the Kane-Mele model. To investigate this phenomenon, we consider an armchair ribbon containing a line defect spanning its width. Utilizing the tight-binding model and non-equilibrium Green's function method, we calculate the transmission coefficient of the system. Our results reveal a suppression of conductance at energies near the lower edge of the bulk gap for positive on-site potentials. To further comprehend this behavior, we perform analytical calculations and discuss the formation of an impurity channel. This channel arises due to the overlap of in-gap bound states, linking the bottom edge of the ribbon to its top edge, consequently facilitating backscattering. Our explanation is supported by the analysis of the local density of states at sites near the position of impurities.	翻訳日:2023-07-25 17:01:01 公開日:2023-07-23
# シーンテキスト認識のためのコンテキスト知覚並列デコーダ Context Perception Parallel Decoder for Scene Text Recognition ( http://arxiv.org/abs/2307.12270v1 ) ライセンス: Link先を確認	Yongkun Du and Zhineng Chen and Caiyan Jia and Xiaoting Yin and Chenxia Li and Yuning Du and Yu-Gang Jiang	(参考訳) Scene Text Recognition (STR) 法は高い精度と高速な推論速度を達成するのに苦労している。自己回帰(AR)ベースのSTRモデルは、事前に認識された文字を使って次の文字を反復的に復号する。精度の点で優位性を示す。しかし、この反復により推論速度も遅くなる。あるいは、並列デコード(PD)ベースのSTRモデルは、すべての文字を1つのデコードパスで推測する。推論速度の面では利点があるが、そのようなパスで堅牢な認識コンテキストを構築するのは難しいため、精度が悪くなる。本稿では,STRにおけるARデコーディングの実証的研究について述べる。また,ARデコーダの精度向上に加えて,ARデコーダの成功は,既存の研究で主張されている言語モデリングよりも,視覚的文脈認識のガイダンスを提供することにも寄与していることがわかった。その結果,1つのPDパスで文字列を復号化するためのコンテキスト知覚並列デコーダ (CPPD) を提案する。 CPPDは文字カウントモジュールと文字順序モジュールを考案する。テキストインスタンスが与えられた場合、前者は各文字の発生回数を推定し、後者は文字読み順序とプレースホルダーを推定する。キャラクタ予測タスクと合わせて、キャラクタシーケンスとキャラクタの出現場所をロバストに指示するコンテキストを構築し、arデコードによって伝達されるコンテキストをよく模倣する。英語と中国語のベンチマークの実験は、CPPDモデルが高い競争精度を達成することを示した。さらに、ARよりも約7倍高速で動作し、最も高速な認識器の1つである。コードはまもなくリリースされる。 Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based STR model uses the previously recognized characters to decode the next character iteratively. It shows superiority in terms of accuracy. However, the inference speed is slow also due to this iteration. Alternatively, parallel decoding (PD)-based STR model infers all the characters in a single decoding pass. It has advantages in terms of inference speed but worse accuracy, as it is difficult to build a robust recognition context in such a pass. In this paper, we first present an empirical study of AR decoding in STR. In addition to constructing a new AR model with the top accuracy, we find out that the success of AR decoder lies also in providing guidance on visual context perception rather than language modeling as claimed in existing studies. As a consequence, we propose Context Perception Parallel Decoder (CPPD) to decode the character sequence in a single PD pass. CPPD devises a character counting module and a character ordering module. Given a text instance, the former infers the occurrence count of each character, while the latter deduces the character reading order and placeholders. Together with the character prediction task, they construct a context that robustly tells what the character sequence is and where the characters appear, well mimicking the context conveyed by AR decoding. Experiments on both English and Chinese benchmarks demonstrate that CPPD models achieve highly competitive accuracy. Moreover, they run approximately 7x faster than their AR counterparts, and are also among the fastest recognizers. The code will be released soon.	翻訳日:2023-07-25 17:00:42 公開日:2023-07-23
# 教育における人間-aiハイブリッドエッセイの自動境界検出に向けて Towards Automatic Boundary Detection for Human-AI Hybrid Essay in Education ( http://arxiv.org/abs/2307.12267v1 ) ライセンス: Link先を確認	Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c, Guanliang Chen	(参考訳) 現代の大規模言語モデル(LLM)、例えばChatGPTの助けを借りて、人間とAIの協調的な記述が大幅に促進された。技術進歩によってもたらされる利便性を認める一方で、教育者は、学生がLLMを利用して部分的に執筆課題を完了し、人間とAIのハイブリッドテキストを原著として引き渡すのではないかと懸念している。そこで本研究では,人文コンテンツとAI生成コンテンツ間の遷移点を識別する境界検出問題として,ハイブリッドテキスト検出を形式化した。学生が書いたエッセイから文章を部分的に取り除き,不完全なエッセイを補うようChatGPTに指示することで,ハイブリッドエッセイデータセットを構築した。そこで我々は,(1)埋め込み学習過程において,人文コンテンツからAI生成コンテンツを分離する2段階検出手法を提案し,(2)隣り合う2つのプロトタイプ間の距離(プロトタイプは埋め込み空間におけるハイブリッドテキストからの連続文の集合の平均)を計算し,その境界が互いに最も近い2つのプロトタイプの間に存在すると仮定した。広範な実験を通じて,(1)提案手法が,異なる実験環境におけるベースラインメソッドを一貫して上回っていたこと,(2)埋め込み学習プロセス(ステップ)を要約した。 1) 単一境界ハイブリッドエッセイのバウンダリを検出する場合, 比較的大きなプロトタイプサイズを採用することにより, 提案手法の性能が向上し, ドメイン内設定では22ドル\%, ドメイン外設定では18ドル\%向上した。 Human-AI collaborative writing has been greatly facilitated with the help of modern large language models (LLM), e.g., ChatGPT. While admitting the convenience brought by technology advancement, educators also have concerns that students might leverage LLM to partially complete their writing assignment and pass off the human-AI hybrid text as their original work. Driven by such concerns, in this study, we investigated the automatic detection of Human-AI hybrid text in education, where we formalized the hybrid text detection as a boundary detection problem, i.e., identifying the transition points between human-written content and AI-generated content. We constructed a hybrid essay dataset by partially removing sentences from the original student-written essays and then instructing ChatGPT to fill in for the incomplete essays. Then we proposed a two-step detection approach where we (1) Separated AI-generated content from human-written content during the embedding learning process; and (2) Calculated the distances between every two adjacent prototypes (a prototype is the mean of a set of consecutive sentences from the hybrid text in the embedding space) and assumed that the boundaries exist between the two prototypes that have the furthest distance from each other. Through extensive experiments, we summarized the following main findings: (1) The proposed approach consistently outperformed the baseline methods across different experiment settings; (2) The embedding learning process (i.e., step 1) can significantly boost the performance of the proposed approach; (3) When detecting boundaries for single-boundary hybrid essays, the performance of the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a $22$\% improvement (against the second-best baseline method) in the in-domain setting and an $18$\% improvement in the out-of-domain setting.	翻訳日:2023-07-25 17:00:16 公開日:2023-07-23
# テキスト意味コミュニケーションのためのトランスフォーマベースジョイントソースチャネル符号化 Transformer-based Joint Source Channel Coding for Textual Semantic Communication ( http://arxiv.org/abs/2307.12266v1 ) ライセンス: Link先を確認	Shicong Liu, Zhen Gao, Gaojie Chen, Yu Su, Lu Peng	(参考訳) Space-Air-Ground-Sea統合ネットワークは、妨害に対するより堅牢でセキュアな送信技術を要求する。本稿では,文のモデル化とエンコードに先進的な自然言語処理技術を利用する,ロバスト伝送のためのテキスト意味伝達フレームワークを提案する。具体的には、テキスト文をワードピースアルゴリズムを用いてトークンに分割し、トランスフォーマベースのエンコーダによる意味抽出のためのトークンベクトルに埋め込む。符号化されたデータは、伝送のための固定長バイナリシーケンスに量子化され、バイナリ消去、対称、削除チャネルが検討される。受信されたバイナリシーケンスは、変換器デコーダによってさらに復号化され、文再構成に用いられるトークンとなる。提案手法は,ニューラルネットワークのパワーと注意機構を利用して,難易度の高い無線環境におけるテキストデータの信頼性と効率的な通信を実現する。 The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm, and are embedded to token vectors for semantic extraction by Transformer-based encoder. The encoded data are quantized to a fixed length binary sequence for transmission, where binary erasure, symmetric, and deletion channels are considered for transmission. The received binary sequences are further decoded by the transformer decoders into tokens used for sentence reconstruction. Our proposed approach leverages the power of neural networks and attention mechanism to provide reliable and efficient communication of textual data in challenging wireless environments, and simulation results on semantic similarity and bilingual evaluation understudy prove the superiority of the proposed model in semantic transmission.	翻訳日:2023-07-25 16:59:40 公開日:2023-07-23
# マンダリン音声認識における高速アクセント領域拡張のためのメタ学習方式 A meta learning scheme for fast accent domain expansion in Mandarin speech recognition ( http://arxiv.org/abs/2307.12262v1 ) ライセンス: Link先を確認	Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu	(参考訳) 音声言語は、マンダリンとアクセントに大きな変化を示す。マンダリン自動音声認識(ASR)の性能は高いが,アクセントASRは依然として課題である。本稿では,マンダリンasrの性能を損なうことなくアクセントの分野を拡大する,マンダリン音声認識におけるアクセント領域の高速拡張のためのメタラーニング手法を提案する。メタラーニング(meta-learning)やlearn-to-learn(learning-to-learn)は、特定のドメインをオーバーフィットするだけでなく、複数のドメインで一般的な関係を学ぶことができる。そこでドメイン拡張タスクでメタラーニングを選択する。このより本質的な学習はアクセントドメイン拡張タスクのパフォーマンスを改善する。モデルパラメータのメタ学習と凍結の手法を組み合わせることで、異なるケースで認識性能がより安定し、トレーニングが約20%高速になる。本手法はアクセント領域拡張タスクにおいて,他の手法を約3%上回っている。ベースラインモデルと比較して、マンダリン試験セットが変化しない条件下では比較的37%改善する。さらに,この手法はアクセントテストセット上での相対的な性能改善を4%とした大量のデータに対して有効であることを示した。 Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-learning or learn-to-learn can learn general relation in multi domains not only for over-fitting a specific domain. So we select meta-learning in the domain expansion task. This more essential learning will cause improved performance on accent domain extension tasks. We combine the methods of meta learning and freeze of model parameters, which makes the recognition performance more stable in different cases and the training faster about 20%. Our approach significantly outperforms other methods about 3% relatively in the accent domain expansion task. Compared to the baseline model, it improves relatively 37% under the condition that the mandarin test set remains unchanged. In addition, it also proved this method to be effective on a large amount of data with a relative performance improvement of 4% on the accent test set.	翻訳日:2023-07-25 16:59:26 公開日:2023-07-23
# クロスインタラクションによるリモートセンシング画像からのビルディング・ロード協調抽出 Building-road Collaborative Extraction from Remotely Sensed Images via Cross-Interaction ( http://arxiv.org/abs/2307.12256v1 ) ライセンス: Link先を確認	Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang	(参考訳) 建物は社会生産と人間生活の基本的な担体であり、道路はソーシャルネットワークを繋ぐリンクである。建築・道路情報は、地域連携開発、防災、自動運転等のフロンティア分野において重要な応用価値を有する。超高解像度(VHR)リモートセンシング画像からの建物や道路のマッピングがホットな研究トピックとなっている。しかし、既存の手法は道路と建物の間の強い空間的相関を無視し、孤立して抽出することが多い。建物と道路の相補的な利点をフル活用するために,マルチタスクとクロススケール機能インタラクションに基づくビル-ロード協調抽出手法を提案し,両タスクの精度を補完的に向上させる。マルチタスク学習におけるシーソー現象に対処する,タスク間での情報交換と各タスクのユニークな情報保存のためのマルチタスクインタラクションモジュールを提案する。建物と道路の外観や構造の変化を考慮し,異なるタスクに対する最適受信場を自動的に学習するクロススケール相互作用モジュールを設計する。個別にタスクを訓練する既存の多くの方法と比較して,提案手法は,タスク間および大規模機能間相互作用によって,建物と道路の相補的優位性を活用でき,タスクごとに最適な受信フィールドを自動的に選択できる。都市・農村の幅広いシナリオにおける実験により,提案アルゴリズムは優れた性能と効率でビルディングロード抽出を実現できることを示した。 Buildings are the basic carrier of social production and human life; roads are the links that interconnect social networks. Building and road information has important application value in the frontier fields of regional coordinated development, disaster prevention, auto-driving, etc. Mapping buildings and roads from very high-resolution (VHR) remote sensing images have become a hot research topic. However, the existing methods often ignore the strong spatial correlation between roads and buildings and extract them in isolation. To fully utilize the complementary advantages between buildings and roads, we propose a building-road collaborative extraction method based on multi-task and cross-scale feature interaction to improve the accuracy of both tasks in a complementary way. A multi-task interaction module is proposed to interact information across tasks and preserve the unique information of each task, which tackle the seesaw phenomenon in multitask learning. By considering the variation in appearance and structure between buildings and roads, a cross-scale interaction module is designed to automatically learn the optimal reception field for different tasks. Compared with many existing methods that train each task individually, the proposed collaborative extraction method can utilize the complementary advantages between buildings and roads by the proposed inter-task and inter-scale feature interactions, and automatically select the optimal reception field for different tasks. Experiments on a wide range of urban and rural scenarios show that the proposed algorithm can achieve building-road extraction with outstanding performance and efficiency.	翻訳日:2023-07-25 16:59:08 公開日:2023-07-23
# 物理インフォームドニューラルネットワークによる次元の呪いへの取り組み Tackling the Curse of Dimensionality with Physics-Informed Neural Networks ( http://arxiv.org/abs/2307.12306v1 ) ライセンス: Link先を確認	Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi	(参考訳) 次元の呪い (CoD) は計算資源に重きを置き、次元が大きくなるにつれて計算コストが指数関数的に増加する。これは60年以上前にRichard Bellman氏が最初に指摘したように、高次元PDEを解決する上で大きな課題となる。近年、数値偏微分方程式(PDE)を高次元で解くことに成功したが、そのような計算は違法に高価であり、一般的な非線形PDEの高次元への真のスケーリングは達成されていない。本稿では,任意の高次元PDEを解くために,物理インフォームドニューラルネットワーク(PINN)をスケールアップする新しい手法を提案する。新しい手法はStochastic Dimension Gradient Descent (SDGD)と呼ばれ、PDEの勾配を異なる次元に対応するピースに分解し、トレーニングPINNの各イテレーションでこれらの次元のサブセットをランダムにサンプリングする。提案手法の収束保証とその他の望ましい性質を理論的に証明する。提案手法は,Hamilton-Jacobi-Bellman や Schr\"{o}dinger 方程式など,非常に難しい高次元 PDE を,PINN のメッシュフリーアプローチを用いて,単一のGPU上で非常に高速に解けることを示す。例えば、非自明な非線形PDE(HJB-Lin方程式とBSB方程式)を、PINNを用いたSDGDを用いて1つのGPU上で6時間で10,000次元で解く。 SDGD は PINN の一般的な訓練手法であるため、SDGD は任意の高次元 PDE に対してスケールアップするために、現在および将来の PINN のどの変種にも適用することができる。 The curse-of-dimensionality (CoD) taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs as Richard Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. In this paper, we develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and samples randomly a subset of these dimensional pieces in each iteration of training PINNs. We theoretically prove the convergence guarantee and other desired properties of the proposed method. We experimentally demonstrate that the proposed method allows us to solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman and the Schr\"{o}dinger equations in thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. For example, we solve nontrivial nonlinear PDEs (the HJB-Lin equation and the BSB equation) in 100,000 dimensions in 6 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, SDGD can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs.	翻訳日:2023-07-25 16:51:19 公開日:2023-07-23
# アルゴンガス駆動型溶融プールダイナミクスの物理インフォームド機械学習 Physics-Informed Machine Learning of Argon Gas-Driven Melt Pool Dynamics ( http://arxiv.org/abs/2307.12304v1 ) ライセンス: Link先を確認	R. Sharma, W. Grace Guo, M. Raissi, Y.B. Guo	(参考訳) 金属添加物製造(AM)における溶融プールダイナミクスは, 印刷材料の安定性, 微細構造形成, 最終特性の処理に重要である。計算流体力学(CFD)を含む物理シミュレーションは, 溶融プール力学を予測する主要な手法である。しかし、物理ベースのシミュレーションアプローチは、計算コストが非常に高いという本質的な問題に苦しむ。本稿では,ニューラルネットワークと制御物理法則を統合し,温度,速度,圧力などの溶融プールのダイナミクスを速度に関するトレーニングデータを用いることなく予測する物理インフォームド機械学習(PIML)手法を提案する。このアプローチは、非常に非線形なナビエ-ストークス方程式を数値的に解くことを避け、計算コストを大幅に削減する。溶融プールの制御方程式の決定が難しいモデル定数は、データ駆動による発見によっても推測できる。さらに、物理インフォームドニューラルネットワーク(PINN)アーキテクチャは、効率的なモデルトレーニングのために最適化されている。データ効率のよいPINNモデルは、制御偏微分方程式(PDE)、初期条件、PINNモデルの境界条件を組み込むことによって、ソフトペナルティに起因している。 Melt pool dynamics in metal additive manufacturing (AM) is critical to process stability, microstructure formation, and final properties of the printed materials. Physics-based simulation including computational fluid dynamics (CFD) is the dominant approach to predict melt pool dynamics. However, the physics-based simulation approaches suffer from the inherent issue of very high computational cost. This paper provides a physics-informed machine learning (PIML) method by integrating neural networks with the governing physical laws to predict the melt pool dynamics such as temperature, velocity, and pressure without using any training data on velocity. This approach avoids solving the highly non-linear Navier-Stokes equation numerically, which significantly reduces the computational cost. The difficult-to-determine model constants of the governing equations of the melt pool can also be inferred through data-driven discovery. In addition, the physics-informed neural network (PINN) architecture has been optimized for efficient model training. The data-efficient PINN model is attributed to the soft penalty by incorporating governing partial differential equations (PDEs), initial conditions, and boundary conditions in the PINN model.	翻訳日:2023-07-25 16:50:48 公開日:2023-07-23
# RANSAC-NN:RANSACを用いた教師なし画像異常検出 RANSAC-NN: Unsupervised Image Outlier Detection using RANSAC ( http://arxiv.org/abs/2307.12301v1 ) ライセンス: Link先を確認	Chen-Han Tsai, Yu-Shao Peng	(参考訳) 画像異常検出(OD)は、コンピュータビジョンタスクで使用される画像データセットの品質と精度を保証するために重要である。しかし、ODアルゴリズムの大部分は画像データを対象としていない。したがって、そのようなアルゴリズムを画像に適用する結果はしばしば最適ではない。本研究では,画像に特化して設計された新しい教師なしODアルゴリズムであるRANSAC-NNを提案する。 RANSACに基づくアプローチで画像を比較することにより、トレーニングやラベル情報なしで各画像の外れ値を自動的に予測する。 RANSAC-NNを15種類のデータセット上の最先端ODアルゴリズムに対して評価する。 RANSAC-NNは、ハイパーパラメータチューニングがなければ、ほぼすべてのデータセットカテゴリの他のアルゴリズムとは対照的に、一貫して好意的に機能する。さらに、RANSAC-NNの各コンポーネントを理解するための詳細な分析を行い、画像誤ラベル検出におけるその可能性を示す。 RANSAC-NNのコードはhttps://github.com/mxtsai/ransac-nnで公開されている。 Image outlier detection (OD) is crucial for ensuring the quality and accuracy of image datasets used in computer vision tasks. The majority of OD algorithms, however, have not been targeted toward image data. Consequently, the results of applying such algorithms to images are often suboptimal. In this work, we propose RANSAC-NN, a novel unsupervised OD algorithm specifically designed for images. By comparing images in a RANSAC-based approach, our algorithm automatically predicts the outlier score of each image without additional training or label information. We evaluate RANSAC-NN against state-of-the-art OD algorithms on 15 diverse datasets. Without any hyperparameter tuning, RANSAC-NN consistently performs favorably in contrast to other algorithms in almost every dataset category. Furthermore, we provide a detailed analysis to understand each RANSAC-NN component, and we demonstrate its potential applications in image mislabeled detection. Code for RANSAC-NN is provided at https://github.com/mxtsai/ransac-nn	翻訳日:2023-07-25 16:50:31 公開日:2023-07-23
# hybrid-csr : 皮質表面再構成のための明示的および暗黙的形状表現 Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction ( http://arxiv.org/abs/2307.12299v1 ) ライセンス: Link先を確認	Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie	(参考訳) 我々は,皮質表面再構成のための明示的および暗黙的な形状表現を組み合わせた幾何学的深層学習モデルであるHybrid-CSRを提案する。具体的には、Hybrid-CSRはテンプレートメッシュの明示的な変形から始まり、粗い再構成された皮質表面を得る。これにより,明示的(指向的点雲)と暗黙的(インジケータ関数)皮質表面再構成を統一する。明示的な表現ベース手法と比較すると,このハイブリッド手法は詳細な構造を捉えるのに好適であり,暗黙的な表現ベース手法と比較すると,メッシュベースの変形モジュールを用いたエンドツーエンドのトレーニングによりトポロジを認識できる。トポロジー欠陥に対処するために,最適化に基づく微分曲面登録に依存する新しいトポロジー補正パイプラインを提案する。 3つの脳データセットによる実験結果から,従来の暗黙的および明示的な皮質表面再構成法を精度,規則性,一貫性の点で超えた。 We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction. Specifically, Hybrid-CSR begins with explicit deformations of template meshes to obtain coarsely reconstructed cortical surfaces, based on which the oriented point clouds are estimated for the subsequent differentiable poisson surface reconstruction. By doing so, our method unifies explicit (oriented point clouds) and implicit (indicator function) cortical surface reconstruction. Compared to explicit representation-based methods, our hybrid approach is more friendly to capture detailed structures, and when compared with implicit representation-based methods, our method can be topology aware because of end-to-end training with a mesh-based deformation module. In order to address topology defects, we propose a new topology correction pipeline that relies on optimization-based diffeomorphic surface registration. Experimental results on three brain datasets show that our approach surpasses existing implicit and explicit cortical surface reconstruction methods in numeric metrics in terms of accuracy, regularity, and consistency.	翻訳日:2023-07-25 16:50:19 公開日:2023-07-23
# 超伝導カーキャット量子ビットの安定化と散逸情報伝達 Stabilization and Dissipative Information Transfer of a Superconducting Kerr-Cat Qubit ( http://arxiv.org/abs/2307.12298v1 ) ライセンス: Link先を確認	Ufuk Korkmaz, Deniz T\"urkpen\c{c}e	(参考訳) 今日では量子コンピュータの競争が続き、ハードウェアにおける量子ビットの数は急速に増加している。しかし、このプロセスに伴う量子ノイズはアルゴリズムアプリケーションの性能を低下させるため、量子コンピュータアーキテクチャやアルゴリズムの実装における別の方法が議論されている。これらの方法の1つは、回路ベースの量子コンピューティングモデルと散逸ベースのコンピューティングモデルとのハイブリッド化である。ここでの目標は、量子回路モデルに量子アドバンテージを提供するアルゴリズムの一部と、ノイズの影響が少ない散逸モデルに残りの部分を適用することである。このスキームは、非常に反復的なプロセスを含む量子機械学習アルゴリズムにおいて重要であり、ノイズの影響を受けやすい。本研究では,cat-qubit と呼ばれる qubit モデルへの散逸情報転送について検討する。このモデルは、量子機械学習アルゴリズムの基本的な処理単位である二項量子分類の散逸ベースのバージョンで特に重要である。一方、キャット量子ビットアーキテクチャは、その豊富な物理性のため、人工ニューラルネットワークにアクティベーションのような機能を簡単に実装できる可能性があり、量子人工ニューラルネットワークの代替ハードウェアの機会を提供する。数値計算は、繰り返し相互作用に基づく散逸的スキームによる貯水池キュービットからの量子情報の転送に成功したことを示す。 Today, the competition to build a quantum computer continues, and the number of qubits in hardware is increasing rapidly. However, the quantum noise that comes with this process reduces the performance of algorithmic applications, so alternative ways in quantum computer architecture and implementation of algorithms are discussed on the one hand. One of these alternative ways is the hybridization of the circuit-based quantum computing model with the dissipative-based computing model. Here, the goal is to apply the part of the algorithm that provides the quantum advantage with the quantum circuit model, and the remaining part with the dissipative model, which is less affected by noise. This scheme is of importance to quantum machine learning algorithms that involve highly repetitive processes and are thus susceptible to noise. In this study, we examine dissipative information transfer to a qubit model called Cat-Qubit. This model is especially important for the dissipative-based version of the binary quantum classification, which is the basic processing unit of quantum machine learning algorithms. On the other hand, Cat-Qubit architecture, which has the potential to easily implement activation-like functions in artificial neural networks due to its rich physics, also offers an alternative hardware opportunity for quantum artificial neural networks. Numerical calculations exhibit successful transfer of quantum information from reservoir qubits by a repeated-interactions-based dissipative scheme.	翻訳日:2023-07-25 16:50:00 公開日:2023-07-23
# 複数フレームからの同時温度推定と非均一性補正 Simultaneous temperature estimation and nonuniformity correction from multiple frames ( http://arxiv.org/abs/2307.12297v1 ) ライセンス: Link先を確認	Navot Oz, Omri Berman, Nir Sochen, David Mendelovich, Iftach Klapp	(参考訳) 赤外線カメラは農業、医療、セキュリティなど様々な用途で温度測定に広く利用されている。しかし、低コストのマイクロボロメーターベースの赤外線カメラは、空間的に変化する非均一性や温度測定のドリフトが起こりやすいため、実用的なシナリオでは使用性に制限がある。これらの制約に対処するために, 低コストのマイクロボロメータベースirカメラで撮影した複数フレームの温度推定と非均一性補正を同時に行う新しい手法を提案する。我々は、カメラの物理的画像取得モデルを利用し、カーネル推定ネットワーク(kpn)と呼ばれるディープラーニングアーキテクチャに組み込む。また,環境温度をモデルに組み込んだ新しいオフセットブロックを提案し,温度推定の重要な要因であるカメラのオフセットを推定する。その結果,フレームの数は温度推定の精度や不均一性補正に有意な影響を及ぼすことがわかった。さらに,本手法はオフセットブロックにより,バニラKPNに比べて性能が大幅に向上した。この手法は、UAVに搭載された低コストの赤外線カメラによって収集された実データに基づいてテストされ、コストの高い科学的グレードのラジオメトリックカメラと比較して、わずか0.27^\circ C-0.54^\circ C$の誤差しか示さなかった。本手法は, 温度推定と非一様性補正を同時に行うための精度と効率のよい解を提供する。 Infrared (IR) cameras are widely used for temperature measurements in various applications, including agriculture, medicine, and security. Low-cost IR camera have an immense potential to replace expansive radiometric cameras in these applications, however low-cost microbolometer-based IR cameras are prone to spatially-variant nonuniformity and to drift in temperature measurements, which limits their usability in practical scenarios. To address these limitations, we propose a novel approach for simultaneous temperature estimation and nonuniformity correction from multiple frames captured by low-cost microbolometer-based IR cameras. We leverage the physical image acquisition model of the camera and incorporate it into a deep learning architecture called kernel estimation networks (KPN), which enables us to combine multiple frames despite imperfect registration between them. We also propose a novel offset block that incorporates the ambient temperature into the model and enables us to estimate the offset of the camera, which is a key factor in temperature estimation. Our findings demonstrate that the number of frames has a significant impact on the accuracy of temperature estimation and nonuniformity correction. Moreover, our approach achieves a significant improvement in performance compared to vanilla KPN, thanks to the offset block. The method was tested on real data collected by a low-cost IR camera mounted on a UAV, showing only a small average error of $0.27^\circ C-0.54^\circ C$ relative to costly scientific-grade radiometric cameras. Our method provides an accurate and efficient solution for simultaneous temperature estimation and nonuniformity correction, which has important implications for a wide range of practical applications.	翻訳日:2023-07-25 16:49:40 公開日:2023-07-23
# 分類法と早期糖尿病の比較分析 Comparative analysis using classification methods versus early stage diabetes ( http://arxiv.org/abs/2307.12296v1 ) ライセンス: Link先を確認	Alca-Vilca Gabriel Anthony, Carpio-Vargas Eloy	(参考訳) 本研究では, 早期糖尿病の有無を判断するために, 判別分析やロジスティック回帰などの分類法を用いて比較分析を行った。この目的のために、2020年のUC IRVINEプラットフォーム(英語版)のデータベースを用いており、糖尿病に影響を与える特定の変数がより良い結果を得るために使用された。方法論的にも同様に、3つの分類法それぞれについて対応する解析を行い、比較表に含め、得られた結果を解析した。最後に, 分類法を疾患に適用した研究の大部分は, ロジスティック回帰分類法に一定のアタッチメントがあり, さらなる利用が期待できるが, その結果, 適用された2つの分類法に関して有意な差がみられ, 最終的な結論を導く上で貴重な情報となった。 In this research work, a comparative analysis was carried out using classification methods such as: Discriminant Analysis and Logistic Regression to subsequently predict whether a person may have the presence of early stage diabetes. For this purpose, use was made of a database of the UC IRVINE platform of the year 2020 where specific variables that influence diabetes were used for a better result. Likewise in terms of methodology, the corresponding analysis was performed for each of the 3 classification methods and then take them to a comparative table and analyze the results obtained. Finally we can add that the majority of the studies carried out applying the classification methods to the diseases can be clearly seen that there is a certain attachment and more use of the logistic regression classification method, on the other hand, in the results we could see significant differences in terms of the 2 classification methods that were applied, which was valuable information for later drawing final conclusions.	翻訳日:2023-07-25 16:49:10 公開日:2023-07-23
# 量子分類器の散逸学習 Dissipative learning of a quantum classifier ( http://arxiv.org/abs/2307.12293v1 ) ライセンス: Link先を確認	Ufuk Korkmaz, Deniz T\"urkpen\c{c}e	(参考訳) 量子計算が機械学習アルゴリズムにパフォーマンスの利点をもたらすかもしれないという期待は、ニューラルネットワークの量子バージョンの開発を動機付けている。本研究では,標準量子回路モデルの代替となるオープン量子システムとして機能する量子分類器モデルの学習力学を解析する。得られた結果から、勾配降下(GD)に基づくアルゴリズムを用いて、モデルをうまく訓練することができる。これらの最適化プロセスが連続ダイナミクスで得られたという事実は、分類器モデルの微分可能活性化関数の開発に有望であることを示している。 The expectation that quantum computation might bring performance advantages in machine learning algorithms motivates the work on the quantum versions of artificial neural networks. In this study, we analyze the learning dynamics of a quantum classifier model that works as an open quantum system which is an alternative to the standard quantum circuit model. According to the obtained results, the model can be successfully trained with a gradient descent (GD) based algorithm. The fact that these optimization processes have been obtained with continuous dynamics, shows promise for the development of a differentiable activation function for the classifier model.	翻訳日:2023-07-25 16:48:55 公開日:2023-07-23
# TransHuman: 汎用型ニューラルヒューマンレンダリングのためのトランスフォーマーに基づく人間表現 TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering ( http://arxiv.org/abs/2307.12291v1 ) ライセンス: Link先を確認	Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang	(参考訳) 本稿では,異なる文字のマルチビュー映像から条件付きニューラルレイディアンス場(NeRF)を訓練する,一般化可能なニューラルヒューマンレンダリングの課題に焦点を当てる。ダイナミックな人間の動きを扱うために、従来の手法は主にSparseConvNet(SPC)ベースの人間の表現を使用して、塗装されたSMPLを処理する。しかし、そのようなSPCベースの表現一トレーニングと推論段階の相違につながる揮発性観測空間の下で最適化すること。二不完全塗布されたSMPLの処理に欠かせない部分のグローバルな関係を欠いていること。これらの問題に対処するため,トランスヒューマン(TransHuman)という新しいフレームワークを提案する。このフレームワークは,塗装されたSMPLを標準空間下で学習し,トランスフォーマーによる人間の世界的関係を捉える。具体的には、TransHumanは主にTransformerベースのHuman Encoding(TransHE)、Deformable partial Radiance Fields(DPaRF)、FDI(Fin-fine Detail Integration)で構成されている。 TransHEはまず、塗られたSMPLを変換器を介して標準的な空間下で処理し、人間の部分間のグローバルな関係を捉える。そして、DPaRFは、各出力トークンを、観測空間下でクエリポイントを符号化する変形可能な放射場にバインドする。最後に、FDIを使用して参照画像からのきめ細かい情報をさらに統合する。 ZJU-MoCapとH36Mの大規模な実験により、我々のTransHumanは、高い効率で最先端のパフォーマンスを著しく向上することを示した。プロジェクトページ: https://pansanity666.github.io/TransHuman/ In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. To handle the dynamic human motion, previous methods have primarily used a SparseConvNet (SPC)-based human representation to process the painted SMPL. However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL. Tackling these issues, we present a brand-new framework named TransHuman, which learns the painted SMPL under the canonical space and captures the global relationships between human parts with transformers. Specifically, TransHuman is mainly composed of Transformer-based Human Encoding (TransHE), Deformable Partial Radiance Fields (DPaRF), and Fine-grained Detail Integration (FDI). TransHE first processes the painted SMPL under the canonical space via transformers for capturing the global relationships between human parts. Then, DPaRF binds each output token with a deformable radiance field for encoding the query point under the observation space. Finally, the FDI is employed to further integrate fine-grained information from reference images. Extensive experiments on ZJU-MoCap and H36M show that our TransHuman achieves a significantly new state-of-the-art performance with high efficiency. Project page: https://pansanity666.github.io/TransHuman/	翻訳日:2023-07-25 16:48:46 公開日:2023-07-23
# 時系列ゲームのためのコントローラ合成 Controller Synthesis for Timeline-based Games ( http://arxiv.org/abs/2307.12289v1 ) ライセンス: Link先を確認	Renato Acampora and Luca Geatti and Nicola Gigante and Angelo Montanari and Valentino Picotti	(参考訳) 計画に対するタイムラインベースのアプローチでは、状態変数(タイムライン)の集合の時間的経過は、時間的制約の集合によって制御される。従来のタイムラインベースの計画システムは、時間的不確実性を扱うことによって計画と実行の統合に優れている。一般の非決定性を扱うために、タイムラインベースのゲームの概念が最近導入された。このようなゲームに勝利戦略が存在するかどうかが2EXPTIME完全であることが証明されている。しかし、そのような戦略を実装するコントローラを合成する具体的なアプローチは欠落している。本稿では,このギャップを埋めるために,タイムラインベースのゲームに対して,効果的かつ計算的に最適なコントローラ合成手法を提案する。 In the timeline-based approach to planning, the evolution over time of a set of state variables (the timelines) is governed by a set of temporal constraints. Traditional timeline-based planning systems excel at the integration of planning with execution by handling temporal uncertainty. In order to handle general nondeterminism as well, the concept of timeline-based games has been recently introduced. It has been proved that finding whether a winning strategy exists for such games is 2EXPTIME-complete. However, a concrete approach to synthesize controllers implementing such strategies is missing. This paper fills this gap, by providing an effective and computationally optimal approach to controller synthesis for timeline-based games.	翻訳日:2023-07-25 16:48:19 公開日:2023-07-23
# 時間的ネットワーク分析:Rを用いた導入,方法,詳細なチュートリアル Temporal network analysis: Introduction, methods and detailed tutorial with R ( http://arxiv.org/abs/2307.12339v1 ) ライセンス: Link先を確認	Mohammed Saqr	(参考訳) 学習には、関係、相互作用、学習者、教師、世界全体とのつながりが含まれる。このような相互作用は本質的に時間的かつ時間的展開である。しかし、研究者は分析フレームワークに2つの側面(時間的側面と関係的側面)を組み合わせることはめったにない。時間的ネットワークは、きめ細かな動的解析を通じて、活動、コミュニティ、社会プロセスの出現と流れといった時間的学習プロセスのモデル化を可能にする。これは知識の共構築、情報の流れ、関係構築のような現象に関する洞察を与えることができる。本章では、時間的ネットワークの基本概念、その種類、技術を紹介する。本章では,ネットワークの構築,可視化,ノードおよびグラフレベルでの数学的解析から始める,時間的ネットワーク解析の詳細なガイドを紹介する。分析は実世界のデータセットで実行される。議論の章では、技術に関する知識を広げたい興味のあるユーザに、追加のリソースを提供している。 Learning involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time. Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework. Temporal networks allow modeling of the temporal learning processes i.e., the emergence and flow of activities, communities, and social processes through fine-grained dynamic analysis. This can provide insights into phenomena like knowledge co-construction, information flow, and relationship building. This chapter introduces the basic concepts of temporal networks, their types and techniques. A detailed guide of temporal network analysis is introduced in this chapter, that starts with building the network, visualization, mathematical analysis on the node and graph level. The analysis is performed with a real-world dataset. The discussion chapter offers some extra resources for interested users who want to expand their knowledge of the technique.	翻訳日:2023-07-25 16:43:08 公開日:2023-07-23
# TabADM:拡散モデルによる教師なし喉頭異常検出 TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models ( http://arxiv.org/abs/2307.12336v1 ) ライセンス: Link先を確認	Guy Zamberg and Moshe Salhov and Ofir Lindenbaum and Amir Averbuch	(参考訳) テーブルは、あらゆる科学分野のユースケースを持つ豊富な形式のデータである。実世界のデータセットは、下流の分析に悪影響を及ぼす可能性のある異常なサンプルを含むことが多い。本研究では,汚染データへのアクセスを想定し,非教師あり異常検出に有効な拡散に基づく確率モデルを提案する。本モデルでは, 特異な拒絶スキームを用いて, 正常試料の密度分布を学習し, 異常が密度推定に与える影響を弱めるように訓練した。低密度領域のサンプルとして異常を同定する。実データを用いて,本手法がベースラインよりも検出能力を向上させることを示す。さらに,本手法はデータ次元に対して比較的安定であり,広範囲なハイパーパラメータチューニングを必要としない。 Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.	翻訳日:2023-07-25 16:42:51 公開日:2023-07-23
# セマンティックマップによるナビゲーション視覚表現の学習 Learning Navigational Visual Representations with Semantic Map Supervision ( http://arxiv.org/abs/2307.12335v1 ) ライセンス: Link先を確認	Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan	(参考訳) 家庭用ロボットの視覚的ナビゲーションには,環境の意味や空間構造を知覚できることが不可欠である。しかし、既存のほとんどの作品は、独立した分類のための画像や、屋内ナビゲーション領域に適応するための自己教師付き学習手法で事前訓練された視覚的バックボーンのみを用いており、ナビゲーションの学習に不可欠な空間的関係を無視している。本稿では,人間が自然に脳に意味的かつ空間的に有意味な認知地図を構築する行動に着想を得て,エージェントの自己中心的視点と意味的地図(ego$^2$-map)を対比して,新たなナビゲーション固有視覚表現学習法を提案する。バックボーンエンコーダとしてビジュアルトランスフォーマーを適用し,大規模Habitat-Matterport3D環境から収集したデータを用いてモデルを訓練する。 Ego$^2$-Map学習は、オブジェクト、構造、遷移などのコンパクトでリッチな情報を、ナビゲーションのためのエージェントのエゴセントリックな表現に転送する。実験の結果,学習した目標ナビゲーション表現を用いたエージェントは,近年の視覚前訓練法よりも優れていた。さらに,高レベルかつ低レベルなアクション空間の連続環境における視覚・言語ナビゲーションを著しく改善し,テストサーバ上での47%のSRと41%のSPLの新たな最先端結果を実現した。 Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego$^2$-Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments. Ego$^2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation. Experiments show that agents using our learned representations on object-goal navigation outperform recent visual pre-training methods. Moreover, our representations significantly improve vision-and-language navigation in continuous environments for both high-level and low-level action spaces, achieving new state-of-the-art results of 47% SR and 41% SPL on the test server.	翻訳日:2023-07-25 16:42:35 公開日:2023-07-23
# 深部ニューラルネットワークの公理化PDEモデル An axiomatized PDE model of deep neural networks ( http://arxiv.org/abs/2307.12333v1 ) ライセンス: Link先を確認	Tangjun Wang, Wenqi Tao, Chenglong Bao, Zuoqiang Shi	(参考訳) ディープニューラルネットワーク (DNN) と偏微分方程式 (PDE) の関係に着想を得て, ディープニューラルネットワークのPDEモデルの一般形について検討する。この目的を達成するために、単純なベースモデルからDNNを進化演算子として定式化する。いくつかの合理的な仮定に基づいて、進化作用素が実際に対流拡散方程式によって決定されることを示す。この対流拡散方程式モデルは、いくつかの有効なネットワークの数学的説明を与える。さらに,対流拡散モデルによりロバスト性が向上し,Rademacherの複雑性が低下することを示す。対流拡散方程式に基づいて,ResNetsの新しいトレーニング手法を設計する。提案手法の性能を検証する実験を行った。 Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method.	翻訳日:2023-07-25 16:41:36 公開日:2023-07-23
# フェイクニュース検出のためのX-CapsNet X-CapsNet For Fake News Detection ( http://arxiv.org/abs/2307.12332v1 ) ライセンス: Link先を確認	Mohammad Hadi Goldani, Reza Safabakhsh, and Saeedeh Momtazi	(参考訳) ウェブベースのフォーラムやソーシャルメディアの普及に伴い、ニュースの消費は大幅に増加した。これは、人々に誤解を与え、混乱させる舞台となる。ユーザの健康関連判断やその他の意図に対する誤った情報の影響を減らすために、フェイクニュースを自動的に検出し、対処するための機械学習モデルが望まれる。本稿では, X-CapsNet と呼ばれる Capsule Neural Networks (CapsNet) を用いたトランスフォーマーモデルを提案する。このモデルには、短くて長いフェイクニュース文を検出するサイズベースの分類器をパラライズした動的ルーティングアルゴリズムを備えたcapsnetが含まれている。 2つのサイズベースの分類器と、長い偽ニュース文を検出するディープ畳み込みニューラルネットワーク(dcnn)と、短いニュース文を検出する多層パーセプトロン(mlp)を使用する。短いニュース文の表現の問題を解決するために、ニュース話者プロファイルのベクトルと、ニュース文の極性、感情、カウントのベクトルを連結した間接的なニュース特徴を用いる。提案するアーキテクチャを評価するために、covid-19とliarデータセットを使用する。 Covid-19データセットのF1スコアとLiarデータセットの精度から見ると、モデルは最先端のベースラインよりも優れたパフォーマンスを示している。 News consumption has significantly increased with the growing popularity and use of web-based forums and social media. This sets the stage for misinforming and confusing people. To help reduce the impact of misinformation on users' potential health-related decisions and other intents, it is desired to have machine learning models to detect and combat fake news automatically. This paper proposes a novel transformer-based model using Capsule neural Networks(CapsNet) called X-CapsNet. This model includes a CapsNet with dynamic routing algorithm paralyzed with a size-based classifier for detecting short and long fake news statements. We use two size-based classifiers, a Deep Convolutional Neural Network (DCNN) for detecting long fake news statements and a Multi-Layer Perceptron (MLP) for detecting short news statements. To resolve the problem of representing short news statements, we use indirect features of news created by concatenating the vector of news speaker profiles and a vector of polarity, sentiment, and counting words of news statements. For evaluating the proposed architecture, we use the Covid-19 and the Liar datasets. The results in terms of the F1-score for the Covid-19 dataset and accuracy for the Liar dataset show that models perform better than the state-of-the-art baselines.	翻訳日:2023-07-25 16:41:18 公開日:2023-07-23
# ES2Net:ハイパースペクトル画像変化検出のための効率的なスペクトル空間ネットワーク ES2Net: An Efficient Spectral-Spatial Network for Hyperspectral Image Change Detection ( http://arxiv.org/abs/2307.12327v1 ) ライセンス: Link先を確認	Qingren Yao, Yuan Zhou, and Wei Xiang	(参考訳) ハイパースペクトル画像変化検出(HSI-CD)は,両眼的HSIの違いを特定することを目的としている。スペクトルの冗長性を緩和し,特徴変化の識別性を向上するため,CDの帯域選択に帯域選択技術を導入した手法もある。しかし、これらの手法は、深層学習に基づく特徴抽出器によるエンドツーエンドの訓練ができないことと、バンド間の複雑な非線形関係を考慮していないことによる制限がある。本稿では,これらの問題に対処するためのスペクトル空間変化検出ネットワーク(ES2Net)を提案する。具体的には,CDに習熟したバンドを自動選択する学習可能なバンド選択モジュールを考案した。特徴抽出ネットワークと共同で最適化し、バンド間の複雑な非線形関係を捉えることができる。さらに,異なる帯域間の空間的特徴分布の相違を考慮し,各バンドに空間的注意因子を割り当てるクラスタ単位の空間的注意機構を設計し,各バンドの特徴識別性を個別に改善する。 3つの広く使われているHSI-CDデータセットの実験は、他の最先端手法と比較して、この手法の有効性と優位性を示している。 Hyperspectral image change detection (HSI-CD) aims to identify the differences in bitemporal HSIs. To mitigate spectral redundancy and improve the discriminativeness of changing features, some methods introduced band selection technology to select bands conducive for CD. However, these methods are limited by the inability to end-to-end training with the deep learning-based feature extractor and lack considering the complex nonlinear relationship among bands. In this paper, we propose an end-to-end efficient spectral-spatial change detection network (ES2Net) to address these issues. Specifically, we devised a learnable band selection module to automatically select bands conducive to CD. It can be jointly optimized with a feature extraction network and capture the complex nonlinear relationships among bands. Moreover, considering the large spatial feature distribution differences among different bands, we design the cluster-wise spatial attention mechanism that assigns a spatial attention factor to each individual band to individually improve the feature discriminativeness for each band. Experiments on three widely used HSI-CD datasets demonstrate the effectiveness and superiority of this method compared with other state-of-the-art methods.	翻訳日:2023-07-25 16:40:56 公開日:2023-07-23
# 分散VQEのための多層HEA間の単一絡み合い接続アーキテクチャ Single Entanglement Connection Architecture between Multi-Layer HEA for Distributed VQE ( http://arxiv.org/abs/2307.12323v1 ) ライセンス: Link先を確認	Shikun Zhang, Zheng Qin, Yang Zhou, Rui Li, Chunxiao Du, Zhisong Xiao	(参考訳) 現在のノイズの多い中間量子(NISQ)デバイス上での大規模量子コンピューティングの実現は、短期的な量子優位を達成する鍵となる。本稿では、VQEにおける多層ハードウェア効率アンサツ(HEA)のための単一絡み合い接続アーキテクチャ(SECA)を提案し、ゲート切断技術と組み合わせて分散VQE(DVQE)を構築し、低オーバーヘッド下でNISQデバイスのサイズを効率的に拡張する。 2次元イジングモデルとハイゼンベルクモデルを用いたシミュレーション実験を行った。数値計算の結果,SEACの表現性,安定性,計算性能は,完全絡み合い接続アーキテクチャ (FECA) と比較して,絡み合い能力の損失が少なかった場合に優れていた。さらに,DVQEがFECAよりも有効性が高いことを示す。最後に, シミュレーション実験に現れる興味深い現象を用いて, 表現可能性, 絡み込み能力, 計算性能の関係について考察する。 Realization of large-scale quantum computing on current noisy intermediate-scale quantum (NISQ) devices is the key to achieving near-term quantum advantage. In this work, we propose the single entanglement connection architecture (SECA) for the multi-layer hardware-efficient ansatz (HEA) in VQE and combine it with the gate cutting technology to construct distributed VQE (DVQE) which can efficiently expand the size of NISQ devices under low overheads. Simulation experiments with the two-dimensional Ising model as well as Heisenberg model are conducted. Our numerical results indicate a superiority of SEAC in expressibility, stability and computational performance at the cost of a little loss in entangling capability compared with the full entanglement connection architecture (FECA). Furthermore, we find evidence that the DVQE also outperforms the FECA in terms of effectiveness. Finally, we discuss the open question about the relationship among expressibility, entangling capability and computational performance with some interesting phenomenon appearing in simulation experiments.	翻訳日:2023-07-25 16:40:38 公開日:2023-07-23
# 古典, 量子, 閉, 開システムの作用 Action for classical, quantum, closed and open systems ( http://arxiv.org/abs/2307.12320v1 ) ライセンス: Link先を確認	Janos Polonyi	(参考訳) 作用汎関数は、古典力学、量子力学、閉力学、開力学を、それぞれ、変分原理の一般化と、古典力学および量子力学における経路積分形式論で定義するのに使うことができる。これらのスキームは異常な特徴、すなわち自由度を正式に再活性化することに基づいている。このような再結合を動機付ける5つの議論が、そのような形式主義が自然であることを示すために提出される。異なる議論の共通の要素は因果時間矢印である。デコヒーレンス、散逸、古典的限界に関するいくつかの教訓も言及されている。 The action functional can be used to define classical, quantum, closed, and open dynamics in a generalization of the variational principle and in the path integral formalism in classical and quantum dynamics, respectively. These schemes are based on an unusual feature, a formal redoubling of the degrees of freedom. Five arguments to motivate such a redoubling are put forward to demonstrate that such a formalism is natural. The common elements of the different arguments is the causal time arrow. Some lessons concerning decoherence, dissipation and the classical limits are mentioned, too.	翻訳日:2023-07-25 16:40:22 公開日:2023-07-23
# 3つの異なるディープラーニングモデルを組み合わせた心膜脂肪数画像の開発 Development of pericardial fat count images using a combination of three different deep-learning models ( http://arxiv.org/abs/2307.12316v1 ) ライセンス: Link先を確認	Takaaki Matsunaga, Atsushi Kono, Hidetoshi Matsuo, Kaoru Kitagawab, Mizuho Nishio, Hiromi Hashimura, Yu Izawa, Takayoshi Toba, Kazuki Ishikawab, Akie Katsuki, Kazuyuki Ohmura, Takamichi Murakami	(参考訳) Rationale and Objectives: 心臓を囲む胸部内臓脂肪である心膜脂肪(PF)は、冠動脈の炎症を誘発することにより、冠動脈疾患の発生を促進する。本研究の目的は,胸部X線写真(CXR)から心膜脂肪数画像(PFCI)を専用のディープラーニングモデルを用いて生成することであった。資料と方法:冠動脈ctを施行した269例について検討した。金属インプラント,胸水,胸腔内手術歴,悪性腫瘍は除外された。対象は191例であった。 PFCIは3次元CT像の投影から生成され, 脂肪蓄積は高ピクセル値で表現された。 CXRからPFCIを生成するために,CycleGANを含む3つの異なるディープラーニングモデルを組み合わせた。提案手法との比較のために,CXRからPFCIを生成するために,CycleGANをベースとした単一モデルを用いた。生成されたPFCIの画像品質、構造類似度指標(SSIM)、平均二乗誤差(MSE)、平均絶対誤差(MAE)を評価する。 i)提案手法を用いて生成されたPFCI及び (II) 単一モデルを用いて生成されたPFCIを比較した。結果: 平均SSIM, MSE, MAEはそれぞれ0.856, 0.0128, 0.0357, それぞれ0.762, 0.0198, 0.0504であった。結論: 提案モデルを用いてCXRから生成されたPFCIは, 単一モデルよりも優れた性能を示した。提案手法ではCTのないPFCI評価が可能である。 Rationale and Objectives: Pericardial fat (PF), the thoracic visceral fat surrounding the heart, promotes the development of coronary artery disease by inducing inflammation of the coronary arteries. For evaluating PF, this study aimed to generate pericardial fat count images (PFCIs) from chest radiographs (CXRs) using a dedicated deep-learning model. Materials and Methods: The data of 269 consecutive patients who underwent coronary computed tomography (CT) were reviewed. Patients with metal implants, pleural effusion, history of thoracic surgery, or that of malignancy were excluded. Thus, the data of 191 patients were used. PFCIs were generated from the projection of three-dimensional CT images, where fat accumulation was represented by a high pixel value. Three different deep-learning models, including CycleGAN, were combined in the proposed method to generate PFCIs from CXRs. A single CycleGAN-based model was used to generate PFCIs from CXRs for comparison with the proposed method. To evaluate the image quality of the generated PFCIs, structural similarity index measure (SSIM), mean squared error (MSE), and mean absolute error (MAE) of (i) the PFCI generated using the proposed method and (ii) the PFCI generated using the single model were compared. Results: The mean SSIM, MSE, and MAE were as follows: 0.856, 0.0128, and 0.0357, respectively, for the proposed model; and 0.762, 0.0198, and 0.0504, respectively, for the single CycleGAN-based model. Conclusion: PFCIs generated from CXRs with the proposed model showed better performance than those with the single model. PFCI evaluation without CT may be possible with the proposed method.	翻訳日:2023-07-25 16:40:13 公開日:2023-07-23
# 不確実性認識ネットワークによるリモートセンシング画像からのビルディング抽出 Building Extraction from Remote Sensing Images via an Uncertainty-Aware Network ( http://arxiv.org/abs/2307.12309v1 ) ライセンス: Link先を確認	Wei He, Jiepan Li, Weinan Cao, Liangpei Zhang, Hongyan Zhang	(参考訳) ビルディング抽出はリモートセンシング画像から画素を分割することを目的としており、都市計画や都市動態モニタリングといった多くの用途において重要な役割を担っている。近年,エンコーダ・デコーダアーキテクチャを用いたディープラーニング手法は,その強力な特徴表現能力により,優れた性能を発揮している。しかし、建物の規模や様式が様々であるため、従来のディープラーニングモデルは常に不確実な予測に悩まされており、建物の完全な足跡と地上の複雑な分布を正確に区別できないため、多くの欠落や委任が生じる。本稿では,不確実な予測の重要性を認識し,この問題を緩和するための新規かつ簡単な不確実性認識ネットワーク(UANet)を提案する。提案したUANetの性能を検証するため、WHUビルディングデータセット、マサチューセッツビルディングデータセット、Inria空中画像データセットを含む3つのパブリックビルディングデータセットに対して広範な実験を行った。その結果、提案したUANetは、他の最先端アルゴリズムよりも大きなマージンで優れていることが示された。 Building extraction aims to segment building pixels from remote sensing images and plays an essential role in many applications, such as city planning and urban dynamic monitoring. Over the past few years, deep learning methods with encoder-decoder architectures have achieved remarkable performance due to their powerful feature representation capability. Nevertheless, due to the varying scales and styles of buildings, conventional deep learning models always suffer from uncertain predictions and cannot accurately distinguish the complete footprints of the building from the complex distribution of ground objects, leading to a large degree of omission and commission. In this paper, we realize the importance of uncertain prediction and propose a novel and straightforward Uncertainty-Aware Network (UANet) to alleviate this problem. To verify the performance of our proposed UANet, we conduct extensive experiments on three public building datasets, including the WHU building dataset, the Massachusetts building dataset, and the Inria aerial image dataset. Results demonstrate that the proposed UANet outperforms other state-of-the-art algorithms by a large margin.	翻訳日:2023-07-25 16:39:38 公開日:2023-07-23
# 大規模言語モデルにおける文脈学習はラベル関係を学習するが、従来の学習ではない In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning ( http://arxiv.org/abs/2307.12375v1 ) ライセンス: Link先を確認	Jannik Kossen, Tom Rainforth, Yarin Gal	(参考訳) 下流タスクにおけるLarge Language Models (LLM) の性能は、文脈における入力-ラベル関係の例を含むと、しばしば著しく改善される。例えば、Xie et al. (2021)は、ICLを汎用学習アルゴリズムに例えたが、Min et al. (2022b)は、ICLはインコンテキストの例からラベル関係を学ばないと主張している。本稿では,(1)テキスト内サンプルのラベルが予測にどのように影響するか,(2)事前学習中に学習したラベル関係がテキスト内サンプルとどのように相互作用するか,(3)ICLがテキスト内サンプル間でラベル情報を集約する方法について検討する。この結果から, LLM はテキスト内ラベルからの情報を通常含んでいるが, 事前学習とテキスト内ラベルの関係は異なる扱いがなされており, モデルがすべてのテキスト内情報を等しく考慮していないことが示唆された。私たちの結果は、llmの動作の理解と調整に関する洞察を与えます。 The performance of Large Language Models (LLMs) on downstream tasks often improves significantly when including examples of the input-label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works: for example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022b) argue ICL does not even learn label relationships from in-context examples. In this paper, we study (1) how labels of in-context examples affect predictions, (2) how label relationships learned during pre-training interact with input-label examples provided in-context, and (3) how ICL aggregates label information across in-context examples. Our findings suggests LLMs usually incorporate information from in-context labels, but that pre-training and in-context label relationships are treated differently, and that the model does not consider all in-context information equally. Our results give insights into understanding and aligning LLM behavior.	翻訳日:2023-07-25 16:32:39 公開日:2023-07-23
# 対話要約における感情ニュアンスの評価 Evaluating Emotional Nuances in Dialogue Summarization ( http://arxiv.org/abs/2307.12371v1 ) ライセンス: Link先を確認	Yongxin Zhou, Fabien Ringeval, Fran\c{c}ois Portet	(参考訳) 自動対話要約は、人間の会話から最も重要なコンテンツを識別し、短いテキスト要約を作成することを目的とした、十分に確立されたタスクである。この分野の最近の進歩にもかかわらず、研究の大部分は事実情報の要約に重点を置いており、人間のインタラクションの分析、監視、支援に有用な情報を伝達できる情緒的な内容は別として残されている。本稿では,対話要約にどれだけの感情が保存されているかを定量化するために,$PEmo$の一連の尺度を提案し,評価する。その結果, 要約モデルでは, 要約中の感情的内容がよく保存されないことがわかった。また,学習セットを感情対話のみに還元することで,生成した要約文に感情内容が保存され,最も有意義な事実情報を保存できることを示した。 Automatic dialogue summarization is a well-established task that aims to identify the most important content from human conversations to create a short textual summary. Despite recent progress in the field, we show that most of the research has focused on summarizing the factual information, leaving aside the affective content, which can yet convey useful information to analyse, monitor, or support human interactions. In this paper, we propose and evaluate a set of measures $PEmo$, to quantify how much emotion is preserved in dialog summaries. Results show that, summarization models of the state-of-the-art do not preserve well the emotional content in the summaries. We also show that by reducing the training set to only emotional dialogues, the emotional content is better preserved in the generated summaries, while conserving the most salient factual information.	翻訳日:2023-07-25 16:32:18 公開日:2023-07-23
# 米軍退役軍人の縦断的電子健康記録からの症状を利用したアルツハイマー病の早期予知 Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans ( http://arxiv.org/abs/2307.12369v1 ) ライセンス: Link先を確認	Rumeng Li, Xun Wang, Dan Berlowitz, Brian Silver, Wen Hu, Heather Keating, Raelene Goodwin, Weisong Liu, Honghuang Lin, Hong Yu	(参考訳) アルツハイマー病(AD)の早期予測は、時間的介入と治療に不可欠である。本研究は,ad患者の縦断的電子健康記録(ehrs)を分析し,早期に発症を予測できる徴候や症状を識別するために,機械学習を用いて行う。 2004年から2021年まで、米国退役軍人健康管理局(VHA)の縦型EHRを用いたケースコントロール設計を行った。 ICD-10-CMコードに基づいて1/1/2016後にADと診断されたVHA患者は、年齢、性別、臨床利用の順に1:9と一致した。我々は,AD関連キーワードのパネルと,患者の縦 EHR における時間的変化を,4つの機械学習モデルを用いたAD予測の予測因子として使用した。年齢・性別・人種・民族によるサブグループ分析を行い, ホールドアウトおよび「見えない」VHA局群でモデルを検証した。モデル判別,キャリブレーション,その他の関連する指標は,ICDによる診断の最大10年前に報告された。調査対象者は16,701例,39,097例であった。診断が近づいた症例では、広告関連キーワード(例えば「集中」や「話し」)の平均数が10から40以上と急速に増加し、一方、コントロールについては10以上にとどまった。最良のモデルは、ICDベースの診断より10年以上前のデータを用いた予測において高い判別精度(ROCAUC 0.997)を達成した。このモデルは、65歳未満の患者(rocauc 0.746)を除いて、年齢、性別、人種/民族のサブグループ間で一貫性がある(hosmer-lemeshow goodness-of-fit p-value = 0.99)。 EHRノートから同定されたAD関連キーワードを用いた機械学習モデルは、将来のAD診断を予測することができる。 Early prediction of Alzheimer's disease (AD) is crucial for timely intervention and treatment. This study aims to use machine learning approaches to analyze longitudinal electronic health records (EHRs) of patients with AD and identify signs and symptoms that can predict AD onset earlier. We used a case-control design with longitudinal EHRs from the U.S. Department of Veterans Affairs Veterans Health Administration (VHA) from 2004 to 2021. Cases were VHA patients with AD diagnosed after 1/1/2016 based on ICD-10-CM codes, matched 1:9 with controls by age, sex and clinical utilization with replacement. We used a panel of AD-related keywords and their occurrences over time in a patient's longitudinal EHRs as predictors for AD prediction with four machine learning models. We performed subgroup analyses by age, sex, and race/ethnicity, and validated the model in a hold-out and "unseen" VHA stations group. Model discrimination, calibration, and other relevant metrics were reported for predictions up to ten years before ICD-based diagnosis. The study population included 16,701 cases and 39,097 matched controls. The average number of AD-related keywords (e.g., "concentration", "speaking") per year increased rapidly for cases as diagnosis approached, from around 10 to over 40, while remaining flat at 10 for controls. The best model achieved high discriminative accuracy (ROCAUC 0.997) for predictions using data from at least ten years before ICD-based diagnoses. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.99) and consistent across subgroups of age, sex and race/ethnicity, except for patients younger than 65 (ROCAUC 0.746). Machine learning models using AD-related keywords identified from EHR notes can predict future AD diagnoses, suggesting its potential use for identifying AD risk using EHR notes, offering an affordable way for early screening on large population.	翻訳日:2023-07-25 16:32:04 公開日:2023-07-23
# ゆらぎ定理と期待効用仮説 Fluctuation theorems and expected utility hypothesis ( http://arxiv.org/abs/2307.12358v1 ) ライセンス: Link先を確認	Gianluca Francica, Luca Dell'Anna	(参考訳) 期待された効用仮説は経済学において一般的な概念であり、支払いが不確実な場合に決定を下すのに役立つ。本稿では,予測効用理論における揺らぎ定理の影響について考察する。特に、エントロピーがギャンブルのガイドラインになるかどうか疑問である。我々は、生成するエントロピーに依存する確実性同値を含む境界の存在を証明する。次に,非平衡初期状態からの作業抽出など,特定の状況に着目し,エントロピーに等価な確実性の依存性について検討する。 The expected utility hypothesis is a popular concept in economics that is useful for making decisions when the payoff is uncertain. In this paper, we investigate the implications of a fluctuation theorem in the theory of expected utility. In particular, we wonder whether entropy could serve as a guideline for gambling. We prove the existence of a bound involving the certainty equivalent which depends on the entropy produced. Then, we examine the dependence of the certainty equivalent on the entropy by looking at specific situations, for instance, the work extraction from a non-equilibrium initial state.	翻訳日:2023-07-25 16:31:34 公開日:2023-07-23
# ComPtr: 単純かつ汎用的なコンバータによる双方向Dense予測タスクの実現 ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer ( http://arxiv.org/abs/2307.12349v1 ) ライセンス: Link先を確認	Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu	(参考訳) ディープラーニング(DL)は、密集予測の分野を前進させ、異なるタスク間の固有の障壁を徐々に解消した。しかし、既存の作品の多くはアーキテクチャの設計と、dlパラダイムによってもたらされる潜在的な均一性を無視した特定のタスクのための視覚的な手がかりの構築に焦点を当てている。本稿では,多種多様なbi-source高密度予測タスクのための新規な \underline{ComP}lementary \underline{tr}ansformer, \textbf{ComPtr} の構築を試みる。具体的には、単一のタスクやタスクのサブセットで過剰に特殊化する既存の方法とは異なり、ComPtrはより一般的な二ソース密集予測の概念から始まる。情報相補性に対する基本的依存に基づいて,ComPtrが様々なタスクのために,様々な画像ソースから重要な視覚的意味的手がかりを抽出・収集する,一貫性の強化と差分認識コンポーネントを提案する。 ComPtrは異なる入力を等しく扱い、変換器上にシーケンス・ツー・シーケンスの形で効率的な密な相互作用モデルを構築する。このタスクジェネリック設計は、様々な双方向情報を同時に処理できる統一モデルを構築するためのスムーズな基盤を提供する。リモートセンシングによる変化検出,RGB-T集団カウント,RGB-D/Tサルエントオブジェクト検出,RGB-Dセマンティックセマンティックセマンティックセグメンテーションなど,複数の代表的な視覚課題に対する広範な実験において,提案手法は一貫して良好な性能を得る。コードは \url{https://github.com/lartpang/ComPtr} で入手できる。 Deep learning (DL) has advanced the field of dense prediction, while gradually dissolving the inherent barriers between different tasks. However, most existing works focus on designing architectures and constructing visual cues only for the specific task, which ignores the potential uniformity introduced by the DL paradigm. In this paper, we attempt to construct a novel \underline{ComP}lementary \underline{tr}ansformer, \textbf{ComPtr}, for diverse bi-source dense prediction tasks. Specifically, unlike existing methods that over-specialize in a single task or a subset of tasks, ComPtr starts from the more general concept of bi-source dense prediction. Based on the basic dependence on information complementarity, we propose consistency enhancement and difference awareness components with which ComPtr can evacuate and collect important visual semantic cues from different image sources for diverse tasks, respectively. ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer. This task-generic design provides a smooth foundation for constructing the unified model that can simultaneously deal with various bi-source information. In extensive experiments across several representative vision tasks, i.e. remote sensing change detection, RGB-T crowd counting, RGB-D/T salient object detection, and RGB-D semantic segmentation, the proposed method consistently obtains favorable performance. The code will be available at \url{https://github.com/lartpang/ComPtr}.	翻訳日:2023-07-25 16:31:26 公開日:2023-07-23
# ResShift: 残差シフトによる画像超解像の効率的な拡散モデル ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting ( http://arxiv.org/abs/2307.12348v1 ) ライセンス: Link先を確認	Zongsheng Yue, Jianyi Wang, Chen Change Loy	(参考訳) 拡散に基づく画像超解像法(SR)は主に、数百から数千のサンプリングステップの要求により、低い推論速度によって制限される。既存の加速サンプリング技術は必然的に性能を犠牲にし、過度なSR結果をもたらす。そこで本稿では,srの新しい効率的な拡散モデルを提案する。拡散ステップ数を大幅に削減し,推論時の高速化の必要性をなくし,それに伴う性能劣化を解消する。本手法では,高分解能画像と低分解能画像との間で残差を移動させ,遷移効率を大幅に向上させるマルコフ連鎖を構築する。また、拡散過程におけるシフト速度と騒音強度を柔軟に制御する精巧なノイズスケジュールを開発する。実験の結果,提案手法は,15段階のサンプリングでも,合成と実世界の両方のデータセットにおいて,現在の最先端手法よりも優れた,あるいは少なくとも同等の性能が得られることが示された。私たちのコードとモデルはhttps://github.com/zsyoaoa/resshiftで利用可能です。 Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps. Our code and model are available at https://github.com/zsyOAOA/ResShift.	翻訳日:2023-07-25 16:30:53 公開日:2023-07-23
# 正しい理由:解釈可能なML技術は偽相関を検出できるか? Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations? ( http://arxiv.org/abs/2307.12344v1 ) ライセンス: Link先を確認	Susu Sun, Lisa M. Koch, Christian F. Baumgartner	(参考訳) ディープニューラルネットワークモデルは、未整合の分類性能を提供するが、データ内の急激な相関を学習する傾向がある。テストデータがトレーニングデータと同じ分布から来ている場合、その情報に対するそのような依存をパフォーマンスメトリクスを使って検出することは困難である。ポストホックな説明や本質的に解釈可能な分類器のような解釈可能なMLメソッドは、欠陥モデル推論を特定することを約束する。しかし、これらの技法が実際にできるかどうかについては諸説ある。本稿では,説明手法のスプリアス相関を正しく識別する能力を評価するための厳密な評価手法を提案する。この戦略を用いて,胸部x線診断タスクにおいて3種類の人工的な共同創設者を検出できるため,ホック後の5つの説明手法と本質的に解釈可能な1つの手法を評価した。ポストホックな手法であるSHAPと本質的に解釈可能なAttri-Netは、最高の性能を提供し、欠陥モデルの振る舞いを確実に識別するために使用できる。 While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis task. We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance and can be used to reliably identify faulty model behavior.	翻訳日:2023-07-25 16:30:35 公開日:2023-07-23
# 音声に基づく感情認識のための自己教師あり学習 Self-Supervised Learning for Audio-Based Emotion Recognition ( http://arxiv.org/abs/2307.12343v1 ) ライセンス: Link先を確認	Peranut Nimitsurachat and Peter Washington	(参考訳) 音声入力データを用いた感情認識モデルは、メンタルヘルス、マーケティング、ゲーム、ソーシャルメディア分析のアプリケーションを含む対話型システムの開発を可能にする。オーディオデータを用いた情緒的コンピューティングの分野は豊富だが、一貫した高性能モデルを達成するための大きな障壁は、利用可能なトレーニングラベルのpaucityである。自己教師付き学習 (SSL) は、データ自体の特性を予測することによって、教師付きラベルの不足にもかかわらず学習できる手法のファミリーである。音声に基づく感情認識における自己教師あり学習の有用性を理解するため,cmu-moseiの音響モダリティから感情の分類に自己教師あり学習前学習を適用した。生の音響データを実験した先行論文とは異なり,本手法は符号化音響データに適用されている。我々のモデルはまず、音響データのランダムにマスクされたタイムスタンプを明らかにするために事前学習される。事前学習されたモデルは、注釈付きデータの小さなサンプルを使って微調整される。最終モデルの性能は、同じバックボーンアーキテクチャを持つベースラインディープラーニングモデルに対して、いくつかの評価指標によって評価される。自己教師型学習は、すべてのメトリクスにわたるモデルの性能を一貫して改善する。本研究は,感情コンピューティングのための自己教師付き学習の有用性を示し,学習例の数が小さい場合,自己教師付き学習が最も有用であること,幸福,悲しみ,怒りなどの分類が容易な感情に対して最も顕著であることを示す。この研究は、生の入力空間で事前学習する従来のアプローチではなく、組み込み特徴表現に適用すると、自己教師付き学習が機能することを示す。 Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data. Our model is first pretrained to uncover the randomly-masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via several evaluation metrics against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics. This work shows the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small, and that the effect is most pronounced for emotions which are easier to classify such as happy, sad and anger. This work further demonstrates that self-supervised learning works when applied to embedded feature representations rather than the traditional approach of pre-training on the raw input space.	翻訳日:2023-07-25 16:30:18 公開日:2023-07-23
# ジェネリックと制御可能なオブジェクト検出攻撃に向けて Towards Generic and Controllable Attacks Against Object Detection ( http://arxiv.org/abs/2307.12342v1 ) ライセンス: Link先を確認	Guopeng Li, Yue Xu, Jian Ding, Gui-Song Xia	(参考訳) 既存のObject Detector(OD)に対する攻撃には、2つの固有の制限がある。まず、odsは複雑なメタ構造設計を持っているため、odsの最も先進的な攻撃は特定の検出器-インタリン構造への攻撃に集中しているため、他の検出器への取り組みが困難であり、odsに対する汎用的な攻撃を設計する動機付けになっている。第二に、ODに対するほとんどの研究は、分類から検出までのイメージレベルの攻撃を一般化し、意味論的に意味のない領域(背景など)で冗長な計算と摂動をもたらし、ODに対する制御可能な攻撃を求める緊急性をもたらす。この目的のために,制御可能な摂動を持つ主流物体検出器を目立たせるための汎用的なホワイトボックス攻撃であるlgp(local perturbation with adaptively global attack)を提案する。検出器に依存しない攻撃の場合、lgpは高品質の提案を追跡し、3つの不均一な損失を同時に最適化する。このようにして、特定の構造の制限なしに、出力の一部でODの重要なコンポーネントを騙すことができる。制御性に関しては,前景と後景の分離を適応的に活用し,前景への摂動の付着を誘導するオブジェクト指向制約を確立する。実験的に提案されたLGPは、MS-COCOおよびDOTAデータセット上の16の最先端物体検出器を攻撃し、有望な不可避性と伝達性を得た。コードはhttps://github.com/liguopeng0923/LGP.gitで公開されている。 Existing adversarial attacks against Object Detectors (ODs) suffer from two inherent limitations. Firstly, ODs have complicated meta-structure designs, hence most advanced attacks for ODs concentrate on attacking specific detector-intrinsic structures, which makes it hard for them to work on other detectors and motivates us to design a generic attack against ODs. Secondly, most works against ODs make Adversarial Examples (AEs) by generalizing image-level attacks from classification to detection, which brings redundant computations and perturbations in semantically meaningless areas (e.g., backgrounds) and leads to an emergency for seeking controllable attacks for ODs. To this end, we propose a generic white-box attack, LGP (local perturbations with adaptively global attacks), to blind mainstream object detectors with controllable perturbations. For a detector-agnostic attack, LGP tracks high-quality proposals and optimizes three heterogeneous losses simultaneously. In this way, we can fool the crucial components of ODs with a part of their outputs without the limitations of specific structures. Regarding controllability, we establish an object-wise constraint that exploits foreground-background separation adaptively to induce the attachment of perturbations to foregrounds. Experimentally, the proposed LGP successfully attacked sixteen state-of-the-art object detectors on MS-COCO and DOTA datasets, with promising imperceptibility and transferability obtained. Codes are publicly released in https://github.com/liguopeng0923/LGP.git	翻訳日:2023-07-25 16:29:53 公開日:2023-07-23
# NIR分光法による土壌炭酸塩の迅速検出、深層学習法および粉末X線回折による相定量 Rapid detection of soil carbonates by means of NIR spectroscopy, deep learning methods and phase quantification by powder Xray diffraction ( http://arxiv.org/abs/2307.12341v1 ) ライセンス: Link先を確認	Lykourgos Chiniadis, Petros Tamvakis	(参考訳) 土壌のnirスペクトル吸収・反射性ライブラリーは農業生産の改善と農業的バランスと環境持続可能性の重要な前提条件である土壌特性の分析に活用されている。特に炭酸塩は土壌の性質を表しており、気候変動による温暖な環境の変化によっても影響を受ける。本研究では,FT NIR反射分光法と深層学習法を用いて土壌中の炭酸塩濃度を迅速かつ効率的に予測する方法を提案する。我々は、次のような複数の機械学習手法を利用した。 1)MLPレシーバ及び 2) cnnをplsr、cubist、svmといった従来のmlアルゴリズムと比較すると、kssl(usda)、全国的に収集された土壌サンプル反射率スペクトルのデータセット、およびeu全域の土壌サンプル吸収スペクトルを含むlucas topsoil(ヨーロッパ土壌ライブラリ)の2つのnirスペクトルライブラリの複合データセット上で、それらのパフォーマンスが比較される。 KSSLおよびTopSoilスペクトルライブラリの土壌試料はvisNIRのスペクトル領域で得られたが,本研究ではNIRスペクトル領域のみが利用された。 X線回折による炭酸塩の定量は, 体積法, MLP予測とよく一致している。本研究は, 土壌試料中の炭酸塩濃度の迅速予測に寄与する。 1)ボリュームメソッドは使用できません。 2)NIRスペクトル吸収データのみが利用可能である。これまで、私たちの知る限りでは、このような広範囲なデータセットでトレーニングされた予測モデルが、目に見えないデータに対して有望な結果をもたらし、深層学習モデルが土壌の炭酸化に優れた予測ツールを提供するという概念を確実に支持する研究は、他にない。 Soil NIR spectral absorbance/reflectance libraries are utilized towards improving agricultural production and analysis of soil properties which are key prerequisite for agroecological balance and environmental sustainability. Carbonates in particular, represent a soil property which is mostly affected even by mild, let alone extreme, changes of environmental conditions during climate change. In this study we propose a rapid and efficient way to predict carbonates content in soil by means of FT NIR reflectance spectroscopy and by use of deep learning methods. We exploited multiple machine learning methods, such as: 1) a MLP Regressor and 2) a CNN and compare their performance with other traditional ML algorithms such as PLSR, Cubist and SVM on the combined dataset of two NIR spectral libraries: KSSL (USDA), a dataset of soil samples reflectance spectra collected nationwide, and LUCAS TopSoil (European Soil Library) which contains soil sample absorbance spectra from all over the European Union, and use them to predict carbonate content on never before seen soil samples. Soil samples in KSSL and in TopSoil spectral libraries were acquired in the spectral region of visNIR, however in this study, only the NIR spectral region was utilized. Quantification of carbonates by means of Xray Diffraction is in good agreement with the volumetric method and the MLP prediction. Our work contributes to rapid carbonates content prediction in soil samples in cases where: 1) no volumetric method is available and 2) only NIR spectra absorbance data are available. Up till now and to the best of our knowledge, there exists no other study, that presents a prediction model trained on such an extensive dataset with such promising results on unseen data, undoubtedly supporting the notion that deep learning models present excellent prediction tools for soil carbonates content.	翻訳日:2023-07-25 16:29:26 公開日:2023-07-23
# 商業用5Gスタンドアローン(SA)アップリンクスループット予測 Practical Commercial 5G Standalone (SA) Uplink Throughput Prediction ( http://arxiv.org/abs/2307.12417v1 ) ライセンス: Link先を確認	Kasidis Arunruangsirilert, Jiro Katto	(参考訳) 5Gニューラジオ(NR)ネットワークはアップリンクスループットの大幅なアップリフトを約束するが、ユーザ機器(UE)が高周波ミリ波(mmWave)帯域に接続されている場合にのみ改善が見られる。 UHD 4K/8Kビデオのリアルタイム伝送やバーチャルリアリティ(VR)/拡張現実(AR)コンテンツなどのアップリンク集約型スマートフォンアプリケーションの増加に伴い、アップリンクスループット予測はユーザー体験の質(QoE)を最大化する上で大きな役割を果たす。本稿では,過去のアップリンクスループットとRFパラメータに基づく将来のアップリンクスループットを予測するために,ConvLSTMベースのニューラルネットワークを提案する。このネットワークは、様々な周波数帯、ハンドオーバ、盲点を考慮した通勤列車に乗りながら、商用の5G SAネットワーク上の実世界のドライブテストのデータを用いて訓練されている。モデルの実装を確実にするために,Android API経由で利用可能な情報のみを使用するようにモデルを制限し,通勤電車や他の交通手段からのデータを用いてモデルを評価する。その結果,我々のモデルの平均予測精度は98.9\%に達し,平均RMSEは1.80Mbpsであることがわかった。 While the 5G New Radio (NR) network promises a huge uplift of the uplink throughput, the improvement can only be seen when the User Equipment (UE) is connected to the high-frequency millimeter wave (mmWave) band. With the rise of uplink-intensive smartphone applications such as the real-time transmission of UHD 4K/8K videos, and Virtual Reality (VR)/Augmented Reality (AR) contents, uplink throughput prediction plays a huge role in maximizing the users' quality of experience (QoE). In this paper, we propose using a ConvLSTM-based neural network to predict the future uplink throughput based on past uplink throughput and RF parameters. The network is trained using the data from real-world drive tests on commercial 5G SA networks while riding commuter trains, which accounted for various frequency bands, handover, and blind spots. To make sure our model can be practically implemented, we then limited our model to only use the information available via Android API, then evaluate our model using the data from both commuter trains and other methods of transportation. The results show that our model reaches an average prediction accuracy of 98.9\% with an average RMSE of 1.80 Mbps across all unseen evaluation scenarios.	翻訳日:2023-07-25 16:22:53 公開日:2023-07-23
# 2段階適応ロバスト最適化のための機械学習アプローチ A Machine Learning Approach to Two-Stage Adaptive Robust Optimization ( http://arxiv.org/abs/2307.12409v1 ) ライセンス: Link先を確認	Dimitris Bertsimas, Cheol Woo Kim	(参考訳) 本稿では,2段線形適応ロバスト最適化(ARO)問題と2段連立変数と多面的不確実性集合を機械学習で解く手法を提案する。最適な現在決定、最適な現在決定に関連する最悪のシナリオ、そして我々が戦略と呼ぶものに最適な待ち時間決定をエンコードします。カラムと制約生成アルゴリズムを用いて,複数の類似AROインスタンスを事前に解決し,最適戦略を抽出し,トレーニングセットを生成する。私たちは、現在決定のための高品質な戦略、最適な現在決定に関連する最悪のシナリオ、そして待ち行列決定を予測する機械学習モデルをトレーニングします。また、機械学習アルゴリズムをトレーニングするために必要な異なるターゲットクラス数を削減できるアルゴリズムも導入する。提案手法を施設立地,多項目在庫管理,ユニットコミットメント問題に適用する。提案手法は,最先端のアルゴリズムよりも高精度でARO問題を解く。 We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multiple similar ARO instances in advance using the column and constraint generation algorithm and extract the optimal strategies to generate a training set. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions. We also introduce an algorithm to reduce the number of different target classes the machine learning algorithm needs to be trained on. We apply the proposed approach to the facility location, the multi-item inventory control and the unit commitment problems. Our approach solves ARO problems drastically faster than the state-of-the-art algorithms with high accuracy.	翻訳日:2023-07-25 16:22:32 公開日:2023-07-23
# マルチクラス流体キューネットワークの最適制御:機械学習によるアプローチ Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach ( http://arxiv.org/abs/2307.12405v1 ) ライセンス: Link先を確認	Dimitris Bertsimas, Cheol Woo Kim	(参考訳) 本稿では,明示的かつ洞察に富んだ制御ポリシーを提供するマルチクラス流体待ち行列ネットワーク(mfqnets)の最適制御のための機械学習手法を提案する。しきい値曲線が原点を通る超平面であるMFQNET制御問題に対して、しきい値型最適ポリシーが存在することを示す。超平面分割(oct-h)を持つ最適分類木を用いてmfqnetの最適制御方針を学習する。我々は,mfqnet制御問題の数値解をトレーニングセットとして使用し,oct-hを用いて明示的な制御方針を学習する。最大33台のサーバと99のクラスで実験結果を報告し、学習したポリシーがテストセット上で100\%の精度を達成することを実証した。 OCT-Hのオフライントレーニングは大規模なネットワークで数日かかるが、オンラインアプリケーションはミリ秒かかる。 We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an optimal control policy for MFQNETs. We use numerical solutions of MFQNET control problems as a training set and apply OCT-H to learn explicit control policies. We report experimental results with up to 33 servers and 99 classes that demonstrate that the learned policies achieve 100\% accuracy on the test set. While the offline training of OCT-H can take days in large networks, the online application takes milliseconds.	翻訳日:2023-07-25 16:22:14 公開日:2023-07-23
# TransNet:カテゴリーレベルポーズ推定による透明オブジェクト操作 TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation ( http://arxiv.org/abs/2307.12400v1 ) ライセンス: Link先を確認	Huijie Zhang, Anthony Opipari, Xiaotong Chen, Jiyue Zhu, Zeren Yu, Odest Chadwicke Jenkins	(参考訳) 透明物体は視覚知覚システムに複数の異なる課題を示す。まず、視覚的な特徴を区別できないため、透明なオブジェクトは不透明なオブジェクトよりも検出やローカライズが難しくなる。人間でさえ、ガラスのドアのように鏡面の反射や屈折がほとんどなく、知覚が難しい透明な表面を見つける。第二の課題は、通常不透明物体の知覚に使用される深度センサーは、そのユニークな反射特性のために透明表面の正確な深度測定を得ることができないことである。これらの課題から、コップのような同じカテゴリ内の透明なオブジェクトインスタンスが、同じカテゴリの通常の不透明なオブジェクトよりも互いに似通っていることを観察した。本稿では,この観察から,インスタンスレベルのポーズ推定ではなく,カテゴリレベルの透明なオブジェクトポーズ推定の可能性について検討する。本研究では,局所的深さ補完と表面正規推定を用いてカテゴリレベルの透明物体ポーズを推定する2段階パイプラインである \textit{\textbf{transnet}} を提案する。 TransNetは、大規模透明オブジェクトデータセット上でのポーズ推定精度を評価し、最先端のカテゴリレベルのポーズ推定手法と比較する。この比較の結果,transnetは透明物体のポーズ推定精度の向上を実現した。さらに,TransNetを用いて,ロボットピック・アンド・プレイスと注ぐ作業のための自律的透明物体操作システムを構築する。 Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, like glass doors, difficult to perceive. A second challenge is that depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent surfaces due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category, such as cups, look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper explores the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose \textit{\textbf{TransNet}}, a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects. Moreover, we use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks.	翻訳日:2023-07-25 16:22:01 公開日:2023-07-23
# 依存的革新を伴う高次元線形過程に対する濃度 Concentration for high-dimensional linear processes with dependent innovations ( http://arxiv.org/abs/2307.12395v1 ) ライセンス: Link先を確認	Eduardo Fonseca Mendes, Fellipe Lopes	(参考訳) Weibull 尾を持つ混合配列上のベクトル線型過程の $l_\infty$ ノルムに対する濃度不等式を開発する。これらの不等式はベベリッジ・ネルソン分解を利用しており、ベクター・ミキシングールのsup-ノルムあるいはその重み付け和の濃度に問題を還元する。この不等式は、線形過程の lag-$h$ 自己共分散行列の最大エントリーワイドノルムに対して有界な濃度を得るために用いられる。これらの結果は、$l_1$正規化を用いた高次元ベクトル自己回帰過程の推定境界、時系列用高次元ガウスブートストラップ、長期共分散行列推定に有用である。 We develop concentration inequalities for the $l_\infty$ norm of a vector linear processes on mixingale sequences with sub-Weibull tails. These inequalities make use of the Beveridge-Nelson decomposition, which reduces the problem to concentration for sup-norm of a vector-mixingale or its weighted sum. This inequality is used to obtain a concentration bound for the maximum entrywise norm of the lag-$h$ autocovariance matrices of linear processes. These results are useful for estimation bounds for high-dimensional vector-autoregressive processes estimated using $l_1$ regularisation, high-dimensional Gaussian bootstrap for time series, and long-run covariance matrix estimation.	翻訳日:2023-07-25 16:21:37 公開日:2023-07-23
# Masked Reference based Centerpoint Supervision を用いた反復的ロバスト視覚接地 Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision ( http://arxiv.org/abs/2307.12392v1 ) ライセンス: Link先を確認	Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao	(参考訳) 視覚グラウンディング(vg)は、与えられた表現に基づく画像から対象オブジェクトをローカライズすることを目的としており、検出および視覚トランスフォーマの開発において大きな進歩を遂げている。しかしながら、既存のVG法は、不正確な記述や無関係な記述が提示されたときに偽アラームオブジェクトを生成する傾向がある。さらに、既存の手法では、画像全体とテキスト記述から、きめ細かい特徴、正確な局所化、および十分なコンテキスト理解を捉えることができない。両問題に対処するため,Masked Reference Based Centerpoint Supervision (MRCS) を用いたIR-VG (Iterative Robust Visual Grounding) フレームワークを提案する。このフレームワークは、アライメントを改善するために反復的多段階視覚言語融合(IMVF)を導入している。 MRCSを用いて,より正確な位置推定を行う。次に,VGのロバスト性を改善するために,不正確な表現を提示した場合の偽アラーム生成を防止するために,多段階の偽アラームセンシティブデコーダ(MFSD)を提案する。提案フレームワークは5つの正規VGデータセットと2つの新たに構築された堅牢VGデータセットで評価される。広汎な実験により、IR-VGは、新たに提案された2つの堅牢なVGデータセットに対する既存のSOTAアプローチと比較して、25\%と10\%の改善により、新しい最先端(SOTA)結果を達成することが示された。さらに,提案フレームワークが5つの正規vgデータセット上で有効であることも確認した。コードとモデルはhttps://github.com/cv516Buaa/IR-VG.comで公開される。 Visual Grounding (VG) aims at localizing target objects from an image based on given expressions and has made significant progress with the development of detection and vision transformer. However, existing VG methods tend to generate false-alarm objects when presented with inaccurate or irrelevant descriptions, which commonly occur in practical applications. Moreover, existing methods fail to capture fine-grained features, accurate localization, and sufficient context comprehension from the whole image and textual descriptions. To address both issues, we propose an Iterative Robust Visual Grounding (IR-VG) framework with Masked Reference based Centerpoint Supervision (MRCS). The framework introduces iterative multi-level vision-language fusion (IMVF) for better alignment. We use MRCS to ahieve more accurate localization with point-wised feature supervision. Then, to improve the robustness of VG, we also present a multi-stage false-alarm sensitive decoder (MFSD) to prevent the generation of false-alarm objects when presented with inaccurate expressions. The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets. Extensive experiments demonstrate that IR-VG achieves new state-of-the-art (SOTA) results, with improvements of 25\% and 10\% compared to existing SOTA approaches on the two newly proposed robust VG datasets. Moreover, the proposed framework is also verified effective on five regular VG datasets. Codes and models will be publicly at https://github.com/cv516Buaa/IR-VG.	翻訳日:2023-07-25 16:21:23 公開日:2023-07-23
# 交通信号制御のためのSim-to-Real転送に向けた不確実な接地行動変換 Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control ( http://arxiv.org/abs/2307.12388v1 ) ライセンス: Link先を確認	Longchao Da, Hao Mei, Romir Sharma and Hua Wei	(参考訳) 交通信号制御(tsc)は、数百万人の日常生活に影響を与える複雑で重要なタスクである。強化学習(rl)は交通信号制御の最適化に有望な結果を示しているが、現在のrlベースのtsc法は主にシミュレーションで訓練され、シミュレーションと実世界のパフォーマンスギャップに苦しむ。本稿では, シミュレーション中の動作を不確実性で動的に変換することで, シミュレーション環境から実世界環境へ学習した学習方針を伝達し, 遷移力学の領域ギャップを緩和する, UGAT と呼ばれるシミュレーションから実世界への移行手法を提案する。本手法をシミュレーションした交通環境において評価し,実環境におけるトランスファーrlポリシーの性能を著しく向上させることを示す。 Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.	翻訳日:2023-07-25 16:20:53 公開日:2023-07-23
# 開量子系における共鳴支配光力学的絡み合い Resonance-dominant optomechanical entanglement in open quantum systems ( http://arxiv.org/abs/2307.12383v1 ) ライセンス: Link先を確認	Cheng Shang and Hongchao Li	(参考訳) 絡み合い保護に動機づけられ,共振効果を用いてコヒーレント状態表現における光力学的絡み合いを高める。熱-機械的モードと周辺熱浴との間の高周波数結合成分を弱結合限界内でフィルタするフィルタモデルを提案する。連続変数の絡み合い保護は、重要なデチューン成分に関連する自由度を取り除き、デコヒーレンスに抵抗することを含む。本研究では, フィルタモデルの非線形ランゲヴィン方程式を構築し, フィルタモデルが熱雑音や機械減衰に対して, 定常的な最大最適エンタングルメントのロバスト性を2倍にすることを示す。さらに、これらの結果を1つの振動するエンドミラーを持つ光学キャビティアレイに一般化し、長距離最適オプティメカニカルエンタングルメント転送について検討する。本研究は, 量子系をデコヒーレンスから保護し, 大規模量子情報処理と量子ネットワーク構築の可能性を高めるために, 共鳴効果を適用した新たな基盤を打破する。 Motivated by entanglement protection, our work utilizes a resonance effect to enhance optomechanical entanglement in the coherent-state representation. We propose a filtering model to filter out the highly frequency-detuned coupling components between a thermal-mechanical mode and its surrounding heat baths within the weak-coupling limit. We reveal that continuous-variable entanglement protection involves the elimination of degrees of freedom associated with significant detuning components, thereby resisting decoherence. We construct a nonlinear Langevin equation of the filtering model and numerically show that the filtering model doubles the robustness of a stationary maximum optomechanical entanglement with respect to thermal noise and mechanical damping. Furthermore, we generalize these results to an optical cavity array with one oscillating end-mirror to investigate the long-distance optimal optomechanical entanglement transfer. Our study breaks new ground for applying the resonance effect to protect the quantum system from decoherence and advancing the possibilities for large-scale quantum information processing and quantum network construction.	翻訳日:2023-07-25 16:20:38 公開日:2023-07-23
# CommonsenseVIS: 自然言語モデルのコモンセンス推論能力の可視化と理解 CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models ( http://arxiv.org/abs/2307.12382v1 ) ライセンス: Link先を確認	Xingbo Wang, Renfei Huang, Zhihua Jin, Tianqing Fang, and Huamin Qu	(参考訳) 近年、大きな事前学習された言語モデルは、commonsenseベンチマークで説得力のあるパフォーマンスを達成している。それにもかかわらず、モデルがどんな常識知識を学んでいるのか、スプリアスパターンのみを利用するのかは不明だ。特徴属性は、モデル出力の重要な入力概念を特定する一般的な説明可能性技術である。しかし、コモンセンス知識は暗黙的であり、入力に明示的に表されることが多い。これらの手法は、上記の概念よりもモデルの暗黙的推論を推論することはできない。本稿では,外部コモンセンス知識ベースを用いた視覚的説明システムであるCommonsenseVISについて述べる。具体的には,モデル行動と人間の知識を整合させるための参考として,入力中の共通意味知識を抽出する。本システムでは,異なる概念とその基盤となる関係について,多段階の可視化とインタラクティブなモデル探索と編集を行う。ユーザスタディを通じて,NLPの専門家が,異なる状況における概念に対するモデルリレーショナル推論の体系的かつスケーラブルな視覚分析を行う上で,CommonsenseVISが有効であることを示す。 Recently, large pretrained language models have achieved compelling performance on commonsense benchmarks. Nevertheless, it is unclear what commonsense knowledge the models learn and whether they solely exploit spurious patterns. Feature attributions are popular explainability techniques that identify important input concepts for model outputs. However, commonsense knowledge tends to be implicit and rarely explicitly presented in inputs. These methods cannot infer models' implicit reasoning over mentioned concepts. We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering. Specifically, we extract relevant commonsense knowledge in inputs as references to align model behavior with human knowledge. Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations. Through a user study, we show that CommonsenseVIS helps NLP experts conduct a systematic and scalable visual analysis of models' relational reasoning over concepts in different situations.	翻訳日:2023-07-25 16:20:20 公開日:2023-07-23
# H$_2^+$分子イオンにおける高次高調波発生の量子光学的解析 Quantum optical analysis of high-order harmonic generation in H$_2^+$ molecular ions ( http://arxiv.org/abs/2307.12381v1 ) ライセンス: Link先を確認	J. Rivera-Dean, P. Stammer, A. S. Maxwell, Th. Lamprou, E. Pisanty, P. Tzallas, M. Lewenstein and M. F. Ciappina	(参考訳) 量子光学系におけるh$_2^+$分子イオンの高次高調波発生に関する包括的理論的研究を行う。本研究は,2中心分子におけるhhgの汎用性に着目し,様々な量子光学および量子情報計測を特徴付けることに焦点を当てた。レーザー・マター相互作用後の電子状態と光状態の絡み合いの出現を示す。また、特定の電子量子状態の条件付けにより、ターゲット周波数モードにおける光の非古典的状態を得る可能性も確認し、これは異なる調和モードの集合間の高古典的非古典的絡み合い状態の生成に不可欠であることが判明した。本研究は,分子系における強レーザー場駆動相互作用の研究の道を開き,量子技術への応用の可能性を提案する。 We present a comprehensive theoretical investigation of high-order harmonic generation in H$_2^+$ molecular ions within a quantum optical framework. Our study focuses on characterizing various quantum optical and quantum information measures, highlighting the versatility of HHG in two-center molecules towards quantum technology applications. We demonstrate the emergence of entanglement between electron and light states after the laser-matter interaction. We also identify the possibility of obtaining non-classical states of light in targeted frequency modes by conditioning on specific electronic quantum states, which turn out to be crucial in the generation of highly non-classical entangled states between distinct sets of harmonic modes. Our findings open up avenues for studying strong-laser field-driven interactions in molecular systems, and suggest their applicability to quantum technology applications.	翻訳日:2023-07-25 16:20:05 公開日:2023-07-23
# ProtoFL: 原型蒸留による教師なしフェデレーション学習 ProtoFL: Unsupervised Federated Learning via Prototypical Distillation ( http://arxiv.org/abs/2307.12450v1 ) ライセンス: Link先を確認	Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim	(参考訳) フェデレートラーニング(FL)は、特に認証システムにおいて、データのプライバシ保護を強化するための有望なアプローチである。しかしながら、ラウンドコミュニケーションの制限、表現の不足、スケーラビリティは、デプロイメントに重大な課題をもたらし、その潜在能力を完全に阻害する。本稿では,グローバルモデルの表現力を高め,ラウンドコミュニケーションコストを削減するために,教師なしフェデレーション学習に基づく原型的表現蒸留法である「protofl」を提案する。さらに,正規化フローに基づく局所的な一クラス分類器を導入し,データ制限による性能向上を図る。本研究は,FLを用いた一級分類性能向上のための最初の研究である。我々は,MNIST, CIFAR-10, CIFAR-100, ImageNet-30, Keystroke-Dynamicsの5つの広く利用されているベンチマークにおいて,従来の手法よりも優れた性能を示した。 Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.	翻訳日:2023-07-25 16:12:30 公開日:2023-07-23
# WEPRO:ハイブリッド量子古典アルゴリズムの効率的な最適化のための重み予測 WEPRO: Weight Prediction for Efficient Optimization of Hybrid Quantum-Classical Algorithms ( http://arxiv.org/abs/2307.12449v1 ) ライセンス: Link先を確認	Satwik Kundu, Debarshi Kundu and Swaroop Ghosh	(参考訳) 古典機械上での量子シミュレータの指数的実行時間と待ち行列深度、および実量子デバイスの高コストは、量子ニューラルネットワーク(QNN)、変分量子固有解法(VQE)、量子近似最適化アルゴリズム(QAOA)などの変分量子アルゴリズム(VQA)の効果的なトレーニングにおいて大きな課題となる。これらの制約に対処するため、パラメータ重みの規則的傾向を利用してVQAの収束を加速する新しい手法、WEPRO(Weight Prediction)を提案する。本稿では,最適予測性能のための2つの手法,naive prediction(nap)とadaptive prediction(adap)を提案する。様々なデータセット上の複数のQNNモデルの広範な実験とトレーニングを通じて、WEPROは標準的なトレーニング手法と比較して約2.25\times$のスピードアップを提供し、ストレージと計算オーバーヘッドの少ない精度(最大2.3\%$以上)と損失(最大6.1\%$以下)を提供する。また,分子基底エネルギー推定のためのVQEとグラフMaxCutのQAOAにおけるWEPROの有効性を評価した。その結果、WEPROは従来の最適化手法と比較して最大3.1\times$VQEと2.91\times$QAOAの速度改善を実現し、トレーニングイテレーションあたりのショット数(繰り返し回路実行)を最大3.3\times$に削減した。 The exponential run time of quantum simulators on classical machines and long queue depths and high costs of real quantum devices present significant challenges in the effective training of Variational Quantum Algorithms (VQAs) like Quantum Neural Networks (QNNs), Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization Algorithm (QAOA). To address these limitations, we propose a new approach, WEPRO (Weight Prediction), which accelerates the convergence of VQAs by exploiting regular trends in the parameter weights. We introduce two techniques for optimal prediction performance namely, Naive Prediction (NaP) and Adaptive Prediction (AdaP). Through extensive experimentation and training of multiple QNN models on various datasets, we demonstrate that WEPRO offers a speedup of approximately $2.25\times$ compared to standard training methods, while also providing improved accuracy (up to $2.3\%$ higher) and loss (up to $6.1\%$ lower) with low storage and computational overheads. We also evaluate WEPRO's effectiveness in VQE for molecular ground-state energy estimation and in QAOA for graph MaxCut. Our results show that WEPRO leads to speed improvements of up to $3.1\times$ for VQE and $2.91\times$ for QAOA, compared to traditional optimization techniques, while using up to $3.3\times$ less number of shots (i.e., repeated circuit executions) per training iteration.	翻訳日:2023-07-25 16:12:13 公開日:2023-07-23
# SCRAPS:音響空間と音声空間の音声コントラスト表現 SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces ( http://arxiv.org/abs/2307.12445v1 ) ライセンス: Link先を確認	Ivan Vall\'es-P\'erez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote	(参考訳) 論文の多くの例は、ディープラーニングモデルがマルチモーダルデータとうまく連携できることを証明した。最近、CLIPは、画像とテキスト記述間の共有潜在空間をディープラーニングシステムで学習できるようにし、下流タスクではゼロまたは少数ショットの結果が卓越している。本稿では,CLIPが提案したのと同じアイデアを,音声空間と音響空間が共存する音声領域に適用する。音声空間と音響空間の共有表現を学習するために,CLIPに基づくモデルを訓練する。その結果,提案モデルは音素の20%をランダムに置き換える際に91%のスコアが低下し,異なる種類の雑音に対してかなりの頑健性が得られ,ガウス雑音の75%と混合した場合のパフォーマンスが10%低下した。また,結果の埋め込みが,知性評価や音声生成タスクにおける豊富な事前学習音声埋め込みの活用など,下流のさまざまなアプリケーションにとって有用であることを示す実証的証拠を提供する。最後に、音声生成と認識分野に興味深い意味を持つ潜在的な応用について論じる。 Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is sensible to phonetic changes, with a 91% of score drops when replacing 20% of the phonemes at random, while providing substantial robustness against different kinds of noise, with a 10% performance drop when mixing the audio with 75% of Gaussian noise. We also provide empirical evidence showing that the resulting embeddings are useful for a variety of downstream applications, such as intelligibility evaluation and the ability to leverage rich pre-trained phonetic embeddings in speech generation task. Finally, we discuss potential applications with interesting implications for the speech generation and recognition fields.	翻訳日:2023-07-25 16:11:42 公開日:2023-07-23
# EnTri: 説明可能なシーン認識のための3レベル表現によるアンサンブル学習 EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition ( http://arxiv.org/abs/2307.12442v1 ) ライセンス: Link先を確認	Amirhossein Aminimehr, Amirali Molaei, Erik Cambria	(参考訳) 深層学習に基づくシーン認識は大きな進歩を遂げているが,クラス間類似性やクラス内類似性による課題により,その性能にはまだ限界がある。さらに、先行研究は主に分類精度の向上に重点を置いているが、解釈可能な正確なシーン分類を達成することにはあまり注意を向けていない。そこで我々は,視覚特徴の階層構造を用いたアンサンブル学習を利用したアンサンブルシーン認識フレームワークであるEnTriを提案する。 entriはピクセルレベル、セマンティクスセグメンテーションレベル、オブジェクトクラス、周波数レベルという3つの異なる詳細レベルで機能を表現する。複雑さの異なる特徴符号化方式を取り入れ,アンサンブル戦略を活用することにより,視覚的・テキスト的説明による透明性と解釈性を向上し,分類精度の向上を目指す。解釈可能性を達成するために,カテゴリの最終予測に寄与する所定のシーンの様々な特性を強調する視覚とテキストの両方の説明を生成する拡張アルゴリズムを考案した。これには、オブジェクト、統計、空間レイアウト、テキストの詳細に関する情報が含まれる。ベンチマークシーン分類データセットの実験を通じて、entriは認識精度の面で優位を示し、mit67、sun397、uiuc8のデータセットで87.69%、75.56%、99.17%の精度で最先端のアプローチと比較して競争力を達成した。 Scene recognition based on deep-learning has made significant progress, but there are still limitations in its performance due to challenges posed by inter-class similarities and intra-class dissimilarities. Furthermore, prior research has primarily focused on improving classification accuracy, yet it has given less attention to achieving interpretable, precise scene classification. Therefore, we are motivated to propose EnTri, an ensemble scene recognition framework that employs ensemble learning using a hierarchy of visual features. EnTri represents features at three distinct levels of detail: pixel-level, semantic segmentation-level, and object class and frequency level. By incorporating distinct feature encoding schemes of differing complexity and leveraging ensemble strategies, our approach aims to improve classification accuracy while enhancing transparency and interpretability via visual and textual explanations. To achieve interpretability, we devised an extension algorithm that generates both visual and textual explanations highlighting various properties of a given scene that contribute to the final prediction of its category. This includes information about objects, statistics, spatial layout, and textural details. Through experiments on benchmark scene classification datasets, EnTri has demonstrated superiority in terms of recognition accuracy, achieving competitive performance compared to state-of-the-art approaches, with an accuracy of 87.69%, 75.56%, and 99.17% on the MIT67, SUN397, and UIUC8 datasets, respectively.	翻訳日:2023-07-25 16:11:21 公開日:2023-07-23
# 対称正定値行列の多様体上の回帰による多値共分散推定 Multifidelity Covariance Estimation via Regression on the Manifold of Symmetric Positive Definite Matrices ( http://arxiv.org/abs/2307.12438v1 ) ライセンス: Link先を確認	Aimee Maurais, Terrence Alsup, Benjamin Peherstorfer, Youssef Marzouk	(参考訳) 対称正定値行列多様体上の回帰問題の解として定式化された共分散行列の多値性推定器を導入する。推定器は構成によって正定値であり、マハラノビス距離は最小限に抑えられ、実用的な計算を可能にする性質を持つ。多様体回帰多元性(mrmf)共分散推定器は、多様体接空間上のある誤差モデルの下で最大確率推定器であることを示す。より広範に、我々のリーマン回帰フレームワークは、制御変数から構築された既存の多値共分散推定器を含むことを示す。数値的な実例から,この推定器は,単一忠実度および他の複数忠実度共分散推定器に対する2乗推定誤差において,最大1桁の大幅な減少をもたらすことを証明した。さらに、正定性の保存は、この性質が不可欠であるデータ同化やメトリック学習のような下流タスクと推定器が互換性があることを保証する。 We introduce a multifidelity estimator of covariance matrices formulated as the solution to a regression problem on the manifold of symmetric positive definite matrices. The estimator is positive definite by construction, and the Mahalanobis distance minimized to obtain it possesses properties which enable practical computation. We show that our manifold regression multifidelity (MRMF) covariance estimator is a maximum likelihood estimator under a certain error model on manifold tangent space. More broadly, we show that our Riemannian regression framework encompasses existing multifidelity covariance estimators constructed from control variates. We demonstrate via numerical examples that our estimator can provide significant decreases, up to one order of magnitude, in squared estimation error relative to both single-fidelity and other multifidelity covariance estimators. Furthermore, preservation of positive definiteness ensures that our estimator is compatible with downstream tasks, such as data assimilation and metric learning, in which this property is essential.	翻訳日:2023-07-25 16:10:53 公開日:2023-07-23
# 物理制約ニューラルネットワークを用いた一般化シュワルツ型非重複領域分解法 A Generalized Schwarz-type Non-overlapping Domain Decomposition Method using Physics-constrained Neural Networks ( http://arxiv.org/abs/2307.12435v1 ) ライセンス: Link先を確認	Shamsulhaq Basir, Inanc Senocak	(参考訳) ニューラルネットワークを用いたメッシュレスシュワルツ型非重複領域分解法を提案し、偏微分方程式(PDE)を含む前方および逆問題の解法を提案する。近隣のサブドメイン間の解の整合性を確保するため、各サブドメインに独自のRobinパラメータを割り当てる一般化されたRobin型インタフェース条件を採用する。これらのサブドメイン固有のRobinパラメータは、Robinインターフェース条件のミスマッチを最小限に抑え、トレーニング中の効率的な情報交換を容易にするために学習される。この方法はラプラス方程式とヘルムホルツ方程式の両方に適用できる。これは、ラグランジアン形式を拡張して境界条件とインターフェース条件を厳格に強制しながら、支配的PDEの損失を最小限に抑えるために訓練された独立したニューラルネットワークモデルによる局所解を表す。提案手法の重要な強みは,各サブドメインのRobinパラメータを学習し,隣接するサブドメインとの情報交換を強化することである。学習したRobinパラメータは、ソリューションの局所的挙動、ドメイン分割、およびドメイン全体に対するサブドメイン位置に適応する。クロスポイントを用いた一方向および二方向の分解を含む前方および逆問題に関する広範な実験は,提案手法の汎用性と性能を示している。 We present a meshless Schwarz-type non-overlapping domain decomposition method based on artificial neural networks for solving forward and inverse problems involving partial differential equations (PDEs). To ensure the consistency of solutions across neighboring subdomains, we adopt a generalized Robin-type interface condition, assigning unique Robin parameters to each subdomain. These subdomain-specific Robin parameters are learned to minimize the mismatch on the Robin interface condition, facilitating efficient information exchange during training. Our method is applicable to both the Laplace's and Helmholtz equations. It represents local solutions by an independent neural network model which is trained to minimize the loss on the governing PDE while strictly enforcing boundary and interface conditions through an augmented Lagrangian formalism. A key strength of our method lies in its ability to learn a Robin parameter for each subdomain, thereby enhancing information exchange with its neighboring subdomains. We observe that the learned Robin parameters adapt to the local behavior of the solution, domain partitioning and subdomain location relative to the overall domain. Extensive experiments on forward and inverse problems, including one-way and two-way decompositions with crosspoints, demonstrate the versatility and performance of our proposed approach.	翻訳日:2023-07-25 16:10:38 公開日:2023-07-23
# SwIPE : 急激なパッチ埋め込みによる効率的かつロバストな医用画像分割 SwIPE: Efficient and Robust Medical Image Segmentation with Implicit Patch Embeddings ( http://arxiv.org/abs/2307.12429v1 ) ライセンス: Link先を確認	Yejia Zhang, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen	(参考訳) 現代の医用画像分割法は、主にラスタ化マスクの形で離散表現を用いて特徴を学習し、予測を生成する。効果はあるものの、このパラダイムは空間的に非フレキシブルであり、高解像度の画像にはスケールが悪く、物体の形状を直接理解できない。これらの制限に対処するため、最近の研究では暗黙のニューラル表現(INR)を使用してセグメンテーションの連続表現を学習している。しかし、これらの手法は3次元形状復元のために設計された部品を直接採用することが多い。 More importantly, these formulations were also constrained to either point-based or global contexts, lacking contextual understanding or local fine-grained details, respectively--both critical for accurate segmentation. To remedy this, we propose a novel approach, SwIPE (Segmentation with Implicit Patch Embeddings), that leverages the advantages of INRs and predicts shapes at the patch level--rather than at the point level or image level--to enable both accurate local boundary delineation and global shape coherence. 2つの課題(2次元ポリープ分割と3次元腹部臓器分割)の広範囲な評価は、SwIPEが最近の暗黙的アプローチよりも著しく改善し、10倍以上のパラメータで最先端の離散的手法より優れていることを示している。また,画像解像度とデータセット間のデータシフトに対して,データ効率の向上とロバスト性の向上も示す。コードはgithubで入手できる。 Modern medical image segmentation methods primarily use discrete representations in the form of rasterized masks to learn features and generate predictions. Although effective, this paradigm is spatially inflexible, scales poorly to higher-resolution images, and lacks direct understanding of object shapes. To address these limitations, some recent works utilized implicit neural representations (INRs) to learn continuous representations for segmentation. However, these methods often directly adopted components designed for 3D shape reconstruction. More importantly, these formulations were also constrained to either point-based or global contexts, lacking contextual understanding or local fine-grained details, respectively--both critical for accurate segmentation. To remedy this, we propose a novel approach, SwIPE (Segmentation with Implicit Patch Embeddings), that leverages the advantages of INRs and predicts shapes at the patch level--rather than at the point level or image level--to enable both accurate local boundary delineation and global shape coherence. Extensive evaluations on two tasks (2D polyp segmentation and 3D abdominal organ segmentation) show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters. Our method also demonstrates superior data efficiency and improved robustness to data shifts across image resolutions and datasets. Code is available on Github.	翻訳日:2023-07-25 16:10:17 公開日:2023-07-23
# Augmented Box Replay: インクリメンタルオブジェクト検出のための前景シフトの克服 Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection ( http://arxiv.org/abs/2307.12427v1 ) ライセンス: Link先を確認	Liu Yuyang, Cong Yang, Goswami Dipam, Liu Xialei, Joost van de Weijer	(参考訳) 漸進的な学習では、過去のタスクから格納されたサンプルを現在のタスクサンプルと共に再生することが、破滅的な忘れに対処する最も効率的なアプローチの1つである。しかし、インクリメンタル分類とは異なり、画像再生はインクリメンタルオブジェクト検出(iod)にうまく適用されていない。本稿では、この主な理由として、前景シフトの見落とされがちな問題を特定する。前景シフトは、以前のタスクのイメージを再生する際にのみ発生し、その背景に現在のタスクの前景オブジェクトが含まれているという事実を指す。この問題を解決するために,前景オブジェクトのみを記憶・再生し,前景シフト問題を回避できる新規かつ効率的な拡張ボックスリプレイ(abr)法を開発した。さらに,関心領域(RoI)特徴からの空間的注意を生かし,従来のモデルから最も重要な情報に焦点を絞るために電流モデルを制約する,革新的なRoI蒸留損失を提案する。 ABRは、現在のクラスで高い可塑性を維持しながら、以前のクラスの忘れを著しく減少させる。さらに、標準的な画像再生と比較してストレージ要求を大幅に削減する。 Pascal-VOCおよびCOCOデータセットに関する総合実験は、我々のモデルの最先端性能をサポートする。 In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model.	翻訳日:2023-07-25 16:09:53 公開日:2023-07-23
# 対話応答生成におけるオフラインRLの有効性について On the Effectiveness of Offline RL for Dialogue Response Generation ( http://arxiv.org/abs/2307.12425v1 ) ライセンス: Link先を確認	Paloma Sodhi, Felix Wu, Ethan R. Elenberg, Kilian Q. Weinberger, Ryan McDonald	(参考訳) 言語モデルの一般的な訓練技法は、教師強制(TF)である。 TFは、同じ意味を異なる方法で表現できるにもかかわらず、人間の言語を正確に一致させようとする。これは対話応答生成のためのシーケンスレベルの目的の使用を動機付ける。本稿では,これらの目的を最大化するための様々なオフライン強化学習(rl)手法の有効性について検討する。複数のデータセット、モデル、メトリクスにわたって包括的な評価を行う。オフラインRLは、トレーニング不安定を誘発したり、実践的なトレーニング予算を犠牲にすることなく、教師の強制よりも明確なパフォーマンス向上を示す。 A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.	翻訳日:2023-07-25 16:09:33 公開日:2023-07-23
# ポリシーに対するヘイトなスピーチをテストする Testing Hateful Speeches against Policies ( http://arxiv.org/abs/2307.12418v1 ) ライセンス: Link先を確認	Jiangrui Zheng, Xueqing Liu, Girish Budhrani, Wei Yang, Ravishka Rathnasuriya	(参考訳) 近年、多くのソフトウェアシステムがAI技術、特にディープラーニング技術を採用しています。そのブラックボックスの性質から、aiベースのシステムはトレーサビリティに課題をもたらした。aiシステムの振る舞いはモデルとデータに基づいているのに対して、要件やポリシーは自然言語やプログラミング言語の形式で規則になっているからだ。私たちの知る限りでは、AIとディープニューラルネットワークベースのシステムは、ルールベースの要件/政策に対してどのように振る舞うか、という研究は限られています。本稿では、自然言語ポリシーに記述された規則に基づく要求に対する深いニューラルネットワークの挙動について検討する。特に、AIベースのコンテンツモデレーションソフトウェアをコンテンツモデレーションポリシーに対してチェックするケーススタディに焦点を当てる。 First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. この作業のデータセットとコードは、匿名のwebサイトにある: \url{https://sites.google.com/view/content-moderation-project}。 In the recent years, many software systems have adopted AI techniques, especially deep learning techniques. Due to their black-box nature, AI-based systems brought challenges to traceability, because AI system behaviors are based on models and data, whereas the requirements or policies are rules in the form of natural or programming language. To the best of our knowledge, there is a limited amount of studies on how AI and deep neural network-based systems behave against rule-based requirements/policies. This experience paper examines deep neural network behaviors against rule-based requirements described in natural language policies. In particular, we focus on a case study to check AI-based content moderation software against content moderation policies. First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. The dataset and code of this work can be found on our anonymous website: \url{https://sites.google.com/view/content-moderation-project}.	翻訳日:2023-07-25 16:09:25 公開日:2023-07-23
# 不確実性におけるテストデータ感度の情報理論解析 Information-theoretic Analysis of Test Data Sensitivity in Uncertainty ( http://arxiv.org/abs/2307.12456v1 ) ライセンス: Link先を確認	Futoshi Futami, Tomoharu Iwata	(参考訳) ベイズ推論は不確実な定量化タスクにしばしば用いられる。 xu と raginsky 2022 による最近の分析では、ベイズ推論の予測の不確実性は、データ生成過程に固有のランダム性を表す aleatoric と epistemic uncertainties と呼ばれる2つの不確実性に厳密に分解された。彼らはこれらの不確実性を情報理論的に分析し、モデルが適切に特定され、モデルのパラメータを潜在変数として扱うと仮定した。しかし、既存の不確実性の情報理論分析では、テストデータとトレーニングデータの感度として知られる、不確実性の広く信じられている性質を説明できない。テストデータが何らかの意味でトレーニングデータと類似している場合、認識の不確実性は小さくなるはずである。本研究では, 予測の不確かさに対する新しい分解法を用いて, 不確かさの感度について検討する。我々の分析は情報理論量を用いてそのような感度をうまく定義する。さらに,ベイズ的メタラーニングの既存分析を拡張し,タスク間の新たな感性を示す。 Bayesian inference is often utilized for uncertainty quantification tasks. A recent analysis by Xu and Raginsky 2022 rigorously decomposed the predictive uncertainty in Bayesian inference into two uncertainties, called aleatoric and epistemic uncertainties, which represent the inherent randomness in the data-generating process and the variability due to insufficient data, respectively. They analyzed those uncertainties in an information-theoretic way, assuming that the model is well-specified and treating the model's parameters as latent variables. However, the existing information-theoretic analysis of uncertainty cannot explain the widely believed property of uncertainty, known as the sensitivity between the test and training data. It implies that when test data are similar to training data in some sense, the epistemic uncertainty should become small. In this work, we study such uncertainty sensitivity using our novel decomposition method for the predictive uncertainty. Our analysis successfully defines such sensitivity using information-theoretic quantities. Furthermore, we extend the existing analysis of Bayesian meta-learning and show the novel sensitivities among tasks for the first time.	翻訳日:2023-07-25 16:01:28 公開日:2023-07-23
# 高速ベイズトモグラフィーによる非マルコフ量子過程のキャラクタリゼーション Characterizing non-Markovian Quantum Processes by Fast Bayesian Tomography ( http://arxiv.org/abs/2307.12452v1 ) ライセンス: Link先を確認	R. Y. Su, J. Y. Huang, N. Dumoulin. Stuyck, M. K. Feng, W. Gilbert, T. J. Evans, W. H. Lim, F. E. Hudson, K. W. Chan, W. Huang, Kohei M. Itoh, R. Harper, S. D. Bartlett, C. H. Yang, A. Laucht, A. Saraiva, T. Tanttu and A. S. Dzurak	(参考訳) 量子誤り訂正のしきい値を超えるレベルにゲート性能をプッシュするには、量子ゲートに発生するエラーソースを特徴付けることが重要である。しかし、非マルコフ誤差の特性は、現在の量子プロセストモグラフィー技術に挑戦している。 Fast Bayesian Tomography (FBT) は自己整合性ゲートセットトモグラフィプロトコルであり、初期の特徴的知識からブートストラップし、任意のゲートシーケンスでリアルタイムで更新できる。ここでは、FBTが鍵となる非マルコフ的誤り過程のキャラクタリゼーションを実現する方法を示す。シリコン量子ドット上の2量子ビット系の非マルコフ的挙動を診断するためのFBTの実験プロトコルを2つ導入する。実験分析ループの効率性とスケーラビリティを向上させるため,オンラインFBTソフトウェアスタックを開発した。実験コストと解析時間を削減するため,本研究では,本手法と温かいブート戦略も導入する。以上の結果から,FBTは量子コンピューティングにおけるフォールトトレラント演算の究極的実現に寄与する非マルコフ誤差の探索に有用であることが示された。 To push gate performance to levels beyond the thresholds for quantum error correction, it is important to characterize the error sources occurring on quantum gates. However, the characterization of non-Markovian error poses a challenge to current quantum process tomography techniques. Fast Bayesian Tomography (FBT) is a self-consistent gate set tomography protocol that can be bootstrapped from earlier characterization knowledge and be updated in real-time with arbitrary gate sequences. Here we demonstrate how FBT allows for the characterization of key non-Markovian error processes. We introduce two experimental protocols for FBT to diagnose the non-Markovian behavior of two-qubit systems on silicon quantum dots. To increase the efficiency and scalability of the experiment-analysis loop, we develop an online FBT software stack. To reduce experiment cost and analysis time, we also introduce a native readout method and warm boot strategy. Our results demonstrate that FBT is a useful tool for probing non-Markovian errors that can be detrimental to the ultimate realization of fault-tolerant operation on quantum computing.	翻訳日:2023-07-25 16:01:08 公開日:2023-07-23
# DiAMoNDBack: C{\alpha}タンパク質の非決定論的バックマッピングのための拡散還元自己回帰モデル DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces ( http://arxiv.org/abs/2307.12451v1 ) ライセンス: Link先を確認	Michael S. Jones and Kirill Shmilovich and Andrew L. Ferguson	(参考訳) タンパク質の粗い粒度の分子モデルは、全原子モデルでは達成できない長さと時間スケールへのアクセスと、凝集や折り畳みなどの長時間スケールで起こるプロセスのシミュレーションを可能にする。分解能の低下は計算加速度を実現するが、機械的詳細の完全な理解には原子論的な表現が不可欠である。バックマッピングは、全原子分解能を粗い分子モデルに復元するプロセスである。本研究では,DiaMoNDBack(Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping)を自己回帰分解拡散確率モデルとして報告し,全原子の詳細をC{\alpha}座標のみを保持する粗粒タンパク質表現に復元する。自己回帰生成過程は、C{\alpha}トレースに条件付き残基バイレジデント方式でタンパク質N末端からC末端へと進行し、以前は局所近傍のバックボーンと側鎖原子がバックマップされていた。我々のモデルにおける局所的および自己回帰的な性質は、タンパク質間の移動を可能にする。消音拡散過程の確率的性質は、モデルが粗粒c{\alpha}トレースと整合するバックボーンとサイドチェーンの全原子配置の現実的なアンサンブルを生成することを意味する。タンパク質データバンク (pdb) から65k以上の構造をダイアモンドバックし, ホールドアウト pdb テストセット, タンパク質アンサンブルデータベース (ped) による内在的不規則タンパク質構造, ド・ショー研究による高速折り畳みミニタンパク質の分子動力学シミュレーション, 粗粒度シミュレーションデータに適用した。我々は, 正しい結合形成, 側鎖衝突の回避, 生成側鎖構成状態の多様性の観点から, 最先端の再構築性能を実現する。 DiAMoNDBackモデルをフリーでオープンソースのPythonパッケージとして公開しています。 Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.	翻訳日:2023-07-25 16:00:49 公開日:2023-07-23
# 無効論理と等価なゲイン:言語モデルのプロンプトにおける推論の奇妙な性質 Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting ( http://arxiv.org/abs/2307.10573v2 ) ライセンス: Link先を確認	Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo	(参考訳) 言語モデルは、パフォーマンスを大幅に向上させる方法で問題を通じて推論するよう促すことができる。しかし、このようなプロンプトによるパフォーマンス改善は明らかではない。最近の研究では、論理的な \textit{invalid} chain-of-thought (cot) プロンプトを用いることで、論理的な \textit{valid} cotプロンプトと同じくらいのパフォーマンスが向上し、cotの編集によって問題固有の情報を抽象情報や分散情報に置き換えることが通常性能に影響を与えないことが示された。批評家は、これらの発見は意味のある結論を導き出すにはあまりにも少なく、簡単に解決できないタスクに基づいていると答えている。この問題を解決するために、論理的に無効なCoTプロンプトが、BIG-Bench Hard(BBH)と呼ばれるBIG-Benchベンチマークの最も難しいタスクにおいて、論理的に有効なプロンプトと同じレベルのパフォーマンスゲインを提供するかどうかをテストする。論理的に textit{invalid} 推論プロンプトは、BBH タスクにおいて論理的に有効な推論プロンプトとして、確かに同様のパフォーマンスゲインを達成する。また、前作で使われたcotプロンプトには論理的なエラーが含まれていることもわかりました。これは、論理的に妥当な推論を超えた共変項がパフォーマンス改善の責任を負うことを示唆している。 Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.	翻訳日:2023-07-25 11:24:55 公開日:2023-07-23
# 検索強化による大規模言語モデルの事実知識境界の検討 Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation ( http://arxiv.org/abs/2307.11019v2 ) ライセンス: Link先を確認	Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang	(参考訳) 知識集約的なタスク(例えば、オープンドメイン質問応答(QA))は、かなりの量の事実知識を必要とし、しばしば援助のために外部情報に依存する。最近の大規模言語モデル(例えばchatgpt)は、知識集約的なタスクを含む、世界的知識による幅広いタスクの解決において印象的な能力を示している。しかし、LLMが実際の知識境界、特に検索強化を取り入れた場合の行動をどのように認識できるかは、まだ不明である。本研究では,オープンドメインQA上でのLLMの実態知識境界と検索の増大がLLMに与える影響について,初期分析を行った。特に,3つの主要な研究課題に焦点をあて,QA評価,事前判定,後部判定による分析を行った。 llmが質問に対する回答能力と回答の正確性に不当な自信を持っている証拠を示す。さらに,検索の強化は,llmsの知識境界に対する意識向上に有効なアプローチであることが証明され,その判断能力が向上した。さらに, LLMは, 回答の定式化に際し, 提案した検索結果に依存する傾向があり, これらの結果の質がそれらの信頼性に大きく影響することがわかった。この作業を再現するコードはhttps://github.com/RUCAIBox/LLM-Knowledge-Boundaryで公開されている。 Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.	翻訳日:2023-07-25 11:12:53 公開日:2023-07-23
# シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく、より高度な一般化を実現する Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization ( http://arxiv.org/abs/2307.11007v2 ) ライセンス: Link先を確認	Kaiyue Wen, Zhiyuan Li, Tengyu Ma	(参考訳) 広範な研究にもかかわらず、過剰パラメータ化されたニューラルネットワークが一般化できる理由については、いまだに解明されていない。既存の理論では、一般的な確率最適化器は訓練損失のより平坦な最小化器を好んでおり、従って平坦性は一般化を意味するという自然な説明がある。この研究はこの説明を批判的に検証する。 1) 平坦性が一般化を立証する, (2) 非一般化平坦性モデルが存在する, (2) シャープ性最小化アルゴリズムは一般化しない, (3) もっとも驚くことに、非一般化平坦性モデルが存在するが、シャープ性最小化アルゴリズムは依然として一般化している。以上の結果から,シャープネスと一般化の関係はデータ分布とモデルアーキテクチャに依存し,シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく,より優れた一般化を実現することができることが示唆された。これにより、超パラメータニューラルネットワークの一般化のための他の説明の探索が要求される。 Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.	翻訳日:2023-07-25 11:12:29 公開日:2023-07-23

Title

Authors

Abstract

論文公表日・翻訳日

# DATを用いたデジタルオブジェクト空間管理サービス(DOSM)のデータアーキテクチャ

Data Architecture for Digital Object Space Management Service (DOSM) using DAT ( http://arxiv.org/abs/2306.12909v3 )

ライセンス: Link先を確認

Moamin Abughazala, Henry Muccini

(参考訳) IoT(Internet of Things)データとソーシャルメディアデータは、急成長するデータセグメントの2つだ。高品質なデータを持つことは、インフォームドビジネスの決定に不可欠です。データからの洞察を活用する戦略的プロセスは、データ駆動意思決定として知られている。これを達成するためには、データの収集、保存、分析、保護を可能な限り最善の方法で行う必要があります。データアーキテクチャは、ソースから目的地へのデータフローを記述し、情報に対するビジネスニーズを満たすためにデータを管理するブループリントを作成する複雑なタスクである。本稿では,データアーキテクチャツール(Data Architecture Tool, DAT)を用いて,VASARIプロジェクトの一部として開発されたDigital Space Management Serviceのデータモデリングを行う。本研究は、データ移動、データフォーマット、データロケーション、データ処理(バッチまたはリアルタイム)、データストレージ技術、データに関する主要な操作を記述することに焦点を当てる。

The Internet of Things (IoT) data and social media data are two of the fastest-growing data segments. Having high-quality data is crucial for making informed business decisions. The strategic process of leveraging insights from data is known as data-driven decision-making. To achieve this, it is necessary to collect, store, analyze, and protect data in the best ways possible. Data architecture is a complex task that involves describing the flow of data from its source to its destination and creating a blueprint for managing the data to meet business needs for information. In this paper, we utilize the Data Architecture Tool (DAT) to model data for Digital Space Management Service, which was developed as part of the VASARI project. This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.

翻訳日:2023-10-23 19:06:42 公開日:2023-07-23

# DATを用いたスマートシティデータ駆動アプリケーションのためのデータ分析アーキテクチャのモデリング

Modeling Data Analytics Architecture for Smart Cities Data-Driven Applications using DAT ( http://arxiv.org/abs/2307.08870v2 )

ライセンス: Link先を確認

Moamin Abughazala, Henry Muccini

(参考訳) 大量の情報から貴重な洞察を抽出することは、データの取得、保存、管理、分析、視覚化を含む重要なプロセスである。データ分析アプリケーションの抽象的な概要を提供することは、収集されたデータが有意義な情報に変換されることを保証するために重要である。この目標を達成する効果的な方法の1つはデータアーキテクチャである。本稿では,データ駆動型スマートシティアプリケーションのためのモデル駆動設計を用いたデータ分析アーキテクチャ(daa)の開発経験について紹介する。

Extracting valuable insights from vast amounts of information is a critical process that involves acquiring, storing, managing, analyzing, and visualizing data. Providing an abstract overview of data analytics applications is crucial to ensure that collected data is transformed into meaningful information. One effective way of achieving this objective is through Data Architecture. This article shares our experiences in developing a Data Analytics Architecture (DAA) using model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.

翻訳日:2023-10-23 17:11:13 公開日:2023-07-23

# 大規模産業における要求工学と検証の整合化の課題

Challenges in aligning requirements engineering and verification in a large-scale industrial context ( http://arxiv.org/abs/2307.12419v1 )

ライセンス: Link先を確認

Giedre Sabaliauskaite, Annabella Loconsole, Emelie Engstr\"om, Michael Unterkalmsteiner, Bj\"orn Regnell, Per Runeson, Tony Gorschek, Robert Feldt

(参考訳) 状況とモチベーション] ソフトウェア開発では,予算内および時間内において,高品質な製品を開発するためには,組織単位間の調整が不可欠です。特に、開発済みのソフトウェア製品が顧客の要求を満たすことを保証するためには、要求と検証プロセスの同期が不可欠である。要求と検証プロセスの整合化における現在の課題は何ですか? 主なアイデア/結果] 大規模ソフトウェア開発会社でインタビュー研究を行いました。本稿では,要件の整合と検証プロセスにおける重要な課題を明らかにするインタビューの予備的知見について述べる。 [貢献]本研究の結果は,組織やプロセス,人,ツール,要件プロセス,テストプロセス,変更管理,トレーサビリティ,測定といった,研究対象の組織が直面するさまざまな課題を含む。本研究の成果は,組織内アライメントの基盤として実践者や,要求と検証のアライメントをより効率的かつ効果的に管理するためのアプローチを開発する科学者によって利用することができる。

[Context and motivation] When developing software, coordination between different organizational units is essential in order to develop a good quality product, on time and within budget. Particularly, the synchronization between requirements and verification processes is crucial in order to assure that the developed software product satisfies customer requirements. [Question/problem] Our research question is: what are the current challenges in aligning the requirements and verification processes? [Principal ideas/results] We conducted an interview study at a large software development company. This paper presents preliminary findings of these interviews that identify key challenges in aligning requirements and verification processes. [Contribution] The result of this study includes a range of challenges faced by the studied organization grouped into the categories: organization and processes, people, tools, requirements process, testing process, change management, traceability, and measurement. The findings of this study can be used by practitioners as a basis for investigating alignment in their organizations, and by scientists in developing approaches for more efficient and effective management of the alignment between requirements and verification.

翻訳日:2023-10-23 16:30:27 公開日:2023-07-23

# 金融ポートフォリオ管理のためのディープラーニングとオンラインソース感の活用

Leveraging Deep Learning and Online Source Sentiment for Financial Portfolio Management ( http://arxiv.org/abs/2309.16679v1 )

ライセンス: Link先を確認

Paraskevi Nousi, Loukia Avramelou, Georgios Rodinos, Maria Tzelepi, Theodoros Manousis, Konstantinos Tsampazis, Kyriakos Stefanidis, Dimitris Spanos, Emmanouil Kirtas, Pavlos Tosidis, Avraam Tsantekidis, Nikolaos Passalis and Anastasios Tefas

(参考訳) ファイナンシャル・ポートフォリオ・マネジメント(英: financial portfolio management)とは、株式、インデックスファンド、外国為替、暗号通貨などの一連の金融資産において、当該事業の損失を最小化しつつ利益を最大化することを目的とした、資金の分配及び取引業務を行う業務をいう。ディープラーニング(DL)メソッドは、さまざまなタスクにおいて一貫して優れており、自動化された金融取引はその中のひとつです。本稿では,金融取引における様々なdl手法について,監督学習と強化学習の両面で見識を提供することを目的としている。同時に、取引資産に関する感情情報を考慮し、対応する研究研究を通してそれらの有用性を議論し、実証する。最後に、このような金融エージェントの訓練においてよく見られる問題について議論し、これらの問題を避けるために必要な知識を読者に与え、実際に議論する方法を適用する。

Financial portfolio management describes the task of distributing funds and conducting trading operations on a set of financial assets, such as stocks, index funds, foreign exchange or cryptocurrencies, aiming to maximize the profit while minimizing the loss incurred by said operations. Deep Learning (DL) methods have been consistently excelling at various tasks and automated financial trading is one of the most complex one of those. This paper aims to provide insight into various DL methods for financial trading, under both the supervised and reinforcement learning schemes. At the same time, taking into consideration sentiment information regarding the traded assets, we discuss and demonstrate their usefulness through corresponding research studies. Finally, we discuss commonly found problems in training such financial agents and equip the reader with the necessary knowledge to avoid these problems and apply the discussed methods in practice.

翻訳日:2023-10-23 05:56:47 公開日:2023-07-23

# 多空間深層モデルを用いた脳波信号によるメンタルワークロード推定

Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models ( http://arxiv.org/abs/2308.02409v1 )

ライセンス: Link先を確認

Hong-Hai Nguyen, Ngumimi Karen Iyortsuun, Hyung-Jeong Yang, Guee-Sang Lee, and Soo-Hyung Kim

(参考訳) 人間の脳は、仕事と休息の間、継続的な活動状態にある。精神活動は日常的なプロセスであり、脳が過剰に働くと人間の健康に悪影響を及ぼす可能性がある。近年,深刻な健康問題の発生防止と生活の質向上に寄与するため,精神疾患の早期発見に注目が集まっている。いくつかの信号は精神状態を評価するために使用されるが、脳波(EEG)は脳に関する大量の情報を提供するため、研究者によって広く用いられている。本稿では,メンタルワーク負荷を3つの状態に分類し,連続レベルを推定することを目的とした。本手法は,複数次元の空間を組み合わせ,心的推定に最適な結果を得る。時間領域アプローチでは、時間的畳み込みネットワークを使用し、周波数領域では、残留ブロックを組み合わせた多次元残留ブロックと呼ばれる新しいアーキテクチャを提案する。

The human brain is in a continuous state of activity during both work and rest. Mental activity is a daily process, and when the brain is overworked, it can have negative effects on human health. In recent years, great attention has been paid to early detection of mental health problems because it can help prevent serious health problems and improve quality of life. Several signals are used to assess mental state, but the electroencephalogram (EEG) is widely used by researchers because of the large amount of information it provides about the brain. This paper aims to classify mental workload into three states and estimate continuum levels. Our method combines multiple dimensions of space to achieve the best results for mental estimation. In the time domain approach, we use Temporal Convolutional Networks, and in the frequency domain, we propose a new architecture called the Multi-Dimensional Residual Block, which combines residual blocks.

翻訳日:2023-08-14 01:49:17 公開日:2023-07-23

# スマートコントラクトの実装:ペイパーライクなNFT-rentalの場合

Implementing Smart Contracts: The case of NFT-rental with pay-per-like ( http://arxiv.org/abs/2308.02424v1 )

ライセンス: Link先を確認

Alfred Sopi, Johannes Schneider, Jan vom Brocke

(参考訳) 非偽造トークン(NFT)が上昇している。それらは、企業やオンラインストアのウェブページでマーケティング目的で展示されたアートワークを表現できる。 NFTの貸与は所有者にとって魅力的な受動的収入形態であるが、リスク(アイテムは返却されない)とエスクローエージェントのコストが伴う。同様に、レンタル業者はアートワークの影響を予測できない(例えば、NFTの観客が彼らをどう感じているかなど)。これらの課題に対処するため、ブロックチェーン技術を使用したペイパーライクな価格モデル、すなわちEthereumチェーンに基づいたスマートコントラクトに基づくNFTレンタルソリューションを導入しました。ブロックチェーンソリューションは、他のアプリケーションでも報告されている多くの利点を享受していますが、興味深いことに、(大きな)ブロックチェーン料金の暗い側面も観察しています。ブロックチェーンソリューションはニッチなアーティストには不公平で、文化的多様性を阻害する可能性がある。さらに、ブロックチェーン外の当事者による操作による不正に対処するために、信頼コストのトレードオフが発生する。ソリューションのすべてのコードは、https://github.com/asopi/rental-projectで公開されている。

Non-fungible tokens(NFTs) are on the rise. They can represent artworks exhibited for marketing purposes on webpages of companies or online stores -- analogously to physical artworks. Lending of NFTs is an attractive form of passive income for owners but comes with risks (e.g., items are not returned) and costs for escrow agents. Similarly, renters have difficulties in anticipating the impact of artworks, e.g., how spectators of NFTs perceive them. To address these challenges, we introduce an NFT rental solution based on a pay-per-like pricing model using blockchain technology, i.e., smart contracts based on the Ethereum chain. We find that blockchain solutions enjoy many advantages also reported for other applications, but interestingly, we also observe dark sides of (large) blockchain fees. Blockchain solutions appear unfair to niche artists and potentially hamper cultural diversity. Furthermore, a trust-cost tradeoff arises to handle fraud caused by manipulation from parties outside the blockchain. All code for the solution is publicly available at: https://github.com/asopi/rental-project

翻訳日:2023-08-14 01:39:18 公開日:2023-07-23

# 衛星画像へのディープラーニングの適用によるキプロス農村地域のゴミ捨て場の同定

The identification of garbage dumps in the rural areas of Cyprus through the application of deep learning to satellite imagery ( http://arxiv.org/abs/2308.02502v1 )

ライセンス: Link先を確認

Andrew Keith Wilkinson

(参考訳) ごみ処理は先進国中で難しい問題である。 In Cyprus, as elsewhere, illegal ``fly-tipping" is a significant issue, especially in rural areas where few legal garbage disposal options exist. However, there is a lack of studies that attempt to measure the scale of this problem, and few resources available to address it. A method of automating the process of identifying garbage dumps would help counter this and provide information to the relevant authorities. The aim of this study was to investigate the degree to which artificial intelligence techniques, together with satellite imagery, can be used to identify illegal garbage dumps in the rural areas of Cyprus. This involved collecting a novel dataset of images that could be categorised as either containing, or not containing, garbage. The collection of such datasets in sufficient raw quantities is time consuming and costly. Therefore a relatively modest baseline set of images was collected, then data augmentation techniques used to increase the size of this dataset to a point where useful machine learning could occur. From this set of images an artificial neural network was trained to recognise the presence or absence of garbage in new images. A type of neural network especially suited to this task known as ``convolutional neural networks" was used. その結果, 独立に収集したテスト画像を用いて, モデルの有効性を評価した。その結果、約90%のケースでゴミを含む画像を正しく識別できるディープラーニングモデルが得られた。このモデルがキプロスの景観全体を体系的に分析し、島の総合的な「ガーベッジ」マップを構築する、将来のシステムの基礎を形成する可能性が考えられている。

Garbage disposal is a challenging problem throughout the developed world. In Cyprus, as elsewhere, illegal ``fly-tipping" is a significant issue, especially in rural areas where few legal garbage disposal options exist. However, there is a lack of studies that attempt to measure the scale of this problem, and few resources available to address it. A method of automating the process of identifying garbage dumps would help counter this and provide information to the relevant authorities. The aim of this study was to investigate the degree to which artificial intelligence techniques, together with satellite imagery, can be used to identify illegal garbage dumps in the rural areas of Cyprus. This involved collecting a novel dataset of images that could be categorised as either containing, or not containing, garbage. The collection of such datasets in sufficient raw quantities is time consuming and costly. Therefore a relatively modest baseline set of images was collected, then data augmentation techniques used to increase the size of this dataset to a point where useful machine learning could occur. From this set of images an artificial neural network was trained to recognise the presence or absence of garbage in new images. A type of neural network especially suited to this task known as ``convolutional neural networks" was used. The efficacy of the resulting model was evaluated using an independently collected dataset of test images. The result was a deep learning model that could correctly identify images containing garbage in approximately 90\% of cases. It is envisaged that this model could form the basis of a future system that could systematically analyse the entire landscape of Cyprus to build a comprehensive ``garbage" map of the island.

翻訳日:2023-08-14 01:29:13 公開日:2023-07-23

# バイオメディカルおよび非バイオメディカル環境における合成画像のクラス内多様性と品質評価

Assessing Intra-class Diversity and Quality of Synthetically Generated Images in a Biomedical and Non-biomedical Setting ( http://arxiv.org/abs/2308.02505v1 )

ライセンス: Link先を確認

Muhammad Muneeb Saad, Mubashir Husain Rehmani, and Ruairi O'Reilly

(参考訳) 生体医用画像解析において、データの不均衡は複数の画像モダリティに共通である。データ拡張はこの制限に対処する上で重要なソリューションのひとつです。 generative adversarial networks (gans) はますますデータ拡張タスクに依存しています。生体画像の特徴は合成画像の有効性の評価に敏感である。これらの特徴は、異なる生体画像モダリティ間で合成画像を評価する際に、メートル法スコアに大きな影響を及ぼす可能性がある。実画像の多様性と品質を比較することで合成画像を評価することができる。多スケール構造類似度指標とコサイン距離はクラス内多様性の評価に使用され、フレシェ開始距離は合成画像の品質評価に使用される。バイオメディカルおよび非バイオメディカルイメージングのためのこれらの指標を評価することは、合成画像の多様性と品質を評価するための情報戦略を検討する上で重要である。本研究では, バイオメディカルで非バイオメディカルな環境下で, 深部畳み込み型GANに対して, 実験的な測定を行った。異なるサンプルサイズを用いて合成画像の多様性と品質を評価する。本研究は,バイオメディカルおよび非バイオメディカルイメージングモダリティにおける多様性と品質のばらつきについて検討することを目的とする。その結果,バイオメディカルからバイオメディカルへ,バイオメディカルからバイオメディカルへ,非バイオメディカルなイメージングモダリティにおいて,多様性と品質の指標は著しく異なることがわかった。

In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impact on metric scores when evaluating synthetic images across different biomedical imaging modalities. Synthetically generated images can be evaluated by comparing the diversity and quality of real images. Multi-scale Structural Similarity Index Measure and Cosine Distance are used to evaluate intra-class diversity, while Frechet Inception Distance is used to evaluate the quality of synthetic images. Assessing these metrics for biomedical and non-biomedical imaging is important to investigate an informed strategy in evaluating the diversity and quality of synthetic images. In this work, an empirical assessment of these metrics is conducted for the Deep Convolutional GAN in a biomedical and non-biomedical setting. The diversity and quality of synthetic images are evaluated using different sample sizes. This research intends to investigate the variance in diversity and quality across biomedical and non-biomedical imaging modalities. Results demonstrate that the metrics scores for diversity and quality vary significantly across biomedical-to-biomedical and biomedical-to-non-biomedical imaging modalities.

翻訳日:2023-08-14 01:18:22 公開日:2023-07-23

# MyVoice: アラビア語音声リソースコラボレーションプラットフォーム

MyVoice: Arabic Speech Resource Collaboration Platform ( http://arxiv.org/abs/2308.02503v1 )

ライセンス: Link先を確認

Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, and Ahmed Ali

(参考訳) MyVoiceはアラビア語の音声を収集して方言の音声技術を強化するためのクラウドソーシングプラットフォームである。このプラットフォームは、大きな方言の音声データセットを設計する機会を提供し、それらを一般公開する。 MyVoiceを使えば、コントリビュータは都市や州レベルのきめ細かい方言を選択して、表示された発話を記録することができる。ユーザーはコントリビュータとアノテーションを切り替えることができる。このプラットフォームには品質保証システムがあり、品質の低い録音をフィルタリングし、検証のために送信する。検証フェーズの間、コントリビュータは録音の品質を評価し、注釈を付け、フィードバックを提供し、管理者によってレビューされる。さらに、このプラットフォームは、管理者の役割に柔軟性を提供し、方言の音声や単語の収集以外の新しいデータやタスクを追加し、コントリビュータに表示する。したがって、多種多様なアラビア語の音声データを収集する共同作業を可能にする。

We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.

翻訳日:2023-08-14 01:17:58 公開日:2023-07-23

# AMaizeD: 自動トウモロコシ病検出のためのエンド・トゥ・エンドパイプライン

AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection ( http://arxiv.org/abs/2308.03766v1 )

ライセンス: Link先を確認

Anish Mall, Sanchit Kabra, Ankur Lhila and Pawan Ajmera

(参考訳) 本研究は,マルチスペクトル画像を用いたトウモロコシ作物の病害早期検出のための自動フレームワークである,トウモロコシ病検出用エンド・ツー・エンドパイプラインであるamaizedを提案する。トウモロコシの収穫に特化した手作りのカスタムデータセットは、専門家や農学者によって慎重に収集された。このデータセットは様々な種類のトウモロコシ品種、栽培慣行、環境条件を含み、トウモロコシの成長と病気の進行の様々な段階を捉えている。マルチスペクトル画像を活用することで、スペクトル分解能が向上し、植物の健康状態の微妙な変化に対する感度が向上する。提案するフレームワークは,コンボリューションニューラルネットワーク(CNN)を特徴抽出器とセグメンテーション技術に組み合わせて,トウモロコシの植物とその関連疾患を同定する。実験により, 粉状ミドウ, アントラクトース, 葉緑化など, 各種のトウモロコシ病の検出に有効であることが示された。このフレームワークは、カスタムハンドコンパイルデータセットにおける最先端のパフォーマンスを達成し、農業における自動疾患検出の分野に貢献し、トウモロコシ作物の病気を早期に識別するための実用的なソリューションを提供する。

This research paper presents AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection, an automated framework for early detection of diseases in maize crops using multispectral imagery obtained from drones. A custom hand-collected dataset focusing specifically on maize crops was meticulously gathered by expert researchers and agronomists. The dataset encompasses a diverse range of maize varieties, cultivation practices, and environmental conditions, capturing various stages of maize growth and disease progression. By leveraging multispectral imagery, the framework benefits from improved spectral resolution and increased sensitivity to subtle changes in plant health. The proposed framework employs a combination of convolutional neural networks (CNNs) as feature extractors and segmentation techniques to identify both the maize plants and their associated diseases. Experimental results demonstrate the effectiveness of the framework in detecting a range of maize diseases, including powdery mildew, anthracnose, and leaf blight. The framework achieves state-of-the-art performance on the custom hand-collected dataset and contributes to the field of automated disease detection in agriculture, offering a practical solution for early identification of diseases in maize crops advanced machine learning techniques and deep learning architectures.

翻訳日:2023-08-14 00:39:57 公開日:2023-07-23

# キューイング型トラヒック割当手法を用いたスマートワークゾーンアプリケーションのためのリーダ追従型自動車両システムの配置

Deployment of Leader-Follower Automated Vehicle Systems for Smart Work Zone Applications with a Queuing-based Traffic Assignment Approach ( http://arxiv.org/abs/2308.03764v1 )

ライセンス: Link先を確認

Qing Tang, Xianbiao Hu

(参考訳) ATMA(Autonomous Truck Mounted Attenuator)の新たな技術は、ワークゾーンにおける交通インフラのメンテナンス中の安全性を高めるために、コネクテッドおよびオートマチックな車両機能を活用している。しかし、ATMA車両と一般車両の速度差は、キャパシティを減少させ、待ち時間を増加させる移動ボトルネックを生じさせ、さらなる遅延をもたらす。 atmaによって取られた異なる経路は、ユーザの平衡トラフィック割り当てに影響し、異なるシステムコストにつながる可能性がある、時間変動容量低下の多様なパターンを引き起こす。本書は,ネットワーク上でのATMA車両の経路最適化に焦点をあて,低速動作に伴うシステムコストを最小化する。これを実現するために,atmaシステムによるシステムコストを特定するため,待ち行列に基づくトラヒック割当手法を提案する。キャパシティ低下を考慮した待ち時間依存旅行時間関数(QBTD)を導入し,動的特性を付加した結果,静的ユーザ平衡トラフィック割り当て問題に適用した。その後、待ち行列に基づくトラフィック割り当て問題を定式化し、修正パスベースのアルゴリズムを用いて解決する。本手法は,小型ネットワークと大規模ネットワークを用いて検証し,キャパシティドロップモデリングとQBTD走行時間関数の利点を分析するための2つのベンチマークモデルと比較した。さらに、異なる経路の交通システムへの影響を定量化し、保守作業を行うatma車両の最適経路を特定するためのアプローチを適用した。最後に,交通需要の変動やキャパシティの低下に伴う影響について,感度解析を行った。

The emerging technology of the Autonomous Truck Mounted Attenuator (ATMA), a leader-follower style vehicle system, utilizes connected and automated vehicle capabilities to enhance safety during transportation infrastructure maintenance in work zones. However, the speed difference between ATMA vehicles and general vehicles creates a moving bottleneck that reduces capacity and increases queue length, resulting in additional delays. The different routes taken by ATMA cause diverse patterns of time-varying capacity drops, which may affect the user equilibrium traffic assignment and lead to different system costs. This manuscript focuses on optimizing the routing for ATMA vehicles in a network to minimize the system cost associated with the slow-moving operation. To achieve this, a queuing-based traffic assignment approach is proposed to identify the system cost caused by the ATMA system. A queuing-based time-dependent (QBTD) travel time function, considering capacity drop, is introduced and applied in the static user equilibrium traffic assignment problem, with a result of adding dynamic characteristics. Subsequently, we formulate the queuing-based traffic assignment problem and solve it using a modified path-based algorithm. The methodology is validated using a small-size and a large-size network and compared with two benchmark models to analyze the benefit of capacity drop modeling and QBTD travel time function. Furthermore, the approach is applied to quantify the impact of different routes on the traffic system and identify an optimal route for ATMA vehicles performing maintenance work. Finally, sensitivity analysis is conducted to explore how the impact changes with variations in traffic demand and capacity reduction.

翻訳日:2023-08-14 00:39:36 公開日:2023-07-23

# 自然と機械

Nature and the Machines ( http://arxiv.org/abs/2308.04440v1 )

ライセンス: Link先を確認

Huw Price and Matthew Connolly

(参考訳) 人工知能(AI)は人間に現実的なリスクをもたらすか? 一部の批評家は、この疑問があまりに注目を集めていると感じており、AIの即時的なリスクに関する議論を後押ししたいと考えている。この雑誌では、最近の論説で「今日のAIがリスクを冒す明日のAIの運命について話すのをやめよう」と促されている。我々は、これは本質的な判断の重大な失敗であると主張する。科学では、日常生活と同様に、影響のある俳優が誤りの結果を考えることを期待する。世界有数の科学雑誌として、自然は間違いなく影響力のある俳優であり、特にaiの堅牢な国際規制が欠如している。しかし、このケースでエラーのコストを考慮できなかったことは明らかです。

Does artificial intelligence (AI) pose existential risks to humanity? Some critics feel this question is getting too much attention, and want to push it aside in favour of conversations about the immediate risks of AI. These critics now include the journal Nature, where a recent editorial urges us to 'stop talking about tomorrow's AI doomsday when AI poses risks today.' We argue that this is a serious failure of judgement, on Nature's part. In science, as in everyday life, we expect influential actors to consider the consequences of error. As the world's leading scientific journal, Nature is certainly an influential actor, especially so in the absence of robust global regulation of AI. Yet it has manifestly failed to consider the cost of error in this case.

翻訳日:2023-08-14 00:29:06 公開日:2023-07-23

# 階層型模倣学習による多段ケーブルルーティング

Multi-Stage Cable Routing through Hierarchical Imitation Learning ( http://arxiv.org/abs/2307.08927v3 )

ライセンス: Link先を確認

Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine

(参考訳) 本研究では,複数段階のロボット操作タスクを学習し,ケーブルルーティングに適用するために,ロボットが一連のクリップを通してケーブルをルーティングしなければならない問題について検討する。この設定では、変形可能なオブジェクトの処理、視覚知覚のループのクローズ、タスク全体の完了に成功して実行しなければならない複数のステップからなる拡張動作の処理など、複雑な多段階ロボット操作シナリオを代表する課題が提示される。このような状況下では、時間的に拡張されたタスクを実行するのに十分な割合で成功する各ステージの個々のプリミティブを学習することは、実用的ではない:もし各ステージが成功し、失敗の不可解な確率を持つなら、タスク全体の完了の可能性は無視できる。したがって、このようなマルチステージタスクで成功したコントローラは、障害から回復し、低レベルのコントローラの欠陥を補うために、任意のタイミングでどのコントローラをトリガーするかをスマートに選択したり、リトライしたり、必要に応じて修正アクションを取るかを選択する必要がある。そこで本研究では,下方(運動制御)と上方(シーケンス)の両方のレベルのデモンストレーションから訓練された視覚に基づくポリシーを用いた模倣学習システムについて述べるとともに,この手法をインスタンス化してケーブルルーティングタスクを学習するシステムを提案し,非常に困難なクリップ配置変動に一般化する上で,優れた性能を示す評価を行う。補足ビデオ、データセット、コードはhttps://sites.google.com/view/cableroutingで見ることができる。

We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.

翻訳日:2023-08-06 11:36:53 公開日:2023-07-23

# 自動運転のための交通流シミュレーション

Traffic Flow Simulation for Autonomous Driving ( http://arxiv.org/abs/2307.16762v1 )

ライセンス: Link先を確認

Junfeng Li, Changqing Yan

(参考訳) 交通システムはランダムで複雑な大規模システムであり、実際の交通環境において繰り返しモデリングや制御研究を行うことは困難である。自動運転技術の発展に伴い、自動運転技術の試験・評価の要件がますます高くなってきているため、交通シミュレーションにおけるコンピュータ技術の応用が極めて有効な技術手段となっている。本稿では,マイクロトラフィックフローモデリングに基づいて,セルオートマトンに基づく車両運動モデルと自転車知能理論を採用し,自律車両流れのシミュレーション環境を構築する。自動運転車のアーキテクチャは一般的に認識システム、意思決定システム、制御システムに分けられる。認識システムは一般に多くのサブシステムに分けられ、自動運転車の位置決め、障害物認識、信号の検出と認識、その他のタスクに責任を負う。意思決定システムは通常、経路計画、経路計画、行動選択、行動計画、制御などのタスクに責任を持つ多くのサブシステムに分割される。制御システムは、自動運転車の基礎であり、車両の各制御システムは、バスを介して意思決定システムに接続される必要があり、車両の自律運転を実現するために、意思決定システムによって発行されたバス指示に従って、加速度、ブレーキ度、ステアリング振幅、照明制御その他の運転動作を正確に制御することができる。

A traffic system is a random and complex large system, which is difficult to conduct repeated modelling and control research in a real traffic environment. With the development of automatic driving technology, the requirements for testing and evaluating the development of automatic driving technology are getting higher and higher, so the application of computer technology for traffic simulation has become a very effective technical means. Based on the micro-traffic flow modelling, this paper adopts the vehicle motion model based on cellular automata and the theory of bicycle intelligence to build the simulation environment of autonomous vehicle flow. The architecture of autonomous vehicles is generally divided into a perception system, decision system and control system. The perception system is generally divided into many subsystems, responsible for autonomous vehicle positioning, obstacle recognition, traffic signal detection and recognition and other tasks. Decision systems are typically divided into many subsystems that are responsible for tasks such as path planning, path planning, behavior selection, motion planning, and control. The control system is the basis of the selfdriving car, and each control system of the vehicle needs to be connected with the decision-making system through the bus, and can accurately control the acceleration degree, braking degree, steering amplitude, lighting control and other driving actions according to the bus instructions issued by the decision-making system, so as to achieve the autonomous driving of the vehicle.

翻訳日:2023-08-06 11:21:02 公開日:2023-07-23

# 非構造化医療データからのデータ抽象化のためのゼロショット学習自然言語処理ツールの検証

Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data ( http://arxiv.org/abs/2308.00107v1 )

ライセンス: Link先を確認

Basil Kaufmann, Dallin Busby, Chandan Krushna Das, Neeraja Tillu, Mani Menon, Ashutosh K. Tewari, Michael A. Gorin

(参考訳) 目的: 電子健康記録などのpdf文書に含まれる構造化されていないテキストからデータを抽象化するゼロショット学習自然言語処理(nlp)ツールの開発と検証を記述する。材料と方法: openai の gpt-3.5 モデルに基づくデータ抽象化ツールを開発し、199 個の非同定根治的前立腺切除病理報告から 14 個の特異変数のデータ抽象化を行うための時間からタスク完了までの時間と正確性の観点から3 つの医師の人間抽象化ツールと比較した。レポートは、ベクトル化およびスキャンされたフォーマットでソフトウェアツールによって処理され、データ抽象化に対する光学的文字認識の影響を確立する。このツールは、データの抽象化速度と精度の非偽性に優れていると評価された。結果: 人間の抽象化者は,データ抽象化に1レポートあたり平均101秒を必要とし,その時間は15～284秒であった。比較として、ソフトウェアツールはベクトル化されたレポートを処理するのに平均12.8秒、スキャンされたレポートを処理する平均15.8秒を必要とした(p < 0.001)。 3つの抽象概念の全体としての精度は94.7%、97.8%、96.4%であった。このソフトウェアツールは、ベクトル化されたレポートの全体的な精度は94.2%であり、人間の抽象論者に対して-10%(=0.025ドル)の差で非競合であることが証明された。このツールの精度はスキャンされたレポートで88.7%とわずかに低く、人間の3つのうち2つに非偽性であることが判明した。結論: 開発したゼロショット学習NLPツールは、研究者が人間の抽象体と同等の精度で、かなりの時間を節約できる。タスク固有のモデルトレーニングの必要性がないため、開発されたツールは高度に一般化でき、医学の分野以外でも、さまざまなデータ抽象化タスクに使用できる。

Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($\alpha$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.

翻訳日:2023-08-06 11:12:55 公開日:2023-07-23

# 安全クリティカル自律システムのフレーミング関連

Framing Relevance for Safety-Critical Autonomous Systems ( http://arxiv.org/abs/2307.14355v1 )

ライセンス: Link先を確認

Astrid Rakow

(参考訳) 私たちは、構築された信念を持ち、環境を知覚し、情報を交換する複雑な高度に自律的なシステムを構築する過程にあります。これらのシステムはそれぞれの世界観を構築し、それに基づいて将来の計画、すなわち、将来の予測に基づいて目標を確立するために行動を選択する。通常、これらのシステムは、すべてが関連しない様々な情報源によって提供される膨大な情報に直面している。我々の研究の目的は、現在のミッションにおいて安全クリティカルな自律システムに関連するものを決定するための公式なアプローチを開発することであり、すなわち、ミッション目標を達成するために適切な世界観を構築するのに十分な情報である。

We are in the process of building complex highly autonomous systems that have build-in beliefs, perceive their environment and exchange information. These systems construct their respective world view and based on it they plan their future manoeuvres, i.e., they choose their actions in order to establish their goals based on their prediction of the possible futures. Usually these systems face an overwhelming flood of information provided by a variety of sources where by far not everything is relevant. The goal of our work is to develop a formal approach to determine what is relevant for a safety critical autonomous system at its current mission, i.e., what information suffices to build an appropriate world view to accomplish its mission goals.

翻訳日:2023-07-28 19:11:34 公開日:2023-07-23

# 実用シナリオにおけるマルチビュークラスタリングにおけるノイズビューの副作用の調査と緩和

Investigating and Mitigating the Side Effects of Noisy Views in Multi-view Clustering in Practical Scenarios ( http://arxiv.org/abs/2303.17245v2 )

ライセンス: Link先を確認

Jie Xu, Gang Niu, Xiaolong Wang, Yazhou Ren, Lei Feng, Xiaoshuang Shi, Zheng Zhang, Heng Tao Shen, Xiaofeng Zhu

(参考訳) マルチビュークラスタリング(MvC)は,ラベルの監督なしに,マルチビューデータのカテゴリ構造を探索することを目的とする。複数のビューは単一のビューよりも多くの情報を提供するので、既存のMvCメソッドは十分なパフォーマンスを得ることができる。しかし、実際のシナリオでは、ビューが騒がしい場合、パフォーマンスが著しく低下する可能性がある。本稿ではまず,まず,ノイズの多い視点の欠点を公式に検討し,その問題に対処するための理論的基盤を持つ深層MvC法(MvCAN)を提案する。具体的には、複数のビューにまたがる非共有パラメータと一貫性のないクラスタリング予測を可能にし、ノイズの多いビューの副作用を低減するための新しいMvC目標を提案する。さらに、複数のビューの有用な情報をマイニングするための堅牢な学習目標を生成するために、非パラメトリック反復プロセスが設計されている。理論的解析により、mvcanはマルチビュー一貫性、相補性、ノイズロバスト性を達成することで機能する。最後に、大規模な公開データセットの実験により、MvCANは最先端の手法よりも優れ、ノイズの多いビューの存在に対して堅牢であることが示された。

Multi-view clustering (MvC) aims at exploring category structures among multi-view data without label supervision. Multiple views provide more information than single views and thus existing MvC methods can achieve satisfactory performance. However, their performance might seriously degenerate when the views are noisy in practical scenarios. In this paper, we first formally investigate the drawback of noisy views and then propose a theoretically grounded deep MvC method (namely MvCAN) to address this issue. Specifically, we propose a novel MvC objective that enables un-shared parameters and inconsistent clustering predictions across multiple views to reduce the side effects of noisy views. Furthermore, a non-parametric iterative process is designed to generate a robust learning target for mining multiple views' useful information. Theoretical analysis reveals that MvCAN works by achieving the multi-view consistency, complementarity, and noise robustness. Finally, experiments on extensive public datasets demonstrate that MvCAN outperforms state-of-the-art methods and is robust against the existence of noisy views.

翻訳日:2023-07-26 21:01:04 公開日:2023-07-23

# 継続的学習を超えた深層学習の予測に関する包括的調査

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning ( http://arxiv.org/abs/2307.09218v2 )

ライセンス: Link先を確認

Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang

(参考訳) 蓄積とは、以前取得した情報や知識の喪失または劣化を指す。忘れることに関する既存の調査は、主に継続的学習に焦点を当てているが、深層学習における他の様々な研究領域でよく見られる現象である。ジェネレータシフトによる生成モデルや、クライアント間での不均一なデータ分布によるフェデレーション学習などの研究分野におけるフォーミングの現れ。忘れることへの対処には、古いタスク知識の保持と新しいタスクの迅速な学習のバランス、競合する目標とのタスク干渉の管理、プライバシー漏洩の防止など、いくつかの課題が含まれている。さらに、継続学習に関する既存の調査のほとんどは、忘れが常に有害であると暗黙的に仮定している。対照的に、われわれの調査は、忘れは二重刃の剣であり、プライバシー保護シナリオのような特定のケースで有益で望ましいものだと主張している。より広い文脈で忘れることを検討することで、我々はこの現象をより微妙な理解を示し、その潜在的な利点を浮き彫りにする。この包括的な調査を通じて、忘れを扱ったさまざまな分野のアイデアやアプローチを描き出すことで、潜在的な解決策を明らかにすることを目指している。従来の境界を越えて忘れることを調べることで、将来の作業では、実際のアプリケーションにおける忘れを緩和、活用、あるいは受け入れるための新しい戦略の開発を奨励したいと考えています。様々な研究分野における忘れに関する包括的な論文の一覧は、 \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning} にある。

Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.

翻訳日:2023-07-26 20:13:36 公開日:2023-07-23

# 同期NPA階層と応用

A synchronous NPA hierarchy with applications ( http://arxiv.org/abs/2105.01555v2 )

ライセンス: Link先を確認

Travis B. Russell

(参考訳) 本稿では,同期相関行列の設定に対するnpa階層の適応について述べる。我々の適応は、より小さな証明書と少ない制約を用いて元のnpa階層を改善するが、同期相関を証明するためにしか適用できない。同期量子交換と同期量子相関の集合の特性を復元する。応用として、対称的完全正の演算子値測度と相互に偏りのない基底の最大集合の存在を、適応されたNPA階層の2つの証明書で検証または無効化できることを示す。

We present an adaptation of the NPA hierarchy to the setting of synchronous correlation matrices. Our adaptation improves upon the original NPA hierarchy by using smaller certificates and fewer constraints, although it can only be applied to certify synchronous correlations. We recover characterizations for the sets of synchronous quantum commuting and synchronous quantum correlations. For applications, we show that the existence of symmetric informationally complete positive operator-valued measures and maximal sets of mutually unbiased bases can be verified or invalidated with only two certificates of our adapted NPA hierarchy.

翻訳日:2023-07-26 01:40:03 公開日:2023-07-23

# 実演のない共模倣学習

Co-Imitation Learning without Expert Demonstration ( http://arxiv.org/abs/2103.14823v2 )

ライセンス: Link先を確認

Kun-Peng Ning, Hu Xu, Kun Zhu, Sheng-Jun Huang

(参考訳) 模倣学習は、専門家のデモンストレーションを利用して強化学習の効率を向上させるための主要なアプローチである。しかし、多くの現実のシナリオでは、専門家のデモンストレーションを得るのは非常に高価か、あるいは不可能かもしれない。この課題を克服するために,本稿では,エージェントの過去の優れた経験を専門家のデモンストレーションなしに活用するための,CoIL(Co-Imitation Learning)と呼ばれる新しい学習フレームワークを提案する。具体的には,それぞれのエージェントが交互に環境を探索し,ピアエージェントの経験を生かして,異なるエージェントを訓練する。経験は価値や誤解を招く可能性があるが、我々は各経験の潜在的有用性を価値関数の期待値で見積もることを提案する。これにより、ノイズをフィルタリングしながら、より有用な体験を強調して、エージェント同士を選択的に模倣することができる。様々な課題に対する実験結果から,提案する共励学習フレームワークは,エージェント同士が外部の監督なしに相互に利益を享受できるという有意な優位性を示した。

Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.

翻訳日:2023-07-26 01:39:19 公開日:2023-07-23

# LAnoBERT: BERT Masked Language Modelに基づくシステムログ異常検出

LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model ( http://arxiv.org/abs/2111.09564v3 )

ライセンス: Link先を確認

Yukyung Lee, Jina Kim and Pilsung Kang

(参考訳) コンピュータシステムで生成されたシステムログは、同時に収集され、エラー、侵入、異常行動を決定する基本データとして使用される大規模データを指す。システムログ異常検出の目的は、人間の介入を最小限に抑えながら異常を迅速に特定することである。従来の研究では,様々なログデータを解析器を用いて標準化テンプレートに変換し,アルゴリズムによる異常検出を行った。特に、ログキー内の情報が失われる可能性のあるすべてのログデータに対して、特定のイベントに対応するテンプレートを事前に定義する必要がある。本研究では,自然言語処理性能に優れたbertモデルを用いたパーザフリーシステムログ異常検出手法であるlanobertを提案する。提案手法であるLAnoBERTは,BERTに基づく事前学習手法であるマスク言語モデリングを用いてモデルを学習し,テスト中にログキー毎のマスク言語モデリング損失関数を用いて教師なし学習に基づく異常検出を行う。さらに,実際のシステムに適用可能なパイプラインを構築するための効率的な推論手法を提案する。 HDFS、BGL、Thunderbirdの3つの有名なログデータセットの実験では、LAnoBERTは教師なし学習ベースのベンチマークモデルよりも高い異常検出性能を示しただけでなく、教師なし学習ベースのベンチマークモデルと同等のパフォーマンスを得た。

The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining errors, intrusion and abnormal behaviors. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template using a parser. Particularly, a template corresponding to a specific event should be defined in advance for all the log data using which the information within the log key may get lost. In this study, we propose LAnoBERT, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance. The proposed method, LAnoBERT, learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modeling loss function per log key during the test process. In addition, we also propose an efficient inference process to establish a practically applicable pipeline to the actual system. Experiments on three well-known log datasets, i.e., HDFS, BGL, and Thunderbird, show that not only did LAnoBERT yield a higher anomaly detection performance compared to unsupervised learning-based benchmark models, but also it resulted in a comparable performance with supervised learning-based benchmark models.

翻訳日:2023-07-26 01:31:10 公開日:2023-07-23

# 等価性と推定オントロジーマッチングのための機械学習フレンドリーなバイオメディカルデータセット

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching ( http://arxiv.org/abs/2205.03447v8 )

ライセンス: Link先を確認

Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian, Ian Horrocks

(参考訳) オントロジーマッチング(OM)はバイオインフォマティクスやセマンティックウェブなど多くの分野において重要な役割を担い、特に機械学習(ML)技術の適用によってその研究はますます人気が高まっている。オントロジーアライメント評価イニシアチブ(OAEI)は,OMシステムの体系的評価に多大な努力を払っているものの,サブエミッションマッピングの限定的な評価,最適でない参照マッピング,MLベースのシステム評価の限定的なサポートなど,いくつかの制限に悩まされている。これらの制約に対処するために,Mondo と UMLS から抽出したオントロジーを含む5つの新しいバイオメディカル OM タスクを導入する。各タスクは等価性と仮定マッチングの両方を含み、参照マッピングの品質は人間のキュレーションやオントロジープルーニングなどで保証される。 MLベースのOMシステムと非MLベースのOMシステムの両方において,様々な観点からOM性能を測定するための総合評価フレームワークを提案する。我々は,OAEI 2022における新たなBioMLトラックの一部として,これらのリソースの利用状況を示すため,異なるタイプのOMシステムの評価結果を報告する。

Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.

翻訳日:2023-07-26 01:21:29 公開日:2023-07-23

# 物理的カスケードイベントにおける推論と行動の学習

Learning to reason about and to act on physical cascading events ( http://arxiv.org/abs/2202.01108v2 )

ライセンス: Link先を確認

Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik

(参考訳) 動的環境の推論とインタラクションは、AIの基本的な問題だが、アクションがクロス依存イベントのカスケードをトリガーできると、極めて困難になる。そこで,エージェントが物理的にシミュレートされた動的シーンの映像を提示し,システムが"国的"な目標に達するように,イベントのカスケードを介入して起動するように要求する,"em cascade"と呼ばれる新しい教師付き学習設定を導入する。例えば、エージェントは「青いボールが緑色のボールを押して赤いボールを打つように」依頼される。エージェントの介入は連続空間から引き出され、事象のカスケードはダイナミクスを非常に非線形にする。セマンティックツリー探索とイベント駆動フォワードモデルを組み合わせることで,連続空間におけるセマンティックツリーの探索を学習するアルゴリズムを考案する。提案手法は,これまで見つからなかった複雑な場面に介入する命令を効果的に追従することを学ぶ。観測された事象のカスケードを提供する場合、別の結果も推論できる。

Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events. We introduce a new supervised learning setup called {\em Cascade} where an agent is shown a video of a physically simulated dynamic scene, and is asked to intervene and trigger a cascade of events, such that the system reaches a "counterfactual" goal. For instance, the agent may be asked to "Make the blue ball hit the red one, by pushing the green ball". The agent intervention is drawn from a continuous space, and cascades of events makes the dynamics highly non-linear. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to search in semantic trees in continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in previously unseen complex scenes. It can also reason about alternative outcomes, when provided an observed cascade of events.

翻訳日:2023-07-26 01:18:11 公開日:2023-07-23

# DSTEA: エンティティ適応型事前トレーニングによる対話状態追跡の改善

DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training ( http://arxiv.org/abs/2207.03858v2 )

ライセンス: Link先を確認

Yukyung Lee, Takyoung Kim, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim

(参考訳) 対話状態追跡(DST)は、ユーザとシステム発話を包括的に解釈するために重要であり、それによって効率的な対話システムの基礎を形成する。過去の研究は、モデル構造の変更やグラフ関係などの追加機能の統合によるDSTパフォーマンスの向上に重点を置いていたが、しばしば外部対話コーパスによる事前学習が必要である。本研究では,対話発話におけるキーエンティティを集中的に訓練することにより,エンコーダを強化可能な,エンティティ適応型事前学習による対話状態追跡を改善するDSTEAを提案する。 DSTEAは、オントロジー情報、名前付き認識、paCy、flairライブラリの4つの異なる方法を用いて、これらの重要なエンティティを入力対話から識別する。その後、モデルを効果的に訓練するために選択的知識マスクを用いる。注目すべきは、DSTEAはDSTモデルに追加知識を直接注入することなく、事前学習を必要とすることだ。このアプローチにより、MultiWOZ 2.0, 2.1, 2.2の4つの堅牢DSTモデルの性能が大幅に向上し、目標精度は2.69%(52.41%から55.10%)まで向上した。 DSTEAの有効性のさらなる検証は、様々なエンティティタイプとマスキング戦略やマスキング率などの異なるエンティティ適応事前学習構成を考慮した比較実験によって行われた。

Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue corpora. In this study, we propose DSTEA, improving Dialogue State Tracking via Entity Adaptive pre-training, which can enhance the encoder through by intensively training key entities in dialogue utterances. DSTEA identifies these pivotal entities from input dialogues utilizing four different methods: ontology information, named-entity recognition, the spaCy, and the flair library. Subsequently, it employs selective knowledge masking to train the model effectively. Remarkably, DSTEA only requires pre-training without the direct infusion of extra knowledge into the DST model. This approach resulted in substantial performance improvements of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further validation of DSTEA's efficacy was provided through comparative experiments considering various entity types and different entity adaptive pre-training configurations such as masking strategy and masking rate.

翻訳日:2023-07-26 01:11:10 公開日:2023-07-23

# TF-GNN:TensorFlowのグラフニューラルネットワーク

TF-GNN: Graph Neural Networks in TensorFlow ( http://arxiv.org/abs/2207.03522v2 )

ライセンス: Link先を確認

Oleksandr Ferludin, Arno Eigenwillig, Martin Blais, Dustin Zelle, Jan Pfeifer, Alvaro Sanchez-Gonzalez, Wai Lok Sibon Li, Sami Abu-El-Haija, Peter Battaglia, Neslihan Bulut, Jonathan Halcrow, Filipe Miguel Gon\c{c}alves de Almeida, Pedro Gonnet, Liangze Jiang, Parth Kothari, Silvio Lattanzi, Andr\'e Linhares, Brandon Mayer, Vahab Mirrokni, John Palowitch, Mihir Paradkar, Jennifer She, Anton Tsitsulin, Kevin Villela, Lisa Wang, David Wong, Bryan Perozzi

(参考訳) TensorFlow-GNN(TF-GNN)は、TensorFlowのグラフニューラルネットワークのためのスケーラブルなライブラリである。これは、今日の情報エコシステムで発生する豊富な異種グラフデータの種類をサポートするために、下から設計されている。機械学習の研究者や高度な開発者を可能にすることに加えて、tf-gnnはグラフ学習の幅広い開発者コミュニティに力を与えるローコードソリューションを提供する。 Googleの多くのプロダクションモデルはTF-GNNを使用しており、最近オープンソースプロジェクトとしてリリースされた。本稿では,tf-gnnデータモデル,kerasメッセージパッシングapi,グラフサンプリングや分散トレーニングといった関連する機能について述べる。

TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many production models at Google use TF-GNN, and it has been recently released as an open source project. In this paper we describe the TF-GNN data model, its Keras message passing API, and relevant capabilities such as graph sampling and distributed training.

翻訳日:2023-07-26 01:10:48 公開日:2023-07-23

# パーソナライゼーションのハーム:予測におけるグループ属性の利用を再考する

When Personalization Harms: Reconsidering the Use of Group Attributes in Prediction ( http://arxiv.org/abs/2206.02058v3 )

ライセンス: Link先を確認

Vinith M. Suriyakumar, Marzyeh Ghassemi, Berk Ustun

(参考訳) マシンラーニングモデルは、保護、機密性、自己報告、あるいは取得コストのかかるカテゴリ属性でパーソナライズされることが多い。本研究では,グループ属性でパーソナライズされたモデルがグループレベルでのパフォーマンスを低下させることを示す。予測タスクにおけるグループ属性の「公平な使用」を保証するための形式的条件として,1つの追加モデルを訓練することを提案する。実験的なリスク最小化における公正な使用を保証するための十分な条件を提示し、モデル開発とデプロイメントの標準プラクティスによる公正な使用違反につながる障害モードを特徴付ける。臨床予測タスクにおける公正な使用に関する総合的な実証研究を行う。本研究は, フェアユース違反の実態を実証し, 害を軽減するための簡単な介入を例示するものである。

Machine learning models are often personalized with categorical attributes that are protected, sensitive, self-reported, or costly to acquire. In this work, we show models that are personalized with group attributes can reduce performance at a group level. We propose formal conditions to ensure the "fair use" of group attributes in prediction tasks by training one additional model -- i.e., collective preference guarantees to ensure that each group who provides personal data will receive a tailored gain in performance in return. We present sufficient conditions to ensure fair use in empirical risk minimization and characterize failure modes that lead to fair use violations due to standard practices in model development and deployment. We present a comprehensive empirical study of fair use in clinical prediction tasks. Our results demonstrate the prevalence of fair use violations in practice and illustrate simple interventions to mitigate their harm.

翻訳日:2023-07-26 01:10:11 公開日:2023-07-23

# 学習型ロボットグラスピングのための物理誘導階層的リワード機構

Physics-Guided Hierarchical Reward Mechanism for Learning-Based Robotic Grasping ( http://arxiv.org/abs/2205.13561v3 )

ライセンス: Link先を確認

Yunsik Jung, Lingfeng Tao, Michael Bowman, Jiucai Zhang, Xiaoli Zhang

(参考訳) 学習に基づく把持は、高い計算効率により、多指ロボットハンドのリアルタイム把持動作計画を可能にする。しかし,学習過程において大きな探索空間を探索するためには,学習に基づく手法が必要となる。検索スペースは学習効率を低下させ、それがその実践的採用の主要な障壁となっている。加えて、トレーニングされたポリシーには、オブジェクトがトレーニングされたオブジェクトと同一でない限り、一般的な結果が欠けている。本研究では,学習効率と学習に基づく自律的把握の一般化性を向上させるために,階層的リワード機構を備えた物理誘導型深層強化学習を開発する。従来の観察に基づくグリップラーニングとは異なり、物理インフォームドメトリクスは手の構造と物体の相関関係を伝達し、学習効率と結果を改善する。さらに、階層的な報酬機構により、ロボットは把握タスクの優先順位付けされたコンポーネントを学習することができる。本手法は3本指MICOロボットアームを用いたロボット把握作業において有効である。その結果,ロボットの把握作業において,標準的なDeep Reinforcement Learning法よりも優れていた。

Learning-based grasping can afford real-time grasp motion planning of multi-fingered robotics hands thanks to its high computational efficiency. However, learning-based methods are required to explore large search spaces during the learning process. The search space causes low learning efficiency, which has been the main barrier to its practical adoption. In addition, the trained policy lacks a generalizable outcome unless objects are identical to the trained objects. In this work, we develop a novel Physics-Guided Deep Reinforcement Learning with a Hierarchical Reward Mechanism to improve learning efficiency and generalizability for learning-based autonomous grasping. Unlike conventional observation-based grasp learning, physics-informed metrics are utilized to convey correlations between features associated with hand structures and objects to improve learning efficiency and outcomes. Further, the hierarchical reward mechanism enables the robot to learn prioritized components of the grasping tasks. Our method is validated in robotic grasping tasks with a 3-finger MICO robot arm. The results show that our method outperformed the standard Deep Reinforcement Learning methods in various robotic grasping tasks.

翻訳日:2023-07-26 01:09:55 公開日:2023-07-23

# EchoGNN: グラフニューラルネットワークによる説明可能な射出差分推定

EchoGNN: Explainable Ejection Fraction Estimation with Graph Neural Networks ( http://arxiv.org/abs/2208.14003v2 )

ライセンス: Link先を確認

Masoud Mokhtari, Teresa Tsang, Purang Abolmaesumi, Renjie Liao

(参考訳) エジェクション分画(EF)は心機能の重要な指標であり、心不全などの心機能障害に起因した患者の識別を可能にする。 EFは、左心室を手動で追跡し、その容積を特定のフレームで推定することにより、心エコー(echo)として知られる心エコービデオから推定される。これらの推定は、マニュアルプロセスとビデオ品質の違いにより、オブザーバ間の可変性が高い。このような不正確さの源泉と迅速な評価の必要性は、信頼性と説明可能な機械学習技術を必要とする。本研究では,グラフニューラルネットワーク(GNN)に基づくモデルであるEchoGNNを導入し,エコービデオからEFを推定する。我々のモデルはまず、1つまたは複数のエコーシン系列のフレームから潜時エコーグラフを推測する。次に、このグラフのノードとエッジの重みを推定し、EF推定に役立つ個々のフレームの重要性を示す。 GNN回帰器はこの重み付きグラフを使用してEFを予測する。我々は,学習グラフの重み付けが,人的介入が必要なタイミングを決定するために,EF推定のためのクリティカルフレームの同定を通じて説明可能性を提供することを示す。 EchoNet-DynamicパブリックEFデータセットでは、EchoGNNは、最先端のEF予測のパフォーマンスを達成し、説明可能性を提供する。

Ejection fraction (EF) is a key indicator of cardiac function, allowing identification of patients prone to heart dysfunctions such as heart failure. EF is estimated from cardiac ultrasound videos known as echocardiograms (echo) by manually tracing the left ventricle and estimating its volume on certain frames. These estimations exhibit high inter-observer variability due to the manual process and varying video quality. Such sources of inaccuracy and the need for rapid assessment necessitate reliable and explainable machine learning techniques. In this work, we introduce EchoGNN, a model based on graph neural networks (GNNs) to estimate EF from echo videos. Our model first infers a latent echo-graph from the frames of one or multiple echo cine series. It then estimates weights over nodes and edges of this graph, indicating the importance of individual frames that aid EF estimation. A GNN regressor uses this weighted graph to predict EF. We show, qualitatively and quantitatively, that the learned graph weights provide explainability through identification of critical frames for EF estimation, which can be used to determine when human intervention is required. On EchoNet-Dynamic public EF dataset, EchoGNN achieves EF prediction performance that is on par with state of the art and provides explainability, which is crucial given the high inter-observer variability inherent in this task.

翻訳日:2023-07-26 01:01:36 公開日:2023-07-23

# Frouros: 機械学習システムにおけるドリフト検出のためのPythonライブラリ

Frouros: A Python library for drift detection in machine learning systems ( http://arxiv.org/abs/2208.06868v4 )

ライセンス: Link先を確認

Jaime C\'espedes-Sisniega and \'Alvaro L\'opez-Garc\'ia

(参考訳) FrourosはオープンソースのPythonライブラリで、機械学習システムのドリフトを検出することができる。ドリフト検出のための古典的なアルゴリズムとより最近のアルゴリズムの組み合わせを提供する:概念とデータドリフトの両方である。私たちは、あらゆる機械学習フレームワークと互換性を持たせ、現実世界のユースケースに容易に適応できるように設計しました。このライブラリは、メンテナンスの容易さと拡張性を確保するために、最良の開発と継続的インテグレーションのプラクティスに従って開発されている。ソースコードはhttps://github.com/ifca/frouros.com/で入手できる。

Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection: both concept and data drift. We have designed it with the objective of making it compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.

翻訳日:2023-07-26 01:00:38 公開日:2023-07-23

# 分離のないスパースモーメント問題の効率的なアルゴリズム

Efficient Algorithms for Sparse Moment Problems without Separation ( http://arxiv.org/abs/2207.13008v2 )

ライセンス: Link先を確認

Zhiyuan Fan and Jian Li

(参考訳) 我々は,任意の次元の雑音モーメント情報から高次元空間におけるk$-spike混合の学習におけるスパースモーメント問題を考える。移動距離を用いて学習した混合物の精度を測定する。以前のアルゴリズムは、特定の分離仮定を仮定するか、より多くのリカバリモーメントを使用するか、あるいは(超)指数関数時間で実行する。我々の一次元問題に対するアルゴリズム(スパースハウスドルフモーメント問題とも呼ばれる)は古典的なプロニーの手法の頑健なバージョンであり、我々の貢献は主に解析に関係している。従来の研究(プロニーの手法の中間結果の摂動を解析する)よりも大域的かつより厳密な分析を採用する。有用な技術的要素は、ヴァンダーモンド行列で定義される線形系とシュール多項式の間の接続であり、これは分離とは独立に束縛され、他の文脈で有用である。この高次元問題に取り組むために,まず1次元アルゴリズムと解析を複素数に拡張して2次元問題を解く。高次元の場合のアルゴリズムは、混合の1次元投影とランダムベクトルと混合の2次元投影のセットを整合させることにより、各スパイクの座標を決定する。この結果から,トピックモデルとガウス混合の学習に応用でき,サンプル複雑性の改善や事前作業の時間短縮が期待できる。

We consider the sparse moment problem of learning a $k$-spike mixture in high-dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the one-dimensional problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high-dimensional problem, we first solve the two-dimensional problem by extending the one-dimensional algorithm and analysis to complex numbers. Our algorithm for the high-dimensional case determines the coordinates of each spike by aligning a 1d projection of the mixture to a random vector and a set of 2d projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work.

翻訳日:2023-07-26 00:59:15 公開日:2023-07-23

# GMA3D: シーンフローの蓄積した動きを推定するローカル・グローバル・アテンション学習

GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow ( http://arxiv.org/abs/2210.03296v2 )

ライセンス: Link先を確認

Zhiyang Lu, Ming Cheng

(参考訳) シーンフローは、3dポイント雲内の各ポイントの動き情報を表す。モーションセグメンテーションやオブジェクトトラッキングなど、多くのタスクに適用される、重要な下流手法である。しかしながら、2つの連続した点雲の間には常に閉塞点があり、スパーシティデータサンプリングや実世界の閉塞からである。本稿では,移動物体のセマンティックな自己相似性と動きの整合性によるシーンフローのオクルージョン問題に焦点をあてる。本稿では, 局所的および大域的セマンティックな類似性を利用して, 局所的および大域的非包含点の運動情報から包含点の運動情報を推定し, オフセットアグリゲータを用いてそれらを集約するGMA3Dモジュールを提案する。我々のモジュールは、最初にトランスフォーマーベースのアーキテクチャを適用して、点雲上のシーンフロー閉塞問題を測定する。実験により,GMA3Dはシーンフロー,特に実シーンにおける閉塞問題を解くことができることがわかった。提案手法は,ポイントクラウドデータセットのオクルードバージョンで評価し,実シーンkittiデータセットで最新の結果を得た。また,GMA3Dが非閉塞シーンフローに対してまだ有効であることを示すために,非閉塞バージョンデータセットの実験を行い,FlyThings3DとKITTIで有望な性能を達成した。コードはhttps://anonymous.4open.science/r/gma3d-e100で入手できる。

Scene flow represents the motion information of each point in the 3D point clouds. It is a vital downstream method applied to many tasks, such as motion segmentation and object tracking. However, there are always occlusion points between two consecutive point clouds, whether from the sparsity data sampling or real-world occlusion. In this paper, we focus on addressing occlusion issues in scene flow by the semantic self-similarity and motion consistency of the moving objects. We propose a GMA3D module based on the transformer framework, which utilizes local and global semantic similarity to infer the motion information of occluded points from the motion information of local and global non-occluded points respectively, and then uses an offset aggregator to aggregate them. Our module is the first to apply the transformer-based architecture to gauge the scene flow occlusion problem on point clouds. Experiments show that our GMA3D can solve the occlusion problem in the scene flow, especially in the real scene. We evaluated the proposed method on the occluded version of point cloud datasets and get state-of-the-art results on the real scene KITTI dataset. To testify that GMA3D is still beneficial to non-occluded scene flow, we also conducted experiments on non-occluded version datasets and achieved promising performance on FlyThings3D and KITTI. The code is available at https://anonymous.4open.science/r/GMA3D-E100.

翻訳日:2023-07-26 00:52:41 公開日:2023-07-23

# データ拡張によるグラフ異常検出モデルの一般化性の向上

Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation ( http://arxiv.org/abs/2209.10168v2 )

ライセンス: Link先を確認

Shuang Zhou, Xiao Huang, Ninghao Liu, Fu-Lai Chung, Long-Kai Huang

(参考訳) グラフ異常検出(GAD)は、少数の異常でさえ、良心的なユーザーに大きな脅威をもたらす可能性があるため、重要なタスクである。従来の知識として利用可能なラベルを効果的に活用できる最近の半教師付きGAD法は、教師なし手法よりも優れた性能を実現している。実際には、人々はビジネスを確保するために新しい(サブ)グラフ上の異常を識別する必要があるが、効果的な検出モデルをトレーニングするラベルが欠落している可能性がある。自然なアイデアのひとつは、トレーニング済みのgadモデルをテスト用の新しい(サブ)グラフに直接導入することだ。しかし、既存の半教師付きGAD法は一般化の問題に悩まされており、例えば、よく訓練されたモデルは、同じグラフの見えない領域(つまり、トレーニングではアクセスできない)ではうまく機能しない。それは大きなトラブルを引き起こすかもしれない。本稿では,この現象を基礎として,学習領域グラフと未発見テストグラフの両方の異常を効果的に識別し,潜在的な危険を解消することを目的とした,一般化グラフ異常検出の一般的かつ新しい研究問題を提案する。それでも、限られたラベルしか利用できないため、通常のバックグラウンドはトレーニングとテストデータの違いがあるため、難しい作業です。そこで本研究では,学習データを充実させ,GADモデルの一般化性を高めるために,textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) というデータ拡張手法を提案する。モデル一般化性向上における本手法の有効性を検証する。

Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.

翻訳日:2023-07-26 00:51:10 公開日:2023-07-23

# 画像セグメンテーションのロバスト化に向けて

Towards Robust Referring Image Segmentation ( http://arxiv.org/abs/2209.09554v2 )

ライセンス: Link先を確認

Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, Dacheng Tao

(参考訳) Referring Image Segmentation (RIS)は、テキスト記述に基づいてオブジェクトマスクを出力する基本的な視覚言語タスクである。様々な融合法の設計を含む多くの研究がRISでかなりの進歩を遂げた。本研究では,「もしテキスト記述が間違っていたり誤解を招いたりしたらどうするか」という本質的な質問を探索する。私たちはそのような文を否定的な文と呼ぶ。しかし、RISの既存のソリューションはそのような設定を扱えない。この目的のために,ロバスト参照画像セグメンテーション (R-RIS) という新しいRISの定式化を提案する。正のテキスト入力以外に負の文入力も考慮している。この新しいタスクを容易にするために,既存のrisデータセットを負の文で拡張し,両方の入力を統一的に評価するための新しい指標を提案する。さらに,トークンベースのビジョンと言語融合モジュールを備えたRefSegformerと呼ばれるトランスフォーマーモデルを提案する。我々の設計は、余分な空白トークンを追加することでR-RIS設定に容易に拡張できる。提案したRefSegformerは、RISとR-RISの両方のデータセットで最先端の結果を達成し、両方の設定にしっかりとしたベースラインを確立する。プロジェクトページは \url{https://github.com/jianzongwu/robust-ref-seg} にある。

Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions. Many works have achieved considerable progress for RIS, including different fusion method designs. In this work, we explore an essential question, ``What if the text description is wrong or misleading?'' For example, the described objects are not in the image. We term such a sentence as a negative sentence. However, existing solutions for RIS cannot handle such a setting. To this end, we propose a new formulation of RIS, named Robust Referring Image Segmentation (R-RIS). It considers the negative sentence inputs besides the regular positive text inputs. To facilitate this new task, we create three R-RIS datasets by augmenting existing RIS datasets with negative sentences and propose new metrics to evaluate both types of inputs in a unified manner. Furthermore, we propose a new transformer-based model, called RefSegformer, with a token-based vision and language fusion module. Our design can be easily extended to our R-RIS setting by adding extra blank tokens. Our proposed RefSegformer achieves state-of-the-art results on both RIS and R-RIS datasets, establishing a solid baseline for both settings. Our project page is at \url{https://github.com/jianzongwu/robust-ref-seg}.

翻訳日:2023-07-26 00:50:42 公開日:2023-07-23

# ヘテロケクタスティック分布の神経活動的学習

Neural Active Learning on Heteroskedastic Distributions ( http://arxiv.org/abs/2211.00928v2 )

ライセンス: Link先を確認

Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, Alex Lamb

(参考訳) 最高品質のトレーニングデータを積極的に探せるモデルは、より正確で適応性があり、効率的な機械学習の可能性を秘めている。アクティブな学習テクニックは、分類するのが最も難しい例を好むことが多い。これは均一なデータセットでうまく機能するが、ラベルノイズやヘテロスケダスティック性が異なる複数の分布で実行された場合、破滅的な障害を引き起こす可能性がある。これらのアクティブな学習アルゴリズムは、例えばランダムなラベルを持つ固体カラー画像のような)情報構造を持たない場合でも、よりノイズの多い分布から引き出すことを強く望んでいる。そこで本研究では,これらアクティブ学習アルゴリズムのヘテロセクタスティック分布における破壊的失敗を実証し,これらの障害を軽減するための微調整に基づくアプローチを提案する。さらに,データポイント毎にモデル差スコアリング機能を組み込んだ新しいアルゴリズムを提案し,ノイズの多いサンプルをフィルタリングし,精度を最大化するクリーンサンプルを抽出し,既存のアクティブラーニング手法をヘテロスケクタスティックデータセットで上回らせる手法を提案する。これらの観察とテクニックが実践者にとってすぐに役に立ち、アクティブラーニングアルゴリズムの設計において共通の仮定に挑戦できることを願っている。

Models that can actively seek out the best quality training data hold the promise of more accurate, adaptable, and efficient machine learning. Active learning techniques often tend to prefer examples that are the most difficult to classify. While this works well on homogeneous datasets, we find that it can lead to catastrophic failures when performed on multiple distributions with different degrees of label noise or heteroskedasticity. These active learning algorithms strongly prefer to draw from the distribution with more noise, even if their examples have no informative structure (such as solid color images with random labels). To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures. Further, we propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples that maximize accuracy, outperforming the existing active learning techniques on the heteroskedastic datasets. We hope these observations and techniques are immediately helpful to practitioners and can help to challenge common assumptions in the design of active learning algorithms.

翻訳日:2023-07-26 00:41:42 公開日:2023-07-23

# 対数線形ガードネスとその意義

Log-linear Guardedness and its Implications ( http://arxiv.org/abs/2210.10012v3 )

ライセンス: Link先を確認

Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

(参考訳) 線形性を仮定する神経表現から人間の解釈可能な概念を消去する方法は、扱いやすく有用であることが判明している。しかし、この除去が修正表現で訓練された下流分類器の挙動に与える影響は、完全には理解されていない。本研究では,対数線形ガードドネスの概念を,その表現から直接その概念を予測できない敵に定義し,その意味について検討する。バイナリの場合、ある仮定の下では、下流の対数線形モデルでは消去された概念を復元できないことを示す。しかし,マルチクラス対数線形モデルであるemph{can}が,対数線形ガード性の本質的な限界を下流バイアス緩和手法として指摘し,間接的に概念を回復することを示す。これらの結果は線形消去法の理論的限界に光を当て、神経モデルにおける内因バイアスと外因バイアスの関係についてさらなる研究の必要性を強調した。

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.

翻訳日:2023-07-26 00:39:09 公開日:2023-07-23

# ソフトコントラスト学習とオールインワン分類器を用いた新しいカテゴリー発見の促進

Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier ( http://arxiv.org/abs/2211.11262v3 )

ライセンス: Link先を確認

Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li

(参考訳) 非教師なしドメイン適応(UDA)は、ラベルリッチソースドメインからラベルスカースターゲットドメインへの知識の転送に非常に効果的であることが証明されている。しかし、対象領域に新たなカテゴリを追加することで、open-set domain adaptation (oda) と universal domain adaptation (unda) が開発された。既存のOdaおよびUNDAメソッドは、すべての新しいカテゴリを単一の統一された未知のクラスとして扱い、トレーニング中にそれを検出しようとする。しかし, 領域の分散は, 教師なしデータ拡張において, 比較学習(CL)の有効性に影響を及ぼし, 新たなカテゴリー発見においてモデルが過度に信頼される原因となりうる。これらの課題に対処するため,ODAおよびUNDAタスクに対して,Soft-Contrastive All-in-one Network (SAN) というフレームワークを提案する。 SANには、特徴伝達のためのバックボーンを微調整する新しいデータ拡張ベースのソフトコントラスト学習(SCL)と、新しいクラス発見機能を改善するためのより人間の直感的な分類器が含まれている。 SCL損失は、ドメイン転送タスクで増幅されたデータ拡張ビューノイズ問題の悪影響を弱める。 All-in-One(AIO)分類器は、現在の主流閉集合および開集合分類器の過信問題を克服する。可視化およびアブレーション実験は、提案されたイノベーションの有効性を示す。さらに、織田とUNDAの広範な実験結果から、SANは既存の最先端手法よりも優れていることが示された。

Unsupervised domain adaptation (UDA) has proven to be highly effective in transferring knowledge from a label-rich source domain to a label-scarce target domain. However, the presence of additional novel categories in the target domain has led to the development of open-set domain adaptation (ODA) and universal domain adaptation (UNDA). Existing ODA and UNDA methods treat all novel categories as a single, unified unknown class and attempt to detect it during training. However, we found that domain variance can lead to more significant view-noise in unsupervised data augmentation, which affects the effectiveness of contrastive learning (CL) and causes the model to be overconfident in novel category discovery. To address these issues, a framework named Soft-contrastive All-in-one Network (SAN) is proposed for ODA and UNDA tasks. SAN includes a novel data-augmentation-based soft contrastive learning (SCL) loss to fine-tune the backbone for feature transfer and a more human-intuitive classifier to improve new class discovery capability. The SCL loss weakens the adverse effects of the data augmentation view-noise problem which is amplified in domain transfer tasks. The All-in-One (AIO) classifier overcomes the overconfidence problem of current mainstream closed-set and open-set classifiers. Visualization and ablation experiments demonstrate the effectiveness of the proposed innovations. Furthermore, extensive experiment results on ODA and UNDA show that SAN outperforms existing state-of-the-art methods.

翻訳日:2023-07-26 00:32:37 公開日:2023-07-23

# ホモフレンドリグラフとヘテロフレンドリグラフのためのシングルパスコントラスト学習

Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph ( http://arxiv.org/abs/2211.10890v3 )

ライセンス: Link先を確認

Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, Xiaokui Xiao

(参考訳) 既存のグラフコントラスト学習(gcl)技術では、1つのインスタンスでコントラスト損失を構築するために2つのフォワードパスが必要であり、ノードの特徴の低周波信号を捉えるのに有効である。このような二重パス設計はホモ親和グラフにおいて経験的成功を示しているが、直結したノードが通常異なるラベルを持つヘテロ親和グラフの有効性は分かっていない。加えて、既存のgclアプローチは強力なパフォーマンス保証を提供しない。異種グラフに対するGCLアプローチの不予測性と相まって、実世界の文脈における適用性は限定的である。そして、自然な疑問が生まれます: 性能保証のあるホモフィルグラフとヘテロフィルグラフの両方で機能するGCL法を設計できますか? そこで本研究では,近傍集計により得られた特徴の集中特性について理論的に検討し,その特性に基づく単パスグラフのコントラスト学習損失を導入し,下流課題における損失の最小化のための性能保証を提供する。分析の結果,Single-Pass Graph Contrastive Learning法(SP-GCL)を実装した。経験的に、14のベンチマークデータセットにおいて、sp-gclによって得られた機能は、既存の強力なベースラインと非常に少ない計算オーバーヘッドでマッチしたり、性能を上回ったりすることができる。

Existing graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss, which is effective for capturing the low-frequency signals of node features. Such a dual-pass design has shown empirical success on homophilic graphs, but its effectiveness on heterophilic graphs, where directly connected nodes typically have different labels, is unknown. In addition, existing GCL approaches fail to provide strong performance guarantees. Coupled with the unpredictability of GCL approaches on heterophilic graphs, their applicability in real-world contexts is limited. Then, a natural question arises: Can we design a GCL method that works for both homophilic and heterophilic graphs with a performance guarantee? To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks. As a direct consequence of our analysis, we implement the Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14 benchmark datasets with varying degrees of homophily, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead, which demonstrates the usefulness of our findings in real-world cases.

翻訳日:2023-07-26 00:32:08 公開日:2023-07-23

# 絶対軌道誤差って何が悪いの?

What's Wrong with the Absolute Trajectory Error? ( http://arxiv.org/abs/2212.05376v3 )

ライセンス: Link先を確認

Seong Hun Lee, Javier Civera

(参考訳) 一般的な絶対軌道誤差 (ate) の限界の一つは、異常値に対する感度が高いことである。その結果、少数の外れ値が存在する場合、異常軌道誤差や外れ値数が変化するため、異なる精度を反映することがしばしば発生する。本研究では,再構成されたカメラ軌跡の精度を評価するための代替誤差指標を提案する。筆者らの測度はDTE (Disnalible Trajectory Error) と命名され,(1) 基底軌道と推定軌道をシフトし,両者の幾何的中央値が起点となるように計算した。 2)対応するカメラ配向間の測地距離の和を最小限に抑えるように推定軌道を回転させる。 (3) カメラの中央値から幾何学的中央値までの距離が地上の真理と同じであるような推定軌道をスケールする。 (4)対応するカメラ間の距離を計算し、ウィンナライズし、正規化する。 (5) 平均距離と結果距離の根平均二乗(RMS)の値を取ることによりDTEを得る。この計量は、慣性軌道誤差や外れ値の数が変化するため、軌跡の精度の変化を識別できるという点で、ateの魅力的な代替手段である。また,同様の考え方を用いて,dteと同様の利点を持つ識別可能な回転誤差(dre)という新しい回転誤差測定法を提案する。さらに,測定値の計算に必要なカメラ対マーカ回転の校正を行うための簡易かつ効果的な手法を提案する。我々の手法は広範なシミュレーションによって検証される。

One of the limitations of the commonly used Absolute Trajectory Error (ATE) is that it is highly sensitive to outliers. As a result, in the presence of just a few outliers, it often fails to reflect the varying accuracy as the inlier trajectory error or the number of outliers varies. In this work, we propose an alternative error metric for evaluating the accuracy of the reconstructed camera trajectory. Our metric, named Discernible Trajectory Error (DTE), is computed in five steps: (1) Shift the ground-truth and estimated trajectories such that both of their geometric medians are located at the origin. (2) Rotate the estimated trajectory such that it minimizes the sum of geodesic distances between the corresponding camera orientations. (3) Scale the estimated trajectory such that the median distance of the cameras to their geometric median is the same as that of the ground truth. (4) Compute, winsorize and normalize the distances between the corresponding cameras. (5) Obtain the DTE by taking the average of the mean and the root-mean-square (RMS) of the resulting distances. This metric is an attractive alternative to the ATE, in that it is capable of discerning the varying trajectory accuracy as the inlier trajectory error or the number of outliers varies. Using the similar idea, we also propose a novel rotation error metric, named Discernible Rotation Error (DRE), which has similar advantages to the DTE. Furthermore, we propose a simple yet effective method for calibrating the camera-to-marker rotation, which is needed for the computation of our metrics. Our methods are verified through extensive simulations.

翻訳日:2023-07-26 00:21:16 公開日:2023-07-23

# cellmix:病理画像分類のためのデータ拡張のための汎用インスタンス関係ベース手法

CellMix: A General Instance Relationship based Method for Data Augmentation Towards Pathology Image Classification ( http://arxiv.org/abs/2301.11513v2 )

ライセンス: Link先を確認

Tianyi Zhang, Zhiling Yan, Chunhui Li, Nan Ying, Yanli Lei, Yunlu Feng, Yu Zhao, Guanglei Zhang

(参考訳) 病理画像解析では、高品質な注釈付きサンプルの取得と維持は非常に労働集約的な作業である。この課題を克服するために、従来の前処理データ拡張技術に代わる効果的な方法として混合方式が登場した。しかしながら、これらの手法は、局所特異性、グローバル分布、内部/外部インスタンス関係など、病理画像のユニークな特徴を完全に考慮できていない。これらの特徴をよりよく理解し、貴重な擬似サンプルを作成するために、新しい分布指向インプレースシャッフル手法であるCellMixフレームワークを提案する。病理インスタンスの粒度に基づいてイメージをパッチに分割し、同じバッチ内でシャッフルすることで、新しいサンプルを生成する際にインスタンス間の絶対的な関係を効果的に保存することができる。さらに,学習にインスパイアされた損失駆動型学習戦略を開発し,学習中に摂動や分布関連ノイズを処理し,モデルが拡張データに適応的に適合できるようにする。病理画像分類タスクにおける実験は、7つの異なるデータセット上での最先端(SOTA)性能を示す。このイノベーティブなインスタンス関係中心の手法は、病理画像分類のための一般的なデータ拡張アプローチを通知する可能性がある。関連コードはhttps://github.com/sagizty/cellmixで入手できる。

In pathology image analysis, obtaining and maintaining high-quality annotated samples is an extremely labor-intensive task. To overcome this challenge, mixing-based methods have emerged as effective alternatives to traditional preprocessing data augmentation techniques. Nonetheless, these methods fail to fully consider the unique features of pathology images, such as local specificity, global distribution, and inner/outer-sample instance relationships. To better comprehend these characteristics and create valuable pseudo samples, we propose the CellMix framework, which employs a novel distribution-oriented in-place shuffle approach. By dividing images into patches based on the granularity of pathology instances and shuffling them within the same batch, the absolute relationships between instances can be effectively preserved when generating new samples. Moreover, we develop a curriculum learning-inspired, loss-driven strategy to handle perturbations and distribution-related noise during training, enabling the model to adaptively fit the augmented data. Our experiments in pathology image classification tasks demonstrate state-of-the-art (SOTA) performance on 7 distinct datasets. This innovative instance relationship-centered method has the potential to inform general data augmentation approaches for pathology image classification. The associated codes are available at https://github.com/sagizty/CellMix.

翻訳日:2023-07-26 00:14:13 公開日:2023-07-23

# DetectGPT:確率曲線を用いたゼロショットマシン生成テキスト検出

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature ( http://arxiv.org/abs/2301.11305v2 )

ライセンス: Link先を確認

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn

(参考訳) 大規模言語モデル(LLMs)の普及と普及により,LLM生成テキストの検出を支援するツールへの期待が高まっている。本稿では,そのような検出に有用な llm の確率関数の構造の性質を明らかにする。具体的には、LLMからサンプリングされたテキストがモデルのログ確率関数の負の曲率領域を占める傾向があることを示す。この観察を生かして、与えられたLLMから通路が生成されるかどうかを判断するための新しい曲率ベースの基準を定義する。このアプローチは detectiongpt と呼ばれ、個別の分類器を訓練したり、実文や生成文のデータセットを収集したり、生成されたテキストを明示的にウォーターマークしたりする必要がありません。興味のモデルと他の一般的な事前訓練された言語モデル(例えばT5)からのパスのランダムな摂動によって計算されるログ確率のみを使用する。本研究では,20Bパラメータ GPT-NeoX による偽ニュース記事の検出を,最強ゼロショットベースラインの 0.81 AUROC から DetectGPT の 0.95 AUROC に改善した。コード、データ、その他のプロジェクト情報についてはhttps://ericmitchell.ai/detectgptを参照してください。

The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

翻訳日:2023-07-26 00:13:49 公開日:2023-07-23

# 石油・ガス運転におけるその場水質モニタリング

In-situ Water quality monitoring in Oil and Gas operations ( http://arxiv.org/abs/2301.08800v2 )

ライセンス: Link先を確認

Satish Kumar, Rui Kou, Henry Hill, Jake Lempges, Eric Qian, and Vikram Jayaram

(参考訳) 農業から鉱業、エネルギーに至るまで、水質モニタリングは重要な課題である。石油・ガス事業者が淡水の消費を減らすために活動する中、長期にわたって生鮮・非フレッシュの水資源を積極的に管理することが重要となる。大規模なモニタリングのためには、多くの場所で手動のサンプリングが時間がかかりすぎて持続不可能になり、多くの分散した池、小さな湖、プレイア、湿地が広い範囲に分散している。したがって、衛星による環境モニタリングは大きな可能性を秘めている。既存の衛星ベースの監視研究の多くは、川や海などの大きな水域を監視するためにインデックスベースの手法を使用している。しかし,小池の観測では,小型水域から受信した反射信号が弱すぎて検出できなかった。この課題に対処するために, 反射率の弱い水域における汚染レベルを推定できる新しい水質指標(WQEI)モデルを提案する。私たちの結果は 1)wqeiは,実験室で測定した1200試料の水濁度を示す良質な指標である。 2) 一般に利用可能な衛星データ(LandSat8など)に本手法を適用することにより, 広域で高精度な水質モニタリングを実現することができる。これは、水面貯水池に蓄えられた水の品質を最適化し、非フレッシュ水の即応性と可用性を高めるためのツールを提供する。

From agriculture to mining, to energy, surface water quality monitoring is an essential task. As oil and gas operators work to reduce the consumption of freshwater, it is increasingly important to actively manage fresh and non-fresh water resources over the long term. For large-scale monitoring, manual sampling at many sites has become too time-consuming and unsustainable, given the sheer number of dispersed ponds, small lakes, playas, and wetlands over a large area. Therefore, satellite-based environmental monitoring presents great potential. Many existing satellite-based monitoring studies utilize index-based methods to monitor large water bodies such as rivers and oceans. However, these existing methods fail when monitoring small ponds-the reflectance signal received from small water bodies is too weak to detect. To address this challenge, we propose a new Water Quality Enhanced Index (WQEI) Model, which is designed to enable users to determine contamination levels in water bodies with weak reflectance patterns. Our results show that 1) WQEI is a good indicator of water turbidity validated with 1200 water samples measured in the laboratory, and 2) by applying our method to commonly available satellite data (e.g. LandSat8), one can achieve high accuracy water quality monitoring efficiently in large regions. This provides a tool for operators to optimize the quality of water stored within surface storage ponds and increasing the readiness and availability of non-fresh water.

翻訳日:2023-07-26 00:12:51 公開日:2023-07-23

# clipter: シーンのテキスト認識で大きな画像を見る

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition ( http://arxiv.org/abs/2301.07464v2 )

ライセンス: Link先を確認

Aviad Aberdam, David Bensa\"id, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman

(参考訳) 現実世界のシナリオでテキストを読むには、周囲の状況を理解する必要がある。しかし、現在のシーンのテキスト認識者は、切り抜かれたテキスト画像を操作するとき、より大きな画像に気づいていない。本研究では,CLIPのような現代視覚言語モデルの代表的能力を利用して,作物認識者にシーンレベルの情報を提供する。視覚言語モデルから得られた画像全体のリッチな表現と,ゲート型クロスアテンション機構による認識者単語レベルの特徴を融合することにより,これを実現する。このコンポーネントは徐々にコンテキスト強調表現に移行し、事前訓練された認識器の安定した微調整を可能にする。本稿では,モデル非依存のフレームワークであるclipter (clip text recognition) の有効性を示し,複数のベンチマークで最新の結果を得る。さらに,語彙外単語に対するロバスト性の向上と,低データ体制における一般化の強化も強調した。

Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.

翻訳日:2023-07-26 00:12:29 公開日:2023-07-23

# キャプションで裏切られた:open vocabularyインスタンスセグメンテーションのための共同キャプショングラウンドと生成

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation ( http://arxiv.org/abs/2301.00805v2 )

ライセンス: Link先を確認

Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy

(参考訳) 本研究では,オープン語彙のインスタンスセグメンテーションに着目し,セグメンテーションモデルを拡張して,インスタンスレベルの新規カテゴリを分類・分割する。従来のアプローチでは、大量のキャプションデータセットと複雑なパイプラインを使用して、キャプション内の画像領域と単語間の1対1のマッピングを確立してきた。しかし、このような手法は、形容詞や動詞などの画像領域に非可視的な単語をマッチングすることで、ノイズの多い監視を構築する。一方、文脈語は、新しいカテゴリーと高い相関関係を示すため、新しい対象の存在を推測する上でも重要である。このような制約を克服するため、学習効率を向上させるために、一致したオブジェクト名詞にのみ焦点をあてる新しい接地損失を取り入れた、共同で \textbf{Caption Grounding and Generation (CGG) フレームワークを考案した。また,接地損失の補足として,追加の監督と文脈モデリングを可能にするキャプション生成ヘッドを導入する。解析と結果から,新たな授業のセグメンテーション性能を大幅に向上させ,グラウンドディングとジェネレーションコンポーネントが相互に補完することを示す。 OVIS(Open Vocabulary Instance Segmentation)とOSPS(Open Set Panoptic Segmentation)の2つの設定によるCOCOデータセットの実験は、CGGの優位性を示している。特に、CGGはOVISタスクの余分なデータなしで新規クラスの6.8% mAPを大幅に改善し、OSPSベンチマークでは新しいクラスの15%のPQ改善を実現している。

In this work, we focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories. Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and words in captions. However, such methods build noisy supervision by matching non-visible words to image regions, such as adjectives and verbs. Meanwhile, context words are also important for inferring the existence of novel objects as they show high inter-correlations with novel categories. To overcome these limitations, we devise a joint \textbf{Caption Grounding and Generation (CGG)} framework, which incorporates a novel grounding loss that only focuses on matching object nouns to improve learning efficiency. We also introduce a caption generation head that enables additional supervision and contextual modeling as a complementation to the grounding loss. Our analysis and results demonstrate that grounding and generation components complement each other, significantly enhancing the segmentation performance for novel classes. Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG. Specifically, CGG achieves a substantial improvement of 6.8% mAP for novel classes without extra data on the OVIS task and 15% PQ improvements for novel classes on the OSPS benchmark.

翻訳日:2023-07-26 00:10:50 公開日:2023-07-23

# グラフ生成からグラフ分類へ

From Graph Generation to Graph Classification ( http://arxiv.org/abs/2302.07989v3 )

ライセンス: Link先を確認

Oliver Schulte

(参考訳) 本稿では,グラフ生成モデル(GGM)を利用したグラフの分類手法について述べる。グラフとそのクラスラベル上の合同確率分布を定義する ggm を仮定すると、私はグラフが与えられたクラスラベルの確率の分類公式を導出する。新しい条件付きelboは、識別のための生成グラフオートエンコーダモデルを訓練するために使用できる。生成モデルを分類に活用することは、非関係データ、すなわちデータに対してよく研究されているが、我々の知識では、グラフ分類に対する新しいアプローチである。

This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leveraging generative models for classification has been well explored for non-relational i.i.d. data, to our knowledge it is a novel approach to graph classification.

翻訳日:2023-07-26 00:02:31 公開日:2023-07-23

# 測定交替イジング量子臨界

Measurement-altered Ising quantum criticality ( http://arxiv.org/abs/2302.04325v3 )

ライセンス: Link先を確認

Sara Murciano, Pablo Sala, Yue Liu, Roger S. K. Mong and Jason Alicea

(参考訳) 量子臨界系は、摂動に自然に敏感なため、新しい測定によって引き起こされる現象を探索するための魅力的なプラットフォームを構成する。数値測定が量子臨界鎖のパラダイム的イジングに与える影響を明示的なプロトコルを用いて検討し,相関したアンシラが臨界鎖と絡み合って投影的に測定されることを示した。広範囲な数値シミュレーションによって支持される摂動解析フレームワークを用いて, 測定値がエンタングゲートの選択, 基数, 測定結果, 基数に依存する方法で, 長距離相関を定性的に変化させることができることを実証した。測定結果における相関の挙動を定量的に予測し,測定平均値における測定交替イジング臨界性を検出するための2つの手法を同定した。まず、測定結果に対するオーダーパラメータ期待値の2乗平均化は、平均的なオーダーパラメータ自体が消滅しても、一定の測定結果で発芽したオーダーパラメータ凝縮の記憶を保持する。第二に、ある場合において、異なる対称性分野に属する測定結果よりも観測可能度を個別に評価できることを示し、これらの「対称性解決平均」は、標準線形平均化観測可能度を考慮しても測定効果を明らかにする。対称解法の平均値とポスト選択法を実験的に合理的に追求できる相補的レジームを同定し,前者は十分弱いアンシラ臨界鎖の絡み合いの限界において後者よりも優れていた。我々のフレームワークは自然に、よりエキゾチックな量子臨界点に適応し、NISQハードウェアやRydberg配列での実験的な実現の可能性を強調する。

Quantum critical systems constitute appealing platforms for the exploration of novel measurement-induced phenomena due to their innate sensitivity to perturbations. We study the impact of measurement on paradigmatic Ising quantum critical chains using an explicit protocol, whereby correlated ancilla are entangled with the critical chain and then projectively measured. Using a perturbative analytic framework supported by extensive numerical simulations, we demonstrate that measurements can qualitatively alter long-distance correlations in a manner dependent on the choice of entangling gate, ancilla measurement basis, measurement outcome, and nature of ancilla correlations. We derive numerous quantitative predictions for the behavior of correlations in select measurement outcomes, and also identify two strategies for detecting measurement-altered Ising criticality in measurement-averaged quantities. First, averaging the square of the order-parameter expectation value over measurement outcomes retains memory of order parameter condensation germinated in fixed measurement outcomes -- even though on average the order parameter itself vanishes. Second, we show that, in certain cases, observables can be averaged separately over measurement outcomes residing in distinct symmetry sectors, and that these `symmetry-resolved averages' reveal measurement effects even when considering standard linearly averaged observables. We identify complementary regimes in which symmetry-resolved averages and post-selection can be pursued reasonably efficiently in experiment, with the former generically outperforming the latter in the limit of sufficiently weak ancilla-critical chain entanglement. Our framework naturally adapts to more exotic quantum critical points and highlights opportunities for potential experimental realization in NISQ hardware and in Rydberg arrays.

翻訳日:2023-07-26 00:01:50 公開日:2023-07-23

# 効率の良い勾配値推定に向けて

Toward Efficient Gradient-Based Value Estimation ( http://arxiv.org/abs/2301.13757v3 )

ライセンス: Link先を確認

Arsalan Sharifnassab, Richard Sutton

(参考訳) 強化学習における値推定法は安定性がよいが,時間差(TD)学習法よりもかなり遅いのが一般的である。この遅さの根本原因を考察し,平均正方形ベルマン誤差 (msbe) が条件数が大きいという意味では不条件損失関数であることを示した。グラデーションベース法におけるmsbeの低条件化の悪影響を解決するため,ガウス・ニュートン方向にほぼ従い,パラメータ化に漸近的にロバストな低複雑性バッチフリー近位法を提案する。 RANSと呼ばれる本アルゴリズムは, 計算複雑性がほぼ同じでありながら, 残留勾配法よりもかなり高速であるという意味で効率的であり, テストした古典的問題に対してTDと競合する。

Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.

翻訳日:2023-07-26 00:00:02 公開日:2023-07-23

# menucraft: 大きな言語モデルによるインタラクティブなメニューシステム設計

MenuCraft: Interactive Menu System Design with Large Language Models ( http://arxiv.org/abs/2303.04496v2 )

ライセンス: Link先を確認

Amir Hossein Kargaran, Nafiseh Nikeghbal, Abbas Heydarnoori and Hinrich Sch\"utze

(参考訳) メニューシステム設計は多くの設計オプションと様々なヒューマンファクターを含む課題である。例えば、デザイナーが考慮する必要がある重要な要素はメニューコマンドの意味的かつ体系的な関係である。しかし、利用可能なリソースが限られているため、これらの関係を捉えることは困難である。ニューラル言語モデルの進歩により、大きな言語モデルはメニューシステムの設計と精錬において、既存の膨大な知識を利用することができる。本稿では,メニューデザインのためのai支援デザイナーであるメニュークラフトを提案する。 MenuCraftはインタラクティブな言語ベースのメニューデザインツールで、メニューデザインプロセスをシンプルにし、デザインオプションを簡単にカスタマイズできる。 menucraftはダイアログを通じてさまざまなインタラクションをサポートし、ゼロ/フェーショット学習を実行できる。

Menu system design is a challenging task involving many design options and various human factors. For example, one crucial factor that designers need to consider is the semantic and systematic relation of menu commands. However, capturing these relations can be challenging due to limited available resources. With the advancement of neural language models, large language models can utilize their vast pre-existing knowledge in designing and refining menu systems. In this paper, we propose MenuCraft, an AI-assisted designer for menu design that enables collaboration between the designer and a dialogue system to design menus. MenuCraft offers an interactive language-based menu design tool that simplifies the menu design process and enables easy customization of design options. MenuCraft supports a variety of interactions through dialog that allows performing zero/few-shot learning.

翻訳日:2023-07-25 23:53:33 公開日:2023-07-23

# 3次元点雲における開ボキャブラリーアフォーアンス検出

Open-Vocabulary Affordance Detection in 3D Point Clouds ( http://arxiv.org/abs/2303.02401v5 )

ライセンス: Link先を確認

Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen

(参考訳) 加速度検出は様々なロボット応用において難しい問題である。従来のアフォーアンス検出手法は、予め定義されたアフォーアンスラベルに制限されており、複雑な動的環境でのインテリジェントロボットの適応性を制限する可能性がある。そこで,本稿では,3次元点雲内の無拘束数を検出できるopen-vocabulary affordance detection (openad)法を提案する。 OpenADは、手当テキストとポイント特徴を同時に学習することで、手当間の意味的関係をうまく活用する。したがって,提案手法はゼロショット検出が可能であり,単一アノテーションの例を使わずに,事前の認識不能を検出できる。集中的な実験結果から,OpenADは幅広いアベイランス検出装置で効果的に機能し,他のベースラインよりも大きなマージンで優れていた。さらに,高速な推論速度(約100ms)を持つ実世界のロボットアプリケーションにおいて,提案するOpenADの実用性を示す。私たちのプロジェクトはhttps://openad2023.github.ioで利用可能です。

Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.

翻訳日:2023-07-25 23:53:03 公開日:2023-07-23

# 散逸キャットキュービット用高忠実性ゼノゲートの設計

Designing High-Fidelity Zeno Gates for Dissipative Cat Qubits ( http://arxiv.org/abs/2303.00760v3 )

ライセンス: Link先を確認

Ronan Gautier, Mazyar Mirrahimi, Alain Sarlette

(参考訳) 誘導二光子散逸で安定化されたボソニック・キャット量子ビットは指数的にバイアスのあるノイズを持つシステムであり、低オーバーヘッド、フォールトトレラント、普遍量子コンピューティングへの扉を開く。しかし、そのような量子ビットに対する現在のゲート提案は、関連する実験パラメータによるスケーリングが不十分な非保護型のノイズをかなり引き起こす。そこで本研究では,2光子偏光の設計に用いるリザーバモードを再考し,ゲート誘起誤差の軽減にどのように活用できるかを示すことにより,放散猫量子ビットに対する新たな視点を提案する。そこで我々は,高忠実度および偏りを保った猫キュービットゲートの4つの新しい設計を導入し,これらを一般的なゲート方式と比較した。これら4つの設計は、異なる相補的なアイデアを持つ散逸系のためのゲートエンジニアリングの概要を提供する。特に,すでに達成可能な低エラーゲート設計と長期実装を提案する。

Bosonic cat qubits stabilized with a driven two-photon dissipation are systems with exponentially biased noise, opening the door to low-overhead, fault-tolerant and universal quantum computing. However, current gate proposals for such qubits induce substantial noise of the unprotected type, whose poor scaling with the relevant experimental parameters limits their practical use. In this work, we provide a new perspective on dissipative cat qubits by reconsidering the reservoir mode used to engineer the tailored two-photon dissipation, and show how it can be leveraged to mitigate gate-induced errors. Doing so, we introduce four new designs of high-fidelity and bias-preserving cat qubit gates, and compare them to the prevalent gate methods. These four designs should give a broad overview of gate engineering for dissipative systems with different and complementary ideas. In particular, we propose both already achievable low-error gate designs and longer-term implementations.

翻訳日:2023-07-25 23:52:45 公開日:2023-07-23

# 知識コンパイルによるニューラルネットワーク分類器のシャップ説明スコアの効率的な計算

Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation ( http://arxiv.org/abs/2303.06516v3 )

ライセンス: Link先を確認

Leopoldo Bertossi and Jorge E. Leon

(参考訳) Shapスコアの使用は、Explainable AIで広く使われている。しかし、特にニューラルネットワークのようなブラックボックスの分類器で処理された場合、計算は一般には難解である。最近の研究では、Shapを効率的に計算できるオープンボックスブール回路分類器のクラスが明らかにされている。効率的なシェープ計算のために,二進ニューラルネットワークをそれらの回路に変換する方法を示し,論理に基づく知識コンパイル手法を用いる。私たちの実験で示しているように、パフォーマンスの向上は巨大です。

The use of Shap scores has become widespread in Explainable AI. However, their computation is in general intractable, in particular when done with a black-box classifier, such as neural network. Recent research has unveiled classes of open-box Boolean Circuit classifiers for which Shap can be computed efficiently. We show how to transform binary neural networks into those circuits for efficient Shap computation.We use logic-based knowledge compilation techniques. The performance gain is huge, as we show in the light of our experiments.

翻訳日:2023-07-25 23:40:41 公開日:2023-07-23

# PDPP:教育ビデオにおけるプロシージャ計画のための拡散計画

PDPP:Projected Diffusion for Procedure Planning in Instructional Videos ( http://arxiv.org/abs/2303.14676v2 )

ライセンス: Link先を確認

Hanlin Wang, Yilu Wu, Sheng Guo, Limin Wang

(参考訳) 本稿では,非構造化映像における現状の視覚的観察から目標指向の計画を作成することを目的とした,指導ビデオにおける手順計画の問題について検討する。以前の研究は、この問題をシーケンス計画問題として位置づけ、重い中間視覚観察または自然言語指示を監督として活用し、複雑な学習スキームと高価なアノテーションコストを生み出した。対照的に,この問題は分布適合問題として扱われる。この意味では, 拡散モデル(pdpp)を用いて, 中間動作列分布全体をモデル化し, この分布から計画問題をサンプリングプロセスに変換する。さらに,コストのかかる中間監督を除去し,代わりに指導ビデオからのタスクラベルを監督として使用する。我々のモデルはU-Netに基づく拡散モデルであり、学習した分布からのアクションシーケンスを与えられた開始と終了の観測で直接サンプリングする。さらに,学習およびサンプリング過程において,モデルに対して正確な条件付きガイドを提供するための効率的なプロジェクション手法を適用した。異なるスケールの3つのデータセットで実験したところ、PDPPモデルはタスクの監督なしに複数のメトリクスで最先端のパフォーマンスを達成できることがわかった。コードとトレーニングされたモデルはhttps://github.com/MCG-NJU/PDPPで入手できる。

In this paper, we study the problem of procedure planning in instructional videos, which aims to make goal-directed plans given the current visual observations in unstructured real-life videos. Previous works cast this problem as a sequence planning problem and leverage either heavy intermediate visual observations or natural language instructions as supervision, resulting in complex learning schemes and expensive annotation costs. In contrast, we treat this problem as a distribution fitting problem. In this sense, we model the whole intermediate action sequence distribution with a diffusion model (PDPP), and thus transform the planning problem to a sampling process from this distribution. In addition, we remove the expensive intermediate supervision, and simply use task labels from instructional videos as supervision instead. Our model is a U-Net based diffusion model, which directly samples action sequences from the learned distribution with the given start and end observations. Furthermore, we apply an efficient projection method to provide accurate conditional guides for our model during the learning and sampling process. Experiments on three datasets with different scales show that our PDPP model can achieve the state-of-the-art performance on multiple metrics, even without the task supervision. Code and trained models are available at https://github.com/MCG-NJU/PDPP.

翻訳日:2023-07-25 23:34:04 公開日:2023-07-23

# ハイブリッドCNN-RNNの重ね合わせを用いた構造振動信号復調

Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN ( http://arxiv.org/abs/2303.11413v4 )

ライセンス: Link先を確認

Youzhi Liang, Wen Liang, Jianguo Jia

(参考訳) 振動信号は, 構造的健康モニタリング, 故障診断, 損傷検出など, 様々な工学的目的に利用され, 構造物の状態や整合性に関する貴重な情報を提供するようになっている。近年,生物工学の分野では振動信号の利用が増加している。活動誘発構造振動、特にフットステップによる信号は、人体や動物などの生体系の運動を分析するのに役立ち、個人の歩行、体重、姿勢に関する貴重な情報を提供し、健康モニタリング、セキュリティ、人間とコンピュータの相互作用のための魅力的なツールとなる。しかし、様々なノイズの存在は、フットステップによる信号解析の精度を損なう可能性がある。本稿では、複数の信号のアンサンブルと、再帰的および畳み込み型ニューラルネットワークの予測の両方を利用する新しいアンサンブルモデルを提案する。提案モデルは,前処理,ハイブリッドモデリング,アンサンブルの3段階からなる。プリプロセッシング段階では、高速フーリエ変換とウェーブレット変換を用いて特徴を抽出し、系の物理に支配されたダイナミクスを捉え、空間的および時間的特徴を抽出する。ハイブリッドモデリング段階では、fft結果と連結されたノイズ信号に双方向lstmを用い、cnnを用いて信号の凝縮特徴表現を得る。アンサンブル段階では、完全に接続されたニューラルネットワークの3つの層を用いて最終識別信号を生成する。提案モデルでは,PSNR,SNR,WMAPEを用いて,広帯域の雑音レベルのアルゴリズムよりも優れる構造振動信号に関する課題に対処する。

Vibration signals have been increasingly utilized in various engineering fields for analysis and monitoring purposes, including structural health monitoring, fault diagnosis and damage detection, where vibration signals can provide valuable information about the condition and integrity of structures. In recent years, there has been a growing trend towards the use of vibration signals in the field of bioengineering. Activity-induced structural vibrations, particularly footstep-induced signals, are useful for analyzing the movement of biological systems such as the human body and animals, providing valuable information regarding an individual's gait, body mass, and posture, making them an attractive tool for health monitoring, security, and human-computer interaction. However, the presence of various types of noise can compromise the accuracy of footstep-induced signal analysis. In this paper, we propose a novel ensemble model that leverages both the ensemble of multiple signals and of recurrent and convolutional neural network predictions. The proposed model consists of three stages: preprocessing, hybrid modeling, and ensemble. In the preprocessing stage, features are extracted using the Fast Fourier Transform and wavelet transform to capture the underlying physics-governed dynamics of the system and extract spatial and temporal features. In the hybrid modeling stage, a bi-directional LSTM is used to denoise the noisy signal concatenated with FFT results, and a CNN is used to obtain a condensed feature representation of the signal. In the ensemble stage, three layers of a fully-connected neural network are used to produce the final denoised signal. The proposed model addresses the challenges associated with structural vibration signals, which outperforms the prevailing algorithms for a wide range of noise levels, evaluated using PSNR, SNR, and WMAPE.

翻訳日:2023-07-25 23:31:35 公開日:2023-07-23

# Sparse から Precise へ:心内エコー分割術の実際的編集法

From Sparse to Precise: A Practical Editing Approach for Intracardiac Echocardiography Segmentation ( http://arxiv.org/abs/2303.11041v2 )

ライセンス: Link先を確認

Ahmed H. Shahin, Yan Zhuang, Noha El-Zehiry

(参考訳) 心房細動に対する正確なカテーテル・アブレーション法は心内エコー画像(ICE)で心構造を正確に区分けする必要がある。従来の研究では、ICEトランスデューサからの3次元幾何情報を用いて、3次元グリッドに2次元フレームを配置することで、スパースICEボリュームを作成する手法が提案されている。しかし、これらのモデルから得られた3dマスクは不正確であり、氷データやフレームのずれ、心臓の運動による深刻な臨床合併症を引き起こす可能性がある。この問題に対処するために,ユーザが2次元フレームにスクリブルを描画することでセグメンテーション出力を編集できるインタラクティブな編集フレームワークを提案する。ユーザインタラクションを3Dグリッドにマッピングして、前のセグメンテーションをインタラクションから離れて保存しながら、インタラクションの近傍のセグメンテーションを変更する編集ステップを実行する。さらに,従来の編集を妥協することなく,セグメンテーション出力に複数の編集を順次対応させる。本稿では,新しい損失関数と編集専用に設計された新しい評価指標を提案する。クロスバリデーションとテストの結果から,提案する損失関数は,セグメンテーション品質およびユーザ入力後の標準損失およびトレーニング戦略を上回っていることが示唆された。さらに,通常のセグメント化損失とは対照的に,その後の編集が従来の編集を損なわないことを定量的に定性的に示す。全体としては,ユーザのインタラクションから望ましくない変更を回避しつつ,事前に編集した領域の品質を損なうことなくセグメント化の精度を高め,患者の予後を改善する。

Accurate and safe catheter ablation procedures for patients with atrial fibrillation require precise segmentation of cardiac structures in Intracardiac Echocardiography (ICE) imaging. Prior studies have suggested methods that employ 3D geometry information from the ICE transducer to create a sparse ICE volume by placing 2D frames in a 3D grid, enabling training of 3D segmentation models. However, the resulting 3D masks from these models can be inaccurate and may lead to serious clinical complications due to the sparse sampling in ICE data, frames misalignment, and cardiac motion. To address this issue, we propose an interactive editing framework that allows users to edit segmentation output by drawing scribbles on a 2D frame. The user interaction is mapped to the 3D grid and utilized to execute an editing step that modifies the segmentation in the vicinity of the interaction while preserving the previous segmentation away from the interaction. Furthermore, our framework accommodates multiple edits to the segmentation output in a sequential manner without compromising previous edits. This paper presents a novel loss function and a novel evaluation metric specifically designed for editing. Results from cross-validation and testing indicate that our proposed loss function outperforms standard losses and training strategies in terms of segmentation quality and following user input. Additionally, we show quantitatively and qualitatively that subsequent edits do not compromise previous edits when using our method, as opposed to standard segmentation losses. Overall, our approach enhances the accuracy of the segmentation while avoiding undesired changes away from user interactions and without compromising the quality of previously edited regions, leading to better patient outcomes.

翻訳日:2023-07-25 23:31:07 公開日:2023-07-23

# 推定テーマ最適化と統合推定最適化とサンプル平均近似:確率的優位性の観点から

Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective ( http://arxiv.org/abs/2304.06833v2 )

ライセンス: Link先を確認

Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, Yunfan Zhao

(参考訳) データ駆動確率最適化では、最適化タスクに加えて、基盤となる分布のモデルパラメータをデータから推定する必要がある。近年の文献では、最高の経験的客観的性能につながるモデルパラメータを選択することによって、推定と最適化のプロセスを統合することを考える。統合推定最適化(ieo)と呼ばれるこの統合アプローチは、モデルが誤って特定された場合、単純な推定最適化(eto)を上回ることが容易に示せる。本稿では,モデルクラスが十分に特定され,十分なデータがある場合に,逆挙動が現れることを示す。具体的には, 一般の非線形確率最適化問題に対して, モデルクラスが基底真理をカバーしている場合, 単純ETOがIEOの漸近的に優れていることを示す。つまり、後悔の分布全体、すなわち平均や他の瞬間だけでなく、IEOと比べて常にETOの方が良い。結果はまた、決定が観測された特徴に依存する制約付き文脈最適化問題にも適用できる。また, 標準サンプル平均近似 (saa) が, モデルクラスが後悔の観点でよく特定され, 誤特定された場合に最善の場合には, いかに最悪かを実証する。最後に、理論的比較を裏付ける実験結果を提供し、洞察が有限サンプル状態および様々な誤識別の下でいつ保持されるかを示す。

In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.

翻訳日:2023-07-25 23:23:28 公開日:2023-07-23

# 変動境界付近で観測可能な場

Field observables near a fluctuating boundary ( http://arxiv.org/abs/2304.05992v2 )

ライセンス: Link先を確認

Federico Armata, Salvatore Butera, Federico Montalbano, Roberto Passante and Lucia Rizzuto

(参考訳) 本稿では,有限質量の可動導電壁を有するキャビティ内の無質量スカラー場の閉じ込めに関するいくつかの側面について検討し,高調波ポテンシャルによって結合される平衡位置を自由に移動でき,その力学的自由度を量子力学的に記述する。この系は、その平衡位置から可動壁の小さな変位に対して、場とミラーの間の効果的な相互作用ハミルトニアン、場作用素における二次、ミラー作用素における線形によって記述することができる。相互作用,すなわち服装,基底状態において,まず場エネルギー密度などの局所場観測性について考察し,固定壁の場合に対する可動壁を有するキャビティ内の場エネルギー密度の変化と,2つの壁の間の通常のカシミール力の補正について検討する。次に、有限質量の可動壁によって分離された2つの1次元キャビティのケースと、2つのキャビティで定義された2つのマスレススカラー場について検討する。この場合, 2つのキャビティの正方形場間の相関は, 可動壁を媒介とし, 固定壁の場合と異なっていた。

We review several aspects related to the confinement of a massless scalar field in a cavity with a movable conducting wall of finite mass, free to move around its equilibrium position to which it is bound by a harmonic potential, and whose mechanical degrees of freedom are described quantum mechanically. This system, for small displacements of the movable wall from its equilibrium position, can be described by an effective interaction Hamiltonian between the field and the mirror, quadratic in the field operators and linear in the mirror operators. In the interacting, i.e. dressed, ground state, we first consider local field observables such as the field energy density: we evaluate changes of the field energy density in the cavity with the movable wall with respect to the case of a fixed wall, and corrections to the usual Casimir forces between the two walls. We then investigate the case of two one-dimensional cavities separated by a movable wall of finite mass, with two massless scalar fields defined in the two cavities. We show that in this case correlations between the squared fields in the two cavities exist, mediated by the movable wall, at variance with the fixed-wall case.

翻訳日:2023-07-25 23:23:02 公開日:2023-07-23

# ELVIS:モーダル内類似性を考慮した視覚言語事前学習の局所性向上

ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity ( http://arxiv.org/abs/2304.05303v2 )

ライセンス: Link先を確認

Sumin Seo, JaeWoong Shin, Jaewoo Kang, Tae Soo Kim, Thijs Kooi

(参考訳) 深層学習は胸部X線画像(CXR)の読影において放射線技師を支援する大きな可能性を示しているが、パフォーマンス向上のための高価なアノテーションの必要性は、広く臨床応用を妨げている。視覚言語事前学習(VLP)は、大量の無線画像とペア形式(画像テキストペア)の定期的なレポートを活用することで、アノテーションの負担とコストを軽減することができる。さらに、CXRにおけるコンピュータ支援診断(CAD)の異常の正確な局在化の必要性に対処するために、ローカライズ対応VLPの拡張も提案されている。しかし, 局所性を考慮したVLP文献による定式化は, 下流の局所化作業に必要な空間的関係の喪失につながることがわかった。そこで本研究では,VLP の局所性をモダル内類似性に富む ELVIS を提案し,モダル内類似性を認識した VLP を用いて,X線写真やレポート内の局所性をよりよく保存し,テキストレポートにおける位置参照の理解能力を高める。我々の局所性認識型VLP法は,複数のセグメンテーションタスクとMS-CXRフレーズグラウンドタスクにおいて,最先端のアートベースラインを著しく上回る。 ELVISは,従来の手法と比較して,レポートテキストに記述された関心領域によく焦点が当てられており,解釈可能性の向上が期待できる。

Deep learning has shown great potential in assisting radiologists in reading chest X-ray (CXR) images, but its need for expensive annotations for improving performance prevents widespread clinical application. Visual language pre-training (VLP) can alleviate the burden and cost of annotation by leveraging routinely generated reports for radiographs, which exist in large quantities as well as in paired form (image-text pairs). Additionally, extensions to localization-aware VLPs are being proposed to address the needs for accurate localization of abnormalities for computer-aided diagnosis (CAD) in CXR. However, we find that the formulation proposed by locality-aware VLP literature actually leads to a loss in spatial relationships required for downstream localization tasks. Therefore, we propose Empowering Locality of VLP with Intra-modal Similarity, ELVIS, a VLP aware of intra-modal locality, to better preserve the locality within radiographs or reports, which enhances the ability to comprehend location references in text reports. Our locality-aware VLP method significantly outperforms state-of-the art baselines in multiple segmentation tasks and the MS-CXR phrase grounding task. Qualitatively, we show that ELVIS focuses well on regions of interest described in the report text compared to prior approaches, allowing for enhanced interpretability.

翻訳日:2023-07-25 23:22:22 公開日:2023-07-23

# 空洞磁気光学における非相互絡み合い

Nonreciprocal entanglement in cavity-magnon optomechanics ( http://arxiv.org/abs/2305.03325v2 )

ライセンス: Link先を確認

Jiaojiao Chen, Xiao-Gang Fan, Wei Xiong, Dong Wang, Liu Ye

(参考訳) マクロな量子効果を研究するための有望なプラットフォームであるキャビティ光学は、サグネック効果による非相互絡みの研究に広く用いられている。本稿では,マグノンカー効果を用いるハイブリッドキャビティ-マグノン光学系において,マグノン,光子,フォノン間の非相互絡み合いを実現する方法を提案する。我々はカー効果がマグノン周波数シフトと追加の2つのマグノン効果をもたらすことを示す。どちらも正から負まで、マゼクティック場の方向をチューニングすることで調整でき、非相反性に繋がる。マグノン周波数デチューニングや2マグノン効果の係数などのシステムパラメータをチューニングすることにより、二成分および三成分の絡み合いを非相対的に向上させることができる。定義した双方向コントラスト比のさらなる研究により, システム内の非相互性はオン/オフ可能であり, 浴槽温度で操作できることがわかった。本提案は,マグノンカー効果と非相互絡み合いを示す潜在経路を提供するだけでなく,非線形効果を持つハイブリッドキャビティ・マグノン光学系における多種多様な非相互デバイスの設計・設計への道を開く。

Cavity optomechanics, a promising platform to investigate macroscopic quantum effects, has been widely used to study nonreciprocal entanglement with Sagnec effect. Here we propose an alternative way to realize nonreciprocal entanglemment among magnons, photons, and phonons in a hybrid cavity-magnon optomechanics, where magnon Kerr effect is used. We show that the Kerr effect gives rise to a magnon frequency shift and an additional two-magnon effect. Both of them can be tuned from positive to negative via tuning the magectic field direction, leading to nonreciprocity. By tuning system parameters such as magnon frequency detuning or the coefficient of the two-magnon effect, bipartite and tripartite entanglements can be nonreciprocally enhanced. By further studying the defined bidirectional contrast ratio, we find that nonreciprocity in our system can be switch on and off, and can be engineered by the bath temperature. Our proposal not only provides a potential path to demonstrate nonreciprocal entanglement with the magnon Kerr effect, but also opens a direction to engineer and design diverse nonreciprocal devices in hybrid cavity-magnon optomechanics with nonlinear effects.

翻訳日:2023-07-25 23:03:35 公開日:2023-07-23

# メタバースにおける意味コミュニケーションとAI生成コンテンツの統合フレームワーク

A Unified Framework for Integrating Semantic Communication and AI-Generated Content in Metaverse ( http://arxiv.org/abs/2305.11911v2 )

ライセンス: Link先を確認

Yijing Lin, Zhipeng Gao, Hongyang Du, Dusit Niyato, Jiawen Kang, Abbas Jamalipour, Xuemin Sherman Shen

(参考訳) Metaverseが成長を続けるにつれて、効率的なコミュニケーションとインテリジェントなコンテンツ生成の必要性がますます重要になっている。セマンティックコミュニケーションはユーザ入力から意味と理解を伝えることに焦点を当て、AI生成コンテンツは人工知能を使用してデジタルコンテンツと体験を作成する。統合セマンティックコミュニケーションとAI生成コンテンツ(ISGC)は最近多くの注目を集めており、ユーザ入力から意味情報を転送し、デジタルコンテンツを生成し、Metaverseのグラフィックを描画する。本稿では,isgcの資源割当を最適化するための統合ゲインと,目標指向の高品質コンテンツ生成のための協調ゲインと,コミュニケーションとコンテンツの両方の観点からの没入性を改善するための統合フレームワークを提案する。また,既存のisgcソリューションを分類し,isgcの主要コンポーネントを分析し,いくつかのユースケースを示す。次に,拡散モデルに基づくケーススタディを構築し,メタバースにおける意味抽出,コンテンツ生成,グラフィックレンダリングを行うための最適なリソース割当戦略を同定する。最後に,いくつかのオープン研究課題について議論し,isgcとその関連応用の可能性についてさらに検討する。

As the Metaverse continues to grow, the need for efficient communication and intelligent content generation becomes increasingly important. Semantic communication focuses on conveying meaning and understanding from user inputs, while AI-Generated Content utilizes artificial intelligence to create digital content and experiences. Integrated Semantic Communication and AI-Generated Content (ISGC) has attracted a lot of attentions recently, which transfers semantic information from user inputs, generates digital content, and renders graphics for Metaverse. In this paper, we introduce a unified framework that captures ISGC two primary benefits, including integration gain for optimized resource allocation and coordination gain for goal-oriented high-quality content generation to improve immersion from both communication and content perspectives. We also classify existing ISGC solutions, analyze the major components of ISGC, and present several use cases. We then construct a case study based on the diffusion model to identify an optimal resource allocation strategy for performing semantic extraction, content generation, and graphic rendering in the Metaverse. Finally, we discuss several open research issues, encouraging further exploring the potential of ISGC and its related applications in the Metaverse.

翻訳日:2023-07-25 21:16:44 公開日:2023-07-23

# 視覚的接地・自己監督音声モデルにおけるシラブル発見と言語間一般化

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model ( http://arxiv.org/abs/2305.11435v2 )

ライセンス: Link先を確認

Puyuan Peng, Shang-Wen Li, Okko R\"as\"anen, Abdelrahman Mohamed, David Harwath

(参考訳) 本稿では,視座訓練目標を用いた自己教師あり音声モデルの訓練において,音節単位を捉えた表現が出現することを示す。マスク付き言語モデリング損失で訓練されたほぼ同一のモデルアーキテクチャ(HuBERT)が、このような能力を示していないことを実証し、この現象の出現に視覚的基盤が関与していることを示す。本研究では,音声中の音節境界を自動的に予測する最小カットアルゴリズムと,同一音節をグループ化する2段階クラスタリング法を提案する。我々のモデルは、訓練された言語(英語)で最先端の音節セグメンテーション法を上回っているだけでなく、ゼロショット方式でエストニア語に一般化している。最後に,Zerospeech Challengeの他の4言語に対する単語分割タスクに対して,同じモデルでゼロショットの一般化が可能であることを示す。

In this paper, we show that representations capturing syllabic units emerge when training a self-supervised speech model with a visually-grounded training objective. We demonstrate that a nearly identical model architecture (HuBERT) trained with a masked language modeling loss does not exhibit this same ability, suggesting that the visual grounding objective is responsible for the emergence of this phenomenon. We propose the use of a minimum cut algorithm to automatically predict syllable boundaries in speech, followed by a 2-stage clustering method to group identical syllables together. We show that our model not only outperforms a state-of-the-art syllabic segmentation method on the language it was trained on (English), but also generalizes in a zero-shot fashion to Estonian. Finally, we show that the same model is capable of zero-shot generalization for a word segmentation task on 4 other languages from the Zerospeech Challenge, in some cases beating the previous state-of-the-art.

翻訳日:2023-07-25 21:16:00 公開日:2023-07-23

# MGR:マルチジェネレータに基づく合理化

MGR: Multi-generator Based Rationalization ( http://arxiv.org/abs/2305.04492v8 )

ライセンス: Link先を確認

Wei Liu, Haozhao Wang, Jun Wang, Ruixuan Li, Xinyang Li, Yuankai Zhang, Yang Qiu

(参考訳) 合理化は、ジェネレータと予測器を用いて、ジェネレータが入力テキストの人間の知性の部分集合を次の予測器に選択する自己説明型NLPモデルを構築することである。しかし、合理化には2つの重要な課題、すなわち、スプリアス相関とデジェネレーションがあり、予測器は、未熟な訓練済みジェネレータによって選択されたスプリアスまたは無意味なピースを過剰に適合させ、ジェネレータを劣化させる。 2つの課題に対処するために多くの研究が提案されているが、通常は個別に設計されており、どちらも考慮していない。本稿では,この2つの問題を同時に解くために,MGRというシンプルな手法を提案する。 MGRの鍵となる考え方は、実際の部品の発生安定性を改善し、より有意義な部品を予測者に届けるように複数の発電機を採用することである。実験により,MGRは最先端手法と比較してF1スコアを最大20.9%改善することがわかった。コードはhttps://github.com/jugechengzi/Rationalization-MGRで公開されている。

Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization suffers from two key challenges, i.e., spurious correlation and degeneration, where the predictor overfits the spurious or meaningless pieces solely selected by the not-yet well-trained generator and in turn deteriorates the generator. Although many studies have been proposed to address the two challenges, they are usually designed separately and do not take both of them into account. In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems. The key idea of MGR is to employ multiple generators such that the occurrence stability of real pieces is improved and more meaningful pieces are delivered to the predictor. Empirically, we show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods. Codes are available at https://github.com/jugechengzi/Rationalization-MGR .

翻訳日:2023-07-25 21:15:10 公開日:2023-07-23

# 分子ドッキングと機械学習回帰法を用いたCOVID-19 3CLプロテアーゼを標的とした薬物精製

Drug Repurposing Targeting COVID-19 3CL Protease using Molecular Docking and Machine Learning Regression Approach ( http://arxiv.org/abs/2305.18088v5 )

ライセンス: Link先を確認

Imra Aqeel, and Abdul Majid

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは世界的な健康危機を招き、緊急治療を必要としている。薬物の再利用は、時間、コスト、労働を節約できるので、有望な解決策として現れてきた。しかし、新型コロナウイルス治療のために特定された再使用薬物の数はまだ限られており、より効率的で包括的な薬物再利用アプローチが必要である。本研究では,分子ドッキング法と機械学習回帰法を組み合わせた薬物再導入法を用いて,covid-19治療の候補候補を同定することを目的とした。ウイルスの複製における重要な酵素であるSARS-CoV-2の主プロテアーゼ3CLを標的とした5903の薬剤のスクリーニングに,Zincデータベースを利用した。薬物の主プロテアーゼ3CLへの結合親和性を評価するために分子ドッキングを行い、QSARモデリングに機械学習回帰アプローチを用いて高い結合親和性を持つ薬物を同定した。以上の結果から,決定木回帰モデル (dtr) は r2 と rmse の最も優れた統計指標であり,15 kcal/mol から -13 kcal/mol の範囲で 6 種類の有望薬を短縮した。これらの薬剤は、他の研究で既に同定されている1つの抗ウイルス性ZINC203757351化合物を除いて、新規な再精製能を有する。我々はさらに、これらのトップランク選択薬の理化学的および薬物動態的性質と、特定の標的プロテアーゼ3clproに対する最適な結合相互作用を解析した。本研究は、covid-19に対する薬物再導入のための効率的な枠組みを提供し、分子ドッキングと機械学習回帰アプローチを組み合わせることによって、潜在的な治療候補の同定を加速する可能性を実証する。この結果は、世界的な健康上の重要な課題である新型コロナウイルスの効果的な治療法を見つけるという大きな目標に寄与する。

The COVID-19 pandemic has created a global health crisis, with an urgent need for effective treatments. Drug repurposing has emerged as a promising solution, as it can save time, cost, and labor. However, the number of identified repurposed drugs for COVID-19 treatment remains limited, and there is a need for more efficient and comprehensive drug repurposing approaches. In this study, we aimed to identify potential therapeutic candidates for COVID-19 treatment through drug repurposing using a combination of molecular docking and machine learning regression approaches. We utilized the Zinc database to screen 5903 World-approved drugs for their potential to target the main protease 3CL of SARS-CoV-2, which is a key enzyme in the replication of the virus. We performed molecular docking to evaluate the binding affinity of the drugs to the main protease 3CL, and used several machine learning regression approaches for QSAR modeling to identify drugs with high binding affinity. Our results showed that the Decision Tree Regression (DTR) model had the best statistical measures of R2 and RMSE, and we shortlisted six promising drugs within the range of -15 kcal/mol to -13 kcal/mol. These drugs have novel repurposing potential, except for one antiviral ZINC203757351 compound that has already been identified in other studies. We further analyzed the physiochemical and pharmacokinetic properties of these top-ranked selected drugs and their best binding interaction for specific target protease 3CLpro. Our study provides an efficient framework for drug repurposing against COVID-19, and demonstrates the potential of combining molecular docking with machine learning regression approaches to accelerate the identification of potential therapeutic candidates. Our findings contribute to the larger goal of finding effective treatments for COVID-19, which is a critical global health challenge.

翻訳日:2023-07-25 21:06:46 公開日:2023-07-23

# 長文のニューラル自然言語処理:最新技術に関する調査

Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art ( http://arxiv.org/abs/2305.16259v5 )

ライセンス: Link先を確認

Dimitrios Tsirmpas, Ioannis Gkionis, Ioannis Mademlis, Georgios Papadopoulos

(参考訳) ディープニューラルネットワーク(DNN)の採用は、過去10年間で自然言語処理(NLP)に大きな恩恵を受けている。しかし、長文解析の要求は短いテキストの要求とはかなり異なるが、オンラインにアップロードされた文書のサイズが増大すると、長文の自動理解が重要な問題となる。関連するアプリケーションは、自動化されたWebマイニング、法的文書レビュー、医療記録分析、財務報告分析、契約管理、環境影響評価、ニュース集約などである。長い文書を解析するための効率的なアルゴリズムが近年開発されているにもかかわらず、この分野の実践的ツールは現在盛んである。この記事では、この動的ドメインのエントリポイントとして機能し、2つの目的を達成することを目的としています。まず、関連するニューラルネットワーク構築ブロックの概要を提供し、フィールドの簡潔なチュートリアルとして機能する。第二に、ドキュメント分類と文書要約という2つの重要なタスクを中心に、ロングドキュメントnlpにおける現在の最先端の簡単な検証を提供する。典型的には文書分類の特定の事例として扱われるので、長文の感性分析もカバーされている。そこで本稿では,文書レベルの分析の序文として,主な課題,課題,既存ソリューションについて述べる。最後に、この記事は、この分野のさらなる研究を促進するために利用可能な注釈付きデータセットを提示している。

The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long document analysis are quite different from those of shorter texts, while the ever increasing size of documents uploaded on-line renders automated understanding of lengthy texts a critical issue. Relevant applications include automated Web mining, legal document review, medical records analysis, financial reports analysis, contract management, environmental impact assessment, news aggregation, etc. Despite the relatively recent development of efficient algorithms for analyzing long documents, practical tools in this field are currently flourishing. This article serves as an entry point into this dynamic domain and aims to achieve two objectives. Firstly, it provides an overview of the relevant neural building blocks, serving as a concise tutorial for the field. Secondly, it offers a brief examination of the current state-of-the-art in long document NLP, with a primary focus on two key tasks: document classification and document summarization. Sentiment analysis for long texts is also covered, since it is typically treated as a particular case of document classification. Consequently, this article presents an introductory exploration of document-level analysis, addressing the primary challenges, concerns, and existing solutions. Finally, the article presents publicly available annotated datasets that can facilitate further research in this area.

翻訳日:2023-07-25 21:05:34 公開日:2023-07-23

# マイクロ波ドレッシングによるRydberg状態dc偏光率の低減

Reducing Rydberg state dc polarizability by microwave dressing ( http://arxiv.org/abs/2305.15200v3 )

ライセンス: Link先を確認

J.C. Bohorquez, R. Chinnarasu, J. Isaacs, D. Booth, M. Beck, R. McDermott, and M. Saffman

(参考訳) 我々はセシウム原子Rydberg状態のdc偏光率の減少をマイクロ波電場ドレッシングを用いた77K環境で実証した。特に、5,35 GHzから5,1D_{5/2}$の共鳴を持つ5,2P_{3/2}$の偏光性は、低温環境下で超伝導共振器と対向するのに適している。磁気オプティカルトラップ(MOT)損失分光法を用いてライドバーグ状態の偏光性を測定する。 52P_{3/2}$と51D_{5/2}$を結合した非共振性無線周波数(RF)ドレッシング場を用いて、52P_{3/2}$状態のdc偏光性の80$$以上を実証する。実験結果はSherley-Floquetフォーマリズムを用いて開発された原子組立場系の数値モデルとよく一致している。また,dc の偏光性低下は,dc とドレッシング場が整列している場合にはほぼゼロ化可能であるが,直交する場合には2つの偏光性低下の要因しか示さない。これらの結果は、表面近傍に存在する様々なdc場に対するリドベルク共鳴の安定化に役立ち、ハイブリッドリドベルク原子アップコンダクタンス共振器量子ゲートの開発を進展させる。

We demonstrate reduction of the dc polarizability of Cesium atom Rydberg states in a 77 K environment utilizing microwave field dressing. In particular we reduce the polarizability of $52P_{3/2}$ states which have resonances at 5.35 GHz to $51D_{5/2}$, suitable for interfacing Rydberg atoms to superconducting resonators in a cryogenic environment. We measure the polarizability of the Rydberg states using Magneto-Optical-Trap (MOT) loss spectroscopy. Using an off-resonant radio-frequency (RF) dressing field coupling $52P_{3/2}$ and $51D_{5/2}$ we demonstrate a reduction in dc polarizability of the $ 52P_{3/2}$ states over 80$\%$. Experimental findings are in good agreement with a numerical model of the atom-dressing field system developed using the Shirley-Floquet formalism. We also demonstrate that the dc polarizability reduction is highly anisotropic, with near total nulling possible when the dc and dressing fields are aligned, but only a factor of two reduction in polarizability when the fields are orthogonal. These results may aid in stabilizing Rydberg resonances against varying dc fields present near surfaces, enabling advancement in the development of hybrid Rydberg atom - superconducting resonator quantum gates.

翻訳日:2023-07-25 21:04:42 公開日:2023-07-23

# TriMLP: シーケンスレコメンデーションにおけるMLPのようなアーキテクチャの回避

TriMLP: Revenge of a MLP-like Architecture in Sequential Recommendation ( http://arxiv.org/abs/2305.14675v2 )

ライセンス: Link先を確認

Yiheng Jiang, Yuanbo Xu, Yongjian Yang, Funing Yang, Pengyang Wang and Hui Xiong

(参考訳) シークエンシャルレコメンデーション(Sequential recommendation)は、動的嗜好の推論を改善するために、歴史的なユーザ・イテムの対話行動(またはトークンと呼ばれる)のシーケンスをモデル化する。 rnn、cnn、transformerといった改良されたニューラルネットワークアーキテクチャによって、この分野はここ数年で急速にパフォーマンスが向上した。オールMLPモデルの最近の進歩は、過去の行動の変換パターンを学習するために、より少ない計算量であるトークン混合MLPの効率的な方法に光を当てている。しかし,制約のないクロストケン通信を許容し,時系列順序を無視する固有の完全接続設計により,トークン混合mlpを逐次レコメンデーションに直接適用することで性能が低下することがわかった。本稿では、修正された \underline{MLP} がトークンに順序付き相互作用を付与する新しい \underline{Tri}angular Mixer を備えた、純粋な MLP ベースのシーケンシャルレコメンデーションアーキテクチャTriMLPを提案する。 mlpのクロス-トケン相互作用は実際には行列の乗算であるので、三角形のミキサーは重み行列内の低三角ニューロンを落とし、将来のトークンからの接続をブロックし、情報漏洩を防ぎ、標準の自己回帰訓練方式で予測能力を向上させる。細粒度での長期および短期の嗜好を更にモデル化するため、ミキサーは、上述の繊細なmlp、すなわちグローバルおよびローカルミキシングに基づくデュアルブランチ構造を採用し、シーケンシャルな長距離依存性と局所パターンを別々に捉える。 MovieLens、Amazon、Tenrecを含む、さまざまなベンチマークの9つの異なるスケールデータセット(50K\textasciitilde20Mの振る舞いを含む)に関する実証的研究は、TriMLPが有望で安定した精度/効率のトレードオフを実現していることを実証している。

Sequential recommendation models sequences of historical user-item interactive behaviors (or referred as token) to better infer dynamic preferences. Fueled by the improved neural network architectures such as RNN, CNN and Transformer, this field has enjoyed rapid performance boost in the past years. Recent progress on all-MLP models lights on an efficient method with less intensive computation, token-mixing MLP, to learn the transformation patterns among historical behaviors. However, due to the inherent fully-connection design that allows the unrestricted cross-token communication and ignores the chronological order, we find that directly applying token-mixing MLP into sequential recommendation leads to subpar performance. In this paper, we present a purely MLP-based sequential recommendation architecture TriMLP with a novel \underline{Tri}angular Mixer where the modified \underline{MLP} endows tokens with ordered interactions. As the cross-token interaction in MLP is actually matrix multiplication, Triangular Mixer drops the lower-triangle neurons in the weight matrix and thus blocks the connections from future tokens, which prevents information leakage and improves prediction capability under the standard auto-regressive training fashion. To further model long and short-term preferences on fine-grained level, the mixer adopts a dual-branch structure based on the delicate MLP described above, namely global and local mixing, to separately capture the sequential long-range dependencies and local patterns. Empirical study on 9 different scale datasets (contain 50K\textasciitilde20M behaviors) of various benchmarks, including MovieLens, Amazon and Tenrec, demonstrates that TriMLP attains promising and stable accuracy/efficiency trade-off, i.e., averagely surpasses several state-of-the-art baselines by 5.32\% and saves 8.44\% inference time cost.

翻訳日:2023-07-25 21:04:17 公開日:2023-07-23

# CLIPSonic: 未ラベルビデオと事前学習言語ビジョンモデルによる音声合成

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models ( http://arxiv.org/abs/2306.09635v2 )

ライセンス: Link先を確認

Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serr\`a, Taylor Berg-Kirkpatrick, Julian McAuley

(参考訳) 近年,大量のテキスト音声データを用いた音声合成の研究が行われている。しかし,高品質なテキストアノテーションを用いた音声記録は入手が困難である。本研究では,未ラベルビデオと事前学習言語ビジョンモデルを用いて音声合成を行う。視覚モダリティを橋梁として活用し,所望のテキスト音声対応を学習することを提案する。我々は,事前学習されたコントラスト言語画像前訓練(clip)モデルで符号化されたビデオフレームに対して,映像の音声トラックを生成するための条件拡散モデルを訓練する。テスト時には,まずゼロショットモダリティ転送を行い,クリップエンコードされたテキストクエリで拡散モデルを条件付けする。しかし,画像クエリに対する顕著な性能低下が観察された。このギャップを埋めるために,事前学習した拡散事前モデルを採用し,クリップテキスト埋め込みによりクリップ画像埋め込みを生成する。その結果,提案手法の有効性が示され,事前学習した拡散前処理によりモーダリティ伝達ギャップを低減できることがわかった。音声合成に注目する一方で,提案モデルでは画像クエリから音声を生成することが可能であり,主観的聞き取りテストにおいて最先端の画像音声合成モデルと競合する性能を示す。本研究は,ビデオにおける自然な音声-視覚対応と事前学習された言語-視覚モデルのパワーを活用する,テキスト-音声合成への新たな方向を提供する。

Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.

翻訳日:2023-07-25 20:57:20 公開日:2023-07-23

# 密度に基づくクラスタリング手法の検討

A Survey of Some Density Based Clustering Techniques ( http://arxiv.org/abs/2306.09256v2 )

ライセンス: Link先を確認

Rupanka Bhuyan and Samarjeet Borah

(参考訳) 密度ベースのクラスタリングは、データセットから未知のパターンを抽出するためにデータマイニングで使用されるクラスタリングの一種である。 DBSCAN、OPTICS、DENCLUE、VDBSCAN、DVBSCAN、DBCLASD、ST-DBSCANなどの密度ベースのクラスタリング手法がある。本稿では,これらの手法について,その特性,長所,短所,そして最も重要な点として,有用かつ適切なパターンをマイニングするための異なる種類のデータセットへの適用性について検討する。

Density Based Clustering are a type of Clustering methods using in data mining for extracting previously unknown patterns from data sets. There are a number of density based clustering methods such as DBSCAN, OPTICS, DENCLUE, VDBSCAN, DVBSCAN, DBCLASD and ST-DBSCAN. In this paper, a study of these methods is done along with their characteristics, advantages and disadvantages and most importantly, their applicability to different types of data sets to mine useful and appropriate patterns.

翻訳日:2023-07-25 20:56:28 公開日:2023-07-23

# 南フロリダにおける水ステージ予測のための深層学習モデル

Deep Learning Models for Water Stage Predictions in South Florida ( http://arxiv.org/abs/2306.15907v2 )

ライセンス: Link先を確認

Jimeng Shi, Zeda Yin, Rukmangadh Myana, Khandker Ishtiaq, Anupama John, Jayantha Obeysekera, Arturo Leon, Giri Narasimhan

(参考訳) 河川システムにおける水位シミュレーションと予測は,洪水警報,水理操作,洪水軽減に不可欠である。工学分野では、HEC-RAS、MIKE、SWMMといったツールを使用して、詳細な物理に基づく水理・水理計算モデルを構築し、流域全体をシミュレートし、システム内の任意の時点での水ステージを予測する。しかし、これらの物理学に基づくモデルは、特に大きな流域やより長いシミュレーションのために、計算集約的である。この問題を克服するために,我々は複数の深層学習モデル(DL)を代理モデルとして使用し,水ステージを迅速に予測する。南フロリダのマイアミ川の下流は,本論文の事例研究として選択されている。データセットは2010年1月1日から2020年12月31日まで、南フロリダ水管理地区(SFWMD)のDBHYDROデータベースからダウンロードされる。大規模な実験により、DLモデルの性能は極度の降水条件(熱帯嵐)においても物理学に基づくモデルの性能に匹敵することが示された。さらに,予測長の増加に伴うDLモデルの予測精度の低下について検討した。今後の水ステージを予測するため,我々のDLモデルでは,近年の河川系の測定変数と,近い将来に確実に予測できる共変量を用いている。要約すると、ディープラーニングモデルは、物理ベースのモデルと比較して、少なくとも1000倍のスピードアップで、同等またはより良いエラー率を達成する。

Simulating and predicting water levels in river systems is essential for flood warnings, hydraulic operations, and flood mitigations. In the engineering field, tools such as HEC-RAS, MIKE, and SWMM are used to build detailed physics-based hydrological and hydraulic computational models to simulate the entire watershed, thereby predicting the water stage at any point in the system. However, these physics-based models are computationally intensive, especially for large watersheds and for longer simulations. To overcome this problem, we train several deep learning (DL) models for use as surrogate models to rapidly predict the water stage. The downstream stage of the Miami River in South Florida is chosen as a case study for this paper. The dataset is from January 1, 2010, to December 31, 2020, downloaded from the DBHYDRO database of the South Florida Water Management District (SFWMD). Extensive experiments show that the performance of the DL models is comparable to that of the physics-based models, even during extreme precipitation conditions (i.e., tropical storms). Furthermore, we study the decline in prediction accuracy of the DL models with an increase in prediction lengths. In order to predict the water stage in the future, our DL models use measured variables of the river system from the recent past as well as covariates that can be reliably predicted in the near future. In summary, the deep learning models achieve comparable or better error rates with at least 1000x speedup in comparison to the physics-based models.

翻訳日:2023-07-25 20:47:16 公開日:2023-07-23

# オープンボキャブラリ学習に向けて:調査

Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v3 )

ライセンス: Link先を確認

Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao

(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.

翻訳日:2023-07-25 20:46:55 公開日:2023-07-23

# Magic123: 2次元および3次元拡散プリミティブを用いた高品質な3Dオブジェクト生成

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ( http://arxiv.org/abs/2306.17843v2 )

ライセンス: Link先を確認

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

(参考訳) Magic123は、高品質でテクスチャ化された3Dメッシュを2Dと3Dの両方の先行画像から生成する2段階粗いアプローチである。第1段階では、神経放射場を最適化して粗い幾何を生成する。第2段階では、メモリ効率の良い微分可能なメッシュ表現を採用し、視覚的に魅力的なテクスチャを持つ高分解能メッシュを生成する。いずれの段階でも、参照ビューの監督と、2d拡散前処理と3d拡散前処理の組み合わせによる新しいビューによって3dコンテンツが学習される。生成した幾何の探索(より想像力のある)と利用(より正確な)を制御するために, 2D と 3D の先行の1つのトレードオフパラメータを導入する。さらに,テキストインバージョンと単眼深度正規化を用いて,ビュー間の一貫した外観を奨励し,解の退化を防止する。 Magic123は、合成ベンチマークと多様な実世界の画像に関する広範な実験を通じて検証され、従来の画像から3Dへの技術よりも大幅に改善されている。私たちのコード、モデル、生成された3dアセットは、https://github.com/guochengqian/magic123で利用可能です。

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.

翻訳日:2023-07-25 20:38:09 公開日:2023-07-23

# ロジウムフォメートパドルホイール複合体の^<103}$rhnmr分光法と緩和度測定

The $^{103}$Rh NMR Spectroscopy and Relaxometry of the Rhodium Formate Paddlewheel Complex ( http://arxiv.org/abs/2306.17457v2 )

ライセンス: Link先を確認

Harry Harbor Collins, Mohamed Sabba, Gamal Moustafa, Bonifac Legrady, Murari Soundararajan, Markus Leutzsch, Malcolm H. Levitt

(参考訳) 強磁性比の低いスピン-1/2核のNMR分光は、NMR信号強度の低いため困難である。ロージウムフォーメイト「パドルホイール」複素$\mathrm{Rh_2(HCO_2)_4}$の場合、$^{103}$Rh NMRパラメータの迅速取得の手法が示されている。この手法は、$^{1}$h核からの偏光移動によって$^{103}$rh信号強度を増大させ、また、低い$\gamma$原子核を直接観測するための共通のハードルであるリングアーティファクトからの干渉を大幅に低減させる。 $^{103}$rh緩和時間定数$t_1$と$t_2$は、$^{1}$h検出実験を用いて20分以内に測定される。そして、$^{103}$Rh $T_1$のフィールド依存性を測定する。高磁場緩和は化学シフト異方性(CSA)機構によって支配される。 $^{103}$Rh遮蔽異方性は非常に大きい: $|\Delta\sigma|=9900\pm540\mathrm{\,ppm}$。この推定は密度汎関数理論計算と比較される。

The NMR spectroscopy of spin-1/2 nuclei with low gyromagnetic ratio is challenging, due to the low NMR signal strength. Methodology for the rapid acquisition of $^{103}$Rh NMR parameters is demonstrated for the case of the rhodium formate "paddlewheel" complex $\mathrm{Rh_2(HCO_2)_4}$. A scheme is described for enhancing the $^{103}$Rh signal strength by polarization transfer from $^{1}$H nuclei and which also greatly reduces the interference from ringing artifacts, a common hurdle for the direct observation of low-$\gamma$ nuclei. The $^{103}$Rh relaxation time constants $T_1$ and $T_2$ are measured within 20 minutes using $^{1}$H-detected experiments. The field-dependence of the $^{103}$Rh $T_1$ is measured. The high-field relaxation is dominated by the chemical shift anisotropy (CSA) mechanism. The $^{103}$Rh shielding anisotropy is found to be very large: $|\Delta\sigma|=9900\pm540\mathrm{\,ppm}$. This estimate is compared with density functional theory calculations.

翻訳日:2023-07-25 20:37:34 公開日:2023-07-23

# ManimML: アニメーションによる機械学習アーキテクチャのコミュニケーション

ManimML: Communicating Machine Learning Architectures with Animation ( http://arxiv.org/abs/2306.17108v2 )

ライセンス: Link先を確認

Alec Helbling and Duen Horng Chau

(参考訳) 近年、機械学習(ML)への関心が爆発的に高まっている。しかし、ML技術が進歩するにつれて、新しいMLアルゴリズムの説明と視覚化ツールが遅れている。アニメーションは、時間とともに動的に変化するシステムのエンゲージメントな視覚化を実現する強力なツールであることが示されており、MLアルゴリズムの通信タスクに適している。しかし、MLアルゴリズムをアニメーションする現在のアプローチは、特定のアルゴリズムをハイライトするアプリケーションや複雑な一般化されたアニメーションソフトウェアを使用するハンドクラフトである。我々は,コードから直接MLアルゴリズムのアニメーションを生成するオープンソースPythonライブラリManimMLを開発した。我々は,複雑なアニメーションソフトウェアを学習するよりも,ML実践者の既存のプログラミング知識を活用することを試みた。 ManimMLには、Pytorchのような人気のあるディープラーニングフレームワークを模倣するニューラルネットワークを指定するための、よく知られた構文がある。ユーザは、既存のニューラルネットワークアーキテクチャを使用して、manimmlでアニメーションの仕様を簡単に記述することができ、システムのさまざまなコンポーネントのアニメーションをニューラルネットワーク全体の最終的なアニメーションに自動生成する。 ManimMLはオープンソースでhttps://github.com/helblazer811/ManimMLで入手できる。

There has been an explosion in interest in machine learning (ML) in recent years due to its applications to science and engineering. However, as ML techniques have advanced, tools for explaining and visualizing novel ML algorithms have lagged behind. Animation has been shown to be a powerful tool for making engaging visualizations of systems that dynamically change over time, which makes it well suited to the task of communicating ML algorithms. However, the current approach to animating ML algorithms is to handcraft applications that highlight specific algorithms or use complex generalized animation software. We developed ManimML, an open-source Python library for easily generating animations of ML algorithms directly from code. We sought to leverage ML practitioners' preexisting knowledge of programming rather than requiring them to learn complex animation software. ManimML has a familiar syntax for specifying neural networks that mimics popular deep learning frameworks like Pytorch. A user can take a preexisting neural network architecture and easily write a specification for an animation in ManimML, which will then automatically compose animations for different components of the system into a final animation of the entire neural network. ManimML is open source and available at https://github.com/helblazer811/ManimML.

翻訳日:2023-07-25 20:37:14 公開日:2023-07-23

# LaunchpadGPT:Launchpad上の音楽可視化デザイナとしての言語モデル

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad ( http://arxiv.org/abs/2307.04827v2 )

ライセンス: Link先を確認

Siting Xu, Yunlong Tang, Feng Zheng

(参考訳) Launchpadは、照明付きのボタンを押すことで、ユーザーが音楽を作り、演奏できる楽器だ。 launchpadライトエフェクトの設計を補助し、さらに初心者がこの楽器を使って音楽のビジュアライゼーションを行えるようにするために、launchpadgptモデルを提案し、自動的にlaunchpad上での音楽のビジュアライゼーションデザインを生成する。生成能力に優れた言語モデルに基づいて,提案したLaunchpadGPTは音声を入力として,ビデオ形式でLaunchpad-playingの照明効果を出力する(Launchpad-playing video)。我々はLaunchpadプレイングビデオを収集し、それらを処理して音楽とそれに対応するLaunchpadプレイングの動画フレームをプロンプト・コンプリートペアとして取得し、言語モデルを訓練する。実験結果から,提案手法はランダム生成法よりも優れた音楽可視化を実現し,幅広い音楽可視化応用の可能性を示す。私たちのコードはhttps://github.com/yunlong10/LaunchpadGPT/で利用可能です。

Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/.

翻訳日:2023-07-25 20:27:40 公開日:2023-07-23

# AppleとAppleの比較: ユーザレビューからアスペクト対応の比較文を生成する

Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews ( http://arxiv.org/abs/2307.03691v2 )

ライセンス: Link先を確認

Jessica Echterhoff, An Yan, Julian McAuley

(参考訳) 多くの類似の選択肢の中で最良の製品を見つけるのに時間がかかります。比較文は、目立った項目の重要な特徴を強調する方法で、ある項目と他の項目を対比するのに役立ちます。 1つまたは複数の項目のレビューと関連する項目の特徴を考慮し、比較レビュー文を生成し、ユーザーが最適な項目を見つけるのに役立つ。具体的には,変換器内の3つの連続成分からなるモデルについて述べる。 (i)比較対象品目を符号化する商品符号化モジュール (ii)自己回帰的な比較文を生成する比較生成モジュール (iii)ユーザパーソナライズのための新しい復号化方法我々のパイプラインは、流動的で多様な比較文を生成する。我々は、人間の評価研究において、生成した文の関連性と忠実性に関する実験を行い、アルゴリズムが関連する真理のある比較レビュー文を作成することを発見した。

It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three successive components in a transformer: (i) an item encoding module to encode an item for comparison, (ii) a comparison generation module that generates comparative sentences in an autoregressive manner, (iii) a novel decoding method for user personalization. We show that our pipeline generates fluent and diverse comparative sentences. We run experiments on the relevance and fidelity of our generated sentences in a human evaluation study and find that our algorithm creates comparative review sentences that are relevant and truthful.

翻訳日:2023-07-25 20:27:09 公開日:2023-07-23

# 制御理論を満たしたテンソル分解:線形力学系の一般混合の学習

Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems ( http://arxiv.org/abs/2307.06538v2 )

ライセンス: Link先を確認

Ainesh Bakshi, Allen Liu, Ankur Moitra, Morris Yau

(参考訳) 最近、チェンとプアーは線形力学系の混合学習の研究を始めた。線形力学系はすでに時系列データのモデリングに広範囲の応用があるが、混合モデルを用いることで、データに表される下位のサブポピュレーションのよりリッチな理解につながる可能性がある。本研究では、テンソル分解に基づく線形力学系の混合を学習するための新しいアプローチを提案する。その結果,本アルゴリズムは,成分の分離条件が強くなければ成功し,軌道のベイズ最適クラスタリングと競合することができる。さらにアルゴリズムは,部分的観測された設定でも動作する。我々の出発点は、古典的ホカルマンアルゴリズムが潜在変数モデルを学習するための現代のテンソル分解法と密接な関係にあるという単純だが強力な観測である。これにより、より複雑な生成モデルで動作するように拡張するためのプレイブックが提供されます。

Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.

翻訳日:2023-07-25 20:18:09 公開日:2023-07-23

# NetGPT: パーソナライズされた生成サービスの提供を超えて、ネイティブAIネットワークアーキテクチャ

NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services ( http://arxiv.org/abs/2307.06148v2 )

ライセンス: Link先を確認

Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, and Honggang Zhang

(参考訳) 大規模言語モデル(LLM)は、生成情報による日常生活の活性化に大きく成功し、LLMのパーソナライゼーションは、人間の意図との整合性の向上により、その応用にさらに貢献する可能性がある。パーソナライズされた生成サービスに向けて、コラボレーティブなクラウドエッジ方法論は有望に思える。異種分散通信とコンピューティングリソースの効率的なオーケストレーションを促進する。本稿では,複数のクラウドエッジコラボレーション技術の長所と短所を議論した後,そのコンピューティング能力に応じて,適切なllmをエッジとクラウドに適切にデプロイするためにnetgptを展開する。さらに、エッジllmは、パーソナライズされたプロンプト完了のためにロケーションベースの情報を効率的に活用することができ、クラウドllmとのインタラクションの恩恵を受ける。エッジとクラウドに代表的オープンソースLLM(例えばGPT-2ベースとLLaMAモデル)をデプロイした後、低ランク適応に基づく軽量微調整に基づくNetGPTの実現可能性を示す。続いて、ネイティブ人工知能(AI)ネットワークアーキテクチャがNetGPTに必要となる重要な変更を強調し、特に通信とコンピューティングリソースのより深い統合と論理的AIワークフローの慎重な校正に焦点を当てた。さらに,NetGPT の副産物的メリットとして,エッジ LLM がトレンドを予測し,意図を推測する驚くべき能力を備えている。簡単に言うと、NetGPTはパーソナライズされた生成サービスをプロビジョニングする以上の、有望なネイティブAIネットワークアーキテクチャである、ということです。

Large language models (LLMs) have triggered tremendous success to empower daily life by generative information, and the personalization of LLMs could further contribute to their applications due to better alignment with human intents. Towards personalized generative services, a collaborative cloud-edge methodology sounds promising, as it facilitates the effective orchestration of heterogeneous distributed communication and computing resources. In this article, after discussing the pros and cons of several candidate cloud-edge collaboration techniques, we put forward NetGPT to capably deploy appropriate LLMs at the edge and the cloud in accordance with their computing capacity. In addition, edge LLMs could efficiently leverage location-based information for personalized prompt completion, thus benefiting the interaction with cloud LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently, we highlight substantial essential changes required for a native artificial intelligence (AI) network architecture towards NetGPT, with special emphasis on deeper integration of communications and computing resources and careful calibration of logical AI workflow. Furthermore, we demonstrate several by-product benefits of NetGPT, given edge LLM's astonishing capability to predict trends and infer intents, which possibly leads to a unified solution for intelligent network management \& orchestration. In a nutshell, we argue that NetGPT is a promising native-AI network architecture beyond provisioning personalized generative services.

翻訳日:2023-07-25 20:17:28 公開日:2023-07-23

# RepViT: ViTの視点からモバイルCNNを再考

RepViT: Revisiting Mobile CNN From ViT Perspective ( http://arxiv.org/abs/2307.09283v2 )

ライセンス: Link先を確認

Ao Wang, Hui Chen, Zijia Lin, Hengjun Pu, Guiguang Ding

(参考訳) 近年、軽量視覚トランスフォーマ(vits)は、リソース制約のあるモバイルデバイスでの軽量畳み込みニューラルネットワーク(cnns)と比較して優れた性能と低レイテンシを示している。この改善は通常、モデルがグローバル表現を学習できるようにするマルチヘッド自己保持モジュールによるものである。しかし,軽量VTと軽量CNNのアーキテクチャ格差は十分に検討されていない。本研究では,軽量CNNの効率的な設計を再考し,モバイルデバイスにおけるその可能性を強調する。我々は、軽量VTの効率的なアーキテクチャ選択を統合することで、標準軽量CNN、特にMobileNetV3のモバイルフレンドリ性を徐々に強化する。最終的に、純粋な軽量CNN、すなわちRepViTの新しいファミリーが誕生する。大規模な実験によると、RepViTは既存の最先端の軽量ViTよりも優れており、様々なビジョンタスクにおいて好ましいレイテンシを示している。 ImageNetでは、RepViTは80\%以上のトップ1の精度を達成し、iPhone 12では1ms近いレイテンシを実現しています。我々の最大のモデルであるRepViT-M3は、1.3msのレイテンシで81.4\%の精度を得る。コードとトレーニングされたモデルは \url{https://github.com/jameslahm/repvit} で入手できる。

Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with nearly 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M3, obtains 81.4\% accuracy with only 1.3ms latency. The code and trained models are available at \url{https://github.com/jameslahm/RepViT}.

翻訳日:2023-07-25 20:07:34 公開日:2023-07-23

# TokenFlow: 一貫性のあるビデオ編集機能

TokenFlow: Consistent Diffusion Features for Consistent Video Editing ( http://arxiv.org/abs/2307.10373v2 )

ライセンス: Link先を確認

Michal Geyer and Omer Bar-Tal and Shai Bagon and Tali Dekel

(参考訳) 生成的AI革命は、最近ビデオにまで拡大した。それでも、現在の最先端のビデオモデルは、生成したコンテンツの視覚的品質とユーザコントロールの観点から、画像モデルに遅れを取っている。本稿では,テキストから画像への拡散モデルのパワーをテキスト駆動ビデオ編集のタスクに活用するフレームワークを提案する。具体的には、ソースビデオとターゲットテキストプロンプトを与えられた場合、入力ビデオの空間レイアウトと動きを維持しながら、対象テキストに準拠した高品質な映像を生成する。本手法は, 拡散特徴空間の一貫性を強制することにより, 編集映像の一貫性が得られることを示す。モデルで容易に利用できるフレーム間対応に基づいて拡散特徴を明示的に伝播することにより、これを実現できる。したがって,本フレームワークはトレーニングや微調整を一切必要とせず,市販のテキスト画像編集手法と併用できる。実世界の様々なビデオで最先端の編集結果を示す。 Webページ: https://diffusion-tokenflow.github.io/

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/

翻訳日:2023-07-25 19:58:19 公開日:2023-07-23

# SentimentGPT:高度な感性分析のためのGPTの爆発と現在の機械学習からの逸脱

SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning ( http://arxiv.org/abs/2307.10234v2 )

ライセンス: Link先を確認

Kiana Kheiri and Hamid Karimi

(参考訳) 本研究では,感情分析におけるGPT(Generative Pretrained Transformer)の方法論について,特にSemEval 2017データセットのタスク4の文脈で詳細に検討する。主な戦略は3つある。 1)先進gpt-3.5ターボを用いたプロンプトエンジニアリング 2)微調整GPTモデル、及び 3)埋め込み分類への創発的アプローチ。この研究は、これらの戦略と個々のgptモデル間の詳細な比較洞察をもたらし、その特異な強みと潜在的な限界を明らかにする。さらに、この研究では、これらのGPTベースの方法論と、以前同じデータセットで使用されていた他の高性能モデルと比較した。その結果, GPT手法の予測性能において, F1スコアの22%以上において, 最先端と比較して有意な優位性を示した。さらに、文脈理解や皮肉の検出など、感情分析タスクにおける共通の課題について考察する。これらの複雑さを効果的に扱うため、GPTモデルの強化機能を強調している。これらの知見は、感情分析におけるGPTモデルの可能性を強調し、今後の研究の舞台となる。コードはhttps://github.com/DSAatUSU/SentimentGPTで見ることができる。

This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis, specifically in the context of Task 4 on the SemEval 2017 dataset. Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification. The research yields detailed comparative insights among these strategies and individual GPT models, revealing their unique strengths and potential limitations. Additionally, the study compares these GPT-based methodologies with other current, high-performing models previously used with the same dataset. The results illustrate the significant superiority of the GPT approaches in terms of predictive performance, more than 22\% in F1-score compared to the state-of-the-art. Further, the paper sheds light on common challenges in sentiment analysis tasks, such as understanding context and detecting sarcasm. It underscores the enhanced capabilities of the GPT models to effectively handle these complexities. Taken together, these findings highlight the promising potential of GPT models in sentiment analysis, setting the stage for future research in this field. The code can be found at https://github.com/DSAatUSU/SentimentGPT

翻訳日:2023-07-25 19:58:04 公開日:2023-07-23

# 心筋SPECT画像再構成のためのトランスフォーマーベースデュアルドメインネットワーク

Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions ( http://arxiv.org/abs/2307.09624v2 )

ライセンス: Link先を確認

Huidong Xie, Bo Zhou, Xiongchao Chen, Xueqi Guo, Stephanie Thorn, Yi-Hwa Liu, Ge Wang, Albert Sinusas, Chi Liu

(参考訳) 心臓血管疾患(CVD)は世界中で死因の主要な疾患であり, SPECTを用いた心筋灌流像はCVDの診断に広く用いられている。 GE 530/570c専用心筋SPECTスキャナは静止形状を採用し、19個の投射を同時に取得して感度を高め、ダイナミックイメージングを実現する。しかし、角サンプリングの限られた量は画質に悪影響を及ぼす。静止データから高品質な画像を生成するディープラーニング手法を実装できる。これは本質的には数ビューの撮像問題である。本研究では,高品質3d心筋spect画像再構成のための新しい3dトランスフォーマーベースのデュアルドメインネットワークtip-netを提案する。本手法は,プロジェクション・ツー・イメージ・ドメイン・トランスフォーマーのカスタマイズにより,投影データから直接3次元SPECT画像を再構成することを目的としている。そして、その復元出力と元の少数視点再構成を考慮し、画像ドメイン再構築ネットワークを用いて再構成をさらに洗練する。 fda 510(k)-cleared clinical softwareによって定量化された心臓カテーテル画像、核心科医からの診断解釈、および欠陥サイズによって検証された本手法は、ヒト研究において従来の基準法と比較して高い心不全コントラストを有する画像を生成し、静止数ビュー専用心筋spectスキャナを用いて高品質の欠陥可視化を可能にする。

Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.

翻訳日:2023-07-25 19:56:01 公開日:2023-07-23

# モナディック深層学習

Monadic Deep Learning ( http://arxiv.org/abs/2307.12187v1 )

ライセンス: Link先を確認

Bo Yang, Zhihao Zhang Kirisame Marisa and Kai Shi

(参考訳) JavaとScalaコミュニティは、非常に成功したビッグデータエコシステムを構築しました。しかし、それ上で動作するニューラルネットワークのほとんどは動的型付けプログラミング言語でモデル化されている。これらの動的型付きディープラーニングフレームワークは、ニューラルネットワークを多くのトレーニング可能な変数を含む微分可能な表現として扱い、トレーニング時にそれらの表現を自動微分する。 2019年まで、静的型付け言語における学習フレームワークは、従来のフレームワークの表現力を提供していなかった。ユーザは、ハードコードされたバックプロパゲーションのために多くの定型コードを作成しない限り、カスタムアルゴリズムを使用できない。 DeepLearning.scalaでこの問題を解決しました。 1. 複数のトレーニング可能な変数を含む静的型付き関数に対して,逆モードで自動微分を行う新しい手法を発見し,メタ言語と自由に相互運用できるようにした。 2. 動的ニューラルネットワークを表現するモナド表現をユーザが作成できるように,モナドとモナド変換器のセットを設計した。 3 これらのモナドとともに、複数の計算を並列に行うための応用的関手を提供する。これらの機能により、DeepLearning.scalaのユーザは、直感的で簡潔な方法で複雑なニューラルネットワークを作成でき、型安全性を維持できた。

The Java and Scala community has built a very successful big data ecosystem. However, most of neural networks running on it are modeled in dynamically typed programming languages. These dynamically typed deep learning frameworks treat neural networks as differentiable expressions that contain many trainable variable, and perform automatic differentiation on those expressions when training them. Until 2019, none of the learning frameworks in statically typed languages provided the expressive power of traditional frameworks. Their users are not able to use custom algorithms unless creating plenty of boilerplate code for hard-coded back-propagation. We solved this problem in DeepLearning.scala 2. Our contributions are: 1. We discovered a novel approach to perform automatic differentiation in reverse mode for statically typed functions that contain multiple trainable variable, and can interoperate freely with the metalanguage. 2. We designed a set of monads and monad transformers, which allow users to create monadic expressions that represent dynamic neural networks. 3. Along with these monads, we provide some applicative functors, to perform multiple calculations in parallel. With these features, users of DeepLearning.scala were able to create complex neural networks in an intuitive and concise way, and still maintain type safety.

翻訳日:2023-07-25 17:30:17 公開日:2023-07-23

# FATRER:高精度かつロバストな会話感情認識のためのフルアテンショントピック正規化器

FATRER: Full-Attention Topic Regularizer for Accurate and Robust Conversational Emotion Recognition ( http://arxiv.org/abs/2307.12221v1 )

ライセンス: Link先を確認

Yuzhao Mao, Di Lu, Xiaojie Wang, Yang Zhang

(参考訳) 本稿では,会話発話における対話者の感情の理解に焦点をあてる。この文献における先行研究は、主により正確な感情予測に焦点をあて、一方で、局所的な文脈が敵の攻撃によって破壊されるときのモデル堅牢性を無視している。正確性を確保しつつ頑健性を維持するため,会話中のローカルコンテキストをモデル化する際の感情関連グローバルビューを可能にする,フルアテンショントピック正規化器によって強化された感情認識器を提案する。表現と損失の両方の観点から正規化を実装するために,共同トピックモデリング戦略を導入する。過剰規則化を避けるため、従来のトピックモデリングに存在する事前分布の制約を廃止し、アテンションアライメントに基づく確率的近似を行う。実験により,我々のモデルは最先端モデルよりも好適な結果が得られ,3種類の敵攻撃による強靭性が得られることが示された。

This paper concentrates on the understanding of interlocutors' emotions evoked in conversational utterances. Previous studies in this literature mainly focus on more accurate emotional predictions, while ignoring model robustness when the local context is corrupted by adversarial attacks. To maintain robustness while ensuring accuracy, we propose an emotion recognizer augmented by a full-attention topic regularizer, which enables an emotion-related global view when modeling the local context in a conversation. A joint topic modeling strategy is introduced to implement regularization from both representation and loss perspectives. To avoid over-regularization, we drop the constraints on prior distributions that exist in traditional topic modeling and perform probabilistic approximations based entirely on attention alignment. Experiments show that our models obtain more favorable results than state-of-the-art models, and gain convincing robustness under three types of adversarial attacks.

翻訳日:2023-07-25 17:22:03 公開日:2023-07-23

# プログレッシブ・レネエント・監督による高解像度リモートセンシング画像からの建物足跡分割の迅速化

Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision ( http://arxiv.org/abs/2307.12220v1 )

ライセンス: Link先を確認

Haonan Guo, Bo Du, Chen Wu, Xin Su, Liangpei Zhang

(参考訳) リモートセンシング画像からのビルフットプリントセグメンテーションの有効性は,モデル転送の有効性によって阻害されている。既存の多くのビルセグメンテーション手法は、イメージネットで事前学習された新しく開発されたバックボーンネットワークからエンコーダを微調整したu-netのエンコーダ-デコーダアーキテクチャに基づいて開発された。しかし、既存のデコーダ設計の重い計算負荷は、これらの現代のエンコーダネットワークをリモートセンシングタスクに移すのを妨げている。広く採用されている深層監視戦略でさえ、フォアグラウンドと背景画素が混在するハイブリッド領域において、これらの課題を軽減できない。本稿では,既存のデコーダネットワークの設計を包括的に評価し,学習効率と有効性を高めるためにbfsegと呼ばれる効率的な枠組みを提案する。具体的には,大規模にまたがる簡易かつ高速な特徴融合を容易にする高密結合型特徴融合デコーダネットワークを提案する。さらに,深層監視過程におけるダウンサンプリング・グラウンド真理におけるハイブリッド領域の無効性を考慮して,ネットワークが深層監視から適切な知識を学習できる寛大な深層監視・蒸留戦略を提案する。これらの進歩を基盤として、我々は、広範囲の新規開発エンコーダネットワークにまたがる性能と効率の優れた先行研究を一貫して超越した、建築セグメンテーションネットワークの新たなファミリーを開発した。コードはhttps://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Frameworkでリリースされる。

The efficacy of building footprint segmentation from remotely sensed images has been hindered by model transfer effectiveness. Many existing building segmentation methods were developed upon the encoder-decoder architecture of U-Net, in which the encoder is finetuned from the newly developed backbone networks that are pre-trained on ImageNet. However, the heavy computational burden of the existing decoder designs hampers the successful transfer of these modern encoder networks to remote sensing tasks. Even the widely-adopted deep supervision strategy fails to mitigate these challenges due to its invalid loss in hybrid regions where foreground and background pixels are intermixed. In this paper, we conduct a comprehensive evaluation of existing decoder network designs for building footprint segmentation and propose an efficient framework denoted as BFSeg to enhance learning efficiency and effectiveness. Specifically, a densely-connected coarse-to-fine feature fusion decoder network that facilitates easy and fast feature fusion across scales is proposed. Moreover, considering the invalidity of hybrid regions in the down-sampled ground truth during the deep supervision process, we present a lenient deep supervision and distillation strategy that enables the network to learn proper knowledge from deep supervision. Building upon these advancements, we have developed a new family of building segmentation networks, which consistently surpass prior works with outstanding performance and efficiency across a wide range of newly developed encoder networks. The code will be released on https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.

翻訳日:2023-07-25 17:21:47 公開日:2023-07-23

# 生成補間による分類器の分散外ロバスト性の向上

Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation ( http://arxiv.org/abs/2307.12219v1 )

ライセンス: Link先を確認

Haoyue Bai, Ceyuan Yang, Yinghao Xu, S.-H. Gary Chan, Bolei Zhou

(参考訳) ディープニューラルネットワークは、独立分散(i.i.d.)データから学習する上で優れた性能を達成する。しかし、トレーニングとテストが異なる分布から引き出される、od(out-of-distribution)データを扱う場合、その性能は著しく低下する。本稿では,生成モデルをデータ拡張源として活用して,ニューラル分類器の分布外ロバスト性を改善することを検討する。具体的には,多様なOoDサンプルを合成するために,複数のドメインから学習した生成モデルを融合させるジェネレーション補間法を開発した。ソースドメイン上で生成モデルをトレーニングする場合、モード崩壊に悩まされ、時にはデータバイアスを増幅する。代わりに、まず1つのソースドメイン上でStyleGANモデルをトレーニングし、それから他のドメインで微調整します。次に、ジェネレータのモデルパラメータを線形に補間し、新しいジェネレータセットを生成する。このような補間されたジェネレータは、分類器を訓練する余分なデータ拡張ソースとして使用される。補間係数は、増大方向及び強度を柔軟に制御することができる。また, 生成したoodサンプルの多様性をさらに向上させるために, スタイル混合機構を適用した。実験の結果,提案手法はトレーニング領域の多様性を明示的に向上し,データセット間のベースラインの整合性の向上と複数の分散シフトを実現する。

Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-distribution robustness of neural classifiers. Specifically, we develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples. Training a generative model directly on the source domains tends to suffer from mode collapse and sometimes amplifies the data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators where their model parameters have the same initialization thus are aligned. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength. In addition, a style-mixing mechanism is applied to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.

翻訳日:2023-07-25 17:21:21 公開日:2023-07-23

# 人工知能規制政策の総合的レビューと体系的分析

A Comprehensive Review and Systematic Analysis of Artificial Intelligence Regulation Policies ( http://arxiv.org/abs/2307.12218v1 )

ライセンス: Link先を確認

Weiyue Wu, Shaoshan Liu

(参考訳) 世界中の国々の文化と統治の相違により、現在、グローバルなAI規制分野に混乱をもたらしたAI規制ポリシーの提案が幅広い範囲に存在する。適切な規制AI技術は、法的制約と技術開発の間の微妙なバランスを必要とするため、非常に難しい。本稿では、まず、異なる地理的な場所と文化的背景からAI規制の提案を包括的にレビューする。そして、歴史的教訓から、AI規制提案の徹底的な分析を容易にする枠組みを開発する。最後に、これらのAI規制提案を体系的に分析し、それぞれの提案が失敗する可能性を理解する。この研究は、歴史的教訓と分析方法を含むもので、AI規制の混乱を解消する組織を分割・縮小的に管理することを目的としている。

Due to the cultural and governance differences of countries around the world, there currently exists a wide spectrum of AI regulation policy proposals that have created a chaos in the global AI regulatory space. Properly regulating AI technologies is extremely challenging, as it requires a delicate balance between legal restrictions and technological developments. In this article, we first present a comprehensive review of AI regulation proposals from different geographical locations and cultural backgrounds. Then, drawing from historical lessons, we develop a framework to facilitate a thorough analysis of AI regulation proposals. Finally, we perform a systematic analysis of these AI regulation proposals to understand how each proposal may fail. This study, containing historical lessons and analysis methods, aims to help governing bodies untangling the AI regulatory chaos through a divide-and-conquer manner.

翻訳日:2023-07-25 17:20:56 公開日:2023-07-23

# LoLep: 局所学習平面と自己認識オクルージョン推論を用いた単一ビュービュー合成

LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference ( http://arxiv.org/abs/2307.12217v1 )

ライセンス: Link先を確認

Cong Wang, Yu-Ping Wang, Dinesh Manocha

(参考訳) 本稿では,1枚のRGB画像から局所学習平面を回帰してシーンを正確に表現するLoLepを提案する。深度情報がなければ、適切な平面位置の後退は難しい問題である。この問題を解決するために、各ビンの複数の平面に対する局所オフセットを回帰する分散サンプリング器を設計し、各ビンに分散空間を分割する。しかし,そのようなサンプルを用いただけでネットワークは収束しない。さらに,データセットの異なる分散分布と組み合わせた2つの最適化戦略を提案し,簡易かつ効果的な幾何的監督手法として,オクルージョン認識の再投影損失を提案する。また、オクルージョン推論を改善する自己注意機構を導入し、大きな特徴マップに自己意識を適用する問題に対処するブロックサンプリング自己意識(BS-SA)モジュールを提案する。提案手法の有効性を実証し,異なるデータセットで最新の結果を生成する。 MINEと比較してLPIPSは4.8%-9.0%、RVは83.1%-84.7%である。また,実世界の画像における性能評価を行い,その効果を示す。

We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 83.1%-84.7%. We also evaluate the performance on real-world images and demonstrate the benefits.

翻訳日:2023-07-25 17:20:43 公開日:2023-07-23

# DeepCL: 距離空間におけるリモートセンシング画像によるディープラーニング機能学習

DeepCL: Deep Change Feature Learning on Remote Sensing Images in the Metric Space ( http://arxiv.org/abs/2307.12208v1 )

ライセンス: Link先を確認

Haonan Guo, Bo Du, Chen Wu, Chengxi Han, Liangpei Zhang

(参考訳) 変化検出(CD)は、地球表面のダイナミクスを監視するための地球観測分野において重要な課題である。ディープラーニング技術の出現は、最近、自動CDを技術革新へと駆り立てている。それでも、ディープラーニングベースのCDメソッドは、2つの主要な問題に悩まされている。 1)時間的関係モデリングの不十分 2)擬似変更誤分類これらの課題に対処するために、計量学習の強い時間的モデリング能力とセグメンテーションの顕著な適合性を補完し、堅牢で説明可能なCDのためのDeep Change Feature Learning(DeepCL)フレームワークを提案する。まず,ハードサンプルとシンプルなサンプルの重要性を強調する,ハードサンプル認識型コントラスト損失の設計を行った。この損失により、双方向リモートセンシング画像間の時間的相関を明示的にモデル化することができる。さらに、モデル化された時間関係を、変化領域を検出するセグメンテーションプロセスを導く前に知識として活用する。 deepclフレームワークは理論的にも実験的にも徹底的に評価され、優れた特徴判別性、擬似変更に対する弾力性、様々なcdアルゴリズムへの適応性を示している。広範な比較実験は、最先端cdアプローチにおけるdeepclの量的・質的優位性を実証するものである。

Change detection (CD) is an important yet challenging task in the Earth observation field for monitoring Earth surface dynamics. The advent of deep learning techniques has recently propelled automatic CD into a technological revolution. Nevertheless, deep learning-based CD methods are still plagued by two primary issues: 1) insufficient temporal relationship modeling and 2) pseudo-change misclassification. To address these issues, we complement the strong temporal modeling ability of metric learning with the prominent fitting ability of segmentation and propose a deep change feature learning (DeepCL) framework for robust and explainable CD. Firstly, we designed a hard sample-aware contrastive loss, which reweights the importance of hard and simple samples. This loss allows for explicit modeling of the temporal correlation between bi-temporal remote sensing images. Furthermore, the modeled temporal relations are utilized as knowledge prior to guide the segmentation process for detecting change regions. The DeepCL framework is thoroughly evaluated both theoretically and experimentally, demonstrating its superior feature discriminability, resilience against pseudo changes, and adaptability to a variety of CD algorithms. Extensive comparative experiments substantiate the quantitative and qualitative superiority of DeepCL over state-of-the-art CD approaches.

翻訳日:2023-07-25 17:20:22 公開日:2023-07-23

# 不聴音声起動装置を攻撃するための敵エージェント

Adversarial Agents For Attacking Inaudible Voice Activated Devices ( http://arxiv.org/abs/2307.12204v1 )

ライセンス: Link先を確認

Forrest McKee and David Noever

(参考訳) NIST National Vulnerability Database (NVD) が独立に収集したセキュリティ上の重大な脆弱性を裏付ける。我々のベースラインネットワークモデルは、攻撃者が不正な音声コマンドを使用してセキュアなラップトップ上の機密情報に無許可でアクセスするシナリオを示す。このベースラインネットワークモデル上で多くの攻撃シナリオをシミュレートし,ハードウェアの追加やデバイススキルの強化を伴わずに,物理的アクセスを通じて特権情報を発見し,所有する可能性を明らかにする。 microsoftのcyberbattlesimフレームワークを使用して、6つの強化学習アルゴリズムを評価し、悪用によるディープq学習が最適であることが分かり、より少ないステップですべてのノードの迅速なオーナシップにつながった。特にモバイルデバイス、音声のアクティベーション、および悪意あるアクターがほぼ超音域または非音域で盗聴攻撃を行っていることを特徴とする非線形マイクが特徴である。 2024年までに、この新たな攻撃面は、地球上の人々よりも多くのデジタル音声アシスタントを含んでいるが、従来のパッチやファームウェアの修正よりも少ない修正を提供する。

Our analysis of inaudible attacks on voice-activated devices confirms the alarming risk factor of 7.6 out of 10, underlining significant security vulnerabilities scored independently by NIST National Vulnerability Database (NVD). Our baseline network model showcases a scenario in which an attacker uses inaudible voice commands to gain unauthorized access to confidential information on a secured laptop. We simulated many attack scenarios on this baseline network model, revealing the potential for mass exploitation of interconnected devices to discover and own privileged information through physical access without adding new hardware or amplifying device skills. Using Microsoft's CyberBattleSim framework, we evaluated six reinforcement learning algorithms and found that Deep-Q learning with exploitation proved optimal, leading to rapid ownership of all nodes in fewer steps. Our findings underscore the critical need for understanding non-conventional networks and new cybersecurity measures in an ever-expanding digital landscape, particularly those characterized by mobile devices, voice activation, and non-linear microphones susceptible to malicious actors operating stealth attacks in the near-ultrasound or inaudible ranges. By 2024, this new attack surface might encompass more digital voice assistants than people on the planet yet offer fewer remedies than conventional patching or firmware fixes since the inaudible attacks arise inherently from the microphone design and digital signal processing.

翻訳日:2023-07-25 17:20:02 公開日:2023-07-23

# ncart: 表データのための神経分類と回帰木

NCART: Neural Classification and Regression Tree for Tabular Data ( http://arxiv.org/abs/2307.12198v1 )

ライセンス: Link先を確認

Jiaqi Luo, Shixin Xu

(参考訳) 深層学習モデルは、決定木の限界に対処し、半教師付き学習、オンライン学習、転帰学習といった貴重な応用を可能にするため、表形式のデータ分析で人気がある。しかし、これらのディープラーニングアプローチはしばしばトレードオフに遭遇する。一方、大規模なデータセットや高次元データセットを扱う場合、計算コストが高い場合がある。一方、解釈性に欠ける可能性があり、小規模なデータセットには適さない可能性がある。本研究では,これらの課題を克服するために,ニューラル分類と回帰木(NCART)と呼ばれる新しい解釈可能なニューラルネットワークを提案する。 ncartは残差ネットワークの修正版で、完全接続層を複数の可微分可換決定木に置き換える。アーキテクチャに決定木を統合することで、NCARTはニューラルネットワークのエンドツーエンド機能の恩恵を受けながら、解釈可能性を維持している。 NCARTアーキテクチャの単純さにより、さまざまなサイズのデータセットに適しており、最先端のディープラーニングモデルと比較して計算コストを削減できる。広範な数値実験により、ncartは既存のディープラーニングモデルよりも優れた性能を示し、木ベースのモデルとの強力な競合として確立された。

Deep learning models have become popular in the analysis of tabular data, as they address the limitations of decision trees and enable valuable applications like semi-supervised learning, online learning, and transfer learning. However, these deep-learning approaches often encounter a trade-off. On one hand, they can be computationally expensive when dealing with large-scale or high-dimensional datasets. On the other hand, they may lack interpretability and may not be suitable for small-scale datasets. In this study, we propose a novel interpretable neural network called Neural Classification and Regression Tree (NCART) to overcome these challenges. NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees. By integrating decision trees into the architecture, NCART maintains its interpretability while benefiting from the end-to-end capabilities of neural networks. The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes and reduces computational costs compared to state-of-the-art deep learning models. Extensive numerical experiments demonstrate the superior performance of NCART compared to existing deep learning models, establishing it as a strong competitor to tree-based models.

翻訳日:2023-07-25 17:19:34 公開日:2023-07-23

# LIST:シングルビュー3次元再構成のための空間変換器からの学習

LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction ( http://arxiv.org/abs/2307.12194v1 )

ライセンス: Link先を確認

Mohammad Samiul Arshad and William J. Beksi

(参考訳) 単一の2d画像から3dオブジェクトの幾何学的および位相的詳細を正確に再構築することは、コンピュータビジョンにおける根本的な課題である。既存の明示的・単純解法は、自閉幾何を復元したり、位相的構造を忠実に再構築するのに苦労する。このジレンマを解決するために,局所的および大域的画像特徴を利用した新しいニューラルアーキテクチャであるLISTを導入し,単一の画像から3次元物体の幾何学的および位相的構造を正確に再構築する。対象物体の粗い形状を予測するためにグローバル2次元特徴を用い,高分解能復元のための基盤として利用する。画像からの局所的な2次元特徴と粗い予測からの3次元特徴の両方を活用することで、任意の点とターゲット表面の間の符号付き距離を、暗黙の予測器で高精度に予測できる。さらに,このモデルではカメラ推定や画素アライメントは不要である。インプットビュー方向からの影響のない再構築を提供する。定性的かつ定量的な分析により,合成画像と実世界画像の両方から3次元オブジェクトを再構成する際のモデルの有用性を示す。

Accurate reconstruction of both the geometric and topological details of a 3D object from a single 2D image embodies a fundamental challenge in computer vision. Existing explicit/implicit solutions to this problem struggle to recover self-occluded geometry and/or faithfully reconstruct topological shape structures. To resolve this dilemma, we introduce LIST, a novel neural architecture that leverages local and global image features to accurately reconstruct the geometric and topological structure of a 3D object from a single image. We utilize global 2D features to predict a coarse shape of the target object and then use it as a base for higher-resolution reconstruction. By leveraging both local 2D features from the image and 3D features from the coarse prediction, we can predict the signed distance between an arbitrary point and the target surface via an implicit predictor with great accuracy. Furthermore, our model does not require camera estimation or pixel alignment. It provides an uninfluenced reconstruction from the input-view direction. Through qualitative and quantitative analysis, we show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.

翻訳日:2023-07-25 17:19:15 公開日:2023-07-23

# 機械的相互作用と輸送を考慮したスピン量子ビットに基づくプログラマブル量子プロセッサ

Programmable Quantum Processors based on Spin Qubits with Mechanically-Mediated Interactions and Transport ( http://arxiv.org/abs/2307.12193v1 )

ライセンス: Link先を確認

F. Fung, E. Rosenfeld, J. D. Schaefer, A. Kabcenell, J. Gieseler, T. X. Zhou, T. Madhavan, N. Aslam, A. Yacoby, M. D. Lukin

(参考訳) 固体スピン量子ビットは量子情報処理の候補として期待できるが、大規模なマルチキュービットシステムにおける制御された相互作用と絡み合いは、現時点では達成が難しい。本稿では、ダイヤモンドナノピラー内の窒素空孔(nv)中心を走査プローブ配置で磁気機能化した窒化ケイ素メカニカル共振器に結合したマルチ量子ビットスピン系のプログラマブル制御法について述べる。量子ビットはナノメカニカル共振器との相互作用によって絡み合い、プログラマブル接続はナノピラー内の量子ビットの機械的輸送によって実現される。この方法の実現可能性を示すために,ナノビーム共振器に設置したマイクロマグネットの力学特性と磁場勾配を特徴付ける。さらに、核スピンメモリを利用して、近位スピン量子ビットのコヒーレントな操作と機械的輸送を示し、nvセンターを用いて振動するマイクロマグネットから時変磁場を検出し、7.7(9)hzのスピンメカニカルカップリングを抽出する。現実的な改善により、高協力性体制に到達でき、スピン量子ビットによるスケーラブルな量子情報処理への新たな道のりを提供する。

Solid state spin qubits are promising candidates for quantum information processing, but controlled interactions and entanglement in large, multi-qubit systems are currently difficult to achieve. We describe a method for programmable control of multi-qubit spin systems, in which individual nitrogen-vacancy (NV) centers in diamond nanopillars are coupled to magnetically functionalized silicon nitride mechanical resonators in a scanning probe configuration. Qubits can be entangled via interactions with nanomechanical resonators while programmable connectivity is realized via mechanical transport of qubits in nanopillars. To demonstrate the feasibility of this approach, we characterize both the mechanical properties and the magnetic field gradients around the micromagnet placed on the nanobeam resonator. Furthermore, we show coherent manipulation and mechanical transport of a proximal spin qubit by utilizing nuclear spin memory, and use the NV center to detect the time-varying magnetic field from the oscillating micromagnet, extracting a spin-mechanical coupling of 7.7(9) Hz. With realistic improvements the high-cooperativity regime can be reached, offering a new avenue towards scalable quantum information processing with spin qubits.

翻訳日:2023-07-25 17:18:55 公開日:2023-07-23

# ResWCAE:Residual Wavelet-Conditioned Autoencoderを用いた生体パターン画像デノーミング

ResWCAE: Biometric Pattern Image Denoising Using Residual Wavelet-Conditioned Autoencoder ( http://arxiv.org/abs/2307.12255v1 )

ライセンス: Link先を確認

Youzhi Liang, Wen Liang

(参考訳) パターン画像による生体認証の利用は、IoT(Internet of Things)デバイスでますます普及している。しかし,このようなシステムの信頼性は,特に高レベルのノイズが存在する場合,画質の問題によって損なわれる可能性がある。汎用的な画像推論のために設計された最先端のディープラーニングアルゴリズムは有望だが、その多数のパラメータとユニークなバイオメトリックパターン検索の最適化の欠如は、これらのデバイスやシナリオに適さない。これらの課題に対応するために,本論文では,指紋の識別に特化して設計されたKLD(Kulback-Leibler divergence)正規化を備えたResidual Wavelet-Conditioned Convolutional Autoencoder(Res-WCAE)を提案する。 res-wcaeはイメージエンコーダとウェーブレットエンコーダという2つのエンコーダと、1つのデコーダからなる。画像エンコーダとデコーダ間の残差接続を利用して、ウェーブレットエンコーダから得られた特徴の圧縮表現に基づくボトルネック層をウェーブレット変換ドメインの近似および細部サブイメージを用いて保存する。 res-wcaeの有効性は最先端のデノイジング法に比較して評価され,res-wcaeは,高レベルのノイズが存在する場合において,特に高度に劣化した指紋画像において,これらの手法よりも優れていることが実証された。全体として、Res-WCAEは、コンパクトIoTデバイスの生体認証システムで直面する課題に対する解決策として、Promiseを示している。

The utilization of biometric authentication with pattern images is increasingly popular in compact Internet of Things (IoT) devices. However, the reliability of such systems can be compromised by image quality issues, particularly in the presence of high levels of noise. While state-of-the-art deep learning algorithms designed for generic image denoising have shown promise, their large number of parameters and lack of optimization for unique biometric pattern retrieval make them unsuitable for these devices and scenarios. In response to these challenges, this paper proposes a lightweight and robust deep learning architecture, the Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE) with a Kullback-Leibler divergence (KLD) regularization, designed specifically for fingerprint image denoising. Res-WCAE comprises two encoders - an image encoder and a wavelet encoder - and one decoder. Residual connections between the image encoder and decoder are leveraged to preserve fine-grained spatial features, where the bottleneck layer conditioned on the compressed representation of features obtained from the wavelet encoder using approximation and detail subimages in the wavelet-transform domain. The effectiveness of Res-WCAE is evaluated against several state-of-the-art denoising methods, and the experimental results demonstrate that Res-WCAE outperforms these methods, particularly for heavily degraded fingerprint images in the presence of high levels of noise. Overall, Res-WCAE shows promise as a solution to the challenges faced by biometric authentication systems in compact IoT devices.

翻訳日:2023-07-25 17:11:32 公開日:2023-07-23

# 頭部運動パターンによる説明可能な抑うつ検出

Explainable Depression Detection via Head Motion Patterns ( http://arxiv.org/abs/2307.12241v1 )

ライセンス: Link先を確認

Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke

(参考訳) うつ病はマルチモーダルな非言語行動手段によって研究されているが、頭部運動行動はバイオマーカーとしてはあまり注目されていない。本研究は,2つの異なるアプローチを採用し,特徴を生かした抑うつ検出のための基本的な頭部運動単位であるemph{kinemes}の有用性を示す。 (a)うつ病患者と健常者の両方に対応する頭部運動データからキネムを発見し、 (b) 健常なコントロールからのみキネムパターンを学習し, 患者とコントロールクラスの両方の再構成誤差から得られた統計計算を行った。機械学習手法を用いて,<emph{BlackDog} と \emph{AVEC2013} データセットの抑うつ分類性能を評価する。その結果,(1)頭部運動パターンは抑うつ症状の検出に有効なバイオマーカーであり,(2)前報と一致した説明的キネメパターンは2つのクラスで観察できることがわかった。 AVEC2013の動画では,emph{thin-slices} の2進分類では BlackDog と AVEC2013 で,F1 は 0.79 と 0.82 となり,F1 は 0.72 である。

While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.

翻訳日:2023-07-25 17:10:58 公開日:2023-07-23

# DQ-Det: トランスフォーマーに基づくオブジェクト検出とセグメンテーションのための動的クエリの組み合わせ学習

DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation ( http://arxiv.org/abs/2307.12239v1 )

ライセンス: Link先を確認

Yiming Cui, Linjie Yang, Haichao Yu

(参考訳) Transformerベースの検出とセグメンテーション方法は、学習した検出クエリのリストを使用して、トランスフォーマーネットワークから情報を取得し、各クエリから特定のオブジェクトの位置とカテゴリを予測する。学習したクエリの無作為な凸の組み合わせは、まだ対応するモデルに相応しいことを実証的に見出した。次に,画像の高レベルなセマンティクスに基づいて,動的係数との凸結合を学習することを提案する。生成された動的クエリ、名前付き変調クエリは、異なる画像内のオブジェクトの位置やカテゴリをよりよくキャプチャする。変調クエリにより、オブジェクト検出、インスタンスセグメンテーション、パノスコープセグメンテーション、ビデオインスタンスセグメンテーションを含む複数のタスクにおいて、広範囲のDETRベースのモデルが一貫性と優れたパフォーマンスを達成する。

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks including object detection, instance segmentation, panoptic segmentation, and video instance segmentation.

翻訳日:2023-07-25 17:10:32 公開日:2023-07-23

# ソフトウェアシステムにおける応答時間に基づくRetaining Useful Life (RUL)予測の実証

Demonstration of a Response Time Based Remaining Useful Life (RUL) Prediction for Software Systems ( http://arxiv.org/abs/2307.12237v1 )

ライセンス: Link先を確認

Ray Islam (Mohammad Rubyet Islam), Peter Sandborn

(参考訳) Prognostic and Health Management (PHM) は、電子工学や非エレクトロニクス分野のハードウェアシステムに広く応用されているが、ソフトウェアには適用されていない。ソフトウェアは時間とともに崩壊しないが、リリースサイクルで劣化する可能性がある。ソフトウェア健康管理は問題を特定する診断アセスメントに限られるが、予後アセスメントは将来問題が有害になる可能性を示唆している。ソフトウェア欠陥予測、ソフトウェア信頼性予測、ソフトウェアの予測メンテナンス、ソフトウェア劣化予測、ソフトウェアパフォーマンス予測といった関連する研究分野は存在するが、これら全ては歴史的データに基づいて構築された診断モデルであり、ソフトウェアに対するrulを予測することはできない。本稿では,故障予測とRUL推定のためのソフトウェアシステムへのPHMの概念の適用について述べる。具体的には,バージョン更新やアップグレード,モジュール変更,システム再設計,再帰,メンテナンススケジューリング,予算削減,トータル放棄といったソフトウェアシステムの意思決定に,phmをどのように活用するかについて述べる。本稿では,利用パラメータ(例えば,リリース数とカテゴリ)と性能パラメータ(例えば応答時間)に基づいて,ソフトウェアシステムのrulを確率的かつ連続的に予測する手法を提案する。開発したモデルは、予測モデルによって生成された結果と実際のデータを比較して検証された。統計的検証(回帰検証、k-fold Cross Validation)も行われている。 Bugzillaアプリケーション用の公開データに基づくケーススタディが紹介されている。このケーススタディは、PHMの概念をソフトウェアシステムに適用し、RULを計算してシステム管理の意思決定を行うことを示した。

Prognostic and Health Management (PHM) has been widely applied to hardware systems in the electronics and non-electronics domains but has not been explored for software. While software does not decay over time, it can degrade over release cycles. Software health management is confined to diagnostic assessments that identify problems, whereas prognostic assessment potentially indicates when in the future a problem will become detrimental. Relevant research areas such as software defect prediction, software reliability prediction, predictive maintenance of software, software degradation, and software performance prediction, exist, but all of these represent diagnostic models built upon historical data, none of which can predict an RUL for software. This paper addresses the application of PHM concepts to software systems for fault predictions and RUL estimation. Specifically, this paper addresses how PHM can be used to make decisions for software systems such as version update and upgrade, module changes, system reengineering, rejuvenation, maintenance scheduling, budgeting, and total abandonment. This paper presents a method to prognostically and continuously predict the RUL of a software system based on usage parameters (e.g., the numbers and categories of releases) and performance parameters (e.g., response time). The model developed has been validated by comparing actual data, with the results that were generated by predictive models. Statistical validation (regression validation, and k-fold cross validation) has also been carried out. A case study, based on publicly available data for the Bugzilla application is presented. This case study demonstrates that PHM concepts can be applied to software systems and RUL can be calculated to make system management decisions.

翻訳日:2023-07-25 17:10:17 公開日:2023-07-23

# オンラインストリーミングにおけるゲームスキル評価のためのマルチモーダル機械学習:CS:GOを事例として

Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO ( http://arxiv.org/abs/2307.12236v1 )

ライセンス: Link先を確認

Longxiang Zhang, Wenping Wang

(参考訳) オンラインストリーミングは、多くの注目を集める新興市場だ。ビデオからゲームスキルを評価することは、ストリーミングサービスプロバイダが才能あるゲーマーを見つけるための重要なタスクである。サービス提供者は、顧客にカスタマイズされたレコメンデーションとサービスプロモーションを提供する情報を要求する。一方で、オンラインストリーミングはビジョン、オーディオ、テキストのモダリティを組み合わせるため、これは重要なマルチモーダル機械学習タスクでもある。本研究では、まずデータセットの欠陥を特定し、手動できれいにすることから始める。次に,複数のモダリティの結合表現を学ぶために,最新のエンド・ツー・エンドモデルのいくつかの変種を提案する。広範な実験を通じて,提案の有効性を実証する。さらに,提案モデルでは,意味のある表現を学習する代わりに,ユーザを識別する傾向がある。この問題に最終的に対処するために、今後の作業が目的です。

Online streaming is an emerging market that address much attention. Assessing gaming skills from videos is an important task for streaming service providers to discover talented gamers. Service providers require the information to offer customized recommendation and service promotion to their customers. Meanwhile, this is also an important multi-modal machine learning tasks since online streaming combines vision, audio and text modalities. In this study we begin by identifying flaws in the dataset and proceed to clean it manually. Then we propose several variants of latest end-to-end models to learn joint representation of multiple modalities. Through our extensive experimentation, we demonstrate the efficacy of our proposals. Moreover, we identify that our proposed models is prone to identifying users instead of learning meaningful representations. We purpose future work to address the issue in the end.

翻訳日:2023-07-25 17:09:51 公開日:2023-07-23

# MARS: 適応型マルチアクセラレータシステムにおけるDNNワークロードのためのマルチレベル並列処理

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems ( http://arxiv.org/abs/2307.12234v1 )

ライセンス: Link先を確認

Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu, Quan Chen, Minyi Guo

(参考訳) ディープニューラルネットワークの急速な進化とともに、ハードウェアシステムも急速に発展している。高いスケーラビリティと低い製造コストを達成する有望なソリューションとして、データセンター、クラウドプラットフォーム、SoCにマルチアクセラレータシステムが広く存在する。したがって、マルチアクセラレータシステムでは、利用可能な設計からアクセラレーションの適切な組み合わせを選択し、効率的なDNNマッピング戦略を探すという、困難な問題が発生する。この目的のために,計算対応アクセラレータ選択が可能な新しいマッピングフレームワークMARSを提案し,通信対応シャーディング戦略を適用して並列性を最大化する。実験の結果、MARSはベースラインと比較して典型的なDNNワークロードの平均で32.2%のレイテンシ削減を実現でき、59.4%のレイテンシ削減を実現している。

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers, cloud platforms, and SoCs. Thus, a challenging problem arises in multi-accelerator systems: selecting a proper combination of accelerators from available designs and searching for efficient DNN mapping strategies. To this end, we propose MARS, a novel mapping framework that can perform computation-aware accelerator selection, and apply communication-aware sharding strategies to maximize parallelism. Experimental results show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.

翻訳日:2023-07-25 17:09:38 公開日:2023-07-23

# 自己教師付き学習表現による音声分離と認識の統合の探索

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation ( http://arxiv.org/abs/2307.12231v1 )

ライセンス: Link先を確認

Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

(参考訳) ニューラル音声分離は目覚ましい進歩を遂げており、自動音声認識(ASR)との統合は、マルチスピーカASRの実現に向けた重要な方向である。本研究は,asrフロントエンドとして残響および雑音残響シナリオにおける音声分離に関する洞察的考察を提供する。本稿では,マルチチャネル分離法,マスクベースのビームフォーミング,複雑なスペクトルマッピング,およびASRバックエンドモデルで使用する最良の特徴について検討する。本稿では,最近の自己教師付き学習表現(sslr)を特徴とし,フィルタバンク機能の場合の認識性能を向上させる。マルチ話者認識性能をさらに向上させるため,音声認識とSSLRの統合を念頭に設計したトレーニング戦略を提案する。 TF-GridNet ベースの複素スペクトルマッピングと WavLM ベースのSSLR は、残響 WHAMR! テストセットの2.5% ワードエラー率を実現し、既存のマスクベースの MVDR ビームフォーミングとフィルタバンク統合(28.9%)を大幅に上回った。

Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. We employ the recent self-supervised learning representation (SSLR) as a feature and improve the recognition performance from the case with filterbank features. To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. The proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set, significantly outperforming an existing mask-based MVDR beamforming and filterbank integration (28.9%).

翻訳日:2023-07-25 17:09:20 公開日:2023-07-23

# EchoGLAD:心エコー図における左室ランドマーク検出のための階層型グラフニューラルネットワーク

EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms ( http://arxiv.org/abs/2307.12229v1 )

ライセンス: Link先を確認

Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao

(参考訳) 左心室の機能評価には,4つの目印位置を検出し,左心室の内部次元と周囲の筋肉の近似質量を測定する必要がある。このタスクを機械学習で自動化する鍵となる課題は、臨床ラベルの空間性、すなわち高次元画像のいくつかのランドマークピクセルだけが注釈付けされており、多くの先行研究が等方性ラベルの平滑化に大きく依存している。しかし、そのようなラベルの平滑化戦略は画像の解剖情報を無視し、偏見を生じさせる。この課題に対処するために、左室ランドマーク検出(EchoGLAD)のための心エコーを用いた階層グラフニューラルネットワーク(GNN)を導入する。私たちの主な貢献は 1)GNNによるマルチ解像度ランドマーク検出のための階層グラフ表現学習フレームワーク 2)多層的損失を用いた粒度の異なる階層的監視を行った。我々は,本モデルについて,分布内(ID)および分布外(OOD)設定下で,パブリックおよびプライベートデータセット上で評価する。 ID設定では、2つのデータセット上で1.46mmと1.86mmの最先端平均絶対誤差(MAE)を達成する。また,本モデルでは,従来の4.3mmの試験MAEよりもOODの一般化が優れていた。

The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.

翻訳日:2023-07-25 17:09:00 公開日:2023-07-23

# 事前学習モデルに対する幾何認識適応

Geometry-Aware Adaptation for Pretrained Models ( http://arxiv.org/abs/2307.12226v1 )

ライセンス: Link先を確認

Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala

(参考訳) 著名なゼロショットモデルを含む機械学習モデルは、ラベルがより大きなラベル空間のごく一部に過ぎないデータセットでトレーニングされることが多い。そのような空間は、ラベルを距離で関連付けるメトリクスを備えている。我々は、トレーニングされたモデルを使って新しいクラスを確実に予測したり、ゼロショット予測の場合、追加のトレーニングなしでパフォーマンスを改善するための単純なアプローチを提案する。我々の手法は標準予測規則のドロップイン置換であり、argmaxをfr\'echet平均に置き換える。このアプローチを包括的に理論的に分析し (i)ラベル空間の直径、サンプルの複雑さ、モデル次元を交換する学習理論的結果 (ii)観測されていないクラスを予測できるシナリオの全範囲の特徴、および (iii)非観察クラス全体の予測ができない場合に最適なトレーニングクラスを得るための最適アクティブラーニング型次類選択手順。経験的に、簡単に利用できる外部メトリクスを使用することで、提案手法であるlokiは、imagenetのsimclrよりも29.7%改善され、数十万のクラスにスケールできる。そのようなメトリクスが利用できない場合、Lokiはクラス埋め込みから自己派生メトリクスを使用でき、CLIPのような事前訓練されたゼロショットモデルで10.5%改善される。

Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.

翻訳日:2023-07-25 17:08:39 公開日:2023-07-23

# ASCON:低用量CTデノーミングのための解剖学的監視型コントラスト学習フレームワーク

ASCON: Anatomy-aware Supervised Contrastive Learning Framework for Low-dose CT Denoising ( http://arxiv.org/abs/2307.12225v1 )

ライセンス: Link先を確認

Zhihao Chen, Qi Gao, Yi Zhang, Hongming Shan

(参考訳) 低線量ct(low-dose computed tomography)では,様々な深層学習法が提案されているが,そのほとんどが正常線量ct画像を用いてデノージングプロセスを監視する。これらの方法は通常、単一のct画像、特に人間の組織の解剖学的意味論における固有の相関を無視し、分別過程における解釈可能性に欠ける。本稿では,低用量ctデノーミングのための解剖学的意味論を探索し,解剖学的解釈可能性を提供しながら,教師付きコントラスト学習フレームワークasconを提案する。提案したASCONは、効率的な自己注意に基づくU-Net(ESAU-Net)とマルチスケールの解剖学的コントラストネットワーク(MAC-Net)の2つの新しい設計で構成されている。まず,グローバルな対話をよりよく捉え,高分解能な入力に適応させるために,チャネルワイド自己認識機構を用いて効率的なESAU-Netを導入する。第2に、MAC-Netは固有の解剖情報を取得するパッチワイド非競合モジュールと、固有の解剖学的一貫性を維持するピクセルワイドコントラストモジュールを組み込んでいる。 2つの公開低用量CTデノゲーションデータセットの大規模な実験結果から,ASCONの最先端モデルよりも優れた性能を示した。特筆すべきは,ASCONが低用量CTに初めて解剖学的解釈性を提供することだ。ソースコードはhttps://github.com/hao1635/ASCONで入手できる。

While various deep learning methods have been proposed for low-dose computed tomography (CT) denoising, most of them leverage the normal-dose CT images as the ground-truth to supervise the denoising process. These methods typically ignore the inherent correlation within a single CT image, especially the anatomical semantics of human tissues, and lack the interpretability on the denoising process. In this paper, we propose a novel Anatomy-aware Supervised CONtrastive learning framework, termed ASCON, which can explore the anatomical semantics for low-dose CT denoising while providing anatomical interpretability. The proposed ASCON consists of two novel designs: an efficient self-attention-based U-Net (ESAU-Net) and a multi-scale anatomical contrastive network (MAC-Net). First, to better capture global-local interactions and adapt to the high-resolution input, an efficient ESAU-Net is introduced by using a channel-wise self-attention mechanism. Second, MAC-Net incorporates a patch-wise non-contrastive module to capture inherent anatomical information and a pixel-wise contrastive module to maintain intrinsic anatomical consistency. Extensive experimental results on two public low-dose CT denoising datasets demonstrate superior performance of ASCON over state-of-the-art models. Remarkably, our ASCON provides anatomical interpretability for low-dose CT denoising for the first time. Source code is available at https://github.com/hao1635/ASCON.

翻訳日:2023-07-25 17:08:15 公開日:2023-07-23

# コンセンサス指向マルチエージェント通信による分散適応形成

Decentralized Adaptive Formation via Consensus-Oriented Multi-Agent Communication ( http://arxiv.org/abs/2307.12287v1 )

ライセンス: Link先を確認

Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao and Honggang Zhang

(参考訳) アダプティブ・マルチエージェント形成制御は、エージェントの量変化を分散的に柔軟に調整する必要があるが、特に通信制限下では、マルチエージェントシステムにおいて最も困難な問題の1つである。本稿では,Consensus-based Decentralized Adaptive Formation (Cons-DecAF) フレームワークを提案する。具体的には,コンセンサス指向のマルチエージェント通信(ConsMAC)という新しいマルチエージェント強化学習手法を開発し,エージェントがグローバルな情報を認識し,近隣のメッセージを効果的に集約することで,地域のコンセンサスを確立する。その後,政策蒸留を利用して適応形成調整を行う。一方,剤の特定の位置を事前に割り当てる代わりに,ハウスドルフ距離による変位に基づく形成を行い,その形成効率を大幅に向上させる。シミュレーションによる実験結果から,提案手法は速度と安定性の両面において優れた性能を示した。

Adaptive multi-agent formation control, which requires the formation to flexibly adjust along with the quantity variations of agents in a decentralized manner, belongs to one of the most challenging issues in multi-agent systems, especially under communication-limited constraints. In this paper, we propose a novel Consensus-based Decentralized Adaptive Formation (Cons-DecAF) framework. Specifically, we develop a novel multi-agent reinforcement learning method, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish the consensus from local states by effectively aggregating neighbor messages. Afterwards, we leverage policy distillation to accomplish the adaptive formation adjustment. Meanwhile, instead of pre-assigning specific positions of agents, we employ a displacement-based formation by Hausdorff distance to significantly improve the formation efficiency. The experimental results through extensive simulations validate that the proposed method has achieved outstanding performance in terms of both speed and stability.

翻訳日:2023-07-25 17:02:03 公開日:2023-07-23

# ミリミリクラウドソーシングによる並列データ収集

Milimili. Collecting Parallel Data via Crowdsourcing ( http://arxiv.org/abs/2307.12282v1 )

ライセンス: Link先を確認

Alexander Antonov

(参考訳) 本稿では,クラウドソーシングによる並列コーパスの収集手法を提案する。さらに,Chechen- Russian と Fula- English のペアに対して,実験的な並列データを収集した。

We present a methodology for gathering a parallel corpus through crowdsourcing, which is more cost-effective than hiring professional translators, albeit at the expense of quality. Additionally, we have made available experimental parallel data collected for Chechen-Russian and Fula-English language pairs.

翻訳日:2023-07-25 17:01:47 公開日:2023-07-23

# ダウンストリーム・アグノスティック・アドバーサリの例

Downstream-agnostic Adversarial Examples ( http://arxiv.org/abs/2307.12280v1 )

ライセンス: Link先を確認

Ziqi Zhou, Shengshan Hu, Ruizhi Zhao, Qian Wang, Leo Yu Zhang, Junhui Hou, Hai Jin

(参考訳) 自己教師付き学習は、通常、大量の未ラベルデータを使用してエンコーダを事前訓練するが、これは汎用的な特徴抽出器として使用することができるため、下流のユーザは「大規模モデル」の利点を享受するためにのみ微調整を行う必要がある。この有望な見通しにもかかわらず、プリトレーニングエンコーダのセキュリティは、特にプリトレーニングエンコーダが商用に利用可能である場合に、まだ完全には調査されていない。本稿では,事前学習したエンコーダに基づいて,下流非依存の普遍的逆例を生成する最初のフレームワークであるadvencoderを提案する。 advencoderは、被害者が事前学習したエンコーダを継承する下流タスクをすべて騙すことのできる、一連の自然画像に対する普遍的な敵対的摂動またはパッチを構築することを目的としている。従来の逆行例とは異なり、プリトレーニングエンコーダはラベルの分類ではなく特徴ベクトルのみを出力する。そこで,我々はまず,画像の高周波成分情報を利用して,敵対例の生成を導く。次に,攻撃サロゲートデータセットの分布を学習し,攻撃成功率と伝達性を改善することにより,攻撃側摂動・パッチを構築するための生成攻撃フレームワークを設計する。その結果、攻撃者はトレーニング済みのデータセットや下流のデータセットを知らずにダウンストリームタスクを攻撃できることがわかった。また,プリトレーニングエンコーダに対する4つの防御を調整し,アドベンコーダの攻撃能力をさらに証明した。

Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-trained encoder is publicly available for commercial use. In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. AdvEncoder aims to construct a universal adversarial perturbation or patch for a set of natural images that can fool all the downstream tasks inheriting the victim pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Therefore, we first exploit the high frequency component information of the image to guide the generation of adversarial examples. Then we design a generative attack framework to construct adversarial perturbations/patches by learning the distribution of the attack surrogate dataset to improve their attack success rates and transferability. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset. We also tailor four defenses for pre-trained encoders, the results of which further prove the attack ability of AdvEncoder.

翻訳日:2023-07-25 17:01:42 公開日:2023-07-23

# FDCT: 透明物体の高速深度補完

FDCT: Fast Depth Completion for Transparent Objects ( http://arxiv.org/abs/2307.12274v1 )

ライセンス: Link先を確認

Tianan Li, Zhehan Chen, Huan Liu, Chen Wang

(参考訳) 深さの完成は、自律運転や3D再構築、操作といった多くのロボット作業にとって不可欠である。著しい進歩にもかかわらず、既存の手法は計算集約的であり、しばしば低消費電力ロボットプラットフォームのリアルタイム要求を満たさない。加えて、ほとんどのメソッドは不透明なオブジェクトのために設計されており、反射と屈折の特別な特性のために透明なオブジェクトに苦しむ。これらの課題に対処するため,我々は,オブジェクトポーズ推定などの下流タスクにも有効である透過的オブジェクト(fdct)のための高速深さ補完フレームワークを提案する。地域情報を活用し,グローバル情報と統合する際の過剰フィッティングを回避するために,新しい融合ブランチとショートカットを設計し,低レベル機能と損失関数を活用し、過剰フィッティングを抑制する。これにより,RGB-D画像のみからの深度推定を再現する,高精度でユーザフレンドリな深度補正フレームワークが実現される。広範な実験により、fdctは最先端の手法よりも高い精度で約70fpsで動作できることが示されている。また,fdctは対象把握タスクにおけるポーズ推定を改善できることを実証する。ソースコードはhttps://github.com/Nonmy/FDCTで入手できる。

Depth completion is crucial for many robotic tasks such as autonomous driving, 3-D reconstruction, and manipulation. Despite the significant progress, existing methods remain computationally intensive and often fail to meet the real-time requirements of low-power robotic platforms. Additionally, most methods are designed for opaque objects and struggle with transparent objects due to the special properties of reflection and refraction. To address these challenges, we propose a Fast Depth Completion framework for Transparent objects (FDCT), which also benefits downstream tasks like object pose estimation. To leverage local information and avoid overfitting issues when integrating it with global information, we design a new fusion branch and shortcuts to exploit low-level features and a loss function to suppress overfitting. This results in an accurate and user-friendly depth rectification framework which can recover dense depth estimation from RGB-D images alone. Extensive experiments demonstrate that FDCT can run about 70 FPS with a higher accuracy than the state-of-the-art methods. We also demonstrate that FDCT can improve pose estimation in object grasping tasks. The source code is available at https://github.com/Nonmy/FDCT

翻訳日:2023-07-25 17:01:18 公開日:2023-07-23

# 線欠陥によるトポロジカル保護ヘリカルエッジ状態の後方散乱

Backscattering of topologically protected helical edge states by line defects ( http://arxiv.org/abs/2307.12271v1 )

ライセンス: Link先を確認

Mohadese Karimi, Mohsen Amini, Morteza Soltani, and Mozhgan Sadeghizadeh

(参考訳) 非磁性点欠陥の存在下での伝導の量子化は、2次元トポロジカル絶縁体におけるヘリカルエッジ状態のトポロジカル保護とスピンモーメントロックの結果である。この保護は、システムの量子ホール相におけるヘリカルエッジモードの後方散乱がないことを保証する。しかし,本研究は,この保護を損なう新たなアプローチを検討することに焦点を当てている。オンサイト不純物の線形配置は,ケイン・ミールモデルにおけるエッジ状態の位相的保護を効果的に高めることができる。この現象を調べるために,その幅にまたがるライン欠陥を含むアームチェアリボンについて検討する。タイト結合モデルと非平衡グリーン関数法を用いて,システムの伝送係数を計算する。その結果, 正のオンサイト電位に対するバルクギャップ下端近傍のエネルギーのコンダクタンスの抑制が明らかになった。この挙動をさらに理解するため,解析計算を行い,不純物チャネルの形成について議論する。このチャネルは、リボンの下端と上端をつなぐガップ内結合状態の重なりによって生じ、結果として後方散乱が容易になる。我々の説明は不純物の位置に近い場所における状態の局所密度の分析によって裏付けられている。

The quantization of conductance in the presence of non-magnetic point defects is a consequence of topological protection and the spin-momentum locking of helical edge states in two-dimensional topological insulators. This protection ensures the absence of backscattering of helical edge modes in the quantum Hall phase of the system. However, our study focuses on exploring a novel approach to disrupt this protection. We propose that a linear arrangement of on-site impurities can effectively lift the topological protection of edge states in the Kane-Mele model. To investigate this phenomenon, we consider an armchair ribbon containing a line defect spanning its width. Utilizing the tight-binding model and non-equilibrium Green's function method, we calculate the transmission coefficient of the system. Our results reveal a suppression of conductance at energies near the lower edge of the bulk gap for positive on-site potentials. To further comprehend this behavior, we perform analytical calculations and discuss the formation of an impurity channel. This channel arises due to the overlap of in-gap bound states, linking the bottom edge of the ribbon to its top edge, consequently facilitating backscattering. Our explanation is supported by the analysis of the local density of states at sites near the position of impurities.

翻訳日:2023-07-25 17:01:01 公開日:2023-07-23

# シーンテキスト認識のためのコンテキスト知覚並列デコーダ

Context Perception Parallel Decoder for Scene Text Recognition ( http://arxiv.org/abs/2307.12270v1 )

ライセンス: Link先を確認

Yongkun Du and Zhineng Chen and Caiyan Jia and Xiaoting Yin and Chenxia Li and Yuning Du and Yu-Gang Jiang

(参考訳) Scene Text Recognition (STR) 法は高い精度と高速な推論速度を達成するのに苦労している。自己回帰(AR)ベースのSTRモデルは、事前に認識された文字を使って次の文字を反復的に復号する。精度の点で優位性を示す。しかし、この反復により推論速度も遅くなる。あるいは、並列デコード(PD)ベースのSTRモデルは、すべての文字を1つのデコードパスで推測する。推論速度の面では利点があるが、そのようなパスで堅牢な認識コンテキストを構築するのは難しいため、精度が悪くなる。本稿では,STRにおけるARデコーディングの実証的研究について述べる。また,ARデコーダの精度向上に加えて,ARデコーダの成功は,既存の研究で主張されている言語モデリングよりも,視覚的文脈認識のガイダンスを提供することにも寄与していることがわかった。その結果,1つのPDパスで文字列を復号化するためのコンテキスト知覚並列デコーダ (CPPD) を提案する。 CPPDは文字カウントモジュールと文字順序モジュールを考案する。テキストインスタンスが与えられた場合、前者は各文字の発生回数を推定し、後者は文字読み順序とプレースホルダーを推定する。キャラクタ予測タスクと合わせて、キャラクタシーケンスとキャラクタの出現場所をロバストに指示するコンテキストを構築し、arデコードによって伝達されるコンテキストをよく模倣する。英語と中国語のベンチマークの実験は、CPPDモデルが高い競争精度を達成することを示した。さらに、ARよりも約7倍高速で動作し、最も高速な認識器の1つである。コードはまもなくリリースされる。

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based STR model uses the previously recognized characters to decode the next character iteratively. It shows superiority in terms of accuracy. However, the inference speed is slow also due to this iteration. Alternatively, parallel decoding (PD)-based STR model infers all the characters in a single decoding pass. It has advantages in terms of inference speed but worse accuracy, as it is difficult to build a robust recognition context in such a pass. In this paper, we first present an empirical study of AR decoding in STR. In addition to constructing a new AR model with the top accuracy, we find out that the success of AR decoder lies also in providing guidance on visual context perception rather than language modeling as claimed in existing studies. As a consequence, we propose Context Perception Parallel Decoder (CPPD) to decode the character sequence in a single PD pass. CPPD devises a character counting module and a character ordering module. Given a text instance, the former infers the occurrence count of each character, while the latter deduces the character reading order and placeholders. Together with the character prediction task, they construct a context that robustly tells what the character sequence is and where the characters appear, well mimicking the context conveyed by AR decoding. Experiments on both English and Chinese benchmarks demonstrate that CPPD models achieve highly competitive accuracy. Moreover, they run approximately 7x faster than their AR counterparts, and are also among the fastest recognizers. The code will be released soon.

翻訳日:2023-07-25 17:00:42 公開日:2023-07-23

# 教育における人間-aiハイブリッドエッセイの自動境界検出に向けて

Towards Automatic Boundary Detection for Human-AI Hybrid Essay in Education ( http://arxiv.org/abs/2307.12267v1 )

ライセンス: Link先を確認

Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c, Guanliang Chen

(参考訳) 現代の大規模言語モデル(LLM)、例えばChatGPTの助けを借りて、人間とAIの協調的な記述が大幅に促進された。技術進歩によってもたらされる利便性を認める一方で、教育者は、学生がLLMを利用して部分的に執筆課題を完了し、人間とAIのハイブリッドテキストを原著として引き渡すのではないかと懸念している。そこで本研究では,人文コンテンツとAI生成コンテンツ間の遷移点を識別する境界検出問題として,ハイブリッドテキスト検出を形式化した。学生が書いたエッセイから文章を部分的に取り除き,不完全なエッセイを補うようChatGPTに指示することで,ハイブリッドエッセイデータセットを構築した。そこで我々は,(1)埋め込み学習過程において,人文コンテンツからAI生成コンテンツを分離する2段階検出手法を提案し,(2)隣り合う2つのプロトタイプ間の距離(プロトタイプは埋め込み空間におけるハイブリッドテキストからの連続文の集合の平均)を計算し,その境界が互いに最も近い2つのプロトタイプの間に存在すると仮定した。広範な実験を通じて,(1)提案手法が,異なる実験環境におけるベースラインメソッドを一貫して上回っていたこと,(2)埋め込み学習プロセス(ステップ)を要約した。 1) 単一境界ハイブリッドエッセイのバウンダリを検出する場合, 比較的大きなプロトタイプサイズを採用することにより, 提案手法の性能が向上し, ドメイン内設定では22ドル\%, ドメイン外設定では18ドル\%向上した。

Human-AI collaborative writing has been greatly facilitated with the help of modern large language models (LLM), e.g., ChatGPT. While admitting the convenience brought by technology advancement, educators also have concerns that students might leverage LLM to partially complete their writing assignment and pass off the human-AI hybrid text as their original work. Driven by such concerns, in this study, we investigated the automatic detection of Human-AI hybrid text in education, where we formalized the hybrid text detection as a boundary detection problem, i.e., identifying the transition points between human-written content and AI-generated content. We constructed a hybrid essay dataset by partially removing sentences from the original student-written essays and then instructing ChatGPT to fill in for the incomplete essays. Then we proposed a two-step detection approach where we (1) Separated AI-generated content from human-written content during the embedding learning process; and (2) Calculated the distances between every two adjacent prototypes (a prototype is the mean of a set of consecutive sentences from the hybrid text in the embedding space) and assumed that the boundaries exist between the two prototypes that have the furthest distance from each other. Through extensive experiments, we summarized the following main findings: (1) The proposed approach consistently outperformed the baseline methods across different experiment settings; (2) The embedding learning process (i.e., step 1) can significantly boost the performance of the proposed approach; (3) When detecting boundaries for single-boundary hybrid essays, the performance of the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a $22$\% improvement (against the second-best baseline method) in the in-domain setting and an $18$\% improvement in the out-of-domain setting.

翻訳日:2023-07-25 17:00:16 公開日:2023-07-23

# テキスト意味コミュニケーションのためのトランスフォーマベースジョイントソースチャネル符号化

Transformer-based Joint Source Channel Coding for Textual Semantic Communication ( http://arxiv.org/abs/2307.12266v1 )

ライセンス: Link先を確認

Shicong Liu, Zhen Gao, Gaojie Chen, Yu Su, Lu Peng

(参考訳) Space-Air-Ground-Sea統合ネットワークは、妨害に対するより堅牢でセキュアな送信技術を要求する。本稿では,文のモデル化とエンコードに先進的な自然言語処理技術を利用する,ロバスト伝送のためのテキスト意味伝達フレームワークを提案する。具体的には、テキスト文をワードピースアルゴリズムを用いてトークンに分割し、トランスフォーマベースのエンコーダによる意味抽出のためのトークンベクトルに埋め込む。符号化されたデータは、伝送のための固定長バイナリシーケンスに量子化され、バイナリ消去、対称、削除チャネルが検討される。受信されたバイナリシーケンスは、変換器デコーダによってさらに復号化され、文再構成に用いられるトークンとなる。提案手法は,ニューラルネットワークのパワーと注意機構を利用して,難易度の高い無線環境におけるテキストデータの信頼性と効率的な通信を実現する。

The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm, and are embedded to token vectors for semantic extraction by Transformer-based encoder. The encoded data are quantized to a fixed length binary sequence for transmission, where binary erasure, symmetric, and deletion channels are considered for transmission. The received binary sequences are further decoded by the transformer decoders into tokens used for sentence reconstruction. Our proposed approach leverages the power of neural networks and attention mechanism to provide reliable and efficient communication of textual data in challenging wireless environments, and simulation results on semantic similarity and bilingual evaluation understudy prove the superiority of the proposed model in semantic transmission.

翻訳日:2023-07-25 16:59:40 公開日:2023-07-23

# マンダリン音声認識における高速アクセント領域拡張のためのメタ学習方式

A meta learning scheme for fast accent domain expansion in Mandarin speech recognition ( http://arxiv.org/abs/2307.12262v1 )

ライセンス: Link先を確認

Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu

(参考訳) 音声言語は、マンダリンとアクセントに大きな変化を示す。マンダリン自動音声認識(ASR)の性能は高いが,アクセントASRは依然として課題である。本稿では,マンダリンasrの性能を損なうことなくアクセントの分野を拡大する,マンダリン音声認識におけるアクセント領域の高速拡張のためのメタラーニング手法を提案する。メタラーニング(meta-learning)やlearn-to-learn(learning-to-learn)は、特定のドメインをオーバーフィットするだけでなく、複数のドメインで一般的な関係を学ぶことができる。そこでドメイン拡張タスクでメタラーニングを選択する。このより本質的な学習はアクセントドメイン拡張タスクのパフォーマンスを改善する。モデルパラメータのメタ学習と凍結の手法を組み合わせることで、異なるケースで認識性能がより安定し、トレーニングが約20%高速になる。本手法はアクセント領域拡張タスクにおいて,他の手法を約3%上回っている。ベースラインモデルと比較して、マンダリン試験セットが変化しない条件下では比較的37%改善する。さらに,この手法はアクセントテストセット上での相対的な性能改善を4%とした大量のデータに対して有効であることを示した。

Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-learning or learn-to-learn can learn general relation in multi domains not only for over-fitting a specific domain. So we select meta-learning in the domain expansion task. This more essential learning will cause improved performance on accent domain extension tasks. We combine the methods of meta learning and freeze of model parameters, which makes the recognition performance more stable in different cases and the training faster about 20%. Our approach significantly outperforms other methods about 3% relatively in the accent domain expansion task. Compared to the baseline model, it improves relatively 37% under the condition that the mandarin test set remains unchanged. In addition, it also proved this method to be effective on a large amount of data with a relative performance improvement of 4% on the accent test set.

翻訳日:2023-07-25 16:59:26 公開日:2023-07-23

# クロスインタラクションによるリモートセンシング画像からのビルディング・ロード協調抽出

Building-road Collaborative Extraction from Remotely Sensed Images via Cross-Interaction ( http://arxiv.org/abs/2307.12256v1 )

ライセンス: Link先を確認

Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang

(参考訳) 建物は社会生産と人間生活の基本的な担体であり、道路はソーシャルネットワークを繋ぐリンクである。建築・道路情報は、地域連携開発、防災、自動運転等のフロンティア分野において重要な応用価値を有する。超高解像度(VHR)リモートセンシング画像からの建物や道路のマッピングがホットな研究トピックとなっている。しかし、既存の手法は道路と建物の間の強い空間的相関を無視し、孤立して抽出することが多い。建物と道路の相補的な利点をフル活用するために,マルチタスクとクロススケール機能インタラクションに基づくビル-ロード協調抽出手法を提案し,両タスクの精度を補完的に向上させる。マルチタスク学習におけるシーソー現象に対処する,タスク間での情報交換と各タスクのユニークな情報保存のためのマルチタスクインタラクションモジュールを提案する。建物と道路の外観や構造の変化を考慮し,異なるタスクに対する最適受信場を自動的に学習するクロススケール相互作用モジュールを設計する。個別にタスクを訓練する既存の多くの方法と比較して,提案手法は,タスク間および大規模機能間相互作用によって,建物と道路の相補的優位性を活用でき,タスクごとに最適な受信フィールドを自動的に選択できる。都市・農村の幅広いシナリオにおける実験により,提案アルゴリズムは優れた性能と効率でビルディングロード抽出を実現できることを示した。

Buildings are the basic carrier of social production and human life; roads are the links that interconnect social networks. Building and road information has important application value in the frontier fields of regional coordinated development, disaster prevention, auto-driving, etc. Mapping buildings and roads from very high-resolution (VHR) remote sensing images have become a hot research topic. However, the existing methods often ignore the strong spatial correlation between roads and buildings and extract them in isolation. To fully utilize the complementary advantages between buildings and roads, we propose a building-road collaborative extraction method based on multi-task and cross-scale feature interaction to improve the accuracy of both tasks in a complementary way. A multi-task interaction module is proposed to interact information across tasks and preserve the unique information of each task, which tackle the seesaw phenomenon in multitask learning. By considering the variation in appearance and structure between buildings and roads, a cross-scale interaction module is designed to automatically learn the optimal reception field for different tasks. Compared with many existing methods that train each task individually, the proposed collaborative extraction method can utilize the complementary advantages between buildings and roads by the proposed inter-task and inter-scale feature interactions, and automatically select the optimal reception field for different tasks. Experiments on a wide range of urban and rural scenarios show that the proposed algorithm can achieve building-road extraction with outstanding performance and efficiency.

翻訳日:2023-07-25 16:59:08 公開日:2023-07-23

# 物理インフォームドニューラルネットワークによる次元の呪いへの取り組み

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks ( http://arxiv.org/abs/2307.12306v1 )

ライセンス: Link先を確認

Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

(参考訳) 次元の呪い (CoD) は計算資源に重きを置き、次元が大きくなるにつれて計算コストが指数関数的に増加する。これは60年以上前にRichard Bellman氏が最初に指摘したように、高次元PDEを解決する上で大きな課題となる。近年、数値偏微分方程式(PDE)を高次元で解くことに成功したが、そのような計算は違法に高価であり、一般的な非線形PDEの高次元への真のスケーリングは達成されていない。本稿では,任意の高次元PDEを解くために,物理インフォームドニューラルネットワーク(PINN)をスケールアップする新しい手法を提案する。新しい手法はStochastic Dimension Gradient Descent (SDGD)と呼ばれ、PDEの勾配を異なる次元に対応するピースに分解し、トレーニングPINNの各イテレーションでこれらの次元のサブセットをランダムにサンプリングする。提案手法の収束保証とその他の望ましい性質を理論的に証明する。提案手法は,Hamilton-Jacobi-Bellman や Schr\"{o}dinger 方程式など,非常に難しい高次元 PDE を,PINN のメッシュフリーアプローチを用いて,単一のGPU上で非常に高速に解けることを示す。例えば、非自明な非線形PDE(HJB-Lin方程式とBSB方程式)を、PINNを用いたSDGDを用いて1つのGPU上で6時間で10,000次元で解く。 SDGD は PINN の一般的な訓練手法であるため、SDGD は任意の高次元 PDE に対してスケールアップするために、現在および将来の PINN のどの変種にも適用することができる。

The curse-of-dimensionality (CoD) taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs as Richard Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. In this paper, we develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and samples randomly a subset of these dimensional pieces in each iteration of training PINNs. We theoretically prove the convergence guarantee and other desired properties of the proposed method. We experimentally demonstrate that the proposed method allows us to solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman and the Schr\"{o}dinger equations in thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. For example, we solve nontrivial nonlinear PDEs (the HJB-Lin equation and the BSB equation) in 100,000 dimensions in 6 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, SDGD can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs.

翻訳日:2023-07-25 16:51:19 公開日:2023-07-23

# アルゴンガス駆動型溶融プールダイナミクスの物理インフォームド機械学習

Physics-Informed Machine Learning of Argon Gas-Driven Melt Pool Dynamics ( http://arxiv.org/abs/2307.12304v1 )

ライセンス: Link先を確認

R. Sharma, W. Grace Guo, M. Raissi, Y.B. Guo

(参考訳) 金属添加物製造(AM)における溶融プールダイナミクスは, 印刷材料の安定性, 微細構造形成, 最終特性の処理に重要である。計算流体力学(CFD)を含む物理シミュレーションは, 溶融プール力学を予測する主要な手法である。しかし、物理ベースのシミュレーションアプローチは、計算コストが非常に高いという本質的な問題に苦しむ。本稿では,ニューラルネットワークと制御物理法則を統合し,温度,速度,圧力などの溶融プールのダイナミクスを速度に関するトレーニングデータを用いることなく予測する物理インフォームド機械学習(PIML)手法を提案する。このアプローチは、非常に非線形なナビエ-ストークス方程式を数値的に解くことを避け、計算コストを大幅に削減する。溶融プールの制御方程式の決定が難しいモデル定数は、データ駆動による発見によっても推測できる。さらに、物理インフォームドニューラルネットワーク(PINN)アーキテクチャは、効率的なモデルトレーニングのために最適化されている。データ効率のよいPINNモデルは、制御偏微分方程式(PDE)、初期条件、PINNモデルの境界条件を組み込むことによって、ソフトペナルティに起因している。

Melt pool dynamics in metal additive manufacturing (AM) is critical to process stability, microstructure formation, and final properties of the printed materials. Physics-based simulation including computational fluid dynamics (CFD) is the dominant approach to predict melt pool dynamics. However, the physics-based simulation approaches suffer from the inherent issue of very high computational cost. This paper provides a physics-informed machine learning (PIML) method by integrating neural networks with the governing physical laws to predict the melt pool dynamics such as temperature, velocity, and pressure without using any training data on velocity. This approach avoids solving the highly non-linear Navier-Stokes equation numerically, which significantly reduces the computational cost. The difficult-to-determine model constants of the governing equations of the melt pool can also be inferred through data-driven discovery. In addition, the physics-informed neural network (PINN) architecture has been optimized for efficient model training. The data-efficient PINN model is attributed to the soft penalty by incorporating governing partial differential equations (PDEs), initial conditions, and boundary conditions in the PINN model.

翻訳日:2023-07-25 16:50:48 公開日:2023-07-23

# RANSAC-NN:RANSACを用いた教師なし画像異常検出

RANSAC-NN: Unsupervised Image Outlier Detection using RANSAC ( http://arxiv.org/abs/2307.12301v1 )

ライセンス: Link先を確認

Chen-Han Tsai, Yu-Shao Peng

(参考訳) 画像異常検出(OD)は、コンピュータビジョンタスクで使用される画像データセットの品質と精度を保証するために重要である。しかし、ODアルゴリズムの大部分は画像データを対象としていない。したがって、そのようなアルゴリズムを画像に適用する結果はしばしば最適ではない。本研究では,画像に特化して設計された新しい教師なしODアルゴリズムであるRANSAC-NNを提案する。 RANSACに基づくアプローチで画像を比較することにより、トレーニングやラベル情報なしで各画像の外れ値を自動的に予測する。 RANSAC-NNを15種類のデータセット上の最先端ODアルゴリズムに対して評価する。 RANSAC-NNは、ハイパーパラメータチューニングがなければ、ほぼすべてのデータセットカテゴリの他のアルゴリズムとは対照的に、一貫して好意的に機能する。さらに、RANSAC-NNの各コンポーネントを理解するための詳細な分析を行い、画像誤ラベル検出におけるその可能性を示す。 RANSAC-NNのコードはhttps://github.com/mxtsai/ransac-nnで公開されている。

Image outlier detection (OD) is crucial for ensuring the quality and accuracy of image datasets used in computer vision tasks. The majority of OD algorithms, however, have not been targeted toward image data. Consequently, the results of applying such algorithms to images are often suboptimal. In this work, we propose RANSAC-NN, a novel unsupervised OD algorithm specifically designed for images. By comparing images in a RANSAC-based approach, our algorithm automatically predicts the outlier score of each image without additional training or label information. We evaluate RANSAC-NN against state-of-the-art OD algorithms on 15 diverse datasets. Without any hyperparameter tuning, RANSAC-NN consistently performs favorably in contrast to other algorithms in almost every dataset category. Furthermore, we provide a detailed analysis to understand each RANSAC-NN component, and we demonstrate its potential applications in image mislabeled detection. Code for RANSAC-NN is provided at https://github.com/mxtsai/ransac-nn

翻訳日:2023-07-25 16:50:31 公開日:2023-07-23

# hybrid-csr : 皮質表面再構成のための明示的および暗黙的形状表現

Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction ( http://arxiv.org/abs/2307.12299v1 )

ライセンス: Link先を確認

Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie

(参考訳) 我々は,皮質表面再構成のための明示的および暗黙的な形状表現を組み合わせた幾何学的深層学習モデルであるHybrid-CSRを提案する。具体的には、Hybrid-CSRはテンプレートメッシュの明示的な変形から始まり、粗い再構成された皮質表面を得る。これにより,明示的(指向的点雲)と暗黙的(インジケータ関数)皮質表面再構成を統一する。明示的な表現ベース手法と比較すると,このハイブリッド手法は詳細な構造を捉えるのに好適であり,暗黙的な表現ベース手法と比較すると,メッシュベースの変形モジュールを用いたエンドツーエンドのトレーニングによりトポロジを認識できる。トポロジー欠陥に対処するために,最適化に基づく微分曲面登録に依存する新しいトポロジー補正パイプラインを提案する。 3つの脳データセットによる実験結果から,従来の暗黙的および明示的な皮質表面再構成法を精度,規則性,一貫性の点で超えた。

We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction. Specifically, Hybrid-CSR begins with explicit deformations of template meshes to obtain coarsely reconstructed cortical surfaces, based on which the oriented point clouds are estimated for the subsequent differentiable poisson surface reconstruction. By doing so, our method unifies explicit (oriented point clouds) and implicit (indicator function) cortical surface reconstruction. Compared to explicit representation-based methods, our hybrid approach is more friendly to capture detailed structures, and when compared with implicit representation-based methods, our method can be topology aware because of end-to-end training with a mesh-based deformation module. In order to address topology defects, we propose a new topology correction pipeline that relies on optimization-based diffeomorphic surface registration. Experimental results on three brain datasets show that our approach surpasses existing implicit and explicit cortical surface reconstruction methods in numeric metrics in terms of accuracy, regularity, and consistency.

翻訳日:2023-07-25 16:50:19 公開日:2023-07-23

# 超伝導カーキャット量子ビットの安定化と散逸情報伝達

Stabilization and Dissipative Information Transfer of a Superconducting Kerr-Cat Qubit ( http://arxiv.org/abs/2307.12298v1 )

ライセンス: Link先を確認

Ufuk Korkmaz, Deniz T\"urkpen\c{c}e

(参考訳) 今日では量子コンピュータの競争が続き、ハードウェアにおける量子ビットの数は急速に増加している。しかし、このプロセスに伴う量子ノイズはアルゴリズムアプリケーションの性能を低下させるため、量子コンピュータアーキテクチャやアルゴリズムの実装における別の方法が議論されている。これらの方法の1つは、回路ベースの量子コンピューティングモデルと散逸ベースのコンピューティングモデルとのハイブリッド化である。ここでの目標は、量子回路モデルに量子アドバンテージを提供するアルゴリズムの一部と、ノイズの影響が少ない散逸モデルに残りの部分を適用することである。このスキームは、非常に反復的なプロセスを含む量子機械学習アルゴリズムにおいて重要であり、ノイズの影響を受けやすい。本研究では,cat-qubit と呼ばれる qubit モデルへの散逸情報転送について検討する。このモデルは、量子機械学習アルゴリズムの基本的な処理単位である二項量子分類の散逸ベースのバージョンで特に重要である。一方、キャット量子ビットアーキテクチャは、その豊富な物理性のため、人工ニューラルネットワークにアクティベーションのような機能を簡単に実装できる可能性があり、量子人工ニューラルネットワークの代替ハードウェアの機会を提供する。数値計算は、繰り返し相互作用に基づく散逸的スキームによる貯水池キュービットからの量子情報の転送に成功したことを示す。

Today, the competition to build a quantum computer continues, and the number of qubits in hardware is increasing rapidly. However, the quantum noise that comes with this process reduces the performance of algorithmic applications, so alternative ways in quantum computer architecture and implementation of algorithms are discussed on the one hand. One of these alternative ways is the hybridization of the circuit-based quantum computing model with the dissipative-based computing model. Here, the goal is to apply the part of the algorithm that provides the quantum advantage with the quantum circuit model, and the remaining part with the dissipative model, which is less affected by noise. This scheme is of importance to quantum machine learning algorithms that involve highly repetitive processes and are thus susceptible to noise. In this study, we examine dissipative information transfer to a qubit model called Cat-Qubit. This model is especially important for the dissipative-based version of the binary quantum classification, which is the basic processing unit of quantum machine learning algorithms. On the other hand, Cat-Qubit architecture, which has the potential to easily implement activation-like functions in artificial neural networks due to its rich physics, also offers an alternative hardware opportunity for quantum artificial neural networks. Numerical calculations exhibit successful transfer of quantum information from reservoir qubits by a repeated-interactions-based dissipative scheme.

翻訳日:2023-07-25 16:50:00 公開日:2023-07-23

# 複数フレームからの同時温度推定と非均一性補正

Simultaneous temperature estimation and nonuniformity correction from multiple frames ( http://arxiv.org/abs/2307.12297v1 )

ライセンス: Link先を確認

Navot Oz, Omri Berman, Nir Sochen, David Mendelovich, Iftach Klapp

(参考訳) 赤外線カメラは農業、医療、セキュリティなど様々な用途で温度測定に広く利用されている。しかし、低コストのマイクロボロメーターベースの赤外線カメラは、空間的に変化する非均一性や温度測定のドリフトが起こりやすいため、実用的なシナリオでは使用性に制限がある。これらの制約に対処するために, 低コストのマイクロボロメータベースirカメラで撮影した複数フレームの温度推定と非均一性補正を同時に行う新しい手法を提案する。我々は、カメラの物理的画像取得モデルを利用し、カーネル推定ネットワーク(kpn)と呼ばれるディープラーニングアーキテクチャに組み込む。また,環境温度をモデルに組み込んだ新しいオフセットブロックを提案し,温度推定の重要な要因であるカメラのオフセットを推定する。その結果,フレームの数は温度推定の精度や不均一性補正に有意な影響を及ぼすことがわかった。さらに,本手法はオフセットブロックにより,バニラKPNに比べて性能が大幅に向上した。この手法は、UAVに搭載された低コストの赤外線カメラによって収集された実データに基づいてテストされ、コストの高い科学的グレードのラジオメトリックカメラと比較して、わずか0.27^\circ C-0.54^\circ C$の誤差しか示さなかった。本手法は, 温度推定と非一様性補正を同時に行うための精度と効率のよい解を提供する。

Infrared (IR) cameras are widely used for temperature measurements in various applications, including agriculture, medicine, and security. Low-cost IR camera have an immense potential to replace expansive radiometric cameras in these applications, however low-cost microbolometer-based IR cameras are prone to spatially-variant nonuniformity and to drift in temperature measurements, which limits their usability in practical scenarios. To address these limitations, we propose a novel approach for simultaneous temperature estimation and nonuniformity correction from multiple frames captured by low-cost microbolometer-based IR cameras. We leverage the physical image acquisition model of the camera and incorporate it into a deep learning architecture called kernel estimation networks (KPN), which enables us to combine multiple frames despite imperfect registration between them. We also propose a novel offset block that incorporates the ambient temperature into the model and enables us to estimate the offset of the camera, which is a key factor in temperature estimation. Our findings demonstrate that the number of frames has a significant impact on the accuracy of temperature estimation and nonuniformity correction. Moreover, our approach achieves a significant improvement in performance compared to vanilla KPN, thanks to the offset block. The method was tested on real data collected by a low-cost IR camera mounted on a UAV, showing only a small average error of $0.27^\circ C-0.54^\circ C$ relative to costly scientific-grade radiometric cameras. Our method provides an accurate and efficient solution for simultaneous temperature estimation and nonuniformity correction, which has important implications for a wide range of practical applications.

翻訳日:2023-07-25 16:49:40 公開日:2023-07-23

# 分類法と早期糖尿病の比較分析

Comparative analysis using classification methods versus early stage diabetes ( http://arxiv.org/abs/2307.12296v1 )

ライセンス: Link先を確認

Alca-Vilca Gabriel Anthony, Carpio-Vargas Eloy

(参考訳) 本研究では, 早期糖尿病の有無を判断するために, 判別分析やロジスティック回帰などの分類法を用いて比較分析を行った。この目的のために、2020年のUC IRVINEプラットフォーム(英語版)のデータベースを用いており、糖尿病に影響を与える特定の変数がより良い結果を得るために使用された。方法論的にも同様に、3つの分類法それぞれについて対応する解析を行い、比較表に含め、得られた結果を解析した。最後に, 分類法を疾患に適用した研究の大部分は, ロジスティック回帰分類法に一定のアタッチメントがあり, さらなる利用が期待できるが, その結果, 適用された2つの分類法に関して有意な差がみられ, 最終的な結論を導く上で貴重な情報となった。

In this research work, a comparative analysis was carried out using classification methods such as: Discriminant Analysis and Logistic Regression to subsequently predict whether a person may have the presence of early stage diabetes. For this purpose, use was made of a database of the UC IRVINE platform of the year 2020 where specific variables that influence diabetes were used for a better result. Likewise in terms of methodology, the corresponding analysis was performed for each of the 3 classification methods and then take them to a comparative table and analyze the results obtained. Finally we can add that the majority of the studies carried out applying the classification methods to the diseases can be clearly seen that there is a certain attachment and more use of the logistic regression classification method, on the other hand, in the results we could see significant differences in terms of the 2 classification methods that were applied, which was valuable information for later drawing final conclusions.

翻訳日:2023-07-25 16:49:10 公開日:2023-07-23

# 量子分類器の散逸学習

Dissipative learning of a quantum classifier ( http://arxiv.org/abs/2307.12293v1 )

ライセンス: Link先を確認

Ufuk Korkmaz, Deniz T\"urkpen\c{c}e

(参考訳) 量子計算が機械学習アルゴリズムにパフォーマンスの利点をもたらすかもしれないという期待は、ニューラルネットワークの量子バージョンの開発を動機付けている。本研究では,標準量子回路モデルの代替となるオープン量子システムとして機能する量子分類器モデルの学習力学を解析する。得られた結果から、勾配降下(GD)に基づくアルゴリズムを用いて、モデルをうまく訓練することができる。これらの最適化プロセスが連続ダイナミクスで得られたという事実は、分類器モデルの微分可能活性化関数の開発に有望であることを示している。

The expectation that quantum computation might bring performance advantages in machine learning algorithms motivates the work on the quantum versions of artificial neural networks. In this study, we analyze the learning dynamics of a quantum classifier model that works as an open quantum system which is an alternative to the standard quantum circuit model. According to the obtained results, the model can be successfully trained with a gradient descent (GD) based algorithm. The fact that these optimization processes have been obtained with continuous dynamics, shows promise for the development of a differentiable activation function for the classifier model.

翻訳日:2023-07-25 16:48:55 公開日:2023-07-23

# TransHuman: 汎用型ニューラルヒューマンレンダリングのためのトランスフォーマーに基づく人間表現

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering ( http://arxiv.org/abs/2307.12291v1 )

ライセンス: Link先を確認

Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

(参考訳) 本稿では,異なる文字のマルチビュー映像から条件付きニューラルレイディアンス場(NeRF)を訓練する,一般化可能なニューラルヒューマンレンダリングの課題に焦点を当てる。ダイナミックな人間の動きを扱うために、従来の手法は主にSparseConvNet(SPC)ベースの人間の表現を使用して、塗装されたSMPLを処理する。しかし、そのようなSPCベースの表現一トレーニングと推論段階の相違につながる揮発性観測空間の下で最適化すること。二不完全塗布されたSMPLの処理に欠かせない部分のグローバルな関係を欠いていること。これらの問題に対処するため,トランスヒューマン(TransHuman)という新しいフレームワークを提案する。このフレームワークは,塗装されたSMPLを標準空間下で学習し,トランスフォーマーによる人間の世界的関係を捉える。具体的には、TransHumanは主にTransformerベースのHuman Encoding(TransHE)、Deformable partial Radiance Fields(DPaRF)、FDI(Fin-fine Detail Integration)で構成されている。 TransHEはまず、塗られたSMPLを変換器を介して標準的な空間下で処理し、人間の部分間のグローバルな関係を捉える。そして、DPaRFは、各出力トークンを、観測空間下でクエリポイントを符号化する変形可能な放射場にバインドする。最後に、FDIを使用して参照画像からのきめ細かい情報をさらに統合する。 ZJU-MoCapとH36Mの大規模な実験により、我々のTransHumanは、高い効率で最先端のパフォーマンスを著しく向上することを示した。プロジェクトページ: https://pansanity666.github.io/TransHuman/

In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. To handle the dynamic human motion, previous methods have primarily used a SparseConvNet (SPC)-based human representation to process the painted SMPL. However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL. Tackling these issues, we present a brand-new framework named TransHuman, which learns the painted SMPL under the canonical space and captures the global relationships between human parts with transformers. Specifically, TransHuman is mainly composed of Transformer-based Human Encoding (TransHE), Deformable Partial Radiance Fields (DPaRF), and Fine-grained Detail Integration (FDI). TransHE first processes the painted SMPL under the canonical space via transformers for capturing the global relationships between human parts. Then, DPaRF binds each output token with a deformable radiance field for encoding the query point under the observation space. Finally, the FDI is employed to further integrate fine-grained information from reference images. Extensive experiments on ZJU-MoCap and H36M show that our TransHuman achieves a significantly new state-of-the-art performance with high efficiency. Project page: https://pansanity666.github.io/TransHuman/

翻訳日:2023-07-25 16:48:46 公開日:2023-07-23

# 時系列ゲームのためのコントローラ合成

Controller Synthesis for Timeline-based Games ( http://arxiv.org/abs/2307.12289v1 )

ライセンス: Link先を確認

Renato Acampora and Luca Geatti and Nicola Gigante and Angelo Montanari and Valentino Picotti

(参考訳) 計画に対するタイムラインベースのアプローチでは、状態変数(タイムライン)の集合の時間的経過は、時間的制約の集合によって制御される。従来のタイムラインベースの計画システムは、時間的不確実性を扱うことによって計画と実行の統合に優れている。一般の非決定性を扱うために、タイムラインベースのゲームの概念が最近導入された。このようなゲームに勝利戦略が存在するかどうかが2EXPTIME完全であることが証明されている。しかし、そのような戦略を実装するコントローラを合成する具体的なアプローチは欠落している。本稿では,このギャップを埋めるために,タイムラインベースのゲームに対して,効果的かつ計算的に最適なコントローラ合成手法を提案する。

In the timeline-based approach to planning, the evolution over time of a set of state variables (the timelines) is governed by a set of temporal constraints. Traditional timeline-based planning systems excel at the integration of planning with execution by handling temporal uncertainty. In order to handle general nondeterminism as well, the concept of timeline-based games has been recently introduced. It has been proved that finding whether a winning strategy exists for such games is 2EXPTIME-complete. However, a concrete approach to synthesize controllers implementing such strategies is missing. This paper fills this gap, by providing an effective and computationally optimal approach to controller synthesis for timeline-based games.

翻訳日:2023-07-25 16:48:19 公開日:2023-07-23

# 時間的ネットワーク分析:Rを用いた導入,方法,詳細なチュートリアル

Temporal network analysis: Introduction, methods and detailed tutorial with R ( http://arxiv.org/abs/2307.12339v1 )

ライセンス: Link先を確認

Mohammed Saqr

(参考訳) 学習には、関係、相互作用、学習者、教師、世界全体とのつながりが含まれる。このような相互作用は本質的に時間的かつ時間的展開である。しかし、研究者は分析フレームワークに2つの側面(時間的側面と関係的側面)を組み合わせることはめったにない。時間的ネットワークは、きめ細かな動的解析を通じて、活動、コミュニティ、社会プロセスの出現と流れといった時間的学習プロセスのモデル化を可能にする。これは知識の共構築、情報の流れ、関係構築のような現象に関する洞察を与えることができる。本章では、時間的ネットワークの基本概念、その種類、技術を紹介する。本章では,ネットワークの構築,可視化,ノードおよびグラフレベルでの数学的解析から始める,時間的ネットワーク解析の詳細なガイドを紹介する。分析は実世界のデータセットで実行される。議論の章では、技術に関する知識を広げたい興味のあるユーザに、追加のリソースを提供している。

Learning involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time. Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework. Temporal networks allow modeling of the temporal learning processes i.e., the emergence and flow of activities, communities, and social processes through fine-grained dynamic analysis. This can provide insights into phenomena like knowledge co-construction, information flow, and relationship building. This chapter introduces the basic concepts of temporal networks, their types and techniques. A detailed guide of temporal network analysis is introduced in this chapter, that starts with building the network, visualization, mathematical analysis on the node and graph level. The analysis is performed with a real-world dataset. The discussion chapter offers some extra resources for interested users who want to expand their knowledge of the technique.

翻訳日:2023-07-25 16:43:08 公開日:2023-07-23

# TabADM:拡散モデルによる教師なし喉頭異常検出

TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models ( http://arxiv.org/abs/2307.12336v1 )

ライセンス: Link先を確認

Guy Zamberg and Moshe Salhov and Ofir Lindenbaum and Amir Averbuch

(参考訳) テーブルは、あらゆる科学分野のユースケースを持つ豊富な形式のデータである。実世界のデータセットは、下流の分析に悪影響を及ぼす可能性のある異常なサンプルを含むことが多い。本研究では,汚染データへのアクセスを想定し,非教師あり異常検出に有効な拡散に基づく確率モデルを提案する。本モデルでは, 特異な拒絶スキームを用いて, 正常試料の密度分布を学習し, 異常が密度推定に与える影響を弱めるように訓練した。低密度領域のサンプルとして異常を同定する。実データを用いて,本手法がベースラインよりも検出能力を向上させることを示す。さらに,本手法はデータ次元に対して比較的安定であり,広範囲なハイパーパラメータチューニングを必要としない。

Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.

翻訳日:2023-07-25 16:42:51 公開日:2023-07-23

# セマンティックマップによるナビゲーション視覚表現の学習

Learning Navigational Visual Representations with Semantic Map Supervision ( http://arxiv.org/abs/2307.12335v1 )

ライセンス: Link先を確認

Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan

(参考訳) 家庭用ロボットの視覚的ナビゲーションには,環境の意味や空間構造を知覚できることが不可欠である。しかし、既存のほとんどの作品は、独立した分類のための画像や、屋内ナビゲーション領域に適応するための自己教師付き学習手法で事前訓練された視覚的バックボーンのみを用いており、ナビゲーションの学習に不可欠な空間的関係を無視している。本稿では,人間が自然に脳に意味的かつ空間的に有意味な認知地図を構築する行動に着想を得て,エージェントの自己中心的視点と意味的地図(ego$^2$-map)を対比して,新たなナビゲーション固有視覚表現学習法を提案する。バックボーンエンコーダとしてビジュアルトランスフォーマーを適用し,大規模Habitat-Matterport3D環境から収集したデータを用いてモデルを訓練する。 Ego$^2$-Map学習は、オブジェクト、構造、遷移などのコンパクトでリッチな情報を、ナビゲーションのためのエージェントのエゴセントリックな表現に転送する。実験の結果,学習した目標ナビゲーション表現を用いたエージェントは,近年の視覚前訓練法よりも優れていた。さらに,高レベルかつ低レベルなアクション空間の連続環境における視覚・言語ナビゲーションを著しく改善し,テストサーバ上での47%のSRと41%のSPLの新たな最先端結果を実現した。

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego$^2$-Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments. Ego$^2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation. Experiments show that agents using our learned representations on object-goal navigation outperform recent visual pre-training methods. Moreover, our representations significantly improve vision-and-language navigation in continuous environments for both high-level and low-level action spaces, achieving new state-of-the-art results of 47% SR and 41% SPL on the test server.

翻訳日:2023-07-25 16:42:35 公開日:2023-07-23

# 深部ニューラルネットワークの公理化PDEモデル

An axiomatized PDE model of deep neural networks ( http://arxiv.org/abs/2307.12333v1 )

ライセンス: Link先を確認

Tangjun Wang, Wenqi Tao, Chenglong Bao, Zuoqiang Shi

(参考訳) ディープニューラルネットワーク (DNN) と偏微分方程式 (PDE) の関係に着想を得て, ディープニューラルネットワークのPDEモデルの一般形について検討する。この目的を達成するために、単純なベースモデルからDNNを進化演算子として定式化する。いくつかの合理的な仮定に基づいて、進化作用素が実際に対流拡散方程式によって決定されることを示す。この対流拡散方程式モデルは、いくつかの有効なネットワークの数学的説明を与える。さらに,対流拡散モデルによりロバスト性が向上し,Rademacherの複雑性が低下することを示す。対流拡散方程式に基づいて,ResNetsの新しいトレーニング手法を設計する。提案手法の性能を検証する実験を行った。

Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method.

翻訳日:2023-07-25 16:41:36 公開日:2023-07-23

# フェイクニュース検出のためのX-CapsNet

X-CapsNet For Fake News Detection ( http://arxiv.org/abs/2307.12332v1 )

ライセンス: Link先を確認

Mohammad Hadi Goldani, Reza Safabakhsh, and Saeedeh Momtazi

(参考訳) ウェブベースのフォーラムやソーシャルメディアの普及に伴い、ニュースの消費は大幅に増加した。これは、人々に誤解を与え、混乱させる舞台となる。ユーザの健康関連判断やその他の意図に対する誤った情報の影響を減らすために、フェイクニュースを自動的に検出し、対処するための機械学習モデルが望まれる。本稿では, X-CapsNet と呼ばれる Capsule Neural Networks (CapsNet) を用いたトランスフォーマーモデルを提案する。このモデルには、短くて長いフェイクニュース文を検出するサイズベースの分類器をパラライズした動的ルーティングアルゴリズムを備えたcapsnetが含まれている。 2つのサイズベースの分類器と、長い偽ニュース文を検出するディープ畳み込みニューラルネットワーク(dcnn)と、短いニュース文を検出する多層パーセプトロン(mlp)を使用する。短いニュース文の表現の問題を解決するために、ニュース話者プロファイルのベクトルと、ニュース文の極性、感情、カウントのベクトルを連結した間接的なニュース特徴を用いる。提案するアーキテクチャを評価するために、covid-19とliarデータセットを使用する。 Covid-19データセットのF1スコアとLiarデータセットの精度から見ると、モデルは最先端のベースラインよりも優れたパフォーマンスを示している。

News consumption has significantly increased with the growing popularity and use of web-based forums and social media. This sets the stage for misinforming and confusing people. To help reduce the impact of misinformation on users' potential health-related decisions and other intents, it is desired to have machine learning models to detect and combat fake news automatically. This paper proposes a novel transformer-based model using Capsule neural Networks(CapsNet) called X-CapsNet. This model includes a CapsNet with dynamic routing algorithm paralyzed with a size-based classifier for detecting short and long fake news statements. We use two size-based classifiers, a Deep Convolutional Neural Network (DCNN) for detecting long fake news statements and a Multi-Layer Perceptron (MLP) for detecting short news statements. To resolve the problem of representing short news statements, we use indirect features of news created by concatenating the vector of news speaker profiles and a vector of polarity, sentiment, and counting words of news statements. For evaluating the proposed architecture, we use the Covid-19 and the Liar datasets. The results in terms of the F1-score for the Covid-19 dataset and accuracy for the Liar dataset show that models perform better than the state-of-the-art baselines.

翻訳日:2023-07-25 16:41:18 公開日:2023-07-23

# ES2Net:ハイパースペクトル画像変化検出のための効率的なスペクトル空間ネットワーク

ES2Net: An Efficient Spectral-Spatial Network for Hyperspectral Image Change Detection ( http://arxiv.org/abs/2307.12327v1 )

ライセンス: Link先を確認

Qingren Yao, Yuan Zhou, and Wei Xiang

(参考訳) ハイパースペクトル画像変化検出(HSI-CD)は,両眼的HSIの違いを特定することを目的としている。スペクトルの冗長性を緩和し,特徴変化の識別性を向上するため,CDの帯域選択に帯域選択技術を導入した手法もある。しかし、これらの手法は、深層学習に基づく特徴抽出器によるエンドツーエンドの訓練ができないことと、バンド間の複雑な非線形関係を考慮していないことによる制限がある。本稿では,これらの問題に対処するためのスペクトル空間変化検出ネットワーク(ES2Net)を提案する。具体的には,CDに習熟したバンドを自動選択する学習可能なバンド選択モジュールを考案した。特徴抽出ネットワークと共同で最適化し、バンド間の複雑な非線形関係を捉えることができる。さらに,異なる帯域間の空間的特徴分布の相違を考慮し,各バンドに空間的注意因子を割り当てるクラスタ単位の空間的注意機構を設計し,各バンドの特徴識別性を個別に改善する。 3つの広く使われているHSI-CDデータセットの実験は、他の最先端手法と比較して、この手法の有効性と優位性を示している。

Hyperspectral image change detection (HSI-CD) aims to identify the differences in bitemporal HSIs. To mitigate spectral redundancy and improve the discriminativeness of changing features, some methods introduced band selection technology to select bands conducive for CD. However, these methods are limited by the inability to end-to-end training with the deep learning-based feature extractor and lack considering the complex nonlinear relationship among bands. In this paper, we propose an end-to-end efficient spectral-spatial change detection network (ES2Net) to address these issues. Specifically, we devised a learnable band selection module to automatically select bands conducive to CD. It can be jointly optimized with a feature extraction network and capture the complex nonlinear relationships among bands. Moreover, considering the large spatial feature distribution differences among different bands, we design the cluster-wise spatial attention mechanism that assigns a spatial attention factor to each individual band to individually improve the feature discriminativeness for each band. Experiments on three widely used HSI-CD datasets demonstrate the effectiveness and superiority of this method compared with other state-of-the-art methods.

翻訳日:2023-07-25 16:40:56 公開日:2023-07-23

# 分散VQEのための多層HEA間の単一絡み合い接続アーキテクチャ

Single Entanglement Connection Architecture between Multi-Layer HEA for Distributed VQE ( http://arxiv.org/abs/2307.12323v1 )

ライセンス: Link先を確認

Shikun Zhang, Zheng Qin, Yang Zhou, Rui Li, Chunxiao Du, Zhisong Xiao

(参考訳) 現在のノイズの多い中間量子(NISQ)デバイス上での大規模量子コンピューティングの実現は、短期的な量子優位を達成する鍵となる。本稿では、VQEにおける多層ハードウェア効率アンサツ(HEA)のための単一絡み合い接続アーキテクチャ(SECA)を提案し、ゲート切断技術と組み合わせて分散VQE(DVQE)を構築し、低オーバーヘッド下でNISQデバイスのサイズを効率的に拡張する。 2次元イジングモデルとハイゼンベルクモデルを用いたシミュレーション実験を行った。数値計算の結果,SEACの表現性,安定性,計算性能は,完全絡み合い接続アーキテクチャ (FECA) と比較して,絡み合い能力の損失が少なかった場合に優れていた。さらに,DVQEがFECAよりも有効性が高いことを示す。最後に, シミュレーション実験に現れる興味深い現象を用いて, 表現可能性, 絡み込み能力, 計算性能の関係について考察する。

Realization of large-scale quantum computing on current noisy intermediate-scale quantum (NISQ) devices is the key to achieving near-term quantum advantage. In this work, we propose the single entanglement connection architecture (SECA) for the multi-layer hardware-efficient ansatz (HEA) in VQE and combine it with the gate cutting technology to construct distributed VQE (DVQE) which can efficiently expand the size of NISQ devices under low overheads. Simulation experiments with the two-dimensional Ising model as well as Heisenberg model are conducted. Our numerical results indicate a superiority of SEAC in expressibility, stability and computational performance at the cost of a little loss in entangling capability compared with the full entanglement connection architecture (FECA). Furthermore, we find evidence that the DVQE also outperforms the FECA in terms of effectiveness. Finally, we discuss the open question about the relationship among expressibility, entangling capability and computational performance with some interesting phenomenon appearing in simulation experiments.

翻訳日:2023-07-25 16:40:38 公開日:2023-07-23

# 古典, 量子, 閉, 開システムの作用

Action for classical, quantum, closed and open systems ( http://arxiv.org/abs/2307.12320v1 )

ライセンス: Link先を確認

Janos Polonyi

(参考訳) 作用汎関数は、古典力学、量子力学、閉力学、開力学を、それぞれ、変分原理の一般化と、古典力学および量子力学における経路積分形式論で定義するのに使うことができる。これらのスキームは異常な特徴、すなわち自由度を正式に再活性化することに基づいている。このような再結合を動機付ける5つの議論が、そのような形式主義が自然であることを示すために提出される。異なる議論の共通の要素は因果時間矢印である。デコヒーレンス、散逸、古典的限界に関するいくつかの教訓も言及されている。

The action functional can be used to define classical, quantum, closed, and open dynamics in a generalization of the variational principle and in the path integral formalism in classical and quantum dynamics, respectively. These schemes are based on an unusual feature, a formal redoubling of the degrees of freedom. Five arguments to motivate such a redoubling are put forward to demonstrate that such a formalism is natural. The common elements of the different arguments is the causal time arrow. Some lessons concerning decoherence, dissipation and the classical limits are mentioned, too.

翻訳日:2023-07-25 16:40:22 公開日:2023-07-23

# 3つの異なるディープラーニングモデルを組み合わせた心膜脂肪数画像の開発

Development of pericardial fat count images using a combination of three different deep-learning models ( http://arxiv.org/abs/2307.12316v1 )

ライセンス: Link先を確認

Takaaki Matsunaga, Atsushi Kono, Hidetoshi Matsuo, Kaoru Kitagawab, Mizuho Nishio, Hiromi Hashimura, Yu Izawa, Takayoshi Toba, Kazuki Ishikawab, Akie Katsuki, Kazuyuki Ohmura, Takamichi Murakami

(参考訳) Rationale and Objectives: 心臓を囲む胸部内臓脂肪である心膜脂肪(PF)は、冠動脈の炎症を誘発することにより、冠動脈疾患の発生を促進する。本研究の目的は,胸部X線写真(CXR)から心膜脂肪数画像(PFCI)を専用のディープラーニングモデルを用いて生成することであった。資料と方法:冠動脈ctを施行した269例について検討した。金属インプラント,胸水,胸腔内手術歴,悪性腫瘍は除外された。対象は191例であった。 PFCIは3次元CT像の投影から生成され, 脂肪蓄積は高ピクセル値で表現された。 CXRからPFCIを生成するために,CycleGANを含む3つの異なるディープラーニングモデルを組み合わせた。提案手法との比較のために,CXRからPFCIを生成するために,CycleGANをベースとした単一モデルを用いた。生成されたPFCIの画像品質、構造類似度指標(SSIM)、平均二乗誤差(MSE)、平均絶対誤差(MAE)を評価する。 i)提案手法を用いて生成されたPFCI及び (II) 単一モデルを用いて生成されたPFCIを比較した。結果: 平均SSIM, MSE, MAEはそれぞれ0.856, 0.0128, 0.0357, それぞれ0.762, 0.0198, 0.0504であった。結論: 提案モデルを用いてCXRから生成されたPFCIは, 単一モデルよりも優れた性能を示した。提案手法ではCTのないPFCI評価が可能である。

Rationale and Objectives: Pericardial fat (PF), the thoracic visceral fat surrounding the heart, promotes the development of coronary artery disease by inducing inflammation of the coronary arteries. For evaluating PF, this study aimed to generate pericardial fat count images (PFCIs) from chest radiographs (CXRs) using a dedicated deep-learning model. Materials and Methods: The data of 269 consecutive patients who underwent coronary computed tomography (CT) were reviewed. Patients with metal implants, pleural effusion, history of thoracic surgery, or that of malignancy were excluded. Thus, the data of 191 patients were used. PFCIs were generated from the projection of three-dimensional CT images, where fat accumulation was represented by a high pixel value. Three different deep-learning models, including CycleGAN, were combined in the proposed method to generate PFCIs from CXRs. A single CycleGAN-based model was used to generate PFCIs from CXRs for comparison with the proposed method. To evaluate the image quality of the generated PFCIs, structural similarity index measure (SSIM), mean squared error (MSE), and mean absolute error (MAE) of (i) the PFCI generated using the proposed method and (ii) the PFCI generated using the single model were compared. Results: The mean SSIM, MSE, and MAE were as follows: 0.856, 0.0128, and 0.0357, respectively, for the proposed model; and 0.762, 0.0198, and 0.0504, respectively, for the single CycleGAN-based model. Conclusion: PFCIs generated from CXRs with the proposed model showed better performance than those with the single model. PFCI evaluation without CT may be possible with the proposed method.

翻訳日:2023-07-25 16:40:13 公開日:2023-07-23

# 不確実性認識ネットワークによるリモートセンシング画像からのビルディング抽出

Building Extraction from Remote Sensing Images via an Uncertainty-Aware Network ( http://arxiv.org/abs/2307.12309v1 )

ライセンス: Link先を確認

Wei He, Jiepan Li, Weinan Cao, Liangpei Zhang, Hongyan Zhang

(参考訳) ビルディング抽出はリモートセンシング画像から画素を分割することを目的としており、都市計画や都市動態モニタリングといった多くの用途において重要な役割を担っている。近年,エンコーダ・デコーダアーキテクチャを用いたディープラーニング手法は,その強力な特徴表現能力により,優れた性能を発揮している。しかし、建物の規模や様式が様々であるため、従来のディープラーニングモデルは常に不確実な予測に悩まされており、建物の完全な足跡と地上の複雑な分布を正確に区別できないため、多くの欠落や委任が生じる。本稿では,不確実な予測の重要性を認識し,この問題を緩和するための新規かつ簡単な不確実性認識ネットワーク(UANet)を提案する。提案したUANetの性能を検証するため、WHUビルディングデータセット、マサチューセッツビルディングデータセット、Inria空中画像データセットを含む3つのパブリックビルディングデータセットに対して広範な実験を行った。その結果、提案したUANetは、他の最先端アルゴリズムよりも大きなマージンで優れていることが示された。

Building extraction aims to segment building pixels from remote sensing images and plays an essential role in many applications, such as city planning and urban dynamic monitoring. Over the past few years, deep learning methods with encoder-decoder architectures have achieved remarkable performance due to their powerful feature representation capability. Nevertheless, due to the varying scales and styles of buildings, conventional deep learning models always suffer from uncertain predictions and cannot accurately distinguish the complete footprints of the building from the complex distribution of ground objects, leading to a large degree of omission and commission. In this paper, we realize the importance of uncertain prediction and propose a novel and straightforward Uncertainty-Aware Network (UANet) to alleviate this problem. To verify the performance of our proposed UANet, we conduct extensive experiments on three public building datasets, including the WHU building dataset, the Massachusetts building dataset, and the Inria aerial image dataset. Results demonstrate that the proposed UANet outperforms other state-of-the-art algorithms by a large margin.

翻訳日:2023-07-25 16:39:38 公開日:2023-07-23

# 大規模言語モデルにおける文脈学習はラベル関係を学習するが、従来の学習ではない

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning ( http://arxiv.org/abs/2307.12375v1 )

ライセンス: Link先を確認

Jannik Kossen, Tom Rainforth, Yarin Gal

(参考訳) 下流タスクにおけるLarge Language Models (LLM) の性能は、文脈における入力-ラベル関係の例を含むと、しばしば著しく改善される。例えば、Xie et al. (2021)は、ICLを汎用学習アルゴリズムに例えたが、Min et al. (2022b)は、ICLはインコンテキストの例からラベル関係を学ばないと主張している。本稿では,(1)テキスト内サンプルのラベルが予測にどのように影響するか,(2)事前学習中に学習したラベル関係がテキスト内サンプルとどのように相互作用するか,(3)ICLがテキスト内サンプル間でラベル情報を集約する方法について検討する。この結果から, LLM はテキスト内ラベルからの情報を通常含んでいるが, 事前学習とテキスト内ラベルの関係は異なる扱いがなされており, モデルがすべてのテキスト内情報を等しく考慮していないことが示唆された。私たちの結果は、llmの動作の理解と調整に関する洞察を与えます。

The performance of Large Language Models (LLMs) on downstream tasks often improves significantly when including examples of the input-label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works: for example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022b) argue ICL does not even learn label relationships from in-context examples. In this paper, we study (1) how labels of in-context examples affect predictions, (2) how label relationships learned during pre-training interact with input-label examples provided in-context, and (3) how ICL aggregates label information across in-context examples. Our findings suggests LLMs usually incorporate information from in-context labels, but that pre-training and in-context label relationships are treated differently, and that the model does not consider all in-context information equally. Our results give insights into understanding and aligning LLM behavior.

翻訳日:2023-07-25 16:32:39 公開日:2023-07-23

# 対話要約における感情ニュアンスの評価

Evaluating Emotional Nuances in Dialogue Summarization ( http://arxiv.org/abs/2307.12371v1 )

ライセンス: Link先を確認

Yongxin Zhou, Fabien Ringeval, Fran\c{c}ois Portet

(参考訳) 自動対話要約は、人間の会話から最も重要なコンテンツを識別し、短いテキスト要約を作成することを目的とした、十分に確立されたタスクである。この分野の最近の進歩にもかかわらず、研究の大部分は事実情報の要約に重点を置いており、人間のインタラクションの分析、監視、支援に有用な情報を伝達できる情緒的な内容は別として残されている。本稿では,対話要約にどれだけの感情が保存されているかを定量化するために,$PEmo$の一連の尺度を提案し,評価する。その結果, 要約モデルでは, 要約中の感情的内容がよく保存されないことがわかった。また,学習セットを感情対話のみに還元することで,生成した要約文に感情内容が保存され,最も有意義な事実情報を保存できることを示した。

Automatic dialogue summarization is a well-established task that aims to identify the most important content from human conversations to create a short textual summary. Despite recent progress in the field, we show that most of the research has focused on summarizing the factual information, leaving aside the affective content, which can yet convey useful information to analyse, monitor, or support human interactions. In this paper, we propose and evaluate a set of measures $PEmo$, to quantify how much emotion is preserved in dialog summaries. Results show that, summarization models of the state-of-the-art do not preserve well the emotional content in the summaries. We also show that by reducing the training set to only emotional dialogues, the emotional content is better preserved in the generated summaries, while conserving the most salient factual information.

翻訳日:2023-07-25 16:32:18 公開日:2023-07-23

# 米軍退役軍人の縦断的電子健康記録からの症状を利用したアルツハイマー病の早期予知

Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans ( http://arxiv.org/abs/2307.12369v1 )

ライセンス: Link先を確認

Rumeng Li, Xun Wang, Dan Berlowitz, Brian Silver, Wen Hu, Heather Keating, Raelene Goodwin, Weisong Liu, Honghuang Lin, Hong Yu

(参考訳) アルツハイマー病(AD)の早期予測は、時間的介入と治療に不可欠である。本研究は,ad患者の縦断的電子健康記録(ehrs)を分析し,早期に発症を予測できる徴候や症状を識別するために,機械学習を用いて行う。 2004年から2021年まで、米国退役軍人健康管理局(VHA)の縦型EHRを用いたケースコントロール設計を行った。 ICD-10-CMコードに基づいて1/1/2016後にADと診断されたVHA患者は、年齢、性別、臨床利用の順に1:9と一致した。我々は,AD関連キーワードのパネルと,患者の縦 EHR における時間的変化を,4つの機械学習モデルを用いたAD予測の予測因子として使用した。年齢・性別・人種・民族によるサブグループ分析を行い, ホールドアウトおよび「見えない」VHA局群でモデルを検証した。モデル判別,キャリブレーション,その他の関連する指標は,ICDによる診断の最大10年前に報告された。調査対象者は16,701例,39,097例であった。診断が近づいた症例では、広告関連キーワード(例えば「集中」や「話し」)の平均数が10から40以上と急速に増加し、一方、コントロールについては10以上にとどまった。最良のモデルは、ICDベースの診断より10年以上前のデータを用いた予測において高い判別精度(ROCAUC 0.997)を達成した。このモデルは、65歳未満の患者(rocauc 0.746)を除いて、年齢、性別、人種/民族のサブグループ間で一貫性がある(hosmer-lemeshow goodness-of-fit p-value = 0.99)。 EHRノートから同定されたAD関連キーワードを用いた機械学習モデルは、将来のAD診断を予測することができる。

Early prediction of Alzheimer's disease (AD) is crucial for timely intervention and treatment. This study aims to use machine learning approaches to analyze longitudinal electronic health records (EHRs) of patients with AD and identify signs and symptoms that can predict AD onset earlier. We used a case-control design with longitudinal EHRs from the U.S. Department of Veterans Affairs Veterans Health Administration (VHA) from 2004 to 2021. Cases were VHA patients with AD diagnosed after 1/1/2016 based on ICD-10-CM codes, matched 1:9 with controls by age, sex and clinical utilization with replacement. We used a panel of AD-related keywords and their occurrences over time in a patient's longitudinal EHRs as predictors for AD prediction with four machine learning models. We performed subgroup analyses by age, sex, and race/ethnicity, and validated the model in a hold-out and "unseen" VHA stations group. Model discrimination, calibration, and other relevant metrics were reported for predictions up to ten years before ICD-based diagnosis. The study population included 16,701 cases and 39,097 matched controls. The average number of AD-related keywords (e.g., "concentration", "speaking") per year increased rapidly for cases as diagnosis approached, from around 10 to over 40, while remaining flat at 10 for controls. The best model achieved high discriminative accuracy (ROCAUC 0.997) for predictions using data from at least ten years before ICD-based diagnoses. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.99) and consistent across subgroups of age, sex and race/ethnicity, except for patients younger than 65 (ROCAUC 0.746). Machine learning models using AD-related keywords identified from EHR notes can predict future AD diagnoses, suggesting its potential use for identifying AD risk using EHR notes, offering an affordable way for early screening on large population.

翻訳日:2023-07-25 16:32:04 公開日:2023-07-23

# ゆらぎ定理と期待効用仮説

Fluctuation theorems and expected utility hypothesis ( http://arxiv.org/abs/2307.12358v1 )

ライセンス: Link先を確認

Gianluca Francica, Luca Dell'Anna

(参考訳) 期待された効用仮説は経済学において一般的な概念であり、支払いが不確実な場合に決定を下すのに役立つ。本稿では,予測効用理論における揺らぎ定理の影響について考察する。特に、エントロピーがギャンブルのガイドラインになるかどうか疑問である。我々は、生成するエントロピーに依存する確実性同値を含む境界の存在を証明する。次に,非平衡初期状態からの作業抽出など,特定の状況に着目し,エントロピーに等価な確実性の依存性について検討する。

The expected utility hypothesis is a popular concept in economics that is useful for making decisions when the payoff is uncertain. In this paper, we investigate the implications of a fluctuation theorem in the theory of expected utility. In particular, we wonder whether entropy could serve as a guideline for gambling. We prove the existence of a bound involving the certainty equivalent which depends on the entropy produced. Then, we examine the dependence of the certainty equivalent on the entropy by looking at specific situations, for instance, the work extraction from a non-equilibrium initial state.

翻訳日:2023-07-25 16:31:34 公開日:2023-07-23

# ComPtr: 単純かつ汎用的なコンバータによる双方向Dense予測タスクの実現

ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer ( http://arxiv.org/abs/2307.12349v1 )

ライセンス: Link先を確認

Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

(参考訳) ディープラーニング(DL)は、密集予測の分野を前進させ、異なるタスク間の固有の障壁を徐々に解消した。しかし、既存の作品の多くはアーキテクチャの設計と、dlパラダイムによってもたらされる潜在的な均一性を無視した特定のタスクのための視覚的な手がかりの構築に焦点を当てている。本稿では,多種多様なbi-source高密度予測タスクのための新規な \underline{ComP}lementary \underline{tr}ansformer, \textbf{ComPtr} の構築を試みる。具体的には、単一のタスクやタスクのサブセットで過剰に特殊化する既存の方法とは異なり、ComPtrはより一般的な二ソース密集予測の概念から始まる。情報相補性に対する基本的依存に基づいて,ComPtrが様々なタスクのために,様々な画像ソースから重要な視覚的意味的手がかりを抽出・収集する,一貫性の強化と差分認識コンポーネントを提案する。 ComPtrは異なる入力を等しく扱い、変換器上にシーケンス・ツー・シーケンスの形で効率的な密な相互作用モデルを構築する。このタスクジェネリック設計は、様々な双方向情報を同時に処理できる統一モデルを構築するためのスムーズな基盤を提供する。リモートセンシングによる変化検出,RGB-T集団カウント,RGB-D/Tサルエントオブジェクト検出,RGB-Dセマンティックセマンティックセマンティックセグメンテーションなど,複数の代表的な視覚課題に対する広範な実験において,提案手法は一貫して良好な性能を得る。コードは \url{https://github.com/lartpang/ComPtr} で入手できる。

Deep learning (DL) has advanced the field of dense prediction, while gradually dissolving the inherent barriers between different tasks. However, most existing works focus on designing architectures and constructing visual cues only for the specific task, which ignores the potential uniformity introduced by the DL paradigm. In this paper, we attempt to construct a novel \underline{ComP}lementary \underline{tr}ansformer, \textbf{ComPtr}, for diverse bi-source dense prediction tasks. Specifically, unlike existing methods that over-specialize in a single task or a subset of tasks, ComPtr starts from the more general concept of bi-source dense prediction. Based on the basic dependence on information complementarity, we propose consistency enhancement and difference awareness components with which ComPtr can evacuate and collect important visual semantic cues from different image sources for diverse tasks, respectively. ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer. This task-generic design provides a smooth foundation for constructing the unified model that can simultaneously deal with various bi-source information. In extensive experiments across several representative vision tasks, i.e. remote sensing change detection, RGB-T crowd counting, RGB-D/T salient object detection, and RGB-D semantic segmentation, the proposed method consistently obtains favorable performance. The code will be available at \url{https://github.com/lartpang/ComPtr}.

翻訳日:2023-07-25 16:31:26 公開日:2023-07-23

# ResShift: 残差シフトによる画像超解像の効率的な拡散モデル

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting ( http://arxiv.org/abs/2307.12348v1 )

ライセンス: Link先を確認

Zongsheng Yue, Jianyi Wang, Chen Change Loy

(参考訳) 拡散に基づく画像超解像法(SR)は主に、数百から数千のサンプリングステップの要求により、低い推論速度によって制限される。既存の加速サンプリング技術は必然的に性能を犠牲にし、過度なSR結果をもたらす。そこで本稿では,srの新しい効率的な拡散モデルを提案する。拡散ステップ数を大幅に削減し,推論時の高速化の必要性をなくし,それに伴う性能劣化を解消する。本手法では,高分解能画像と低分解能画像との間で残差を移動させ,遷移効率を大幅に向上させるマルコフ連鎖を構築する。また、拡散過程におけるシフト速度と騒音強度を柔軟に制御する精巧なノイズスケジュールを開発する。実験の結果,提案手法は,15段階のサンプリングでも,合成と実世界の両方のデータセットにおいて,現在の最先端手法よりも優れた,あるいは少なくとも同等の性能が得られることが示された。私たちのコードとモデルはhttps://github.com/zsyoaoa/resshiftで利用可能です。

Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps. Our code and model are available at https://github.com/zsyOAOA/ResShift.

翻訳日:2023-07-25 16:30:53 公開日:2023-07-23

# 正しい理由:解釈可能なML技術は偽相関を検出できるか?

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations? ( http://arxiv.org/abs/2307.12344v1 )

ライセンス: Link先を確認

Susu Sun, Lisa M. Koch, Christian F. Baumgartner

(参考訳) ディープニューラルネットワークモデルは、未整合の分類性能を提供するが、データ内の急激な相関を学習する傾向がある。テストデータがトレーニングデータと同じ分布から来ている場合、その情報に対するそのような依存をパフォーマンスメトリクスを使って検出することは困難である。ポストホックな説明や本質的に解釈可能な分類器のような解釈可能なMLメソッドは、欠陥モデル推論を特定することを約束する。しかし、これらの技法が実際にできるかどうかについては諸説ある。本稿では,説明手法のスプリアス相関を正しく識別する能力を評価するための厳密な評価手法を提案する。この戦略を用いて,胸部x線診断タスクにおいて3種類の人工的な共同創設者を検出できるため,ホック後の5つの説明手法と本質的に解釈可能な1つの手法を評価した。ポストホックな手法であるSHAPと本質的に解釈可能なAttri-Netは、最高の性能を提供し、欠陥モデルの振る舞いを確実に識別するために使用できる。

While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis task. We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance and can be used to reliably identify faulty model behavior.

翻訳日:2023-07-25 16:30:35 公開日:2023-07-23

# 音声に基づく感情認識のための自己教師あり学習

Self-Supervised Learning for Audio-Based Emotion Recognition ( http://arxiv.org/abs/2307.12343v1 )

ライセンス: Link先を確認

Peranut Nimitsurachat and Peter Washington

(参考訳) 音声入力データを用いた感情認識モデルは、メンタルヘルス、マーケティング、ゲーム、ソーシャルメディア分析のアプリケーションを含む対話型システムの開発を可能にする。オーディオデータを用いた情緒的コンピューティングの分野は豊富だが、一貫した高性能モデルを達成するための大きな障壁は、利用可能なトレーニングラベルのpaucityである。自己教師付き学習 (SSL) は、データ自体の特性を予測することによって、教師付きラベルの不足にもかかわらず学習できる手法のファミリーである。音声に基づく感情認識における自己教師あり学習の有用性を理解するため,cmu-moseiの音響モダリティから感情の分類に自己教師あり学習前学習を適用した。生の音響データを実験した先行論文とは異なり,本手法は符号化音響データに適用されている。我々のモデルはまず、音響データのランダムにマスクされたタイムスタンプを明らかにするために事前学習される。事前学習されたモデルは、注釈付きデータの小さなサンプルを使って微調整される。最終モデルの性能は、同じバックボーンアーキテクチャを持つベースラインディープラーニングモデルに対して、いくつかの評価指標によって評価される。自己教師型学習は、すべてのメトリクスにわたるモデルの性能を一貫して改善する。本研究は,感情コンピューティングのための自己教師付き学習の有用性を示し,学習例の数が小さい場合,自己教師付き学習が最も有用であること,幸福,悲しみ,怒りなどの分類が容易な感情に対して最も顕著であることを示す。この研究は、生の入力空間で事前学習する従来のアプローチではなく、組み込み特徴表現に適用すると、自己教師付き学習が機能することを示す。

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data. Our model is first pretrained to uncover the randomly-masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via several evaluation metrics against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics. This work shows the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small, and that the effect is most pronounced for emotions which are easier to classify such as happy, sad and anger. This work further demonstrates that self-supervised learning works when applied to embedded feature representations rather than the traditional approach of pre-training on the raw input space.

翻訳日:2023-07-25 16:30:18 公開日:2023-07-23

# ジェネリックと制御可能なオブジェクト検出攻撃に向けて

Towards Generic and Controllable Attacks Against Object Detection ( http://arxiv.org/abs/2307.12342v1 )

ライセンス: Link先を確認

Guopeng Li, Yue Xu, Jian Ding, Gui-Song Xia

(参考訳) 既存のObject Detector(OD)に対する攻撃には、2つの固有の制限がある。まず、odsは複雑なメタ構造設計を持っているため、odsの最も先進的な攻撃は特定の検出器-インタリン構造への攻撃に集中しているため、他の検出器への取り組みが困難であり、odsに対する汎用的な攻撃を設計する動機付けになっている。第二に、ODに対するほとんどの研究は、分類から検出までのイメージレベルの攻撃を一般化し、意味論的に意味のない領域(背景など)で冗長な計算と摂動をもたらし、ODに対する制御可能な攻撃を求める緊急性をもたらす。この目的のために,制御可能な摂動を持つ主流物体検出器を目立たせるための汎用的なホワイトボックス攻撃であるlgp(local perturbation with adaptively global attack)を提案する。検出器に依存しない攻撃の場合、lgpは高品質の提案を追跡し、3つの不均一な損失を同時に最適化する。このようにして、特定の構造の制限なしに、出力の一部でODの重要なコンポーネントを騙すことができる。制御性に関しては,前景と後景の分離を適応的に活用し,前景への摂動の付着を誘導するオブジェクト指向制約を確立する。実験的に提案されたLGPは、MS-COCOおよびDOTAデータセット上の16の最先端物体検出器を攻撃し、有望な不可避性と伝達性を得た。コードはhttps://github.com/liguopeng0923/LGP.gitで公開されている。

Existing adversarial attacks against Object Detectors (ODs) suffer from two inherent limitations. Firstly, ODs have complicated meta-structure designs, hence most advanced attacks for ODs concentrate on attacking specific detector-intrinsic structures, which makes it hard for them to work on other detectors and motivates us to design a generic attack against ODs. Secondly, most works against ODs make Adversarial Examples (AEs) by generalizing image-level attacks from classification to detection, which brings redundant computations and perturbations in semantically meaningless areas (e.g., backgrounds) and leads to an emergency for seeking controllable attacks for ODs. To this end, we propose a generic white-box attack, LGP (local perturbations with adaptively global attacks), to blind mainstream object detectors with controllable perturbations. For a detector-agnostic attack, LGP tracks high-quality proposals and optimizes three heterogeneous losses simultaneously. In this way, we can fool the crucial components of ODs with a part of their outputs without the limitations of specific structures. Regarding controllability, we establish an object-wise constraint that exploits foreground-background separation adaptively to induce the attachment of perturbations to foregrounds. Experimentally, the proposed LGP successfully attacked sixteen state-of-the-art object detectors on MS-COCO and DOTA datasets, with promising imperceptibility and transferability obtained. Codes are publicly released in https://github.com/liguopeng0923/LGP.git

翻訳日:2023-07-25 16:29:53 公開日:2023-07-23

# NIR分光法による土壌炭酸塩の迅速検出、深層学習法および粉末X線回折による相定量

Rapid detection of soil carbonates by means of NIR spectroscopy, deep learning methods and phase quantification by powder Xray diffraction ( http://arxiv.org/abs/2307.12341v1 )

ライセンス: Link先を確認

Lykourgos Chiniadis, Petros Tamvakis

(参考訳) 土壌のnirスペクトル吸収・反射性ライブラリーは農業生産の改善と農業的バランスと環境持続可能性の重要な前提条件である土壌特性の分析に活用されている。特に炭酸塩は土壌の性質を表しており、気候変動による温暖な環境の変化によっても影響を受ける。本研究では,FT NIR反射分光法と深層学習法を用いて土壌中の炭酸塩濃度を迅速かつ効率的に予測する方法を提案する。我々は、次のような複数の機械学習手法を利用した。 1)MLPレシーバ及び 2) cnnをplsr、cubist、svmといった従来のmlアルゴリズムと比較すると、kssl(usda)、全国的に収集された土壌サンプル反射率スペクトルのデータセット、およびeu全域の土壌サンプル吸収スペクトルを含むlucas topsoil(ヨーロッパ土壌ライブラリ)の2つのnirスペクトルライブラリの複合データセット上で、それらのパフォーマンスが比較される。 KSSLおよびTopSoilスペクトルライブラリの土壌試料はvisNIRのスペクトル領域で得られたが,本研究ではNIRスペクトル領域のみが利用された。 X線回折による炭酸塩の定量は, 体積法, MLP予測とよく一致している。本研究は, 土壌試料中の炭酸塩濃度の迅速予測に寄与する。 1)ボリュームメソッドは使用できません。 2)NIRスペクトル吸収データのみが利用可能である。これまで、私たちの知る限りでは、このような広範囲なデータセットでトレーニングされた予測モデルが、目に見えないデータに対して有望な結果をもたらし、深層学習モデルが土壌の炭酸化に優れた予測ツールを提供するという概念を確実に支持する研究は、他にない。

Soil NIR spectral absorbance/reflectance libraries are utilized towards improving agricultural production and analysis of soil properties which are key prerequisite for agroecological balance and environmental sustainability. Carbonates in particular, represent a soil property which is mostly affected even by mild, let alone extreme, changes of environmental conditions during climate change. In this study we propose a rapid and efficient way to predict carbonates content in soil by means of FT NIR reflectance spectroscopy and by use of deep learning methods. We exploited multiple machine learning methods, such as: 1) a MLP Regressor and 2) a CNN and compare their performance with other traditional ML algorithms such as PLSR, Cubist and SVM on the combined dataset of two NIR spectral libraries: KSSL (USDA), a dataset of soil samples reflectance spectra collected nationwide, and LUCAS TopSoil (European Soil Library) which contains soil sample absorbance spectra from all over the European Union, and use them to predict carbonate content on never before seen soil samples. Soil samples in KSSL and in TopSoil spectral libraries were acquired in the spectral region of visNIR, however in this study, only the NIR spectral region was utilized. Quantification of carbonates by means of Xray Diffraction is in good agreement with the volumetric method and the MLP prediction. Our work contributes to rapid carbonates content prediction in soil samples in cases where: 1) no volumetric method is available and 2) only NIR spectra absorbance data are available. Up till now and to the best of our knowledge, there exists no other study, that presents a prediction model trained on such an extensive dataset with such promising results on unseen data, undoubtedly supporting the notion that deep learning models present excellent prediction tools for soil carbonates content.

翻訳日:2023-07-25 16:29:26 公開日:2023-07-23

# 商業用5Gスタンドアローン(SA)アップリンクスループット予測

Practical Commercial 5G Standalone (SA) Uplink Throughput Prediction ( http://arxiv.org/abs/2307.12417v1 )

ライセンス: Link先を確認

Kasidis Arunruangsirilert, Jiro Katto

(参考訳) 5Gニューラジオ(NR)ネットワークはアップリンクスループットの大幅なアップリフトを約束するが、ユーザ機器(UE)が高周波ミリ波(mmWave)帯域に接続されている場合にのみ改善が見られる。 UHD 4K/8Kビデオのリアルタイム伝送やバーチャルリアリティ(VR)/拡張現実(AR)コンテンツなどのアップリンク集約型スマートフォンアプリケーションの増加に伴い、アップリンクスループット予測はユーザー体験の質(QoE)を最大化する上で大きな役割を果たす。本稿では,過去のアップリンクスループットとRFパラメータに基づく将来のアップリンクスループットを予測するために,ConvLSTMベースのニューラルネットワークを提案する。このネットワークは、様々な周波数帯、ハンドオーバ、盲点を考慮した通勤列車に乗りながら、商用の5G SAネットワーク上の実世界のドライブテストのデータを用いて訓練されている。モデルの実装を確実にするために,Android API経由で利用可能な情報のみを使用するようにモデルを制限し,通勤電車や他の交通手段からのデータを用いてモデルを評価する。その結果,我々のモデルの平均予測精度は98.9\%に達し,平均RMSEは1.80Mbpsであることがわかった。

While the 5G New Radio (NR) network promises a huge uplift of the uplink throughput, the improvement can only be seen when the User Equipment (UE) is connected to the high-frequency millimeter wave (mmWave) band. With the rise of uplink-intensive smartphone applications such as the real-time transmission of UHD 4K/8K videos, and Virtual Reality (VR)/Augmented Reality (AR) contents, uplink throughput prediction plays a huge role in maximizing the users' quality of experience (QoE). In this paper, we propose using a ConvLSTM-based neural network to predict the future uplink throughput based on past uplink throughput and RF parameters. The network is trained using the data from real-world drive tests on commercial 5G SA networks while riding commuter trains, which accounted for various frequency bands, handover, and blind spots. To make sure our model can be practically implemented, we then limited our model to only use the information available via Android API, then evaluate our model using the data from both commuter trains and other methods of transportation. The results show that our model reaches an average prediction accuracy of 98.9\% with an average RMSE of 1.80 Mbps across all unseen evaluation scenarios.

翻訳日:2023-07-25 16:22:53 公開日:2023-07-23

# 2段階適応ロバスト最適化のための機械学習アプローチ

A Machine Learning Approach to Two-Stage Adaptive Robust Optimization ( http://arxiv.org/abs/2307.12409v1 )

ライセンス: Link先を確認

Dimitris Bertsimas, Cheol Woo Kim

(参考訳) 本稿では,2段線形適応ロバスト最適化(ARO)問題と2段連立変数と多面的不確実性集合を機械学習で解く手法を提案する。最適な現在決定、最適な現在決定に関連する最悪のシナリオ、そして我々が戦略と呼ぶものに最適な待ち時間決定をエンコードします。カラムと制約生成アルゴリズムを用いて,複数の類似AROインスタンスを事前に解決し,最適戦略を抽出し,トレーニングセットを生成する。私たちは、現在決定のための高品質な戦略、最適な現在決定に関連する最悪のシナリオ、そして待ち行列決定を予測する機械学習モデルをトレーニングします。また、機械学習アルゴリズムをトレーニングするために必要な異なるターゲットクラス数を削減できるアルゴリズムも導入する。提案手法を施設立地,多項目在庫管理,ユニットコミットメント問題に適用する。提案手法は,最先端のアルゴリズムよりも高精度でARO問題を解く。

We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multiple similar ARO instances in advance using the column and constraint generation algorithm and extract the optimal strategies to generate a training set. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions. We also introduce an algorithm to reduce the number of different target classes the machine learning algorithm needs to be trained on. We apply the proposed approach to the facility location, the multi-item inventory control and the unit commitment problems. Our approach solves ARO problems drastically faster than the state-of-the-art algorithms with high accuracy.

翻訳日:2023-07-25 16:22:32 公開日:2023-07-23

# マルチクラス流体キューネットワークの最適制御:機械学習によるアプローチ

Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach ( http://arxiv.org/abs/2307.12405v1 )

ライセンス: Link先を確認

Dimitris Bertsimas, Cheol Woo Kim

(参考訳) 本稿では,明示的かつ洞察に富んだ制御ポリシーを提供するマルチクラス流体待ち行列ネットワーク(mfqnets)の最適制御のための機械学習手法を提案する。しきい値曲線が原点を通る超平面であるMFQNET制御問題に対して、しきい値型最適ポリシーが存在することを示す。超平面分割(oct-h)を持つ最適分類木を用いてmfqnetの最適制御方針を学習する。我々は,mfqnet制御問題の数値解をトレーニングセットとして使用し,oct-hを用いて明示的な制御方針を学習する。最大33台のサーバと99のクラスで実験結果を報告し、学習したポリシーがテストセット上で100\%の精度を達成することを実証した。 OCT-Hのオフライントレーニングは大規模なネットワークで数日かかるが、オンラインアプリケーションはミリ秒かかる。

We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an optimal control policy for MFQNETs. We use numerical solutions of MFQNET control problems as a training set and apply OCT-H to learn explicit control policies. We report experimental results with up to 33 servers and 99 classes that demonstrate that the learned policies achieve 100\% accuracy on the test set. While the offline training of OCT-H can take days in large networks, the online application takes milliseconds.

翻訳日:2023-07-25 16:22:14 公開日:2023-07-23

# TransNet:カテゴリーレベルポーズ推定による透明オブジェクト操作

TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation ( http://arxiv.org/abs/2307.12400v1 )

ライセンス: Link先を確認

Huijie Zhang, Anthony Opipari, Xiaotong Chen, Jiyue Zhu, Zeren Yu, Odest Chadwicke Jenkins

(参考訳) 透明物体は視覚知覚システムに複数の異なる課題を示す。まず、視覚的な特徴を区別できないため、透明なオブジェクトは不透明なオブジェクトよりも検出やローカライズが難しくなる。人間でさえ、ガラスのドアのように鏡面の反射や屈折がほとんどなく、知覚が難しい透明な表面を見つける。第二の課題は、通常不透明物体の知覚に使用される深度センサーは、そのユニークな反射特性のために透明表面の正確な深度測定を得ることができないことである。これらの課題から、コップのような同じカテゴリ内の透明なオブジェクトインスタンスが、同じカテゴリの通常の不透明なオブジェクトよりも互いに似通っていることを観察した。本稿では,この観察から,インスタンスレベルのポーズ推定ではなく,カテゴリレベルの透明なオブジェクトポーズ推定の可能性について検討する。本研究では,局所的深さ補完と表面正規推定を用いてカテゴリレベルの透明物体ポーズを推定する2段階パイプラインである \textit{\textbf{transnet}} を提案する。 TransNetは、大規模透明オブジェクトデータセット上でのポーズ推定精度を評価し、最先端のカテゴリレベルのポーズ推定手法と比較する。この比較の結果,transnetは透明物体のポーズ推定精度の向上を実現した。さらに,TransNetを用いて,ロボットピック・アンド・プレイスと注ぐ作業のための自律的透明物体操作システムを構築する。

Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, like glass doors, difficult to perceive. A second challenge is that depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent surfaces due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category, such as cups, look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper explores the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose \textit{\textbf{TransNet}}, a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects. Moreover, we use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks.

翻訳日:2023-07-25 16:22:01 公開日:2023-07-23

# 依存的革新を伴う高次元線形過程に対する濃度

Concentration for high-dimensional linear processes with dependent innovations ( http://arxiv.org/abs/2307.12395v1 )

ライセンス: Link先を確認

Eduardo Fonseca Mendes, Fellipe Lopes

(参考訳) Weibull 尾を持つ混合配列上のベクトル線型過程の $l_\infty$ ノルムに対する濃度不等式を開発する。これらの不等式はベベリッジ・ネルソン分解を利用しており、ベクター・ミキシングールのsup-ノルムあるいはその重み付け和の濃度に問題を還元する。この不等式は、線形過程の lag-$h$ 自己共分散行列の最大エントリーワイドノルムに対して有界な濃度を得るために用いられる。これらの結果は、$l_1$正規化を用いた高次元ベクトル自己回帰過程の推定境界、時系列用高次元ガウスブートストラップ、長期共分散行列推定に有用である。

We develop concentration inequalities for the $l_\infty$ norm of a vector linear processes on mixingale sequences with sub-Weibull tails. These inequalities make use of the Beveridge-Nelson decomposition, which reduces the problem to concentration for sup-norm of a vector-mixingale or its weighted sum. This inequality is used to obtain a concentration bound for the maximum entrywise norm of the lag-$h$ autocovariance matrices of linear processes. These results are useful for estimation bounds for high-dimensional vector-autoregressive processes estimated using $l_1$ regularisation, high-dimensional Gaussian bootstrap for time series, and long-run covariance matrix estimation.

翻訳日:2023-07-25 16:21:37 公開日:2023-07-23

# Masked Reference based Centerpoint Supervision を用いた反復的ロバスト視覚接地

Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision ( http://arxiv.org/abs/2307.12392v1 )

ライセンス: Link先を確認

Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao

(参考訳) 視覚グラウンディング(vg)は、与えられた表現に基づく画像から対象オブジェクトをローカライズすることを目的としており、検出および視覚トランスフォーマの開発において大きな進歩を遂げている。しかしながら、既存のVG法は、不正確な記述や無関係な記述が提示されたときに偽アラームオブジェクトを生成する傾向がある。さらに、既存の手法では、画像全体とテキスト記述から、きめ細かい特徴、正確な局所化、および十分なコンテキスト理解を捉えることができない。両問題に対処するため,Masked Reference Based Centerpoint Supervision (MRCS) を用いたIR-VG (Iterative Robust Visual Grounding) フレームワークを提案する。このフレームワークは、アライメントを改善するために反復的多段階視覚言語融合(IMVF)を導入している。 MRCSを用いて,より正確な位置推定を行う。次に,VGのロバスト性を改善するために,不正確な表現を提示した場合の偽アラーム生成を防止するために,多段階の偽アラームセンシティブデコーダ(MFSD)を提案する。提案フレームワークは5つの正規VGデータセットと2つの新たに構築された堅牢VGデータセットで評価される。広汎な実験により、IR-VGは、新たに提案された2つの堅牢なVGデータセットに対する既存のSOTAアプローチと比較して、25\%と10\%の改善により、新しい最先端(SOTA)結果を達成することが示された。さらに,提案フレームワークが5つの正規vgデータセット上で有効であることも確認した。コードとモデルはhttps://github.com/cv516Buaa/IR-VG.comで公開される。

Visual Grounding (VG) aims at localizing target objects from an image based on given expressions and has made significant progress with the development of detection and vision transformer. However, existing VG methods tend to generate false-alarm objects when presented with inaccurate or irrelevant descriptions, which commonly occur in practical applications. Moreover, existing methods fail to capture fine-grained features, accurate localization, and sufficient context comprehension from the whole image and textual descriptions. To address both issues, we propose an Iterative Robust Visual Grounding (IR-VG) framework with Masked Reference based Centerpoint Supervision (MRCS). The framework introduces iterative multi-level vision-language fusion (IMVF) for better alignment. We use MRCS to ahieve more accurate localization with point-wised feature supervision. Then, to improve the robustness of VG, we also present a multi-stage false-alarm sensitive decoder (MFSD) to prevent the generation of false-alarm objects when presented with inaccurate expressions. The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets. Extensive experiments demonstrate that IR-VG achieves new state-of-the-art (SOTA) results, with improvements of 25\% and 10\% compared to existing SOTA approaches on the two newly proposed robust VG datasets. Moreover, the proposed framework is also verified effective on five regular VG datasets. Codes and models will be publicly at https://github.com/cv516Buaa/IR-VG.

翻訳日:2023-07-25 16:21:23 公開日:2023-07-23

# 交通信号制御のためのSim-to-Real転送に向けた不確実な接地行動変換

Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control ( http://arxiv.org/abs/2307.12388v1 )

ライセンス: Link先を確認

Longchao Da, Hao Mei, Romir Sharma and Hua Wei

(参考訳) 交通信号制御(tsc)は、数百万人の日常生活に影響を与える複雑で重要なタスクである。強化学習(rl)は交通信号制御の最適化に有望な結果を示しているが、現在のrlベースのtsc法は主にシミュレーションで訓練され、シミュレーションと実世界のパフォーマンスギャップに苦しむ。本稿では, シミュレーション中の動作を不確実性で動的に変換することで, シミュレーション環境から実世界環境へ学習した学習方針を伝達し, 遷移力学の領域ギャップを緩和する, UGAT と呼ばれるシミュレーションから実世界への移行手法を提案する。本手法をシミュレーションした交通環境において評価し,実環境におけるトランスファーrlポリシーの性能を著しく向上させることを示す。

Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.

翻訳日:2023-07-25 16:20:53 公開日:2023-07-23

# 開量子系における共鳴支配光力学的絡み合い

Resonance-dominant optomechanical entanglement in open quantum systems ( http://arxiv.org/abs/2307.12383v1 )

ライセンス: Link先を確認

Cheng Shang and Hongchao Li

(参考訳) 絡み合い保護に動機づけられ,共振効果を用いてコヒーレント状態表現における光力学的絡み合いを高める。熱-機械的モードと周辺熱浴との間の高周波数結合成分を弱結合限界内でフィルタするフィルタモデルを提案する。連続変数の絡み合い保護は、重要なデチューン成分に関連する自由度を取り除き、デコヒーレンスに抵抗することを含む。本研究では, フィルタモデルの非線形ランゲヴィン方程式を構築し, フィルタモデルが熱雑音や機械減衰に対して, 定常的な最大最適エンタングルメントのロバスト性を2倍にすることを示す。さらに、これらの結果を1つの振動するエンドミラーを持つ光学キャビティアレイに一般化し、長距離最適オプティメカニカルエンタングルメント転送について検討する。本研究は, 量子系をデコヒーレンスから保護し, 大規模量子情報処理と量子ネットワーク構築の可能性を高めるために, 共鳴効果を適用した新たな基盤を打破する。

Motivated by entanglement protection, our work utilizes a resonance effect to enhance optomechanical entanglement in the coherent-state representation. We propose a filtering model to filter out the highly frequency-detuned coupling components between a thermal-mechanical mode and its surrounding heat baths within the weak-coupling limit. We reveal that continuous-variable entanglement protection involves the elimination of degrees of freedom associated with significant detuning components, thereby resisting decoherence. We construct a nonlinear Langevin equation of the filtering model and numerically show that the filtering model doubles the robustness of a stationary maximum optomechanical entanglement with respect to thermal noise and mechanical damping. Furthermore, we generalize these results to an optical cavity array with one oscillating end-mirror to investigate the long-distance optimal optomechanical entanglement transfer. Our study breaks new ground for applying the resonance effect to protect the quantum system from decoherence and advancing the possibilities for large-scale quantum information processing and quantum network construction.

翻訳日:2023-07-25 16:20:38 公開日:2023-07-23

# CommonsenseVIS: 自然言語モデルのコモンセンス推論能力の可視化と理解

CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models ( http://arxiv.org/abs/2307.12382v1 )

ライセンス: Link先を確認

Xingbo Wang, Renfei Huang, Zhihua Jin, Tianqing Fang, and Huamin Qu

(参考訳) 近年、大きな事前学習された言語モデルは、commonsenseベンチマークで説得力のあるパフォーマンスを達成している。それにもかかわらず、モデルがどんな常識知識を学んでいるのか、スプリアスパターンのみを利用するのかは不明だ。特徴属性は、モデル出力の重要な入力概念を特定する一般的な説明可能性技術である。しかし、コモンセンス知識は暗黙的であり、入力に明示的に表されることが多い。これらの手法は、上記の概念よりもモデルの暗黙的推論を推論することはできない。本稿では,外部コモンセンス知識ベースを用いた視覚的説明システムであるCommonsenseVISについて述べる。具体的には,モデル行動と人間の知識を整合させるための参考として,入力中の共通意味知識を抽出する。本システムでは,異なる概念とその基盤となる関係について,多段階の可視化とインタラクティブなモデル探索と編集を行う。ユーザスタディを通じて,NLPの専門家が,異なる状況における概念に対するモデルリレーショナル推論の体系的かつスケーラブルな視覚分析を行う上で,CommonsenseVISが有効であることを示す。

Recently, large pretrained language models have achieved compelling performance on commonsense benchmarks. Nevertheless, it is unclear what commonsense knowledge the models learn and whether they solely exploit spurious patterns. Feature attributions are popular explainability techniques that identify important input concepts for model outputs. However, commonsense knowledge tends to be implicit and rarely explicitly presented in inputs. These methods cannot infer models' implicit reasoning over mentioned concepts. We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering. Specifically, we extract relevant commonsense knowledge in inputs as references to align model behavior with human knowledge. Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations. Through a user study, we show that CommonsenseVIS helps NLP experts conduct a systematic and scalable visual analysis of models' relational reasoning over concepts in different situations.

翻訳日:2023-07-25 16:20:20 公開日:2023-07-23

# H$_2^+$分子イオンにおける高次高調波発生の量子光学的解析

Quantum optical analysis of high-order harmonic generation in H$_2^+$ molecular ions ( http://arxiv.org/abs/2307.12381v1 )

ライセンス: Link先を確認

J. Rivera-Dean, P. Stammer, A. S. Maxwell, Th. Lamprou, E. Pisanty, P. Tzallas, M. Lewenstein and M. F. Ciappina

(参考訳) 量子光学系におけるh$_2^+$分子イオンの高次高調波発生に関する包括的理論的研究を行う。本研究は,2中心分子におけるhhgの汎用性に着目し,様々な量子光学および量子情報計測を特徴付けることに焦点を当てた。レーザー・マター相互作用後の電子状態と光状態の絡み合いの出現を示す。また、特定の電子量子状態の条件付けにより、ターゲット周波数モードにおける光の非古典的状態を得る可能性も確認し、これは異なる調和モードの集合間の高古典的非古典的絡み合い状態の生成に不可欠であることが判明した。本研究は,分子系における強レーザー場駆動相互作用の研究の道を開き,量子技術への応用の可能性を提案する。

We present a comprehensive theoretical investigation of high-order harmonic generation in H$_2^+$ molecular ions within a quantum optical framework. Our study focuses on characterizing various quantum optical and quantum information measures, highlighting the versatility of HHG in two-center molecules towards quantum technology applications. We demonstrate the emergence of entanglement between electron and light states after the laser-matter interaction. We also identify the possibility of obtaining non-classical states of light in targeted frequency modes by conditioning on specific electronic quantum states, which turn out to be crucial in the generation of highly non-classical entangled states between distinct sets of harmonic modes. Our findings open up avenues for studying strong-laser field-driven interactions in molecular systems, and suggest their applicability to quantum technology applications.

翻訳日:2023-07-25 16:20:05 公開日:2023-07-23

# ProtoFL: 原型蒸留による教師なしフェデレーション学習

ProtoFL: Unsupervised Federated Learning via Prototypical Distillation ( http://arxiv.org/abs/2307.12450v1 )

ライセンス: Link先を確認

Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim

(参考訳) フェデレートラーニング(FL)は、特に認証システムにおいて、データのプライバシ保護を強化するための有望なアプローチである。しかしながら、ラウンドコミュニケーションの制限、表現の不足、スケーラビリティは、デプロイメントに重大な課題をもたらし、その潜在能力を完全に阻害する。本稿では,グローバルモデルの表現力を高め,ラウンドコミュニケーションコストを削減するために,教師なしフェデレーション学習に基づく原型的表現蒸留法である「protofl」を提案する。さらに,正規化フローに基づく局所的な一クラス分類器を導入し,データ制限による性能向上を図る。本研究は,FLを用いた一級分類性能向上のための最初の研究である。我々は,MNIST, CIFAR-10, CIFAR-100, ImageNet-30, Keystroke-Dynamicsの5つの広く利用されているベンチマークにおいて,従来の手法よりも優れた性能を示した。

Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.

翻訳日:2023-07-25 16:12:30 公開日:2023-07-23

# WEPRO:ハイブリッド量子古典アルゴリズムの効率的な最適化のための重み予測

WEPRO: Weight Prediction for Efficient Optimization of Hybrid Quantum-Classical Algorithms ( http://arxiv.org/abs/2307.12449v1 )

ライセンス: Link先を確認

Satwik Kundu, Debarshi Kundu and Swaroop Ghosh

(参考訳) 古典機械上での量子シミュレータの指数的実行時間と待ち行列深度、および実量子デバイスの高コストは、量子ニューラルネットワーク(QNN)、変分量子固有解法(VQE)、量子近似最適化アルゴリズム(QAOA)などの変分量子アルゴリズム(VQA)の効果的なトレーニングにおいて大きな課題となる。これらの制約に対処するため、パラメータ重みの規則的傾向を利用してVQAの収束を加速する新しい手法、WEPRO(Weight Prediction)を提案する。本稿では,最適予測性能のための2つの手法,naive prediction(nap)とadaptive prediction(adap)を提案する。様々なデータセット上の複数のQNNモデルの広範な実験とトレーニングを通じて、WEPROは標準的なトレーニング手法と比較して約2.25\times$のスピードアップを提供し、ストレージと計算オーバーヘッドの少ない精度(最大2.3\%$以上)と損失(最大6.1\%$以下)を提供する。また,分子基底エネルギー推定のためのVQEとグラフMaxCutのQAOAにおけるWEPROの有効性を評価した。その結果、WEPROは従来の最適化手法と比較して最大3.1\times$VQEと2.91\times$QAOAの速度改善を実現し、トレーニングイテレーションあたりのショット数(繰り返し回路実行)を最大3.3\times$に削減した。

The exponential run time of quantum simulators on classical machines and long queue depths and high costs of real quantum devices present significant challenges in the effective training of Variational Quantum Algorithms (VQAs) like Quantum Neural Networks (QNNs), Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization Algorithm (QAOA). To address these limitations, we propose a new approach, WEPRO (Weight Prediction), which accelerates the convergence of VQAs by exploiting regular trends in the parameter weights. We introduce two techniques for optimal prediction performance namely, Naive Prediction (NaP) and Adaptive Prediction (AdaP). Through extensive experimentation and training of multiple QNN models on various datasets, we demonstrate that WEPRO offers a speedup of approximately $2.25\times$ compared to standard training methods, while also providing improved accuracy (up to $2.3\%$ higher) and loss (up to $6.1\%$ lower) with low storage and computational overheads. We also evaluate WEPRO's effectiveness in VQE for molecular ground-state energy estimation and in QAOA for graph MaxCut. Our results show that WEPRO leads to speed improvements of up to $3.1\times$ for VQE and $2.91\times$ for QAOA, compared to traditional optimization techniques, while using up to $3.3\times$ less number of shots (i.e., repeated circuit executions) per training iteration.

翻訳日:2023-07-25 16:12:13 公開日:2023-07-23

# SCRAPS:音響空間と音声空間の音声コントラスト表現

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces ( http://arxiv.org/abs/2307.12445v1 )

ライセンス: Link先を確認

Ivan Vall\'es-P\'erez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote

(参考訳) 論文の多くの例は、ディープラーニングモデルがマルチモーダルデータとうまく連携できることを証明した。最近、CLIPは、画像とテキスト記述間の共有潜在空間をディープラーニングシステムで学習できるようにし、下流タスクではゼロまたは少数ショットの結果が卓越している。本稿では,CLIPが提案したのと同じアイデアを,音声空間と音響空間が共存する音声領域に適用する。音声空間と音響空間の共有表現を学習するために,CLIPに基づくモデルを訓練する。その結果,提案モデルは音素の20%をランダムに置き換える際に91%のスコアが低下し,異なる種類の雑音に対してかなりの頑健性が得られ,ガウス雑音の75%と混合した場合のパフォーマンスが10%低下した。また,結果の埋め込みが,知性評価や音声生成タスクにおける豊富な事前学習音声埋め込みの活用など,下流のさまざまなアプリケーションにとって有用であることを示す実証的証拠を提供する。最後に、音声生成と認識分野に興味深い意味を持つ潜在的な応用について論じる。

Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is sensible to phonetic changes, with a 91% of score drops when replacing 20% of the phonemes at random, while providing substantial robustness against different kinds of noise, with a 10% performance drop when mixing the audio with 75% of Gaussian noise. We also provide empirical evidence showing that the resulting embeddings are useful for a variety of downstream applications, such as intelligibility evaluation and the ability to leverage rich pre-trained phonetic embeddings in speech generation task. Finally, we discuss potential applications with interesting implications for the speech generation and recognition fields.

翻訳日:2023-07-25 16:11:42 公開日:2023-07-23

# EnTri: 説明可能なシーン認識のための3レベル表現によるアンサンブル学習

EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition ( http://arxiv.org/abs/2307.12442v1 )

ライセンス: Link先を確認

Amirhossein Aminimehr, Amirali Molaei, Erik Cambria

(参考訳) 深層学習に基づくシーン認識は大きな進歩を遂げているが,クラス間類似性やクラス内類似性による課題により,その性能にはまだ限界がある。さらに、先行研究は主に分類精度の向上に重点を置いているが、解釈可能な正確なシーン分類を達成することにはあまり注意を向けていない。そこで我々は,視覚特徴の階層構造を用いたアンサンブル学習を利用したアンサンブルシーン認識フレームワークであるEnTriを提案する。 entriはピクセルレベル、セマンティクスセグメンテーションレベル、オブジェクトクラス、周波数レベルという3つの異なる詳細レベルで機能を表現する。複雑さの異なる特徴符号化方式を取り入れ,アンサンブル戦略を活用することにより,視覚的・テキスト的説明による透明性と解釈性を向上し,分類精度の向上を目指す。解釈可能性を達成するために,カテゴリの最終予測に寄与する所定のシーンの様々な特性を強調する視覚とテキストの両方の説明を生成する拡張アルゴリズムを考案した。これには、オブジェクト、統計、空間レイアウト、テキストの詳細に関する情報が含まれる。ベンチマークシーン分類データセットの実験を通じて、entriは認識精度の面で優位を示し、mit67、sun397、uiuc8のデータセットで87.69%、75.56%、99.17%の精度で最先端のアプローチと比較して競争力を達成した。

Scene recognition based on deep-learning has made significant progress, but there are still limitations in its performance due to challenges posed by inter-class similarities and intra-class dissimilarities. Furthermore, prior research has primarily focused on improving classification accuracy, yet it has given less attention to achieving interpretable, precise scene classification. Therefore, we are motivated to propose EnTri, an ensemble scene recognition framework that employs ensemble learning using a hierarchy of visual features. EnTri represents features at three distinct levels of detail: pixel-level, semantic segmentation-level, and object class and frequency level. By incorporating distinct feature encoding schemes of differing complexity and leveraging ensemble strategies, our approach aims to improve classification accuracy while enhancing transparency and interpretability via visual and textual explanations. To achieve interpretability, we devised an extension algorithm that generates both visual and textual explanations highlighting various properties of a given scene that contribute to the final prediction of its category. This includes information about objects, statistics, spatial layout, and textural details. Through experiments on benchmark scene classification datasets, EnTri has demonstrated superiority in terms of recognition accuracy, achieving competitive performance compared to state-of-the-art approaches, with an accuracy of 87.69%, 75.56%, and 99.17% on the MIT67, SUN397, and UIUC8 datasets, respectively.

翻訳日:2023-07-25 16:11:21 公開日:2023-07-23

# 対称正定値行列の多様体上の回帰による多値共分散推定

Multifidelity Covariance Estimation via Regression on the Manifold of Symmetric Positive Definite Matrices ( http://arxiv.org/abs/2307.12438v1 )

ライセンス: Link先を確認

Aimee Maurais, Terrence Alsup, Benjamin Peherstorfer, Youssef Marzouk

(参考訳) 対称正定値行列多様体上の回帰問題の解として定式化された共分散行列の多値性推定器を導入する。推定器は構成によって正定値であり、マハラノビス距離は最小限に抑えられ、実用的な計算を可能にする性質を持つ。多様体回帰多元性(mrmf)共分散推定器は、多様体接空間上のある誤差モデルの下で最大確率推定器であることを示す。より広範に、我々のリーマン回帰フレームワークは、制御変数から構築された既存の多値共分散推定器を含むことを示す。数値的な実例から,この推定器は,単一忠実度および他の複数忠実度共分散推定器に対する2乗推定誤差において,最大1桁の大幅な減少をもたらすことを証明した。さらに、正定性の保存は、この性質が不可欠であるデータ同化やメトリック学習のような下流タスクと推定器が互換性があることを保証する。

We introduce a multifidelity estimator of covariance matrices formulated as the solution to a regression problem on the manifold of symmetric positive definite matrices. The estimator is positive definite by construction, and the Mahalanobis distance minimized to obtain it possesses properties which enable practical computation. We show that our manifold regression multifidelity (MRMF) covariance estimator is a maximum likelihood estimator under a certain error model on manifold tangent space. More broadly, we show that our Riemannian regression framework encompasses existing multifidelity covariance estimators constructed from control variates. We demonstrate via numerical examples that our estimator can provide significant decreases, up to one order of magnitude, in squared estimation error relative to both single-fidelity and other multifidelity covariance estimators. Furthermore, preservation of positive definiteness ensures that our estimator is compatible with downstream tasks, such as data assimilation and metric learning, in which this property is essential.

翻訳日:2023-07-25 16:10:53 公開日:2023-07-23

# 物理制約ニューラルネットワークを用いた一般化シュワルツ型非重複領域分解法

A Generalized Schwarz-type Non-overlapping Domain Decomposition Method using Physics-constrained Neural Networks ( http://arxiv.org/abs/2307.12435v1 )

ライセンス: Link先を確認

Shamsulhaq Basir, Inanc Senocak

(参考訳) ニューラルネットワークを用いたメッシュレスシュワルツ型非重複領域分解法を提案し、偏微分方程式(PDE)を含む前方および逆問題の解法を提案する。近隣のサブドメイン間の解の整合性を確保するため、各サブドメインに独自のRobinパラメータを割り当てる一般化されたRobin型インタフェース条件を採用する。これらのサブドメイン固有のRobinパラメータは、Robinインターフェース条件のミスマッチを最小限に抑え、トレーニング中の効率的な情報交換を容易にするために学習される。この方法はラプラス方程式とヘルムホルツ方程式の両方に適用できる。これは、ラグランジアン形式を拡張して境界条件とインターフェース条件を厳格に強制しながら、支配的PDEの損失を最小限に抑えるために訓練された独立したニューラルネットワークモデルによる局所解を表す。提案手法の重要な強みは,各サブドメインのRobinパラメータを学習し,隣接するサブドメインとの情報交換を強化することである。学習したRobinパラメータは、ソリューションの局所的挙動、ドメイン分割、およびドメイン全体に対するサブドメイン位置に適応する。クロスポイントを用いた一方向および二方向の分解を含む前方および逆問題に関する広範な実験は,提案手法の汎用性と性能を示している。

We present a meshless Schwarz-type non-overlapping domain decomposition method based on artificial neural networks for solving forward and inverse problems involving partial differential equations (PDEs). To ensure the consistency of solutions across neighboring subdomains, we adopt a generalized Robin-type interface condition, assigning unique Robin parameters to each subdomain. These subdomain-specific Robin parameters are learned to minimize the mismatch on the Robin interface condition, facilitating efficient information exchange during training. Our method is applicable to both the Laplace's and Helmholtz equations. It represents local solutions by an independent neural network model which is trained to minimize the loss on the governing PDE while strictly enforcing boundary and interface conditions through an augmented Lagrangian formalism. A key strength of our method lies in its ability to learn a Robin parameter for each subdomain, thereby enhancing information exchange with its neighboring subdomains. We observe that the learned Robin parameters adapt to the local behavior of the solution, domain partitioning and subdomain location relative to the overall domain. Extensive experiments on forward and inverse problems, including one-way and two-way decompositions with crosspoints, demonstrate the versatility and performance of our proposed approach.

翻訳日:2023-07-25 16:10:38 公開日:2023-07-23

# SwIPE : 急激なパッチ埋め込みによる効率的かつロバストな医用画像分割

SwIPE: Efficient and Robust Medical Image Segmentation with Implicit Patch Embeddings ( http://arxiv.org/abs/2307.12429v1 )

ライセンス: Link先を確認

Yejia Zhang, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen

(参考訳) 現代の医用画像分割法は、主にラスタ化マスクの形で離散表現を用いて特徴を学習し、予測を生成する。効果はあるものの、このパラダイムは空間的に非フレキシブルであり、高解像度の画像にはスケールが悪く、物体の形状を直接理解できない。これらの制限に対処するため、最近の研究では暗黙のニューラル表現(INR)を使用してセグメンテーションの連続表現を学習している。しかし、これらの手法は3次元形状復元のために設計された部品を直接採用することが多い。 More importantly, these formulations were also constrained to either point-based or global contexts, lacking contextual understanding or local fine-grained details, respectively--both critical for accurate segmentation. To remedy this, we propose a novel approach, SwIPE (Segmentation with Implicit Patch Embeddings), that leverages the advantages of INRs and predicts shapes at the patch level--rather than at the point level or image level--to enable both accurate local boundary delineation and global shape coherence. 2つの課題(2次元ポリープ分割と3次元腹部臓器分割)の広範囲な評価は、SwIPEが最近の暗黙的アプローチよりも著しく改善し、10倍以上のパラメータで最先端の離散的手法より優れていることを示している。また,画像解像度とデータセット間のデータシフトに対して,データ効率の向上とロバスト性の向上も示す。コードはgithubで入手できる。

Modern medical image segmentation methods primarily use discrete representations in the form of rasterized masks to learn features and generate predictions. Although effective, this paradigm is spatially inflexible, scales poorly to higher-resolution images, and lacks direct understanding of object shapes. To address these limitations, some recent works utilized implicit neural representations (INRs) to learn continuous representations for segmentation. However, these methods often directly adopted components designed for 3D shape reconstruction. More importantly, these formulations were also constrained to either point-based or global contexts, lacking contextual understanding or local fine-grained details, respectively--both critical for accurate segmentation. To remedy this, we propose a novel approach, SwIPE (Segmentation with Implicit Patch Embeddings), that leverages the advantages of INRs and predicts shapes at the patch level--rather than at the point level or image level--to enable both accurate local boundary delineation and global shape coherence. Extensive evaluations on two tasks (2D polyp segmentation and 3D abdominal organ segmentation) show that SwIPE significantly improves over recent implicit approaches and outperforms state-of-the-art discrete methods with over 10x fewer parameters. Our method also demonstrates superior data efficiency and improved robustness to data shifts across image resolutions and datasets. Code is available on Github.

翻訳日:2023-07-25 16:10:17 公開日:2023-07-23

# Augmented Box Replay: インクリメンタルオブジェクト検出のための前景シフトの克服

Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection ( http://arxiv.org/abs/2307.12427v1 )

ライセンス: Link先を確認

Liu Yuyang, Cong Yang, Goswami Dipam, Liu Xialei, Joost van de Weijer

(参考訳) 漸進的な学習では、過去のタスクから格納されたサンプルを現在のタスクサンプルと共に再生することが、破滅的な忘れに対処する最も効率的なアプローチの1つである。しかし、インクリメンタル分類とは異なり、画像再生はインクリメンタルオブジェクト検出(iod)にうまく適用されていない。本稿では、この主な理由として、前景シフトの見落とされがちな問題を特定する。前景シフトは、以前のタスクのイメージを再生する際にのみ発生し、その背景に現在のタスクの前景オブジェクトが含まれているという事実を指す。この問題を解決するために,前景オブジェクトのみを記憶・再生し,前景シフト問題を回避できる新規かつ効率的な拡張ボックスリプレイ(abr)法を開発した。さらに,関心領域(RoI)特徴からの空間的注意を生かし,従来のモデルから最も重要な情報に焦点を絞るために電流モデルを制約する,革新的なRoI蒸留損失を提案する。 ABRは、現在のクラスで高い可塑性を維持しながら、以前のクラスの忘れを著しく減少させる。さらに、標準的な画像再生と比較してストレージ要求を大幅に削減する。 Pascal-VOCおよびCOCOデータセットに関する総合実験は、我々のモデルの最先端性能をサポートする。

In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model.

翻訳日:2023-07-25 16:09:53 公開日:2023-07-23

# 対話応答生成におけるオフラインRLの有効性について

On the Effectiveness of Offline RL for Dialogue Response Generation ( http://arxiv.org/abs/2307.12425v1 )

ライセンス: Link先を確認

Paloma Sodhi, Felix Wu, Ethan R. Elenberg, Kilian Q. Weinberger, Ryan McDonald

(参考訳) 言語モデルの一般的な訓練技法は、教師強制(TF)である。 TFは、同じ意味を異なる方法で表現できるにもかかわらず、人間の言語を正確に一致させようとする。これは対話応答生成のためのシーケンスレベルの目的の使用を動機付ける。本稿では,これらの目的を最大化するための様々なオフライン強化学習(rl)手法の有効性について検討する。複数のデータセット、モデル、メトリクスにわたって包括的な評価を行う。オフラインRLは、トレーニング不安定を誘発したり、実践的なトレーニング予算を犠牲にすることなく、教師の強制よりも明確なパフォーマンス向上を示す。

A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

翻訳日:2023-07-25 16:09:33 公開日:2023-07-23

# ポリシーに対するヘイトなスピーチをテストする

Testing Hateful Speeches against Policies ( http://arxiv.org/abs/2307.12418v1 )

ライセンス: Link先を確認

Jiangrui Zheng, Xueqing Liu, Girish Budhrani, Wei Yang, Ravishka Rathnasuriya

(参考訳) 近年、多くのソフトウェアシステムがAI技術、特にディープラーニング技術を採用しています。そのブラックボックスの性質から、aiベースのシステムはトレーサビリティに課題をもたらした。aiシステムの振る舞いはモデルとデータに基づいているのに対して、要件やポリシーは自然言語やプログラミング言語の形式で規則になっているからだ。私たちの知る限りでは、AIとディープニューラルネットワークベースのシステムは、ルールベースの要件/政策に対してどのように振る舞うか、という研究は限られています。本稿では、自然言語ポリシーに記述された規則に基づく要求に対する深いニューラルネットワークの挙動について検討する。特に、AIベースのコンテンツモデレーションソフトウェアをコンテンツモデレーションポリシーに対してチェックするケーススタディに焦点を当てる。 First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. この作業のデータセットとコードは、匿名のwebサイトにある: \url{https://sites.google.com/view/content-moderation-project}。

In the recent years, many software systems have adopted AI techniques, especially deep learning techniques. Due to their black-box nature, AI-based systems brought challenges to traceability, because AI system behaviors are based on models and data, whereas the requirements or policies are rules in the form of natural or programming language. To the best of our knowledge, there is a limited amount of studies on how AI and deep neural network-based systems behave against rule-based requirements/policies. This experience paper examines deep neural network behaviors against rule-based requirements described in natural language policies. In particular, we focus on a case study to check AI-based content moderation software against content moderation policies. First, using crowdsourcing, we collect natural language test cases which match each moderation policy, we name this dataset HateModerate; second, using the test cases in HateModerate, we test the failure rates of state-of-the-art hate speech detection software, and we find that these models have high failure rates for certain policies; finally, since manual labeling is costly, we further proposed an automated approach to augument HateModerate by finetuning OpenAI's large language models to automatically match new examples to policies. The dataset and code of this work can be found on our anonymous website: \url{https://sites.google.com/view/content-moderation-project}.

翻訳日:2023-07-25 16:09:25 公開日:2023-07-23

# 不確実性におけるテストデータ感度の情報理論解析

Information-theoretic Analysis of Test Data Sensitivity in Uncertainty ( http://arxiv.org/abs/2307.12456v1 )

ライセンス: Link先を確認

Futoshi Futami, Tomoharu Iwata

(参考訳) ベイズ推論は不確実な定量化タスクにしばしば用いられる。 xu と raginsky 2022 による最近の分析では、ベイズ推論の予測の不確実性は、データ生成過程に固有のランダム性を表す aleatoric と epistemic uncertainties と呼ばれる2つの不確実性に厳密に分解された。彼らはこれらの不確実性を情報理論的に分析し、モデルが適切に特定され、モデルのパラメータを潜在変数として扱うと仮定した。しかし、既存の不確実性の情報理論分析では、テストデータとトレーニングデータの感度として知られる、不確実性の広く信じられている性質を説明できない。テストデータが何らかの意味でトレーニングデータと類似している場合、認識の不確実性は小さくなるはずである。本研究では, 予測の不確かさに対する新しい分解法を用いて, 不確かさの感度について検討する。我々の分析は情報理論量を用いてそのような感度をうまく定義する。さらに,ベイズ的メタラーニングの既存分析を拡張し,タスク間の新たな感性を示す。

Bayesian inference is often utilized for uncertainty quantification tasks. A recent analysis by Xu and Raginsky 2022 rigorously decomposed the predictive uncertainty in Bayesian inference into two uncertainties, called aleatoric and epistemic uncertainties, which represent the inherent randomness in the data-generating process and the variability due to insufficient data, respectively. They analyzed those uncertainties in an information-theoretic way, assuming that the model is well-specified and treating the model's parameters as latent variables. However, the existing information-theoretic analysis of uncertainty cannot explain the widely believed property of uncertainty, known as the sensitivity between the test and training data. It implies that when test data are similar to training data in some sense, the epistemic uncertainty should become small. In this work, we study such uncertainty sensitivity using our novel decomposition method for the predictive uncertainty. Our analysis successfully defines such sensitivity using information-theoretic quantities. Furthermore, we extend the existing analysis of Bayesian meta-learning and show the novel sensitivities among tasks for the first time.

翻訳日:2023-07-25 16:01:28 公開日:2023-07-23

# 高速ベイズトモグラフィーによる非マルコフ量子過程のキャラクタリゼーション

Characterizing non-Markovian Quantum Processes by Fast Bayesian Tomography ( http://arxiv.org/abs/2307.12452v1 )

ライセンス: Link先を確認

R. Y. Su, J. Y. Huang, N. Dumoulin. Stuyck, M. K. Feng, W. Gilbert, T. J. Evans, W. H. Lim, F. E. Hudson, K. W. Chan, W. Huang, Kohei M. Itoh, R. Harper, S. D. Bartlett, C. H. Yang, A. Laucht, A. Saraiva, T. Tanttu and A. S. Dzurak

(参考訳) 量子誤り訂正のしきい値を超えるレベルにゲート性能をプッシュするには、量子ゲートに発生するエラーソースを特徴付けることが重要である。しかし、非マルコフ誤差の特性は、現在の量子プロセストモグラフィー技術に挑戦している。 Fast Bayesian Tomography (FBT) は自己整合性ゲートセットトモグラフィプロトコルであり、初期の特徴的知識からブートストラップし、任意のゲートシーケンスでリアルタイムで更新できる。ここでは、FBTが鍵となる非マルコフ的誤り過程のキャラクタリゼーションを実現する方法を示す。シリコン量子ドット上の2量子ビット系の非マルコフ的挙動を診断するためのFBTの実験プロトコルを2つ導入する。実験分析ループの効率性とスケーラビリティを向上させるため,オンラインFBTソフトウェアスタックを開発した。実験コストと解析時間を削減するため,本研究では,本手法と温かいブート戦略も導入する。以上の結果から,FBTは量子コンピューティングにおけるフォールトトレラント演算の究極的実現に寄与する非マルコフ誤差の探索に有用であることが示された。

To push gate performance to levels beyond the thresholds for quantum error correction, it is important to characterize the error sources occurring on quantum gates. However, the characterization of non-Markovian error poses a challenge to current quantum process tomography techniques. Fast Bayesian Tomography (FBT) is a self-consistent gate set tomography protocol that can be bootstrapped from earlier characterization knowledge and be updated in real-time with arbitrary gate sequences. Here we demonstrate how FBT allows for the characterization of key non-Markovian error processes. We introduce two experimental protocols for FBT to diagnose the non-Markovian behavior of two-qubit systems on silicon quantum dots. To increase the efficiency and scalability of the experiment-analysis loop, we develop an online FBT software stack. To reduce experiment cost and analysis time, we also introduce a native readout method and warm boot strategy. Our results demonstrate that FBT is a useful tool for probing non-Markovian errors that can be detrimental to the ultimate realization of fault-tolerant operation on quantum computing.

翻訳日:2023-07-25 16:01:08 公開日:2023-07-23

# DiAMoNDBack: C{\alpha}タンパク質の非決定論的バックマッピングのための拡散還元自己回帰モデル

DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces ( http://arxiv.org/abs/2307.12451v1 )

ライセンス: Link先を確認

Michael S. Jones and Kirill Shmilovich and Andrew L. Ferguson

(参考訳) タンパク質の粗い粒度の分子モデルは、全原子モデルでは達成できない長さと時間スケールへのアクセスと、凝集や折り畳みなどの長時間スケールで起こるプロセスのシミュレーションを可能にする。分解能の低下は計算加速度を実現するが、機械的詳細の完全な理解には原子論的な表現が不可欠である。バックマッピングは、全原子分解能を粗い分子モデルに復元するプロセスである。本研究では,DiaMoNDBack(Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping)を自己回帰分解拡散確率モデルとして報告し,全原子の詳細をC{\alpha}座標のみを保持する粗粒タンパク質表現に復元する。自己回帰生成過程は、C{\alpha}トレースに条件付き残基バイレジデント方式でタンパク質N末端からC末端へと進行し、以前は局所近傍のバックボーンと側鎖原子がバックマップされていた。我々のモデルにおける局所的および自己回帰的な性質は、タンパク質間の移動を可能にする。消音拡散過程の確率的性質は、モデルが粗粒c{\alpha}トレースと整合するバックボーンとサイドチェーンの全原子配置の現実的なアンサンブルを生成することを意味する。タンパク質データバンク (pdb) から65k以上の構造をダイアモンドバックし, ホールドアウト pdb テストセット, タンパク質アンサンブルデータベース (ped) による内在的不規則タンパク質構造, ド・ショー研究による高速折り畳みミニタンパク質の分子動力学シミュレーション, 粗粒度シミュレーションデータに適用した。我々は, 正しい結合形成, 側鎖衝突の回避, 生成側鎖構成状態の多様性の観点から, 最先端の再構築性能を実現する。 DiAMoNDBackモデルをフリーでオープンソースのPythonパッケージとして公開しています。

Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.

翻訳日:2023-07-25 16:00:49 公開日:2023-07-23

# 無効論理と等価なゲイン:言語モデルのプロンプトにおける推論の奇妙な性質

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting ( http://arxiv.org/abs/2307.10573v2 )

ライセンス: Link先を確認

Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo

(参考訳) 言語モデルは、パフォーマンスを大幅に向上させる方法で問題を通じて推論するよう促すことができる。しかし、このようなプロンプトによるパフォーマンス改善は明らかではない。最近の研究では、論理的な \textit{invalid} chain-of-thought (cot) プロンプトを用いることで、論理的な \textit{valid} cotプロンプトと同じくらいのパフォーマンスが向上し、cotの編集によって問題固有の情報を抽象情報や分散情報に置き換えることが通常性能に影響を与えないことが示された。批評家は、これらの発見は意味のある結論を導き出すにはあまりにも少なく、簡単に解決できないタスクに基づいていると答えている。この問題を解決するために、論理的に無効なCoTプロンプトが、BIG-Bench Hard(BBH)と呼ばれるBIG-Benchベンチマークの最も難しいタスクにおいて、論理的に有効なプロンプトと同じレベルのパフォーマンスゲインを提供するかどうかをテストする。論理的に textit{invalid} 推論プロンプトは、BBH タスクにおいて論理的に有効な推論プロンプトとして、確かに同様のパフォーマンスゲインを達成する。また、前作で使われたcotプロンプトには論理的なエラーが含まれていることもわかりました。これは、論理的に妥当な推論を超えた共変項がパフォーマンス改善の責任を負うことを示唆している。

Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.

翻訳日:2023-07-25 11:24:55 公開日:2023-07-23

# 検索強化による大規模言語モデルの事実知識境界の検討

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation ( http://arxiv.org/abs/2307.11019v2 )

ライセンス: Link先を確認

Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang

(参考訳) 知識集約的なタスク(例えば、オープンドメイン質問応答(QA))は、かなりの量の事実知識を必要とし、しばしば援助のために外部情報に依存する。最近の大規模言語モデル(例えばchatgpt)は、知識集約的なタスクを含む、世界的知識による幅広いタスクの解決において印象的な能力を示している。しかし、LLMが実際の知識境界、特に検索強化を取り入れた場合の行動をどのように認識できるかは、まだ不明である。本研究では,オープンドメインQA上でのLLMの実態知識境界と検索の増大がLLMに与える影響について,初期分析を行った。特に,3つの主要な研究課題に焦点をあて,QA評価,事前判定,後部判定による分析を行った。 llmが質問に対する回答能力と回答の正確性に不当な自信を持っている証拠を示す。さらに,検索の強化は,llmsの知識境界に対する意識向上に有効なアプローチであることが証明され,その判断能力が向上した。さらに, LLMは, 回答の定式化に際し, 提案した検索結果に依存する傾向があり, これらの結果の質がそれらの信頼性に大きく影響することがわかった。この作業を再現するコードはhttps://github.com/RUCAIBox/LLM-Knowledge-Boundaryで公開されている。

Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.

翻訳日:2023-07-25 11:12:53 公開日:2023-07-23

# シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく、より高度な一般化を実現する

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization ( http://arxiv.org/abs/2307.11007v2 )

ライセンス: Link先を確認

Kaiyue Wen, Zhiyuan Li, Tengyu Ma

(参考訳) 広範な研究にもかかわらず、過剰パラメータ化されたニューラルネットワークが一般化できる理由については、いまだに解明されていない。既存の理論では、一般的な確率最適化器は訓練損失のより平坦な最小化器を好んでおり、従って平坦性は一般化を意味するという自然な説明がある。この研究はこの説明を批判的に検証する。 1) 平坦性が一般化を立証する, (2) 非一般化平坦性モデルが存在する, (2) シャープ性最小化アルゴリズムは一般化しない, (3) もっとも驚くことに、非一般化平坦性モデルが存在するが、シャープ性最小化アルゴリズムは依然として一般化している。以上の結果から,シャープネスと一般化の関係はデータ分布とモデルアーキテクチャに依存し,シャープネス最小化アルゴリズムはシャープネスを最小化するだけでなく,より優れた一般化を実現することができることが示唆された。これにより、超パラメータニューラルネットワークの一般化のための他の説明の探索が要求される。

Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.

翻訳日:2023-07-25 11:12:29 公開日:2023-07-23

PDF登録状況（公開日: 20230723）