Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230429となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 材料を用いた拡張可能なマルチモーダルマルチタスクオブジェクトデータセット An Extensible Multimodal Multi-task Object Dataset with Materials ( http://arxiv.org/abs/2305.14352v1 ) ライセンス: Link先を確認	Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese	(参考訳) リッチマテリアルアノテーションを含むAmazon製品リストの,拡張可能なマルチモーダルデータセットEMMaを提案する。これは280万以上のオブジェクトを含み、それぞれが画像、テキスト、質量、価格、製品評価、およびAmazonの製品分類における位置をリストアップしている。 182の物理材料(プラスチック$\rightarrow$熱可塑性$\rightarrow$ acrylic)の包括的な分類も設計しています。対象は、この分類から1つまたは複数の材料で注釈される。各オブジェクトに利用可能な多数の属性で、我々はSmart Labelingフレームワークを開発し、手作業によるラベル付けをほとんど行わずに、すべてのオブジェクトに新しいバイナリラベルを素早く追加し、データセットを拡張可能にします。データセットの各オブジェクト属性は、モデル入力または出力のいずれかに含めることができるため、タスク設定の組合せ可能性につながります。例えば、リストテキストからオブジェクトカテゴリを予測するためにモデルをトレーニングしたり、製品一覧画像から商品の質量と価格を予測することができる。 emmaはコンピュータビジョンとnlpでマルチタスク学習のための新しいベンチマークを提供し、実践者が大規模に新しいタスクやオブジェクト属性を効率的に追加できるようにする。 We present EMMa, an Extensible, Multimodal dataset of Amazon product listings that contains rich Material annotations. It contains more than 2.8 million objects, each with image(s), listing text, mass, price, product ratings, and position in Amazon's product-category taxonomy. We also design a comprehensive taxonomy of 182 physical materials (e.g., Plastic $\rightarrow$ Thermoplastic $\rightarrow$ Acrylic). Objects are annotated with one or more materials from this taxonomy. With the numerous attributes available for each object, we develop a Smart Labeling framework to quickly add new binary labels to all objects with very little manual labeling effort, making the dataset extensible. Each object attribute in our dataset can be included in either the model inputs or outputs, leading to combinatorial possibilities in task configurations. For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image. EMMa offers a new benchmark for multi-task learning in computer vision and NLP, and allows practitioners to efficiently add new tasks and object attributes at scale.	翻訳日:2023-05-28 05:01:03 公開日:2023-04-29
# 多地点雲の効率的な処理・転送のための高度医用画像表現 Advanced Medical Image Representation for Efficient Processing and Transfer in Multisite Clouds ( http://arxiv.org/abs/2305.15411v1 ) ライセンス: Link先を確認	Elena-Simona Apostol and Ciprian-Octavian Truic\u{a}	(参考訳) 医学研究における重要なトピックは、医療機器から得られる画像を改善するプロセスである。結果として、医療画像の解像度と分析を改善する必要もある。この分野でのもう一つの問題は、大量の保存された医療データ[16]である。例えば医療機関の人間の脳データベースは、年間数十テラバイトのデータを蓄積することができる。本稿では,医療画像に保持される情報を改善するために,複数のデータ構造に基づく新しい医用画像形式表現を提案する。新しい表現は、画像に見つかったオブジェクトのイメージクラスやタグなど、追加のメタデータ情報を保持する。我々は,多層ニューラルネットワークを用いて医用画像中の物体を分類するために,独自のオントロジーを定義した。一般的に大規模なデータセットを扱うため、クラウド環境でmapreduceパラダイムを使用して画像処理をスピードアップしました。クラウドノード間の転送を最適化し,前処理時間を短縮するために,復号化に基づくデータ圧縮手法を提案する。マルチサイトクラウド環境で画像表現と効率的なデータ転送のためのソリューションをテストします。提案手法では,平均27%の時間改善でデータ転送を最適化する。 An important topic in medical research is the process of improving the images obtained from medical devices. As a consequence, there is also a need to improve medical image resolution and analysis. Another issue in this field is the large amount of stored medical data [16]. Human brain databases at medical institutes, for example, can accumulate tens of Terabytes of data per year. In this paper, we propose a novel medical image format representation based on multiple data structures that improve the information maintained in the medical images. The new representation keeps additional metadata information, such as the image class or tags for the objects found in the image. We defined our own ontology to help us classify the objects found in medical images using a multilayer neural network. As we generally deal with large data sets, we used the MapReduce paradigm in the Cloud environment to speed up the image processing. To optimize the transfer between Cloud nodes and to reduce the preprocessing time, we also propose a data compression method based on deduplication. We test our solution for image representation and efficient data transfer in a multisite cloud environment. Our proposed solution optimizes the data transfer with a time improvement of 27% on average.	翻訳日:2023-05-28 04:40:15 公開日:2023-04-29
# フィンテック分野における強化学習の体系的レビュー Systematic Review on Reinforcement Learning in the Field of Fintech ( http://arxiv.org/abs/2305.07466v1 ) ライセンス: Link先を確認	Nadeem Malibari, Iyad Katib and Rashid Mehmood	(参考訳) 最近、金融技術(Fintech)における強化学習の応用は、多くの賞賛を集めている。膨大な能力と能力を通じて、間違いなく学習が強化され、フィンテックの分野で素晴らしい結果が得られた。本研究の目的は,強化学習とフィンテックの相関関係を探索的に検討し,予測精度,複雑性,スケーラビリティ,リスク,収益性,パフォーマンスを明らかにすることである。金融やフィンテックにおける強化学習の主な用途は、ポートフォリオ最適化、信用リスク低減、投資資本管理、利益の最大化、効果的なレコメンデーションシステム、より良い価格設定戦略である。いくつかの研究は、金融機関の業績に対する強化学習の実際の貢献に対処してきた。この調査に含まれる最新の研究は、2018年以降の出版物である。この調査はレビューの報告に焦点を当てたPRISMA技術を用いて行われ、チェックリストと4相フロー図に基づいている。調査の結果、フィンテック分野におけるRLベースの戦略の性能は、他の最先端のアルゴリズムよりもかなり優れていることが判明した。本稿では、フィンテックにおける多様な意思決定課題における強化学習アルゴリズムの利用について論じ、金融を扱う組織は、ロボアドバイザリング、スマートオーダーチャネル、マーケットメイキング、ヘッジとオプションの価格設定、ポートフォリオ最適化、最適な実行から大きな利益を得ることができると結論づける。 Applications of Reinforcement Learning in the Finance Technology (Fintech) have acquired a lot of admiration lately. Undoubtedly Reinforcement Learning, through its vast competence and proficiency, has aided remarkable results in the field of Fintech. The objective of this systematic survey is to perform an exploratory study on a correlation between reinforcement learning and Fintech to highlight the prediction accuracy, complexity, scalability, risks, profitability and performance. Major uses of reinforcement learning in finance or Fintech include portfolio optimization, credit risk reduction, investment capital management, profit maximization, effective recommendation systems, and better price setting strategies. Several studies have addressed the actual contribution of reinforcement learning to the performance of financial institutions. The latest studies included in this survey are publications from 2018 onward. The survey is conducted using PRISMA technique which focuses on the reporting of reviews and is based on a checklist and four-phase flow diagram. The conducted survey indicates that the performance of RL-based strategies in Fintech fields proves to perform considerably better than other state-of-the-art algorithms. The present work discusses the use of reinforcement learning algorithms in diverse decision-making challenges in Fintech and concludes that the organizations dealing with finance can benefit greatly from Robo-advising, smart order channelling, market making, hedging and options pricing, portfolio optimization, and optimal execution.	翻訳日:2023-05-21 11:13:33 公開日:2023-04-29
# POET: ProFINET産業運用のためのセルフラーニングフレームワーク POET: A Self-learning Framework for PROFINET Industrial Operations Behaviour ( http://arxiv.org/abs/2305.03175v1 ) ライセンス: Link先を確認	Ankush Meshram, Markus Karch, Christian Haas, J\"urgen Beyerer	(参考訳) 2010年以降、StuxnetやCrashOverrideといった産業インフラにおける複数のサイバーインシデントが、ICS(Industrial Control Systems)の脆弱性をサイバー脅威にさらしている。産業システムは数十年にわたって発注されており、しばしば産業用サイバーセキュリティ機構の技術的進歩に非準拠している。ネットワークインフラストラクチャ情報の利用不可能は,セキュリティポリシの設計やネットワーク侵入検知システム(nid)などのサイバーセキュリティ対策の設定を困難にする。実証的な解決策は、監視されたネットワークトラフィックから産業システムのネットワークインフラストラクチャ情報を自己学習し、異常検出などの下流解析タスクに対してネットワークを透明化することである。本稿では,Pythonをベースとした産業コミュニケーションのパラダイムを意識したフレームワークであるPROFINET Operations Enumeration and Tracking(POET)について報告する。オペレーション駆動の産業ネットワークプロトコルフレームは、オペレーションの列挙のために解剖される。通信イベントによって引き起こされる産業操作間の遷移をキャプチャする要求に対して、有限状態機械(FSM)はデバイス、接続、システムのPROFINET操作を列挙するようにモデル化される。 POETはネットワークトラフィックからネットワーク情報を抽出し、適切なFSMモデル(デバイス、接続、システム)をインスタンス化し、産業運用を追跡する。ネットワーク攻撃によって引き起こされる異常を、PROFINETベースの産業システムで検知し、報告し、有効なネットワークプロトコル交換によって実行し、デバイスに対する不正なPROFINET操作遷移をもたらす。 Since 2010, multiple cyber incidents on industrial infrastructure, such as Stuxnet and CrashOverride, have exposed the vulnerability of Industrial Control Systems (ICS) to cyber threats. The industrial systems are commissioned for longer duration amounting to decades, often resulting in non-compliance to technological advancements in industrial cybersecurity mechanisms. The unavailability of network infrastructure information makes designing the security policies or configuring the cybersecurity countermeasures such as Network Intrusion Detection Systems (NIDS) challenging. An empirical solution is to self-learn the network infrastructure information of an industrial system from its monitored network traffic to make the network transparent for downstream analyses tasks such as anomaly detection. In this work, a Python-based industrial communication paradigm-aware framework, named PROFINET Operations Enumeration and Tracking (POET), that enumerates different industrial operations executed in a deterministic order of a PROFINET-based industrial system is reported. The operation-driving industrial network protocol frames are dissected for enumeration of the operations. For the requirements of capturing the transitions between industrial operations triggered by the communication events, the Finite State Machines (FSM) are modelled to enumerate the PROFINET operations of the device, connection and system. POET extracts the network information from network traffic to instantiate appropriate FSM models (Device, Connection or System) and track the industrial operations. It successfully detects and reports the anomalies triggered by a network attack in a miniaturized PROFINET-based industrial system, executed through valid network protocol exchanges and resulting in invalid PROFINET operation transition for the device.	翻訳日:2023-05-14 21:15:36 公開日:2023-04-29
# 80MHzWi-Fiチャネルを用いた無線人体センシングのためのCSIデータセット A CSI Dataset for Wireless Human Sensing on 80 MHz Wi-Fi Channels ( http://arxiv.org/abs/2305.03170v1 ) ライセンス: Link先を確認	Francesca Meneghello, Nicol\`o Dal Fabbro, Domenico Garlisi, Ilenia Tinnirello, Michele Rossi	(参考訳) 近年,Wi-Fiチャネルの読み上げから人の動きを監視する機械学習技術がいくつか提案されている。しかし、異なる環境に対して堅牢に動作するドメイン適応型アルゴリズムの開発は、まだオープンな問題であり、そのソリューションは環境、人、Wi-Fiハードウェアの観点から、強力なドメイン多様性を特徴とする大きなデータセットを必要とする。現在利用可能な数少ないパブリックデータセットは、20MHzまたは40MHz帯で動作するWi-Fiデバイスを通じて得られるもので、ドメインの多様性がほとんどあるいは全くないため、センシングアルゴリズムの設計の進歩が劇的に制限されている。本研究は,ieee 802.11acチャネル測定のデータセットを,異なる環境,日,ハードウェア間で13名の被験者を対象とした測定キャンペーンを通じて,著名なドメイン多様性を特徴とする80mhz帯域チャネル上で提供することで,このギャップを埋めることを目的としている。送信機とモニタとの間の直接経路を遮断し、半直交室(マルチパスフェーディングなし)で測定値を収集し、新しい実験データを提供する。全体として、データセットは、ieee dataport [1]で利用可能で、13時間以上のチャネル状態情報読み込み(23.6gb)が含まれており、研究者はアクティビティ/id認識とアルゴリズムのカウントをテストできる。 In the last years, several machine learning-based techniques have been proposed to monitor human movements from Wi-Fi channel readings. However, the development of domain-adaptive algorithms that robustly work across different environments is still an open problem, whose solution requires large datasets characterized by strong domain diversity, in terms of environments, persons and Wi-Fi hardware. To date, the few public datasets available are mostly obsolete - as obtained via Wi-Fi devices operating on 20 or 40 MHz bands - and contain little or no domain diversity, thus dramatically limiting the advancements in the design of sensing algorithms. The present contribution aims to fill this gap by providing a dataset of IEEE 802.11ac channel measurements over an 80 MHz bandwidth channel featuring notable domain diversity, through measurement campaigns that involved thirteen subjects across different environments, days, and with different hardware. Novel experimental data is provided by blocking the direct path between the transmitter and the monitor, and collecting measurements in a semi-anechoic chamber (no multi-path fading). Overall, the dataset - available on IEEE DataPort [1] - contains more than thirteen hours of channel state information readings (23.6 GB), allowing researchers to test activity/identity recognition and people counting algorithms.	翻訳日:2023-05-14 21:15:09 公開日:2023-04-29
# qichwabase: ケチュア語とケチュア人コミュニティのための知識ベース QICHWABASE: A Quechua Language and Knowledge Base for Quechua Communities ( http://arxiv.org/abs/2305.06173v1 ) ライセンス: Link先を確認	Elwin Huaman, David Lindemann, Valeria Caruso, Jorge Luis Huaman	(参考訳) 過去10年間で、ウェブはますます言語と知識の表現の場になりつつある。しかし、それはよく読まれた言語と確立されたコミュニティにのみ当てはまり、少数派コミュニティとその資源はあまり注目されなかった。本稿では,ケチュア語と知識の調和プロセスとそのコミュニティを支援するため,qichwabaseを提案する。そのために、世界中のKechuaコミュニティに有利なゲームチェンジャーになり得る方法とツールを採用しています。 Wikibase インスタンスである QICHWABASE の構築に採用されている方法論やツールは,Web 上でのマイノリティの存在を高めることができる。 Over the last decade, the Web has increasingly become a space of language and knowledge representation. However, it is only true for well-spread languages and well-established communities, while minority communities and their resources received less attention. In this paper, we propose QICHWABASE to support the harmonization process of the Quechua language and knowledge, and its community. For doing it, we adopt methods and tools that could become a game changer in favour of Quechua communities around the world. We conclude that the methodology and tools adopted on building QICHWABASE, which is a Wikibase instance, could enhance the presence of minorities on the Web.	翻訳日:2023-05-14 20:46:18 公開日:2023-04-29
# 教育におけるチャットGPT : ソーシャルメディアに関する不安と懸念の談話分析 ChatGPT in education: A discourse analysis of worries and concerns on social media ( http://arxiv.org/abs/2305.02201v1 ) ライセンス: Link先を確認	Lingyao Li, Zihui Ma, Lizhou Fan, Sanggyu Lee, Huizi Yu, Libby Hemphill	(参考訳) 生成型AIモデルの急速な進歩は、教育分野に新たな機会をもたらす。しかし、その使用によって生じる可能性のあるリスクや懸念を認識し、対処することが不可欠である。教育におけるchatgptの利用に関する重要な懸念を明らかにするためにtwitterのデータを分析した。我々は,会話における影響力のあるユーザを特定するために,BERTに基づくトピックモデリングを用いて談話分析とソーシャルネットワーク分析を行った。 twitterユーザは一般的に、chatgptの使用に対する肯定的な態度を強調するが、彼らの懸念は、学術的整合性、学習結果とスキル開発への影響、能力の制限、政策と社会的関心、労働力の課題の5つのカテゴリに収束した。また、テクノロジー分野、教育分野、メディア分野のユーザーは会話にしばしば関与しており、教育やテクノロジーの個人ユーザーは懸念の議論を主導していることもわかりました。これらの知見に基づき、この研究は政策立案者、テック企業、個人、教育者、メディアエージェンシーにいくつかの意味を与えている。まとめると、我々の研究は、教育におけるAIの責任と倫理的利用の重要性を強調し、利害関係者間の協力の必要性を強調している。 The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We analyzed Twitter data to identify key concerns related to the use of ChatGPT in education. We employed BERT-based topic modeling to conduct a discourse analysis and social network analysis to identify influential users in the conversation. While Twitter users generally ex-pressed a positive attitude towards the use of ChatGPT, their concerns converged to five specific categories: academic integrity, impact on learning outcomes and skill development, limitation of capabilities, policy and social concerns, and workforce challenges. We also found that users from the tech, education, and media fields were often implicated in the conversation, while education and tech individual users led the discussion of concerns. Based on these findings, the study provides several implications for policymakers, tech companies and individuals, educators, and media agencies. In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.	翻訳日:2023-05-04 14:17:05 公開日:2023-04-29
# ChatGPTは入門レベルの関数型言語プログラミングコースをパスできるか? Can ChatGPT Pass An Introductory Level Functional Language Programming Course? ( http://arxiv.org/abs/2305.02230v1 ) ライセンス: Link先を確認	Chuqin Geng, Zhang Yihan, Brigitte Pientka, Xujie Si	(参考訳) chatgptの最近の導入は、言語翻訳、テキスト要約、コンピュータプログラミングなど、さまざまなタスクを解決できるという印象的な能力によって、業界とアカデミアの両方から大きな注目を集めている。コードを書き、修正し、修正する能力と使いやすさ、アクセス性は、すでにコンピュータサイエンス教育に劇的に影響を与えています。本稿では,ChatGPTが導入レベルの関数型言語プログラミングコースでどのように機能するかを検討する。システム評価では,chatgptを学生の1人として扱い,b級の成績が得られ,全学生314名中155名であることを示した。総合的な評価は、ChatGPTが学生とインストラクターの両方に与える影響についての貴重な洞察を提供する。さらに、ChatGPTが両グループに提供できる潜在的なメリットをいくつか挙げる。全体として、この研究はChatGPTの能力と潜在的なコンピュータサイエンス教育への影響についての理解を深めるものであると信じている。 The recent introduction of ChatGPT has drawn significant attention from both industry and academia due to its impressive capabilities in solving a diverse range of tasks, including language translation, text summarization, and computer programming. Its capability for writing, modifying, and even correcting code together with its ease of use and access is already dramatically impacting computer science education. This paper aims to explore how well ChatGPT can perform in an introductory-level functional language programming course. In our systematic evaluation, we treated ChatGPT as one of our students and demonstrated that it can achieve a grade B- and its rank in the class is 155 out of 314 students overall. Our comprehensive evaluation provides valuable insights into ChatGPT's impact from both student and instructor perspectives. Additionally, we identify several potential benefits that ChatGPT can offer to both groups. Overall, we believe that this study significantly clarifies and advances our understanding of ChatGPT's capabilities and potential impact on computer science education.	翻訳日:2023-05-04 14:07:10 公開日:2023-04-29
# スクイーズと励磁によるスウィントランスを用いた表情認識 Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers ( http://arxiv.org/abs/2301.10906v7 ) ライセンス: Link先を確認	Arpita Vats, Aman Chadha	(参考訳) 顔の感情を認識して解釈する能力は、表情や発声音を通じて伝達される感情を理解し、応答することができるため、人間のコミュニケーションの重要な要素である。顔の感情の認識は、視覚と聴覚の情報の統合や、事前の知識や社会的手がかりを含む複雑な認知過程である。社会的相互作用、情緒的処理、共感において重要な役割を担い、人間とコンピュータの相互作用、仮想アシスタント、メンタルヘルス診断と治療を含む多くの現実世界の応用において重要な側面である。顔の感情認識のための正確かつ効率的なモデルの開発は、様々な研究分野に大きな影響を与える可能性があり、コンピュータビジョンや人工知能の分野において、顔の感情認識(FER)の分野は大きな意味を持ち、セキュリティ、広告、エンターテイメントといった分野において、商業的および学術的な可能性を持っている。本研究では,Swin Vision Transformers (SwinT) とSwin Vision Transformers (SE) を併用したFERフレームワークを提案する。このアプローチでは、アテンション機構を備えたトランスフォーマーモデル、SE、SAMを使用して、トランスフォーマーが大量のデータを必要とする場合が多いため、モデルの効率を改善する。我々の焦点は、最小限のデータを使って顔の感情を認識できるSwinTアーキテクチャに基づく効率的なFERモデルを作ることであった。我々はハイブリッドデータセットでモデルをトレーニングし,AffectNetデータセット上での性能評価を行い,欧州コンピュータビジョン会議(ECCV)2022~\cite{Kollias}で開催されるABAWコンペティションの優勝者を上回ったF1スコア0.5420を達成した。 The ability to recognize and interpret facial emotions is a critical component of human communication, as it allows individuals to understand and respond to emotions conveyed through facial expressions and vocal tones. The recognition of facial emotions is a complex cognitive process that involves the integration of visual and auditory information, as well as prior knowledge and social cues. It plays a crucial role in social interaction, affective processing, and empathy, and is an important aspect of many real-world applications, including human-computer interaction, virtual assistants, and mental health diagnosis and treatment. The development of accurate and efficient models for facial emotion recognition is therefore of great importance and has the potential to have a significant impact on various fields of study.The field of Facial Emotion Recognition (FER) is of great significance in the areas of computer vision and artificial intelligence, with vast commercial and academic potential in fields such as security, advertising, and entertainment. We propose a FER framework that employs Swin Vision Transformers (SwinT) and squeeze and excitation block (SE) to address vision tasks. The approach uses a transformer model with an attention mechanism, SE, and SAM to improve the efficiency of the model, as transformers often require a large amount of data. Our focus was to create an efficient FER model based on SwinT architecture that can recognize facial emotions using minimal data. We trained our model on a hybrid dataset and evaluated its performance on the AffectNet dataset, achieving an F1-score of 0.5420, which surpassed the winner of the Affective Behavior Analysis in the Wild (ABAW) Competition held at the European Conference on Computer Vision (ECCV) 2022~\cite{Kollias}.	翻訳日:2023-05-03 17:26:41 公開日:2023-04-29
# ガウス混合ブロックモデルにおけるスペクトルクラスタリング Spectral clustering in the Gaussian mixture block model ( http://arxiv.org/abs/2305.00979v1 ) ライセンス: Link先を確認	Shuangping Li, Tselil Schramm	(参考訳) ガウス混合ブロックモデルは、現代のネットワークをモデル化しようとするグラフ上の分布である: そのようなモデルからグラフを生成するために、各頂点 $i$ と遅延特徴ベクトル $u_i \in \mathbb{R}^d$ をガウスの混合からサンプリングし、特徴ベクトルが十分に類似している場合にのみ edge $(i,j)$ を加える。ガウス混合の異なる構成要素は、機能上の異なる分布を持つ異なる種類のノードが存在するという事実を表している。これらのネットワークに関連する自然なアルゴリズムタスクは、埋め込み(潜在特徴ベクトルの復元)とクラスタリング(混合成分によるノードのグループ化)である。本稿では、高次元ガウス混合ブロックモデルからサンプリングされたクラスタリングと埋め込みグラフの研究を開始し、ネットワークの$n \to \infty$として潜在特徴ベクトルの次元を$d\to \infty$とする。この高次元の設定は、潜在特徴空間が高次元であると考える現代のネットワークの文脈において最も適切である。 2成分球面ガウス混合の場合、そのようなグラフに対する標準スペクトルクラスタリングと埋め込みアルゴリズムの性能を分析し、これらのモデルにクラスタリングと埋め込みのための情報計算の展望をスケッチし始める。 Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge \tau$ for a pre-specified threshold $\tau$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.	翻訳日:2023-05-03 16:40:24 公開日:2023-04-29
# 分子関係学習のための条件付きグラフ情報基盤 Conditional Graph Information Bottleneck for Molecular Relational Learning ( http://arxiv.org/abs/2305.01520v1 ) ライセンス: Link先を確認	Namkyeong Lee, Dongmin Hyun, Gyoung S. Na, Sungwon Kim, Junseok Lee, Chanyoung Park	(参考訳) 分子関係学習は、分子対間の相互作用の振る舞いを学ぶことを目的としており、その幅広い応用のために分子科学への関心が高まった。近年、グラフニューラルネットワークは、分子をグラフ構造としてモデル化し、2分子間の原子レベルの相互作用を考慮し、分子関係学習において大きな成功を収めている。その成功にもかかわらず、既存の分子関係学習法は化学の性質を見落としている傾向にあり、例えば、化学反応を引き起こす官能基のような複数のサブ構造からなる化合物である。本研究では,コアサブグラフを検出することによって,グラフ対間のインタラクション挙動を予測するcgibと呼ばれる新しい関係学習フレームワークを提案する。主なアイデアは、一対のグラフが与えられたとき、条件付きグラフ情報ボトルネックの原理に基づいて、ペア付きグラフ上で条件付けされたタスクに関する最小限の十分な情報を含むグラフからサブグラフを見つけることである。提案手法は化学反応の性質、すなわち分子の核構造がどの分子と相互作用するかによって変化するという性質を模倣していると論じる。実世界のデータセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインよりもCGIBの方が優れていることを示す。私たちのコードはhttps://github.com/Namkyeong/CGIB.comで利用可能です。 Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.	翻訳日:2023-05-03 13:57:25 公開日:2023-04-29
# 無線センサネットワークにおける臨界ノード同定のための教師付き能動学習法 A supervised active learning method for identifying critical nodes in Wireless Sensor Network ( http://arxiv.org/abs/2004.08885v4 ) ライセンス: Link先を確認	Behnam Ojaghi and Mohammad Mahdi Dehshibi	(参考訳) 無線センサネットワーク(WSN)のエネルギー効率は、ホップ数、ユーザの位置、割り当てられた電力、リレーなどの主な特性に依存する。しかし,これらの特徴に影響を及ぼすノードの同定は,計算オーバーヘッドやエネルギー消費に大きく影響している。本稿では,wsnにおける臨界ノード同定の計算オーバーヘッドに対処するためのアクティブラーニング手法を提案する。提案手法は非クリティカルノードを識別するバイアスを克服し、wsnの動的性質に適応するための微調整の労力をはるかに少なくする。この手法はクラスタリングと分類モジュールの協調によって、典型的な教師付き学習シナリオにおけるデータの要求数を反復的に減少させ、非クリティカルノードである非形式的な例の存在下での精度を高めることができる。実験の結果,提案手法は,大規模WSN環境,第5世代モバイルネットワーク(5G),大規模分散IoT(センサネットワーク)など,ネットワークの寿命を延ばすことができる。 Energy Efficiency of a wireless sensor network (WSN) relies on its main characteristics, including hop-number, user's location, allocated power, and relay. Identifying nodes, which have more impact on these characteristics, is, however, subject to a substantial computational overhead and energy consumption. In this paper, we proposed an active learning approach to address the computational overhead of identifying critical nodes in a WSN. The proposed approach can overcome biasing in identifying non-critical nodes and needs much less effort in fine-tuning to adapt to the dynamic nature of WSN. This method benefits from the cooperation of clustering and classification modules to iteratively decrease the required number of data in a typical supervised learning scenario and to increase the accuracy in the presence of uninformative examples, i.e., non-critical nodes. Experiments show that the proposed method has more flexibility, compared to the state-of-the-art, to be employed in large scale WSN environments, the fifth-generation mobile networks (5G), and massively distributed IoT (i.e., sensor networks), where it can prolong the network lifetime.	翻訳日:2023-05-02 22:37:29 公開日:2023-04-29
# 多部相関によるスピン1鎖の量子相転移の検出 Detection of quantum phase transition in spin-1 chain through multipartite high-order correlations ( http://arxiv.org/abs/2105.12391v2 ) ライセンス: Link先を確認	Dongkeun Lee, Adel Sohbi and Wonmin Son	(参考訳) 我々は XXZ spin-1 鎖の基底状態と相転移領域における部位異方性との相関関係に反するベルの不等式を設計する。スピン1系におけるそのような相関を検出するために、多部相関と高次相関を用いて一般化ベル不等式の定式化を利用する。我々は、いわゆる大D相とAFM相の間の量子相転移付近で鋭い破れを観察する。興味深いことに,我々のベル不等式違反は,臨界領域におけるXXZスピン-1鎖基底状態からGHZ様状態への変化によるものである。本研究は, XXZ スピン-1 連鎖の相関によるベル型制約違反による量子相転移を, 多体相関および高次測定により初めて評価した。 We design a Bell inequality that is violated by correlations obtained from the ground states of XXZ spin-1 chain with on site anisotropies at the region of phase transition. In order to detect such correlations in spin-1 systems we exploit the formalism of generalized Bell inequality via the use of multipartite and high order correlations. We observe sharp violation in the vicinity of quantum phase transition between the so called large D and AFM phase. Interestingly, the violation of our Bell inequality is manifested by the change of the XXZ spin-1 chain ground state to a Greenberger-Horne-Zeilinger (GHZ)-like state at the critical region. Our results provide the first characterization of quantum phase transition via the violation of Bell-type constraint by correlations in the XXZ spin-1 chain with multi-body correlations and high-order measurements.	翻訳日:2023-05-02 22:10:41 公開日:2023-04-29
# 残留ニューラルネットワークにおける拡散機構:理論と応用 Diffusion Mechanism in Residual Neural Network: Theory and Applications ( http://arxiv.org/abs/2105.03155v5 ) ライセンス: Link先を確認	Tangjun Wang, Zehao Dou, Chenglong Bao, Zuoqiang Shi	(参考訳) 多くの物理プロセスで現れる基本的な内部機構である拡散は、異なるオブジェクト間の相互作用を記述する。限られたトレーニングサンプルを持つ多くの学習タスクでは、拡散はラベル付きデータポイントとラベルなしデータポイントを接続し、高い分類精度を達成するための重要な要素である。既存のディープラーニングアプローチの多くは、ニューラルネットワークのトレーニング時に直接核融合損失を課している。本研究では, 対流拡散常微分方程式(odes)に着想を得て, ニューラルネットワークのアーキテクチャに内部拡散を導入する新しい拡散残差ネットワーク(diff-resnet)を提案する。構造的データ仮定により,提案した拡散ブロックは,クラス間点の分離性を向上し,クラス間点間の距離を減少させる距離-距離比を増大させることができることを示した。さらに、この性質は分離可能な超平面を構築するための残留ネットワークにより容易に適用できる。合成二分法,半教師付きグラフノード分類,少数ショット画像分類の大規模な実験により,提案手法の有効性が検証された。 Diffusion, a fundamental internal mechanism emerging in many physical processes, describes the interaction among different objects. In many learning tasks with limited training samples, the diffusion connects the labeled and unlabeled data points and is a critical component for achieving high classification accuracy. Many existing deep learning approaches directly impose the fusion loss when training neural networks. In this work, inspired by the convection-diffusion ordinary differential equations (ODEs), we propose a novel diffusion residual network (Diff-ResNet), internally introduces diffusion into the architectures of neural networks. Under the structured data assumption, it is proved that the proposed diffusion block can increase the distance-diameter ratio that improves the separability of inter-class points and reduces the distance among local intra-class points. Moreover, this property can be easily adopted by the residual networks for constructing the separable hyperplanes. Extensive experiments of synthetic binary classification, semi-supervised graph node classification and few-shot image classification in various datasets validate the effectiveness of the proposed method.	翻訳日:2023-05-02 22:10:32 公開日:2023-04-29
# 多様性保存グラフリファインメントによるグラフ表現学習 Graph Representation Learning via Diversity-preserving Graph Refinement ( http://arxiv.org/abs/2103.07295v3 ) ライセンス: Link先を確認	Shuai Zheng	(参考訳) 実世界のグラフデータの場合、ノード間の複雑な関係はしばしばハードバイナリリンクとして表される。明らかに、これはノード間の連続的な関係の離散的で単純化された形式であり、学習したノード表現の表現性を著しく制限する。一方、埋め込み空間で得られるノード表現は、ノード間の固有の関係を明らかにするために使うことができる。ノード間の関係をよりよく特徴付けし、さらにノード表現の学習を容易にするため、直感的な方法は、組み込みノード表現を用いて元のグラフ構造を洗練することである。しかし、区別のない全てのノード間の関係のグローバルな改善は、必然的にノイズの多いエッジにつながり、ノード表現学習モデルのトレーニングをさらに混乱させる可能性がある。さらに、大規模なグラフにもスケーラビリティの問題があります。これらの問題に対処するために,ノードの潜在関係を徐々に明らかにし,効率的かつ堅牢なグラフリファインメントを実現するために,局所構造を考慮したグラフリファインメントを提案する。 For real-world graph data, the complex relationship between nodes is often represented as a hard binary link. Obviously, it is a discrete and simplified form of continuous relationship between nodes, which seriously limits the expressibility of the learned node representation. On the other hand, the node representation obtained in the embedding space can in turn be used to reveal the intrinsic relationship between nodes. To better characterize the node relationships and further facilitate the learning of node representation, an intuitive way is to refine the originally given graph structure with the embedded node representations. However, such global refinement of the relationships among all nodes without distinction will inevitably lead to some noisy edges, which may further confuse the training of the node representation learning model. In addition, it also has scalability problems on large graphs. To address these issues, we propose a local structure aware graph refinement to progressively reveal the latent relationships of nodes, thus achieving efficient and robust graph refinement.	翻訳日:2023-05-02 22:10:14 公開日:2023-04-29
# 深いガウス過程に対する疎拡大 A Sparse Expansion For Deep Gaussian Processes ( http://arxiv.org/abs/2112.05888v3 ) ライセンス: Link先を確認	Liang Ding and Rui Tuo and Shahin Shahrampour	(参考訳) 本研究では,複雑な分布をもつ確率過程の統計代理として,深いガウス過程(dgps)を用いる。 DGPモデルの従来の推論手法は、トレーニングと推論のためにカーネル行列を用いた大規模演算を必要とするため、計算の複雑さに悩まされる。本研究では, テンソルマルコフ・ガウス過程 (TMGP) と呼ばれる, ガウス過程の幅に基づいて, 正確な推論と効率的なトレーニングを行うための効率的なスキームを提案する。階層展開(hierarchical expansion)と呼ばれるTMGPの誘導近似を構築する。次に,深部TMGP(DTMGP)モデルを構築し,TMGPの多重階層展開の合成を行う。提案したDTMGPモデルには以下の特性がある: (1) 各活性化関数の出力は決定論的であり、一方で重みは標準ガウス分布から独立に選択される; (2) 訓練や予測において、ポリログ(M) のみの活性化関数はゼロではないので、計算効率が大幅に向上する。合成モデルと実データセットに関する数値実験により、既存のDGPモデルよりもDTMGPの計算効率が優れていることを示した。 In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and efficient training based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only polylog(M) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on synthetic models and real datasets show the superior computational efficiency of DTMGP over existing DGP models.	翻訳日:2023-05-02 22:01:47 公開日:2023-04-29
# 協調型悪質勾配フィルタリングによるビザンチン・ロバスト連関学習 Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering ( http://arxiv.org/abs/2109.05872v2 ) ライセンス: Link先を確認	Jian Xu, Shao-Lun Huang, Linqi Song, Tian Lan	(参考訳) フェデレーション学習における勾配ベースのトレーニングは、しばしばビザンチンクライアントとしてモデル化される、欠陥/悪意のあるクライアントに対して脆弱であることが知られている。この目的のために、以前の研究ではパラメータサーバで補助データを使用して受信した勾配(例えば、検証エラー率の計算など)を検証するか、統計ベースの手法(中央値やKrumなど)を利用して、ビザンティンのクライアントから悪意のある勾配を特定し削除する。本稿では,補助データの利用が必ずしも可能とは限らないことを指摘し,統計ベースのアプローチに焦点をあてる。しかし、近年のモデル中毒攻撃の研究は、高度に作り上げられた攻撃は、中央値と距離に基づく統計的防御手法のほとんどを回避できることを示した。この課題に取り組むために,勾配ベクトルの要素方向符号がモデル中毒攻撃の検出に有用な洞察を与えることを示す。我々は, \textit{little is enough}攻撃の理論的解析に基づいて,ビザンチン・ロバスト連関学習を実現するための新しい手法である \textit{signguard}を提案する。より正確には、受信された勾配は最初に処理され、関連する等級、符号、類似度統計を生成し、最終的に集約する前に悪意のある勾配を取り除くために複数のフィルタによって協調的に利用される。最後に,最近提案された攻撃および防衛戦略に基づいて,画像およびテキスト分類タスクの広範な実験を行った。その結果,提案手法の有効性と優位性を示した。コードは \textit{\url{https://github.com/jianxu95/signguard}} で利用可能である。 Gradient-based training in federated learning is known to be vulnerable to faulty/malicious clients, which are often modeled as Byzantine clients. To this end, previous work either makes use of auxiliary data at parameter server to verify the received gradients (e.g., by computing validation error rate) or leverages statistic-based methods (e.g. median and Krum) to identify and remove malicious gradients from Byzantine clients. In this paper, we remark that auxiliary data may not always be available in practice and focus on the statistic-based approach. However, recent work on model poisoning attacks has shown that well-crafted attacks can circumvent most of median- and distance-based statistical defense methods, making malicious gradients indistinguishable from honest ones. To tackle this challenge, we show that the element-wise sign of gradient vector can provide valuable insight in detecting model poisoning attacks. Based on our theoretical analysis of the \textit{Little is Enough} attack, we propose a novel approach called \textit{SignGuard} to enable Byzantine-robust federated learning through collaborative malicious gradient filtering. More precisely, the received gradients are first processed to generate relevant magnitude, sign, and similarity statistics, which are then collaboratively utilized by multiple filters to eliminate malicious gradients before final aggregation. Finally, extensive experiments of image and text classification tasks are conducted under recently proposed attacks and defense strategies. The numerical results demonstrate the effectiveness and superiority of our proposed approach. The code is available at \textit{\url{https://github.com/JianXu95/SignGuard}}	翻訳日:2023-05-02 22:00:37 公開日:2023-04-29
# 冗長表現は広域ニューラルネットワークの一般化に役立つ Redundant representations help generalization in wide neural networks ( http://arxiv.org/abs/2106.03485v4 ) ライセンス: Link先を確認	Diego Doimo, Aldo Glielmo, Sebastian Goldt, Alessandro Laio	(参考訳) ディープラーニング(DNN)は、古典的なバイアス分散トレードオフを否定する: トレーニングデータを補間するパラメータをDNNに追加することで、一般化のパフォーマンスが向上する。ディープネットワークにおけるこの ‘benign overfitting' のメカニズムを説明することは、いまだに優れた課題である。本稿では,最先端の畳み込みニューラルネットワークにおける最後の隠れ層表現について検討し,最後に隠れた表現が十分に広い場合,そのニューロンは同一の情報を持つグループに分けられる傾向にあり,統計的に独立したノイズによってのみ互いに異なることを見出した。このような群の数は層幅とともに直線的に増加するが、その幅が臨界値を超える場合に限る。トレーニングプロセスが補間され、トレーニングエラーがゼロとなる場合にのみ、冗長ニューロンが現れることを示す。 Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise. The number of such groups increases linearly with the width of the layer, but only if the width is above a critical value. We show that redundant neurons appear only when the training process reaches interpolation and the training error is zero.	翻訳日:2023-05-02 21:59:17 公開日:2023-04-29
# 歪み補正と高精度特徴検出を用いた学習型カメラ校正フレームワーク Learning-Based Framework for Camera Calibration with Distortion Correction and High Precision Feature Detection ( http://arxiv.org/abs/2202.00158v3 ) ライセンス: Link先を確認	Yesheng Zhang, Xu Zhao and Dahong Qian	(参考訳) カメラキャリブレーションは多くのロボットシステムの性能に大きな影響を及ぼす重要な技術である。堅牢性と高精度は、常に多様な校正方法の追求である。しかし、Zhangの手法に基づく最先端のキャリブレーション技術は、環境ノイズ、ラジアルレンズ歪み、準最適パラメータ推定に悩まされている。そこで本稿では,学習に基づくアプローチと,これらのボトルネックに対処する従来の手法を組み合わせたハイブリッドカメラキャリブレーションフレームワークを提案する。特にこのフレームワークは、効率的な歪み補正とロバストなチェスボードコーナー座標符号化を行うために学習に基づくアプローチを利用する。コーナー検出のサブピクセル精度向上のために,組込み外乱除去機構を備えた特別設計座標復号アルゴリズムを提案する。提案手法は, RANSACアルゴリズムによる従来のパラメータ推定を改良し, 安定した結果を得る。広範に使われている2つのカメラキャリブレーションツールボックスと比較して、実データと合成データの両方の実験結果は、提案フレームワークのより良い堅牢性と高い精度を示す。大規模な合成データセットは、当社のフレームワークの十分なパフォーマンスの基礎であり、https://github.com/Easonyesheng/CCS.comのコードとともに公開されます。 Camera calibration is a crucial technique which significantly influences the performance of many robotic systems. Robustness and high precision have always been the pursuit of diverse calibration methods. State-of-the-art calibration techniques based on classical Zhang's method, however, still suffer from environmental noise, radial lens distortion and sub-optimal parameter estimation. Therefore, in this paper, we propose a hybrid camera calibration framework which combines learning-based approaches with traditional methods to handle these bottlenecks. In particular, this framework leverages learning-based approaches to perform efficient distortion correction and robust chessboard corner coordinate encoding. For sub-pixel accuracy of corner detection, a specially-designed coordinate decoding algorithm with embed outlier rejection mechanism is proposed. To avoid sub-optimal estimation results, we improve the traditional parameter estimation by RANSAC algorithm and achieve stable results. Compared with two widely-used camera calibration toolboxes, experiment results on both real and synthetic datasets manifest the better robustness and higher precision of the proposed framework. The massive synthetic dataset is the basis of our framework's decent performance and will be publicly available along with the code at https://github.com/Easonyesheng/CCS.	翻訳日:2023-05-02 20:15:46 公開日:2023-04-29
# よりロバストなサンプルにより正規化を施すことによる対向ロバスト性の向上 Improving adversarial robustness by putting more regularizations on less robust samples ( http://arxiv.org/abs/2206.03353v2 ) ライセンス: Link先を確認	Dongyoon Yang, Insung Kong, Yongdai Kim	(参考訳) 敵の攻撃に対する堅牢性を高めるための敵意トレーニングは、与えられた深層ニューラルネットワークを欺くために、人間の知覚可能なデータの摂動を生成することが容易であるため、多くの注目を集めている。本稿では,既存のアルゴリズムよりも理論的にモチベーションが高く,経験的に優れている新しい学習アルゴリズムを提案する。提案アルゴリズムの新たな特徴は、既存の正規化アルゴリズムよりも敵攻撃に弱いデータに対してより規則化を適用することである。理論的には,本アルゴリズムはロバストリスクの新たな上限から誘導される正規化経験的リスクを最小化するためのアルゴリズムとして理解できることを示す。数値実験により,提案アルゴリズムは一般化(実例の精度)と強靭性(敵攻撃の精度)を同時に改善し,最先端の性能を実現する。 Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.	翻訳日:2023-05-02 20:08:32 公開日:2023-04-29
# オープン量子系における代数のスクランブル Scrambling of Algebras in Open Quantum Systems ( http://arxiv.org/abs/2206.02033v5 ) ライセンス: Link先を確認	Faidon Andreadakis, Namit Anand, Paolo Zanardi	(参考訳) 量子システムにおける情報のダイナミックスクランブルに対する多くの定量的アプローチは、オフオブタイムコリエータ(otocs)の研究を含んでいる。本稿では、一般化された量子サブシステムの量子チャネル下での情報スクランブルを研究するための代数OTOC(\mathcal{A}$-OTOC)を提案する。閉量子系において、この代数的フレームワークは近年、作用素の絡み合い、コヒーレンス生成力、ロシミットエコーの量子情報理論の統一に用いられている。この研究の主な焦点は、これらの技術の自然に一般化して量子システムを開くことである。まず、ユニタリダイナミクスにおいて、$\mathcal{a}$-otoc は情報スクランブルの一般化された概念、すなわち可観測圏とその可換圏の間を定量化する。一方,オープン量子システムでは,グローバル環境デコヒーレンスと局所的な情報のスクランブルが競合している。この相互作用は代数や量子チャネルの様々な例を解析的に研究することによって説明できる。解析結果を補完するため,PXPモデルとハイゼンベルクXXXモデルという2つのパラダイムシステムの数値シミュレーションを行った。数値計算の結果,多体傷と脱コヒーレンスのない部分空間の安定性が明らかとなった。 Many quantitative approaches to the dynamical scrambling of information in quantum systems involve the study of out-of-time-ordered correlators (OTOCs). In this paper, we introduce an algebraic OTOC ($\mathcal{A}$-OTOC) that allows us to study information scrambling of generalized quantum subsystems under quantum channels. For closed quantum systems, this algebraic framework was recently employed to unify quantum information-theoretic notions of operator entanglement, coherence-generating power, and Loschmidt echo. The main focus of this work is to provide a natural generalization of these techniques to open quantum systems. We first show that, for unitary dynamics, the $\mathcal{A}$-OTOC quantifies a generalized notion of information scrambling, namely between a subalgebra of observables and its commutant. For open quantum systems, on the other hand, we find a competition between the global environmental decoherence and the local scrambling of information. We illustrate this interplay by analytically studying various illustrative examples of algebras and quantum channels. To complement our analytical results, we perform numerical simulations of two paradigmatic systems: the PXP model and the Heisenberg XXX model, under dephasing. Our numerical results reveal connections with many-body scars and the stability of decoherence-free subspaces.	翻訳日:2023-05-02 20:07:58 公開日:2023-04-29
# Occupancy-MAE: Masked Occupancy Autoencoders を用いた自己学習型大規模LiDAR点雲 Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders ( http://arxiv.org/abs/2206.09900v6 ) ライセンス: Link先を確認	Chen Min and Xinli Xu and Dawei Zhao and Liang Xiao and Yiming Nie and Bin Dai	(参考訳) 自動運転における現在の認識モデルは、大規模ラベル付きLiDARデータに大きく依存している。本研究では,自動運転において利用可能な大量のラベルなしLiDARデータを用いて,自己指導型マスク学習の研究を促進することを目的とする。しかしながら、既存のマスク付きポイント自動符号化法は、小規模の屋内点雲にのみ焦点をあて、通常、多くの分散されていないLiDAR点を持つ屋外のシーンに適応するのに苦労する。これらの課題に対処するために,大規模屋外LiDARポイントに特化して設計されたOccupancy-MAEという自己教師型マスク学習手法を提案する。本研究では,大規模ライダ点雲の空間占有構造を緩やかに活用し,レンジアウェアなランダムマスキング戦略と占有予測のプリテキストタスクを導入する。 Occupancy-MAEは、LiDARへの距離に基づいて、LiDAR点雲のボクセルをランダムにマスクし、3Dシーン全体のマスクされた占有構造を予測する。この単純な占有予測目的により、Occupancy-MAEは、少量の目に見えるボクセルからマスクされたボクセルを回収するために、高いレベルの意味情報を抽出する。大規模な実験は、複数の下流タスクにおけるOccupancy-MAEの有効性を示す。 3dオブジェクト検出タスクでは、kittiの車検出に必要なラベル付きデータを半分に削減し、waymo上の小さなオブジェクト検出を約2%増加させる。 3Dセマンティックセグメンテーションタスクでは、Occupancy-MAEはnuScenesでトレーニングをスクラッチから約2%のmIOUで上回ります。教師なしのドメイン適応タスクでは、Occupancy-MAEは約0.5\% ~ 1% mAPの性能を改善する。以上の結果から,未ラベルの大規模lidar点雲をマスク付きオートエンコーディングで事前訓練することで,自律運転の3次元知覚能力を向上させることが可能であった。 Current perception models in autonomous driving rely heavily on large-scale labeled LiDAR data, which is costly and time-consuming to annotate. In this work, we aim to facilitate research on self-supervised masked learning using the vast amount of unlabeled LiDAR data available in autonomous driving. However, existing masked point autoencoding methods only focus on small-scale indoor point clouds and struggle to adapt to outdoor scenes, which usually have a large number of non-evenly distributed LiDAR points. To address these challenges, we propose a new self-supervised masked learning method named Occupancy-MAE, specifically designed for large-scale outdoor LiDAR points. We leverage the gradually sparse occupancy structure of large-scale outdoor LiDAR point clouds and introduce a range-aware random masking strategy and a pretext task of occupancy prediction. Occupancy-MAE randomly masks voxels of LiDAR point clouds based on their distance to LiDAR and predicts the masked occupancy structure of the whole 3D scene. This simple occupancy prediction objective encourages Occupancy-MAE to extract high-level semantic information to recover the masked voxel from only a small amount of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For the 3D object detection task, Occupancy-MAE reduces the labeled data required for car detection on KITTI by half and boosts small object detection by around 2% mAP on Waymo. For the 3D semantic segmentation task, Occupancy-MAE outperforms training from scratch by around 2% mIOU on nuScenes. For the unsupervised domain adaptation task, Occupancy-MAE improves the performance by about 0.5\% ~ 1% mAP. Our results show that it is feasible to pre-train unlabeled large-scale LiDAR point clouds with masked autoencoding to enhance the 3D perception ability of autonomous driving.	翻訳日:2023-05-02 19:56:19 公開日:2023-04-29
# 参照限定合成ゼロショット学習 Reference-Limited Compositional Zero-Shot Learning ( http://arxiv.org/abs/2208.10046v2 ) ライセンス: Link先を確認	Siteng Huang, Qiyao Wei, Donglin Wang	(参考訳) compositional zero-shot learning (czsl)とは、人工知能システムが世界を学習し理解するための必須の能力である、既知の視覚プリミティブの未熟な構成を認識することを指す。既存のベンチマークではかなりの進歩があったが、一般的なCZSL手法は、実世界の見えない環境での学習において一般的である、少数ショットと少数参照合成の課題に対処できるかどうかを疑っている。そこで本研究では,数個のサンプルのみを含む限定的構成を基準として,観察されたプリミティブの見当たらない構成を同定する,難解な参照限定合成ゼロショット学習(rl-czsl)問題について検討する。本稿では,不十分な参照情報から効率的に構成性を学習し,未知の合成に一般化できるメタ合成グラフ学習器(metacgl)を提案する。さらに、多様な合成ラベルを持つ自然画像からなる2つの新しい大規模データセットでベンチマークを構築し、rl-czslのより現実的な環境を提供します。評価実験の結果,提案手法は,参照が作曲学習に限られている場合の未知の合成を認識できる。 Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world. While considerable progress has been made on existing benchmarks, we suspect whether popular CZSL methods can address the challenges of few-shot and few referential compositions, which is common when learning in real-world unseen environments. To this end, we study the challenging reference-limited compositional zero-shot learning (RL-CZSL) problem in this paper, i.e., given limited seen compositions that contain only a few samples as reference, unseen compositions of observed primitives should be identified. We propose a novel Meta Compositional Graph Learner (MetaCGL) that can efficiently learn the compositionality from insufficient referential information and generalize to unseen compositions. Besides, we build a benchmark with two new large-scale datasets that consist of natural images with diverse compositional labels, providing more realistic environments for RL-CZSL. Extensive experiments in the benchmarks show that our method achieves state-of-the-art performance in recognizing unseen compositions when reference is limited for compositional learning.	翻訳日:2023-05-02 19:49:53 公開日:2023-04-29
# コンフォーマルリスク制御 Conformal Risk Control ( http://arxiv.org/abs/2208.02814v3 ) ライセンス: Link先を確認	Anastasios N. Angelopoulos and Stephen Bates and Adam Fisch and Lihua Lei and Tal Schuster	(参考訳) 我々はコンフォメーション予測を拡張して,任意の単調損失関数の期待値を制御する。このアルゴリズムは、カバレッジ保証とともに分割共形予測を一般化する。共形予測と同様に、共形リスク制御手順は$\mathcal{O}(1/n)$ factorまで厳密である。また, 分散シフト, 量子リスク制御, 複数対逆リスク制御, およびU統計学の期待に対する考え方の拡張についても紹介する。コンピュータビジョンと自然言語処理によるサンプルは、偽陰性率、グラフ距離、トークンレベルのf1-scoreをバインドするアルゴリズムの使用例を示している。 We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.	翻訳日:2023-05-02 19:48:37 公開日:2023-04-29
# 軽量画像超解像のためのクロスレセプティブフォーカス型推論ネットワーク Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution ( http://arxiv.org/abs/2207.02796v2 ) ライセンス: Link先を確認	Wenjie Li, Juncheng Li, Guangwei Gao, Jiantao Zhou, Jian Yang, and Guo-Jun Qi	(参考訳) 近年,トランスフォーマーを用いた手法は,グローバルな特徴抽出能力により,単一画像超解像(SISR)タスクにおいて顕著な性能を示した。しかし、動的に特徴を抽出するために文脈情報を組み込む必要のあるトランスフォーマーの能力は無視される。そこで本研究では,CNNとTransformerを混合したCTブロックのカスケードで構成される,軽量なクロスレセプティブ・フォーカスド推論ネットワーク(CFIN)を提案する。具体的には、CTブロックにおいて、まずCNNベースのクロススケール情報集約モジュール(CIAM)を提案する。そこで我々は,現在の意味情報を理解し,異なる自己意図内での情報相互作用を利用する変調畳み込みカーネルを用いて,再構成に必要なコンテキスト情報の選択を可能にする,新しいクロスレセプティブフィールドガイドトランス (CFGT) を設計した。大規模実験により,提案したCFINは文脈情報を用いて画像の再構成を効果的に行うことができ,計算コストとモデル性能のバランスが良くなることを示した。ソースコードはhttps://github.com/IVIPLab/CFINで入手できる。 Recently, Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks due to the ability of global feature extraction. However, the capabilities of Transformers that need to incorporate contextual information to extract features dynamically are neglected. To address this issue, we propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer. Specifically, in the CT block, we first propose a CNN-based Cross-Scale Information Aggregation Module (CIAM) to enable the model to better focus on potentially helpful information to improve the efficiency of the Transformer phase. Then, we design a novel Cross-receptive Field Guided Transformer (CFGT) to enable the selection of contextual information required for reconstruction by using a modulated convolutional kernel that understands the current semantic information and exploits the information interaction within different self-attention. Extensive experiments have shown that our proposed CFIN can effectively reconstruct images using contextual information, and it can strike a good balance between computational cost and model performance as an efficient model. Source codes will be available at https://github.com/IVIPLab/CFIN.	翻訳日:2023-05-02 19:47:28 公開日:2023-04-29
# 頂点色制約下での量子インスパイアされた完全マッチング Quantum-Inspired Perfect Matching under Vertex-Color Constraints ( http://arxiv.org/abs/2209.13063v3 ) ライセンス: Link先を確認	Moshe Y. Vardi and Zhiwei Zhang	(参考訳) 両色エッジを持つグラフに頂点色制約の下で完全マッチングが存在するというグラフ理論問題EXISTS-PMVCを提案する。 EXISTS-PMVCは、量子状態の同定と量子実験設計によるモチベーションと、その豊かな表現性、すなわち、EXISTS-PMVCは、完全マッチングのような重要な制約付きマッチング問題を自然に仮定するため、特に関心がある。我々は,(1)決定ダイアグラム制約(EXISTS-PMVC-DD)と(2)対称性制約(EXISTS-PMVC-Sym)の2種類の頂点色制約の下で,EXISTS-PMVCの複雑性とアルゴリズム的結果を与える。 EXISTS-PMVC-DDでは,グラフガジェット法によりNP硬度を明らかにする。有界な色数(EXISTS-PMVC-Sym-Bunded)を持つEXISTS-PMVC-SymがExact Perfect Matching(XPM)と多項式的に等価であることを証明する。しかし、EXISTS-PMVC-Sym-Boundedを解くためにXPMのアルゴリズムを直接適用することは現実的ではない。我々は, EXISTS-PMVC-Sym-Bounded を複雑に処理するアルゴリズムを提案する。 EXISTS-PMVCの新たな結果は、制約付きマッチングとスケーラブルな量子実験設計の両方に関する洞察を提供する。 We propose and study the graph-theoretical problem EXISTS-PMVC: the existence of perfect matching under vertex-color constraints on graphs with bi-colored edges. EXISTS-PMVC is of special interest because of its motivation from quantum-state identification and quantum-experiment design, as well as its rich expressiveness, i.e., EXISTS-PMVC naturally subsumes important constrained matching problems, such as exact perfect matching. We give complexity and algorithmic results for EXISTS-PMVC under two types of vertex color constraints: (1) decision-diagram constraints (EXISTS-PMVC-DD) and (2) symmetric constraints (EXISTS-PMVC-Sym). For EXISTS-PMVC-DD, we reveal its NP-hardness by a graph-gadget technique. We prove that EXISTS-PMVC-Sym with a bounded number of colors (EXISTS-PMVC-Sym-Bounded) is polynomially equivalent with Exact Perfect Matching (XPM), which implies that EXISTS-PMVC-Sym-Bounded is in RNC on general graphs and PTIME on planar graphs. Directly applying algorithms for XPM to solve EXISTS-PMVC-Sym-Bounded is, however, impractical. We propose algorithms that natively handle EXISTS-PMVC-Sym-Bounded with considerably better complexity. Our novel results for EXISTS-PMVC provide insights into both constrained matching and scalable quantum experiment design.	翻訳日:2023-05-02 19:40:52 公開日:2023-04-29
# dytanvo: 動的環境における視覚オドメトリと運動セグメンテーションの合同改良 DytanVO: Joint Refinement of Visual Odometry and Motion Segmentation in Dynamic Environments ( http://arxiv.org/abs/2209.08430v4 ) ライセンス: Link先を確認	Shihao Shen and Yilin Cai and Wenshan Wang and Sebastian Scherer	(参考訳) 学習ベースビジュアル・オドメトリー(VO)アルゴリズムは、高容量モデルと大量の注釈付きデータの恩恵を受けながら、動的で人口密度の高い環境では失敗する傾向がある。セマンティクスセグメンテーションは、カメラの動きを推定する前にダイナミックな関連を破棄するために主に使用されるが、静的な特徴を破棄するコストがかかるため、未認識のカテゴリにスケールアップするのは難しい。本稿では,カメラエゴモーションとモーションセグメンテーションの相互依存性を活用し,単一学習ベースで協調的に両者を洗練できることを示す。特に,動的環境を扱う最初の教師付き学習ベースVO法であるDytanVOを提案する。 2つの連続した単眼フレームをリアルタイムで取得し、反復的にカメラのエゴモーションを予測する。本手法は,現実の動的環境における最先端VOソリューションよりも平均27.7%向上し,バックエンド上での軌跡を最適化する動的視覚SLAMシステムと競合する性能を実現している。また,本手法の一般化可能性を示す実験も行った。 Learning-based visual odometry (VO) algorithms achieve remarkable performance on common static scenes, benefiting from high-capacity models and massive annotated data, but tend to fail in dynamic, populated environments. Semantic segmentation is largely used to discard dynamic associations before estimating camera motions but at the cost of discarding static features and is hard to scale up to unseen categories. In this paper, we leverage the mutual dependence between camera ego-motion and motion segmentation and show that both can be jointly refined in a single learning-based framework. In particular, we present DytanVO, the first supervised learning-based VO method that deals with dynamic environments. It takes two consecutive monocular frames in real-time and predicts camera ego-motion in an iterative fashion. Our method achieves an average improvement of 27.7% in ATE over state-of-the-art VO solutions in real-world dynamic environments, and even performs competitively among dynamic visual SLAM systems which optimize the trajectory on the backend. Experiments on plentiful unseen environments also demonstrate our method's generalizability.	翻訳日:2023-05-02 19:40:12 公開日:2023-04-29
# ノイズレジームにおける高次元簡易学習のためのサンプル複雑境界 Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes ( http://arxiv.org/abs/2209.05953v2 ) ライセンス: Link先を確認	Amir Hossein Saberi, Amir Najafi, Seyed Abolfazl Motahari and Babak H. Khalaj	(参考訳) 本稿では,ノイズのあるサンプルからsimplexを学習するためのサンプル複雑性を求める。大きさ$n$のデータセットは、未知の単純体上の一様分布から引き出されたサンプルを$\mathbb{R}^K$と仮定し、サンプルは任意の大きさの多変量加法的ガウス雑音によって破損すると仮定する。我々は、高い確率で、真単純数から少なくとも$\varepsilon$の$\ell_2$距離を持つ単純数(任意の$\varepsilon>0$)を出力するアルゴリズムの存在を証明した。また、このバウンドを達成するために、理論上、$n\ge\left(k^2/\varepsilon^2\right)e^{\omega\left(k/\mathrm{snr}^2\right)}$サンプルを持つことが示されている。この結果は重要な開問題を解き、$\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$ さえあれば、ノイズのない場合と同じ順序でノイズレジームのサンプル複雑性が得られる。我々の証明は、 \citep{ashtiani2018nearly} におけるいわゆるサンプル圧縮技術、高次元幾何学の数学的ツール、フーリエ解析の組み合わせである。特に,加法的ガウス雑音からより一般的な分布族を復元するための一般フーリエに基づく手法を提案し,他の様々な問題にさらに適用することができる。 In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.	翻訳日:2023-05-02 19:39:04 公開日:2023-04-29
# 運用経済改善に向けた二段階的MIPベース予測最適化フレームワーク Towards Improving Operation Economics: A Bilevel MIP-Based Closed-Loop Predict-and-Optimize Framework for Prescribing Unit Commitment ( http://arxiv.org/abs/2208.13065v2 ) ライセンス: Link先を確認	Xianbang Chen, Yikui Liu, Lei Wu	(参考訳) システムオペレータは、一般に、再生可能エネルギー源(RES)の可用性とシステム予備要件を最初に予測し、予測により、ユニットコミットメント(UC)などの最適化モデルを解き、それに応じて経済的な運用計画を決定する。しかし、そのようなオープンループプロセスは、その予測器が究極の演算コストではなく即時統計的予測誤差を改善するために、本質的に運用経済学を損なう可能性がある。そこで,本稿では,演算経済学を改善するための規範的な uc を提供するクローズドループ予測最適化フレームワークを提案する。まず, 2レベル混合整数プログラミングモデルを用いて, 最適システム動作に適したコスト指向予測器を訓練する: 上位レベルは, 誘導運転コストに基づいて RES および予備予測器を訓練する; 下位レベルは, 与えられた予測によりシステム動作プロセスを模倣し, 誘導運転コストを上位レベルに戻す。さらに、トレーニングされた予測器の組込み可能性により、規範的なUCモデルが与えられ、RES保存予測とUC決定を同時に行なえる。最後に、実世界のデータを用いた数値ケーススタディでは、決定論的、堅牢で確率的なUCモデルよりも、規範的UCの経済的および実践的な利点が示される。 Generally, system operators conduct the economic operation of power systems in an open-loop predict-then-optimize process: the renewable energy source (RES) availability and system reserve requirements are first predicted; given the predictions, system operators solve optimization models such as unit commitment (UC) to determine the economical operation plans accordingly. However, such an open-loop process could essentially compromise the operation economics because its predictors myopically seek to improve the immediate statistical prediction errors instead of the ultimate operation cost. To this end, this paper presents a closed-loop predict-and-optimize framework, offering a prescriptive UC to improve the operation economics. First, a bilevel mixed-integer programming model is leveraged to train cost-oriented predictors tailored for optimal system operations: the upper level trains the RES and reserve predictors based on their induced operation cost; the lower level, with given predictions, mimics the system operation process and feeds the induced operation cost back to the upper level. Furthermore, the embeddability of the trained predictors grants a prescriptive UC model, which simultaneously provides RES-reserve predictions and UC decisions with enhanced operation economics. Finally, numerical case studies using real-world data illustrate the potential economic and practical advantages of prescriptive UC over deterministic, robust, and stochastic UC models.	翻訳日:2023-05-02 19:37:34 公開日:2023-04-29
# 統一バングラ多クラス感情コーパスのトランスフォーマーによるテキスト分類 Transformer-based Text Classification on Unified Bangla Multi-class Emotion Corpus ( http://arxiv.org/abs/2210.06405v2 ) ライセンス: Link先を確認	Md Sakib Ullah Sourav, Huidong Wang	(参考訳) 様々なWeb 2.0サービスにおける人々の思考を研究することの重要性から、感情分類(EC)は重要な業務である。一方、既存の研究は主に英語に重点を置いており、低リソース言語にはほとんど取り組んでいない。感情分析、特に英語のecは近年多くの注目を集めているが、世界で最も広く話されている言語の1つであるバングラの文脈ではほとんど研究されていない。本研究では,バングラ語テキストから感情を識別し抽出する手法の完全セットを提案する。バングラ語からの6つのクラス(怒り,嫌悪感,恐怖,喜び,悲しみ,驚き)に対して,近年,特に高資源言語において顕著な結果を示すトランスフォーマーベースモデルを用いて感情分類を行う。本モデルの性能評価には,Unified Bangla Multi-class Emotion Corpus (UBMEC) が用いられている。 UBMECは、6-emotionクラスでBanglaコメントをラベル付けした2つのデータセットと、私たちが開発した新しい手動タグ付きBanglaコメントを組み合わせたものだ。この作業で使用したコーパスデータセットとコードは、公開されています。 Because of its importance in studying people's thoughts on various Web 2.0 services, emotion classification (EC) is an important undertaking. Existing research, on the other hand, is mostly focused on the English language, with little work on low-resource languages. Though sentiment analysis, particularly the EC in English, has received a lot of attention in recent years, little study has been done in the context of Bangla, one of the world's most widely spoken languages. We propose a complete set of approaches for identifying and extracting emotions from Bangla texts in this research. We provide a Bangla emotion classifier for six classes (anger, disgust, fear, joy, sadness, and surprise) from Bangla words, using transformer-based models which exhibit phenomenal results in recent days, especially for high resource languages. The "Unified Bangla Multi-class Emotion Corpus (UBMEC)" is used to assess the performance of our models. UBMEC was created by combining two previously released manually labeled datasets of Bangla comments on 6-emotion classes with fresh manually tagged Bangla comments created by us. The corpus dataset and code we used in this work is publicly available.	翻訳日:2023-05-02 19:30:12 公開日:2023-04-29
# 対称性平均化を伴う臨界イジングモデルの変分量子シミュレーション Variational quantum simulation of critical Ising model with symmetry averaging ( http://arxiv.org/abs/2210.15053v2 ) ライセンス: Link先を確認	Troy J. Sewell, Ning Bao, Stephen P. Jordan	(参考訳) 本稿では, ギャップレスシステムの基底状態に対する可変アンサッツとして, DMERA(Deep Multi-scale entanglement Renormalization)回路を用いることを検討した。正解可能な一次元臨界横場イジングモデルをテストベッドとして用いる。この場合、ansatzの数値的正確なシミュレーションは、効率的な古典アルゴリズムを利用してマッチゲート回路をシミュレートすることにより、数百キュービットに実行することができる。このシステムでは、DMERAは標準的なQAOAスタイルのアンサッツを強く上回り、DMERAを用いて近似した相関関数の体系的誤差の主な原因は、逆場イジングモデルの変換対称性とクラマース・ワニエ対称性の破れである。この誤差を対称性平均化によって最大4桁削減できるが、量子ビットや回路の深さに余計なコストがかかることはない。本手法は,他の対称性を持つ物理系のnisqシミュレーションに適用できることを示す。 Here, we investigate the use of deep multi-scale entanglement renormalization (DMERA) circuits as a variational ansatz for ground states of gapless systems. We use the exactly-solvable one-dimensional critical transverse-field Ising model as a testbed. Numerically exact simulation of the ansatz can in this case be carried out to hundreds of qubits by exploiting efficient classical algorithms for simulating matchgate circuits. We find that, for this system, DMERA strongly outperforms a standard QAOA-style ansatz, and that a major source of systematic error in correlation functions approximated using DMERA is the breaking of the translational and Kramers-Wannier symmetries of the transverse-field Ising model. We are able to reduce this error by up to four orders of magnitude by symmetry averaging, without incurring additional cost in qubits or circuit depth. We propose that this technique for mitigating systematic error could be applied to NISQ simulations of physical systems with other symmetries.	翻訳日:2023-05-02 19:21:02 公開日:2023-04-29
# 言語を使って見えないドメインに拡張する Using Language to Extend to Unseen Domains ( http://arxiv.org/abs/2210.09520v6 ) ライセンス: Link先を確認	Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, Aditi Raghunathan, Anja Rohrbach	(参考訳) ビジョンモデルがデプロイ時に遭遇する可能性のあるすべてのドメインのトレーニングデータを集めることは、費用がかかる。代わりに、訓練領域(例えば「鳥の写真」)と拡張したいがデータを持たない領域(例えば「鳥の絵」)がいかに堅牢性を向上させるかを考える。共同画像と言語埋め込み空間を備えたマルチモーダルモデルを用いて、LADSは、タスク関連情報を保存しながら、トレーニング領域から各未確認テスト領域への画像埋め込みの変換を学習する。未確認テストドメインからのイメージを一切使用せずに、トレーニングドメインと未確認テストドメインの両方を含む拡張ドメイン上で、LADSは、ドメイン適応とデータセットバイアスをターゲットとする4つのベンチマークのスイートに対して、標準的な微調整とアンサンブルアプローチより優れていることを示す。 It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias.	翻訳日:2023-05-02 19:19:33 公開日:2023-04-29
# コンビネータ型機械学習のためのゲーム理論的混合エキスパート Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning ( http://arxiv.org/abs/2211.14669v2 ) ライセンス: Link先を確認	Ethan Rathbun, Kaleel Mahmood, Sohaib Ahmad, Caiwen Ding, Marten van Dijk	(参考訳) 敵の機械学習の最近の進歩は、堅牢であると考えられる防御は、その弱点を狙うように特別にカスタマイズされた敵の攻撃の影響を受けやすいことを示している。これらの防衛には、BaRT(Barrage of Random Transforms)、FAT(Friendly Adversarial Training)、Trash is Treasure(TiT)、ViT(Vision Transformers)、Big Transfer(Big Transfer)モデル、SNN(Spike Neural Networks)で構成されるアンサンブルモデルが含まれる。まず,一方の防衛をカスタマイズした攻撃によって生じる敵の事例を,他方の防衛で誤分類されることが少なくないことを示す。この発見は2つの重要な疑問をもたらす。まず、ゲーム理論の枠組みにおいて、防御間の低転送性をどのように活用してロバスト性を向上させるのか。第2に、このフレームワーク内の敵は、どのようにして効果的なマルチモデル攻撃を開発できるのか? 本稿では,敵の攻撃と防御をアンサンブルするためのゲーム理論フレームワークを提案する。我々のフレームワークはGame Theoretic Mixed Experts (GaME)と呼ばれる。これは、検知器ベースと標準ディフェンダーの両方の混合ナッシュ戦略を見つけるために設計されており、構成的敵攻撃を用いる攻撃者に直面している。さらに,ランダム化変換による防御,マルチモデル投票方式,敵検出アーキテクチャを対象とした3つの攻撃アルゴリズムを提案する。これらの攻撃は、GaMEフレームワークによって生成された防御を強化し、予期せぬ攻撃に対する堅牢性を検証するのに役立つ。全体として、我々のフレームワークと分析は、構成的攻撃と防御の定式化に新たな洞察を与えることで、敵対的機械学習の分野を前進させます。 Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). We first conduct a transferability analysis, to demonstrate the adversarial examples generated by customized attacks on one defense, are not often misclassified by another defense. This finding leads to two important questions. First, how can the low transferability between defenses be utilized in a game theoretic framework to improve the robustness? Second, how can an adversary within this framework develop effective multi-model attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses. Our framework is called Game theoretic Mixed Experts (GaME). It is designed to find the Mixed-Nash strategy for both a detector based and standard defender, when facing an attacker employing compositional adversarial attacks. We further propose three new attack algorithms, specifically designed to target defenses with randomized transformations, multi-model voting schemes, and adversarial detector architectures. These attacks serve to both strengthen defenses generated by the GaME framework and verify their robustness against unforeseen attacks. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.	翻訳日:2023-05-02 19:13:05 公開日:2023-04-29
# 局所情報の流れをもつ量子理論 Quantum theories with local information flow ( http://arxiv.org/abs/2211.13325v2 ) ライセンス: Link先を確認	Eduarda Fonseca da Nova Cruz, David M\"ockli	(参考訳) ベル非局所性(bell non-locality)は、量子力学の特定の修正に適用される用語である。しかし、ベルの定理は、修正されていない量子力学自体が非局所的であり、局所実在論的な解釈は維持できないと宣伝するために常用的に用いられる。ベルの元々の不等式に基づいて、局所量子力学、超決定論、非局所崩壊量子力学、非局所隠れ変数理論の4つの可能なカテゴリを同定する。多くの局所的・決定論的記述は見過ごされている。これら3つのカテゴリについて、量子情報の局所的な流れが可能である解釈の例を示す。我々は,現在の実験的提案と改良された科学哲学が,解釈を対比し,両者を区別できるかどうかを評価する。 Bell non-locality is a term that applies to specific modifications of quantum mechanics. Yet, Bell's theorem is habitually used to advertise that unmodified quantum mechanics itself is non-local and that local realist interpretations are untenable. Based on Bell's original inequality, we identify four viable categories of quantum theories: local quantum mechanics, superdeterminism, non-local collapse quantum mechanics, and non-local hidden variable theories. Many local and deterministic descriptions appear to be overlooked. For three of those categories, we present an example of an interpretation where a local flow of quantum information is possible. We assess whether current experimental proposals and an improved philosophy of science can contrast interpretations and distinguish between them.	翻訳日:2023-05-02 19:12:37 公開日:2023-04-29
# グラフニューラルネットワークと構造化状態空間モデルを用いた多変量生体信号のモデリング Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models ( http://arxiv.org/abs/2211.11176v3 ) ライセンス: Link先を確認	Siyi Tang, Jared A. Dunnmon, Liangqiong Qu, Khaled K. Saab, Tina Baykaner, Christopher Lee-Messer, Daniel L. Rubin	(参考訳) 多変量バイオシグナールは、脳波、ポリソムノグラフィ、心電図など多くの医療領域で広く使われている。多変量生体信号の時空間依存性のモデル化は,(1)長距離時間依存性と(2)電極間の複雑な空間相関により困難である。これらの課題に対処するために,多変量バイオシグナーを時間依存グラフとして表現し,バイオシグナーの時空間依存性をモデル化して生体シグナー分類タスクの性能を向上させる汎用グラフニューラルネットワーク(GNN)アーキテクチャであるGraphS4merを提案する。具体的には,(1)生体信号の長期的時間依存性を捉えるために,最先端のディープシーケンスモデルである構造化状態空間アーキテクチャを利用し,(2)グラフ構造学習層をgraphs4merで提案し,データ内の動的に進化するグラフ構造を学習する。 We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score. Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as time-dependent graphs and introduce GraphS4mer, a general graph neural network (GNN) architecture that improves performance on biosignal classification tasks by modeling spatiotemporal dependencies in biosignals. Specifically, (1) we leverage the Structured State Space architecture, a state-of-the-art deep sequence model, to capture long-range temporal dependencies in biosignals and (2) we propose a graph structure learning layer in GraphS4mer to learn dynamically evolving graph structures in the data. We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score.	翻訳日:2023-05-02 19:12:26 公開日:2023-04-29
# 文法的誤り訂正 : 美術の現状調査 Grammatical Error Correction: A Survey of the State of the Art ( http://arxiv.org/abs/2211.05166v4 ) ライセンス: Link先を確認	Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe	(参考訳) 文法的誤り訂正(英: grammatical error correction、gec)は、テキスト中の誤りを自動的に検出し修正する作業である。このタスクには、前置詞の欠如や主語-動詞の一致の誤りなどの文法的誤りの修正だけでなく、スペルミスや単語選択エラーなどの正書法と意味的誤りも含んでいる。この分野は過去10年間に顕著な進歩を遂げており、一部にはルールベースの手法、統計分類器、統計機械翻訳、そして芸術の現在の支配的な状態を表すニューラルネットワーク翻訳システムの開発を推進した5つの共有タスクが動機となっている。本稿では,この分野を一つの記事にまとめ,まず,課題の言語的課題について概説し,研究者が利用可能な最も一般的なデータセット(英語と他言語)を紹介し,特に人工的エラー生成に焦点を当てた様々な手法とテクニックを要約する。次に,評価に対する様々なアプローチについて述べるとともに,特に主観的人間の判断に関して,メートル法信頼性に関する懸念について述べるとともに,最近の進歩と今後の課題への提言の概要をまとめる。この調査が、この分野に新しい研究者や、最近の進歩を評価され続けたい研究者にとって、包括的なリソースになることを期待しています。 Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.	翻訳日:2023-05-02 19:12:02 公開日:2023-04-29
# 相関型不確かさによるドメインの一般化 Domain Generalization with Correlated Style Uncertainty ( http://arxiv.org/abs/2212.09950v2 ) ライセンス: Link先を確認	Zheyuan Zhang, Bin Wang, Debesh Jha, Ugur Demir, Ulas Bagci	(参考訳) ドメイン一般化(dg)アプローチは、より堅牢なディープラーニングモデルにつながるドメイン不変機能を抽出することを目的としている。この点において、スタイル拡張は、合成新規ドメインに対する情報的スタイル特性を含むインスタンス固有の特徴統計を利用する強力なDG手法である。しかしながら、スタイル拡張に関する先行研究は、異なる特徴チャネル間の相互依存を無視したり、スタイル拡張を線形補間のみに制限している。本研究では,スタイル統計空間における線形補間の限界を乗り越え,バイタル相関情報を同時に保持する,相関型不確実性(csu)という最先端拡張手法を提案する。本手法の有効性は,pacs,office-home,camlyon17データセット,duke-market1501インスタンス検索タスクなど,多種多様なクロスドメインコンピュータビジョンおよび医用画像分類タスクの広範な実験によって確立される。その結果,既存の最先端技術に比べて著しく改善率が向上した。ソースコードは一般公開されている。 Domain generalization (DG) approaches intend to extract domain invariant features that can lead to a more robust deep learning model. In this regard, style augmentation is a strong DG method taking advantage of instance-specific feature statistics containing informative style characteristics to synthetic novel domains. However, prior works on style augmentation have disregarded the interdependence amongst distinct feature channels or have solely constrained style augmentation to linear interpolation. In this work, we introduce a cutting-edge augmentation approach named Correlated Style Uncertainty (CSU), which surpasses the limitations of linear interpolation in style statistic space and simultaneously preserves vital correlation information. Our method's efficacy is established through extensive experimentation on diverse cross-domain computer vision and medical imaging classification tasks, namely PACS, Office-Home, and Camelyon17 datasets, as well as the Duke-Market1501 instance retrieval task. The results showcase a remarkable improvement margin over existing state-of-the-art techniques. The source code is available for public use.	翻訳日:2023-05-02 19:02:32 公開日:2023-04-29
# 3量子状態における絡み合い、コヒーレンス、ステアリング、ベル非局所不等式違反の相補的関係 Complementary relations of entanglement, coherence, steering and Bell nonlocality inequality violation in three-qubit states ( http://arxiv.org/abs/2212.09326v2 ) ライセンス: Link先を確認	Dong-Dong Dong, Xue-Ke Song, Xiao-Gang Fan, Liu Ye, and Dong Wang	(参考訳) 我々は,任意の3ビット状態に対する絡み合い,コヒーレンス,ステアリング不等式違反,ベル非局所性の相補関係を提唱した。一つのパラメータを持つ真に絡み合った3量子状態の2つの族が存在し、それぞれ一定量の負性に対して最大コヒーレンスおよびステアリング不等式違反を示す。ネガティリティは常に3量子ビット混合状態のネガティリティよりも小さいかまたは等しいが、ネガティリティは3量子ビット純状態の2成分共役の幾何学的平均とちょうど等しいことが判明した。さらに, 3部交絡状態に対する負性度と一階コヒーレンスとの相補関係を確立する。さらに, 負性度と最大操舵不等式違反の関係について検討した。さらに、任意の3ビット状態に対する負性度とベル不等式最大違反の相補関係を得る。この結果は、絡み合い、コヒーレンス、操舵不等式違反、ベル非局所性との間の基本的な関係の信頼できる証拠を提供する。 We put forward complementary relations of entanglement, coherence, steering inequality violation, and Bell nonlocality for arbitrary three-qubit states. We show that two families of genuinely entangled three-qubit pure states with single parameter exist, and they exhibit maximum coherence and steering inequality violation for a fixed amount of negativity, respectively. It is found that the negativity is exactly equal to the geometric mean of bipartite concurrences for the three-qubit pure states, although the negativity is always less than or equal to the latter for three-qubit mixed states. Moreover, the complementary relation between negativity and first-order coherence for tripartite entanglement states are established. Furthermore, we investigate the close relation between the negativity and the maximum steering inequality violation. In addition, the complementary relation between negativity and the maximum Bell-inequality violation for arbitrary three-qubit states is obtained. The results provide reliable evidence of fundamental connections among entanglement, coherence, steering inequality violation, and Bell nonlocality.	翻訳日:2023-05-02 19:01:56 公開日:2023-04-29
# 氷河氷モデルのベイズ推定への応用による多段階スタイン変分勾配降下のさらなる解析 Further analysis of multilevel Stein variational gradient descent with an application to the Bayesian inference of glacier ice models ( http://arxiv.org/abs/2212.03366v2 ) ライセンス: Link先を確認	Terrence Alsup and Tucker Hartland and Benjamin Peherstorfer and Noemi Petra	(参考訳) 多レベルスタイン変分勾配勾配は、様々なコストと忠実さで代理対象分布の階層性を活用し、推論を計算的に高速化する粒子ベースの変分勾配勾配法である。この作品の貢献は2つある。まず, 単レベルスタイン変分勾配勾配の指数収束速度が反復変動パラメータに依存する場合においても, 従来のコスト複雑性解析の拡張を示す。第2に,アロラ氷河の離散基底すべり係数場を推定する大規模ベイズ逆問題に対して,多値スタイン変分勾配勾配を適用した。数値実験により、マルチレベルバージョンはシングルレベルバージョンに比べて桁違いのスピードアップを達成することが示された。 Multilevel Stein variational gradient descent is a method for particle-based variational inference that leverages hierarchies of surrogate target distributions with varying costs and fidelity to computationally speed up inference. The contribution of this work is twofold. First, an extension of a previous cost complexity analysis is presented that applies even when the exponential convergence rate of single-level Stein variational gradient descent depends on iteration-varying parameters. Second, multilevel Stein variational gradient descent is applied to a large-scale Bayesian inverse problem of inferring discretized basal sliding coefficient fields of the Arolla glacier ice. The numerical experiments demonstrate that the multilevel version achieves orders of magnitude speedups compared to its single-level version.	翻訳日:2023-05-02 19:00:50 公開日:2023-04-29
# 多量子ビット状態の特殊コアテンソルと3線の並行性 Special core tensors of multi-qubit states and the concurrency of three lines ( http://arxiv.org/abs/2301.05953v2 ) ライセンス: Link先を確認	Pak Shen Choong, Hishamuddin Zainuddin, Kar Tim Chan, Sharifah Kartini Said Husain	(参考訳) マルチパーティイト状態の分類は、局所ユニタリ(LU)または確率的局所演算および古典的通信(SLOCC)の作用の下で、運用上有用で有限な絡み合いクラスを得ることを目的としている。本研究では,高階特異値分解 (hosvd) と 3 行の並行性を用いて,これらのクラスを計算的に簡易に求める手法を提案する。 HOSVDは同時に多粒子状態の1体還元密度行列(RDM)を対角化するため、多粒子状態のコアテンソルはそのような対角化1体RDMの純粋状態表現である。 3 と 4 のキュービットの特別なコアテンソルを同定し、これはデフォルトでも真に絡み合っている。特別な核テンソルは、最初の$n$モード特異値である$\sigma_1^{(i)2}$に基づいて、状態の族に分類される。現在の提案はマルチキュービットシステムに限定されているが、大規模なマルチキュービットシステムとよく合致し、有限個の状態族を生成する。 Classification of multipartite states aims to obtain a set of operationally useful and finite entanglement classes under the action of either local unitary (LU) or stochastic local operation and classical communication (SLOCC). In this work, we propose a computationally simple approach to find these classes by using higher order singular value decomposition (HOSVD) and the concurrency of three lines. Since HOSVD simultaneously diagonalizes the one-body reduced density matrices (RDM) of multipartite states, the core tensor of multipartite states is the pure-state representation of such simultaneously diagonalized one-body RDM. We identified the special core tensors of three and four qubits, which are also genuinely entangled by default. The special core tensors are further categorized into families of states based on their first $n$-mode singular values, $\sigma_1^{(i)2}$. The current proposal is limited to multi-qubit system, but it scales well with large multi-qubit systems and produces a finite number of families of states.	翻訳日:2023-05-02 18:54:12 公開日:2023-04-29
# Hungry Hungry Hippos: 状態空間モデルによる言語モデリングを目指して Hungry Hungry Hippos: Towards Language Modeling with State Space Models ( http://arxiv.org/abs/2212.14052v3 ) ライセンス: Link先を確認	Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher R\'e	(参考訳) 状態空間モデル (SSM) は、いくつかのモダリティにおいて最先端のシーケンスモデリング性能を示しているが、言語モデリングではあまり注目されていない。さらに、二乗ではなく列長でほぼ線形にスケーリングしても、ハードウェア使用率の低さから、ssmはトランスフォーマーよりも遅い。本稿では,言語モデリングにおけるssmと注意の間の表現性ギャップの理解と,ssmと注意の間のハードウェア障壁の低減について述べる。まず,SSMと注意のギャップを理解するために,合成言語モデリングタスクを用いる。既存のssmには2つの機能があります。シーケンス内の以前のトークンのリコールと、シーケンス全体のトークンの比較です。言語モデリングへの影響を理解するため,これらの機能に特化して設計された新しいSSM層H3を提案する。 H3は合成言語に注意を向け、OpenWebText上のTransformersの0.4 PPL以内である。さらに、2つの注意層を保持する125mパラメータh3アテンションハイブリッドモデルは、openwebtextのトランスフォーマーを1.0 pplで驚くほど上回っている。次に,最新のハードウェア上でのssmトレーニングの効率を向上させるため,flashconvを提案する。 FlashConvは8Kまでのシーケンスの効率を改善するために融合ブロックFFTアルゴリズムを使用し、SSMの繰り返し特性を利用して長いシーケンスにスケールする新しいステートパスアルゴリズムを導入した。 FlashConvは、長距離アリーナベンチマークで2$\times$スピードアップし、トランスフォーマーよりも2.4$\times$のテキストを生成することができる。 flashconvを使用すると、最大2.7bのパラメータを持つハイブリッドh3-attention言語モデルにスケールし、最初の結果が期待できる。 State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 2.4$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 2.7B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.	翻訳日:2023-05-02 18:52:11 公開日:2023-04-29
# オープン語彙オブジェクト検出のための検出とセグメントの学習 Learning to Detect and Segment for Open Vocabulary Object Detection ( http://arxiv.org/abs/2212.12130v5 ) ライセンス: Link先を確認	Tao Wang and Nan Li	(参考訳) オープンボキャブラリのオブジェクト検出は,最近開発された視覚言語事前学習モデルによって,意味カテゴリーのみを持つ新規なオブジェクトの認識を支援することで,大きく進歩している。先行研究は、主にオブジェクト提案分類への知識伝達に焦点をあて、クラスに依存しないボックスとマスク予測を採用する。本研究では,オープン語彙設定のためのボックス回帰とマスクセグメンテーションをより一般化する,原理的動的ネットワーク設計であるCondHeadを提案する。中心となる考え方は、セマンティック埋め込みに基づいてネットワークヘッドを条件付きパラメータ化することで、新しいカテゴリをよりよく検出するために、クラス固有の知識でモデルが導かれることである。特に、condheadは、動的に集約されたヘッドと動的に生成されたヘッドの2つのネットワークヘッドからなる。前者は条件付き集約された静的なヘッドでインスタンス化され、これらのヘッドはエキスパートとして最適化され、洗練された予測を学ぶことが期待されている。後者は動的に生成されたパラメータでインスタンス化し、一般的なクラス固有の情報をエンコードする。このような条件付き設計により、検出モデルは意味埋め込みによって橋渡しされ、強い一般化可能なクラスワイズボックスとマスク予測を提供する。提案手法は,最先端のオープンボキャブラリオブジェクト検出手法に非常に小さなオーバーヘッドで大幅な改善をもたらす。例えば,新しいカテゴリのAPを3.0で検出し,計算量はわずか1.1%に留まる。 Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1% more computation.	翻訳日:2023-05-02 18:51:19 公開日:2023-04-29
# 深層学習型アクセシブルパーキング管理システムShine SHINE: Deep Learning-Based Accessible Parking Management System ( http://arxiv.org/abs/2302.00837v2 ) ライセンス: Link先を確認	Dhiraj Neupane, Aashish Bhattarai, Sunil Aryal, Mohamed Reda Bouadjenek, Uk-Min Seok, and Jongwon Seok	(参考訳) 科学技術の進歩により、現在進行中の都市部の拡大は、韓国を含む世界中の民間所有車両の数が大幅に増加した。しかし、この段階的な車両数の増加は必然的に、障害者専用駐車スペース(以下「アクセス可能な駐車スペース」と呼ぶ)の乱用など、駐車関連の問題を引き起こしている。従来のlprシステムは、監視カメラのフレームレートが高いこと、自然と人工のノイズの存在、これらのシステムによる検出と認識を妨げる照明や気象条件の変化などにより、このような問題をリアルタイムに対処できないことが証明されている。パーキング4.0の概念の高まりにより、多くのセンサー、IoTおよびディープラーニングベースのアプローチが自動LPRとパーキング管理システムに適用された。それにもかかわらず、この研究は韓国でアクセス可能な駐車スペースを管理するための堅牢で効率的なモデルの必要性を示している。これに対処するため,我々は,深層学習に基づく物体検出アルゴリズムを用いて車両,ナンバープレート,障害バッジ(以下,カード,バッジ,アクセスバッジとして参照)を検出し,中央サーバと協調してアクセス可能な駐車スペースの使用権を検証する,shineという新しいシステムを提案する。本モデルは,平均92.16%の精度を実現し,アクセス可能な駐車スペース乱用の問題に対処し,都市環境における効率的な駐車管理に大いに寄与する。 The ongoing expansion of urban areas facilitated by advancements in science and technology has resulted in a considerable increase in the number of privately owned vehicles worldwide, including in South Korea. However, this gradual increment in the number of vehicles has inevitably led to parking-related issues, including the abuse of disabled parking spaces (hereafter referred to as accessible parking spaces) designated for individuals with disabilities. Traditional license plate recognition (LPR) systems have proven inefficient in addressing such a problem in real-time due to the high frame rate of surveillance cameras, the presence of natural and artificial noise, and variations in lighting and weather conditions that impede detection and recognition by these systems. With the growing concept of parking 4.0, many sensors, IoT and deep learning-based approaches have been applied to automatic LPR and parking management systems. Nonetheless, the studies show a need for a robust and efficient model for managing accessible parking spaces in South Korea. To address this, we have proposed a novel system called, SHINE, which uses the deep learning-based object detection algorithm for detecting the vehicle, license plate, and disability badges (referred to as cards, badges, or access badges hereafter) and verifies the rights of the driver to use accessible parking spaces by coordinating with the central server. Our model, which achieves a mean average precision of 92.16%, is expected to address the issue of accessible parking space abuse and contributes significantly towards efficient and effective parking management in urban environments.	翻訳日:2023-05-02 18:44:21 公開日:2023-04-29
# リプシッツ境界深部ネットワークの直接パラメータ化 Direct Parameterization of Lipschitz-Bounded Deep Networks ( http://arxiv.org/abs/2301.11526v2 ) ライセンス: Link先を確認	Ruigang Wang, Ian R. Manchester	(参考訳) 本稿では、リプシッツ境界が保証される深層ニューラルネットワーク(完全接続と畳み込みの両方)の新しいパラメータ化、すなわち摂動に対する感度の制限を導入する。リプシッツ保証は半定値プログラム(SDP)による認証に基づく最も厳密な既知の境界と等価であり、大きなモデルにスケールしない。 sdp のアプローチとは対照的に ``direct'' パラメータ化、すなわち$\mathbb r^n$ からリプシッツ境界ネットワークの重み集合への滑らかな写像を提供する。これにより、計算集約的なプロジェクションや障壁項を使わずに、標準的な勾配法によるトレーニングが可能になる。新しいパラメータ化は、新しい層タイプ( \textit{sandwich layer} )や、近隣層間のパラメータ共有を伴う標準フィードフォワードネットワークの新しいパラメータ化のいずれかと考えることができる。最後に、画像分類に関する総合的な実験により、サンドイッチ層は経験的および証明された堅牢な精度において、以前のアプローチよりも優れていることが示された。 This paper introduces a new parameterization of deep neural networks (both fully-connected and convolutional) with guaranteed Lipschitz bounds, i.e. limited sensitivity to perturbations. The Lipschitz guarantees are equivalent to the tightest-known bounds based on certification via a semidefinite program (SDP), which does not scale to large models. In contrast to the SDP approach, we provide a ``direct'' parameterization, i.e. a smooth mapping from $\mathbb R^N$ onto the set of weights of Lipschitz-bounded networks. This enables training via standard gradient methods, without any computationally intensive projections or barrier terms. The new parameterization can equivalently be thought of as either a new layer type (the \textit{sandwich layer}), or a novel parameterization of standard feedforward networks with parameter sharing between neighbouring layers. Finally, the comprehensive set of experiments on image classification shows that sandwich layers outperform previous approaches on both empirical and certified robust accuracy.	翻訳日:2023-05-02 18:43:42 公開日:2023-04-29
# ユーザビリティギャップの橋渡し--隠れマルコフモデルのスペクトル学習のための理論的および方法論的進歩 Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models ( http://arxiv.org/abs/2302.07437v2 ) ライセンス: Link先を確認	Xiaoyuan Ma, Jordan Rodu	(参考訳) Baum-Welch (B-W) アルゴリズムは隠れマルコフモデル (HMM) を推論する最も広く受け入れられている手法である。しかし、ローカルの最適化では立ち往生する傾向があり、多くのリアルタイムアプリケーションでは遅すぎる可能性がある。モーメント法(MOM)に基づくHMM(SHMM)のスペクトル学習は,これらの障害を克服するために文献で提案されている。 SHMMに対する漸近的理論は期待されているが, SHMMの長期性能は未確認誤差の伝播により劣化する可能性がある。本稿では, SHMMが推定した推定値の近似誤差の漸近分布について, 2) 誤り伝播の問題を緩和するプロジェクテッドSHMM (PSHMM) と呼ばれる新しいアルゴリズムを提案し, (3) 潜在的な非定常性に対応するSHMMとPSHMMの両方のオンライン学習用変種を開発する。 SHMMの性能をPSHMMと比較し、実世界のアプリケーションからのデータとシミュレーションデータの両方でB-Wアルゴリズムを用いて推定し、PSHMMがSHMMの計算上の優位性を保持するだけでなく、より堅牢な推定と予測を提供することを示した。 The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMM), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM has been elusive, and the long-run performance of SHMM can degrade due to unchecked propagation of error. In this paper, we (1) provide an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, (2) propose a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and (3) develop online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity. We compare the performance of SHMM with PSHMM and estimation through the B-W algorithm on both simulated data and data from real world applications, and find that PSHMM not only retains the computational advantages of SHMM, but also provides more robust estimation and forecasting.	翻訳日:2023-05-02 18:34:34 公開日:2023-04-29
# 混合量子古典力学におけるラグランジュ軌道と閉包モデル Lagrangian trajectories and closure models in mixed quantum-classical dynamics ( http://arxiv.org/abs/2303.01975v2 ) ライセンス: Link先を確認	Cesare Tronci, Fran\c{c}ois Gay-Balmaz	(参考訳) 完全量子アプローチの計算課題を克服するために、混合量子古典モデルがいくつかの文脈で提案されている。しかし、平均場近似を超えた現在のモデルは、通常長期にわたる一貫性の問題に悩まされ、場合によってはハイゼンベルクの不確実性原理を無効にする。ここでは量子古典力学の完全ハミルトン理論を提示し、量子密度と古典密度の正則性を超えた一連の一貫性特性を最初に保証したように見える。ラグランジアン位相空間パスに基づいて、モデルはカシミール汎函数の無限類と同様に量子古典的なポアンカーイ積分不変量を持つ。また,エーレンフェスト模型を化学物理学から拡張する軌道閉包スキームを提案する。 Mixed quantum-classical models have been proposed in several contexts to overcome the computational challenges of fully quantum approaches. However, current models beyond mean-field approximations typically suffer from long-standing consistency issues, and, in some cases, invalidate Heisenberg's uncertainty principle. Here, we present a fully Hamiltonian theory of quantum-classical dynamics that appears to be the first to ensure a series of consistency properties, beyond positivity of quantum and classical densities. Based on Lagrangian phase-space paths, the model possesses a quantum-classical Poincar\'e integral invariant as well as infinite classes of Casimir functionals. We also present a trajectory closure scheme that extends the Ehrenfest model from chemical physics.	翻訳日:2023-05-02 18:26:18 公開日:2023-04-29
# テスト時間適応のための特徴調整と均一性 Feature Alignment and Uniformity for Test Time Adaptation ( http://arxiv.org/abs/2303.10902v2 ) ライセンス: Link先を確認	Shuai Wang, Daoan Zhang, Zipei Yan, Jianguo Zhang, Rui Li	(参考訳) テスト時間適応(TTA)は、分散テストドメインサンプルの受信時にディープニューラルネットワークを適用することを目的としている。この設定では、モデルはオンラインのラベルのないテストサンプルとトレーニングドメインで事前トレーニングされたモデルのみにアクセスできる。まず、ソースドメインとターゲットドメイン間のドメインギャップにより、TTAを機能リビジョン問題として扱う。その後、2つの測定アライメントと均一性に従い,テスト時間特徴の修正について検討した。テスト時間特徴の均一性について,本研究では,現在のバッチと前回のバッチの表現間の均一性の一貫性を保証するための,テスト時間自己蒸留戦略を提案する。テスト時間の特徴的アライメントを実現するため, 周辺サンプル間の表現の整合化を図った空間的局所クラスタリング手法を提案する。一般的なノイズラベル問題に対処するため,エントロピーと一貫性フィルタを提案し,ノイズラベルの選択と削除を行う。本手法のスケーラビリティと有効性を証明するため,種々のバックボーンを用いた4つの領域一般化ベンチマークと4つの医療画像分割タスクの実験を行った。実験の結果,本手法はベースラインを安定的に改善するだけでなく,既存のテスト時間適応法よりも優れていることがわかった。 Test time adaptation (TTA) aims to adapt deep neural networks when receiving out of distribution test domain samples. In this setting, the model can only access online unlabeled test samples and pre-trained models on the training domains. We first address TTA as a feature revision problem due to the domain gap between source domains and target domains. After that, we follow the two measurements alignment and uniformity to discuss the test time feature revision. For test time feature uniformity, we propose a test time self-distillation strategy to guarantee the consistency of uniformity between representations of the current batch and all the previous batches. For test time feature alignment, we propose a memorized spatial local clustering strategy to align the representations among the neighborhood samples for the upcoming batch. To deal with the common noisy label problem, we propound the entropy and consistency filters to select and drop the possible noisy labels. To prove the scalability and efficacy of our method, we conduct experiments on four domain generalization benchmarks and four medical image segmentation tasks with various backbones. Experiment results show that our method not only improves baseline stably but also outperforms existing state-of-the-art test time adaptation methods.	翻訳日:2023-05-02 18:16:59 公開日:2023-04-29
# オーディオ信号処理のためのコンテンツ適応フロントエンド Content Adaptive Front End For Audio Signal Processing ( http://arxiv.org/abs/2303.10446v2 ) ライセンス: Link先を確認	Prateek Verma and Chris Chafe	(参考訳) 音声信号処理のための学習可能なコンテンツ適応フロントエンドを提案する。ディープラーニングが出現する前は、spectrogramやmel-spectrogramのような固定表現非学習フロントエンドを使用していた。 ASRや音響シーン理解などの様々な応用をサポートする畳み込みアーキテクチャでは、学習可能なフロントエンドへのシフトが発生し、基礎関数の種類と重みの両方がスクラッチから学習され、特定の作業に最適化される。畳み込みブロックのないトランスフォーマーベースのアーキテクチャへの移行により、線形層は小さな波形パッチを小さな潜在次元に投影し、トランスフォーマーアーキテクチャに供給する。本研究では,コンテンツ適応学習可能な時間周波数表現の計算法を提案する。我々は各音声信号を畳み込みフィルタのバンクに通し、それぞれが固定次元ベクトルを与える。有限インパルス応答フィルタバンクのバンクを学習し、入力信号の内容に応じて最適なフィルタバンクを介して入力信号を渡すのと同じである。コンテンツ適応学習可能な時間周波数表現は、本論文の実験以上に広く適用することができる。 We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural architectures. With convolutional architectures supporting various applications such as ASR and acoustic scene understanding, a shift to a learnable front ends occurred in which both the type of basis functions and the weight were learned from scratch and optimized for the particular task of interest. With the shift to transformer-based architectures with no convolutional blocks present, a linear layer projects small waveform patches onto a small latent dimension before feeding them to a transformer architecture. In this work, we propose a way of computing a content-adaptive learnable time-frequency representation. We pass each audio signal through a bank of convolutional filters, each giving a fixed-dimensional vector. It is akin to learning a bank of finite impulse-response filterbanks and passing the input signal through the optimum filter bank depending on the content of the input signal. A content-adaptive learnable time-frequency representation may be more broadly applicable, beyond the experiments in this paper.	翻訳日:2023-05-02 18:16:39 公開日:2023-04-29
# 大きな言語モデルは意識できるのか? Could a Large Language Model be Conscious? ( http://arxiv.org/abs/2303.07103v2 ) ライセンス: Link先を確認	David J. Chalmers	(参考訳) 最近、大きな言語モデルが知覚的か意識的であるかという議論が広まっている。このアイデアを真剣に考えるべきか? 私は最強の理由と反対の理由を断ち切る。意識科学における主要な仮定を考えると、現在のモデルでは意識に重大な障害がある:例えば、リカレント処理の欠如、グローバルワークスペース、統合されたエージェンシーなどである。同時に、これらの障害が今後10年ほどで克服される可能性は極めて高い。結論としては、現在の大きな言語モデルが意識されている可能性は少しありそうにないが、大きな言語モデルの後継者が近い将来意識される可能性について真剣に考えるべきであると結論づけます。 There has recently been widespread discussion of whether large language models might be sentient or conscious. Should we take this idea seriously? I will break down the strongest reasons for and against. Given mainstream assumptions in the science of consciousness, there are significant obstacles to consciousness in current models: for example, their lack of recurrent processing, a global workspace, and unified agency. At the same time, it is quite possible that these obstacles will be overcome in the next decade or so. I conclude that while it is somewhat unlikely that current large language models are conscious, we should take seriously the possibility that successors to large language models may be conscious in the not-too-distant future.	翻訳日:2023-05-02 18:16:20 公開日:2023-04-29
# 多値拡散:画像生成のための無限次元スコアベース拡散モデル Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation ( http://arxiv.org/abs/2303.04772v2 ) ライセンス: Link先を確認	Paul Hagemann, Sophie Mildenberger, Lars Ruthotto, Gabriele Steidl, Nicole Tianjiao Yang	(参考訳) スコアベース拡散モデル(SBDM)は画像生成のための最先端のアプローチとして最近登場した。既存のSBDMは通常有限次元の設定で定式化され、画像は有限サイズのテンソルと見なされる。本稿では, 無限次元のSBDM, すなわち, 矩形領域でサポートされている関数としてトレーニングデータをモデル化する。より高解像度で画像を生成することの探求に加えて、我々の主な動機は、よく考えられた無限次元の学習問題を作成し、複数の解像度レベルで一貫した識別を可能にすることである。これにより,異なる解像度レベルにまたがる拡散モデルが得られ,訓練プロセスの効率が向上することを期待している。無限次元設定におけるsbdmアプローチの2つの欠点を克服する方法を示す。まず, 潜在分布が無限次元設定においてトレースクラス作用素の概念を用いて well-defined であることを保証するために, フォワードプロセスを修正した。第2に,演算子ネットワークを用いたスコア関数の近似化は,fno(fourier neural operator)が多レベルトレーニングに有用であることを示す。有限近似に対する無限次元の設定および逆過程のフォワード過程を導出した後、それらの正当性を示し、適切な離散化を導出し、潜在分布の役割を研究する。 2つのデータセット、MNISTと材料構造について、まず有望な数値結果を提供する。特に、このフレームワークでマルチレベルトレーニングが実現可能であることを示す。 Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of a finite size. This papers develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently on multiple resolution levels. We thereby hope to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process to ensure that the latent distribution is well-defined in the infinite-dimensional setting using the notion of trace class operators. Second, we illustrate that approximating the score function with an operator network, in our case Fourier neural operators (FNOs), is beneficial for multilevel training. After deriving the forward process in the infinite-dimensional setting and reverse processes for finite approximations, we show their well-posedness, derive adequate discretizations, and investigate the role of the latent distributions. We provide first promising numerical results on two datasets, MNIST and material structures. In particular, we show that multilevel training is feasible within this framework.	翻訳日:2023-05-02 18:14:41 公開日:2023-04-29
# 腎移植のための限られた表データによる臨床プロンプトを用いた3次元医用画像 MEDIMP: 3D Medical Images with clinical Prompts from limited tabular data for renal transplantation ( http://arxiv.org/abs/2303.12445v2 ) ライセンス: Link先を確認	Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou	(参考訳) 腎移植は末期腎疾患の最も有効な解決策として出現する。複雑な原因から発生し、慢性的な機能不全のかなりのリスクが持続し、移植片が失われる可能性がある。医療画像は、臨床における腎移植モニタリングにおいて重要な役割を果たす。しかし, 移植管理は, 腎学, 尿学, 放射線学の分野において多分野にまたがっており, このような高次元・複雑な診断データから堅牢なバイオマーカーを同定することは困難である。本研究では,近年の大規模言語モデル(llms)の成功から着想を得て,腎移植におけるダイナミックコントラスト強調mri(dce mri)の有意義なマルチモーダル表現を学習するためのモデルとして,臨床画像からテキストプロンプトへの変換後の構造的臨床生物学データを取り込むことにより,医用画像(クリニカルプロンプトを用いた医用画像)を提案する。 MEDIMPは、この困難なタスクを実行するために、ジョイントテキストイメージのペア埋め込みから対照的な学習に基づいている。さらに,LSMから自動テキストデータ拡張を用いて医療用プロンプトを生成するフレームワークを提案する。本研究の目的は,移植後2年,3年,4年後の患者状態の予後に興味深い腎移植dce mriの有意義な多様体を探索することであり,限られたマルチモーダルデータを最も効率的に活用することである。広範にわたる実験と、限られたデータによる他の腎移植表現学習法との比較は、関連する臨床環境におけるmedimpの有効性を証明し、医学的プロンプトへの新しい方向性を与える。私たちのコードはhttps://github.com/leomlck/MEDIMPで利用可能です。 Renal transplantation emerges as the most effective solution for end-stage renal disease. Occurring from complex causes, a substantial risk of transplant chronic dysfunction persists and may lead to graft loss. Medical imaging plays a substantial role in renal transplant monitoring in clinical practice. However, graft supervision is multi-disciplinary, notably joining nephrology, urology, and radiology, while identifying robust biomarkers from such high-dimensional and complex data for prognosis is challenging. In this work, taking inspiration from the recent success of Large Language Models (LLMs), we propose MEDIMP -- Medical Images with clinical Prompts -- a model to learn meaningful multi-modal representations of renal transplant Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE MRI) by incorporating structural clinicobiological data after translating them into text prompts. MEDIMP is based on contrastive learning from joint text-image paired embeddings to perform this challenging task. Moreover, we propose a framework that generates medical prompts using automatic textual data augmentations from LLMs. Our goal is to learn meaningful manifolds of renal transplant DCE MRI, interesting for the prognosis of the transplant or patient status (2, 3, and 4 years after the transplant), fully exploiting the limited available multi-modal data most efficiently. Extensive experiments and comparisons with other renal transplant representation learning methods with limited data prove the effectiveness of MEDIMP in a relevant clinical setting, giving new directions toward medical prompts. Our code is available at https://github.com/leomlck/MEDIMP.	翻訳日:2023-05-02 18:04:10 公開日:2023-04-29
# 機械学習モデルに専門家の判断を組み込む Incorporating Experts' Judgment into Machine Learning Models ( http://arxiv.org/abs/2304.11870v2 ) ライセンス: Link先を確認	Hogun Park and Aly Megahed and Peifeng Yin and Yuya Ong and Pravar Mahajan and Pei Guo	(参考訳) 機械学習(ML)モデルは、多くのアプリケーションで結果を予測することに成功している。しかし、場合によっては、ドメインの専門家はMLモデルの予測と矛盾する可能性のある期待された結果について判断するかもしれない。この主な理由は、トレーニングデータが完全に人口を表すものではないかもしれないためである。本稿では,専門家の判断を活かして紛争を緩和することを目的とした新しい枠組みを提案する。私たちのフレームワークの背後にある基本的な考え方は、トレーニングデータ内のラベルのないデータポイントの表現度を、生成的な敵ネットワークを用いて最初に決定することです。そして,そのような度合いに基づいて,上記の表現度が高いほど,補正された出力に付加する専門家の直感に重みが小さいほど,その逆であるとする専門家の判断を組み込むことで,「textcolor{black}{machine learning}」モデルの予測を補正する。我々は,合成データと実世界のケーススタディ(ITサービス産業と金融産業のケーススタディ)について,複数の数値実験を行った。その結果,複数の基準法と比較して,予測精度の犠牲を最小限に抑えながら,専門家の判断に非常に近い精度が得られることがわかった。また,予測精度と専門家の判断の近接性を組み合わせた新しい評価指標を開発した。我々のフレームワークは、そのメトリックで評価すると統計的に有意な結果をもたらす。 Machine learning (ML) models have been quite successful in predicting outcomes in many applications. However, in some cases, domain experts might have a judgment about the expected outcome that might conflict with the prediction of ML models. One main reason for this is that the training data might not be totally representative of the population. In this paper, we present a novel framework that aims at leveraging experts' judgment to mitigate the conflict. The underlying idea behind our framework is that we first determine, using a generative adversarial network, the degree of representation of an unlabeled data point in the training data. Then, based on such degree, we correct the \textcolor{black}{machine learning} model's prediction by incorporating the experts' judgment into it, where the higher that aforementioned degree of representation, the less the weight we put on the expert intuition that we add to our corrected output, and vice-versa. We perform multiple numerical experiments on synthetic data as well as two real-world case studies (one from the IT services industry and the other from the financial industry). All results show the effectiveness of our framework; it yields much higher closeness to the experts' judgment with minimal sacrifice in the prediction accuracy, when compared to multiple baseline methods. We also develop a new evaluation metric that combines prediction accuracy with the closeness to experts' judgment. Our framework yields statistically significant results when evaluated on that metric.	翻訳日:2023-05-02 17:57:56 公開日:2023-04-29
# 倫理的・哲学的原則による信頼できる医療人工知能の確保 Ensuring Trustworthy Medical Artificial Intelligence through Ethical and Philosophical Principles ( http://arxiv.org/abs/2304.11530v3 ) ライセンス: Link先を確認	Debesh Jha, Ashish Rauniyar, Abhiskek Srivastava, Desta Haileselassie Hagos, Nikhil Kumar Tomar, Vanshali Sharma, Elif Keles, Zheyuan Zhang, Ugur Demir, Ahmet Topcu, Anis Yazidi, Jan Erik H{\aa}akeg{\aa}rd, and Ulas Bagci	(参考訳) 人工知能(AI)手法は、医療専門家や患者の経験を高めることで、多くの医療に革命をもたらす可能性がある。 aiベースのコンピュータ支援診断ツールは、臨床専門家のレベルに匹敵する能力や性能を発揮できれば、非常に有益である。その結果、先進的な医療サービスは発展途上国では手頃な価格で提供でき、専門医の欠如の問題にも対処できる。 AIベースのツールは、患者の治療の時間、リソース、全体的なコストを節約できる。さらに、人間とは対照的に、AIは大量の入力からデータの複雑な関係を明らかにし、医学における新たなエビデンスベースの知識へと導くことができる。しかし、医療におけるAIの統合は、バイアス、透明性、自律性、責任、説明責任など、いくつかの倫理的および哲学的な懸念を提起する。本稿では、AIを用いた医療画像分析の最近の進歩、既存の標準、および臨床現場におけるAIの応用のための倫理的問題やベストプラクティスを理解することの重要性を強調する。我々は、AIの技術的および倫理的課題と、病院や公共機関にAIを配置することの意味について取り上げる。また、倫理的課題、データ不足、人種的バイアス、透明性の欠如、アルゴリズム的バイアスに対処するための重要な手段と手法についても論じる。最後に、私たちは、医療アプリケーションにおけるAIに関連する倫理的課題に対処するための推奨事項と今後の方向性を提供し、このワークフローをより効率的に、正確で、アクセス可能で、透明で、世界中の患者に信頼できるものにするために、AIを臨床環境にデプロイすることを目的としています。 Artificial intelligence (AI) methods have great potential to revolutionize numerous medical care by enhancing the experience of medical experts and patients. AI based computer-assisted diagnosis tools can have a tremendous benefit if they can outperform or perform similarly to the level of a clinical expert. As a result, advanced healthcare services can be affordable in developing nations, and the problem of a lack of expert medical practitioners can be addressed. AI based tools can save time, resources, and overall cost for patient treatment. Furthermore, in contrast to humans, AI can uncover complex relations in the data from a large set of inputs and even lead to new evidence-based knowledge in medicine. However, integrating AI in healthcare raises several ethical and philosophical concerns, such as bias, transparency, autonomy, responsibility and accountability, which must be addressed before integrating such tools into clinical settings. In this article, we emphasize recent advances in AI-assisted medical image analysis, existing standards, and the significance of comprehending ethical issues and best practices for the applications of AI in clinical settings. We cover the technical and ethical challenges of AI and the implications of deploying AI in hospitals and public organizations. We also discuss promising key measures and techniques to address the ethical challenges, data scarcity, racial bias, lack of transparency, and algorithmic bias. Finally, we provide our recommendation and future directions for addressing the ethical challenges associated with AI in healthcare applications, with the goal of deploying AI into the clinical settings to make the workflow more efficient, accurate, accessible, transparent, and reliable for the patient worldwide.	翻訳日:2023-05-02 17:57:34 公開日:2023-04-29
# 再調査なしの研究: 最大更新パラメトリゼーションはスケールにわたって正確な損失予測をもたらす Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales ( http://arxiv.org/abs/2304.06875v2 ) ライセンス: Link先を確認	Yiqun Yao and Yequan Wang	(参考訳) 言語モデルが拡大するにつれて、小さなモデルの結論が容易に大きなモデルに移行しないため、研究アイデアの検証がますます高価になる。考えられる解決策は、小さなモデルの結果とハイパーパラメータのみに基づいて、大規模モデルのメトリクスを直接予測する汎用システムを確立することである。スケーリングの法則に基づく既存の手法では,最大モデルのハイパーパラメータ探索が必要となる。我々は,最大更新パラメトリゼーション(muP)により,共通損失盆地近傍のハイパーパラメータのスケーリング法則を,探索なしで正確に適合させることができることを示す発見を提示することによって,この問題に対処する。これにより、トレーニング開始前であっても、複数のモデルを直接比較して損失予測を行うことができる。重計算を伴わないモデルスケールの信頼性の高い学術研究への第一歩として,新しいパラダイムを提案する。コードは近々公開される予定だ。 As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that directly predicts some metrics for large models solely based on the results and hyperparameters from small models. Existing methods based on scaling laws require hyperparameter search on the largest models, which is impractical with limited resources. We address this issue by presenting our discoveries indicating that Maximal Update parametrization (muP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation. Code will be publicly available shortly.	翻訳日:2023-05-02 17:55:28 公開日:2023-04-29
# ChartSumm: 長文と短文の自動チャート要約のための総合ベンチマーク ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries ( http://arxiv.org/abs/2304.13620v2 ) ライセンス: Link先を確認	Raian Rahman, Rizvi Hasan, Abdullah Al Farhad, Md Tahmid Rahman Laskar, Md. Hamjajul Ashmafee, Abu Raihan Mostofa Kamal	(参考訳) テキスト要約への自動チャートは、視覚障害者に有効なツールであり、自然言語による表データの正確な洞察をユーザに提供します。大規模で構造化されたデータセットは、データ駆動モデルにとって常に重要な部分です。本稿では,トータル84,363のチャートからなる大規模ベンチマークデータセットであるchartsummを提案する。強力なベースラインモデルによる大規模な実験は、これらのモデルが様々な自動評価指標で十分なスコアを達成して流動的で情報的な要約を生成するにもかかわらず、しばしば幻覚に苦しむこと、重要なデータポイントを欠いていること、チャートの複雑な傾向の誤った説明といった問題に直面していることを示している。また、自動翻訳ツールを用いてChartSummを他の言語に拡張する可能性についても検討した。これらのデータセットは、将来の研究のための挑戦的なベンチマークになります。 Automatic chart to text summarization is an effective tool for the visually impaired people along with providing precise insights of tabular data in natural language to the user. A large and well-structured dataset is always a key part for data driven models. In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts along with their metadata and descriptions covering a wide range of topics and chart types to generate short and long summaries. Extensive experiments with strong baseline models show that even though these models generate fluent and informative summaries by achieving decent scores in various automatic evaluation metrics, they often face issues like suffering from hallucination, missing out important data points, in addition to incorrect explanation of complex trends in the charts. We also investigated the potential of expanding ChartSumm to other languages using automated translation tools. These make our dataset a challenging benchmark for future research.	翻訳日:2023-05-02 17:46:48 公開日:2023-04-29
# ブロックチェーンの大規模言語モデル Blockchain Large Language Models ( http://arxiv.org/abs/2304.12749v2 ) ライセンス: Link先を確認	Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais	(参考訳) 本稿では,異常なブロックチェーントランザクションを検出するための動的,リアルタイムなアプローチを提案する。提案ツールであるblockgptは、ブロックチェーンアクティビティのトレース表現を生成し、大規模な言語モデルをスクラッチからトレーニングすることで、リアルタイム侵入検出システムとして機能する。従来の方法とは異なり、blockgptは制限のない検索空間を提供し、事前定義されたルールやパターンに依存しないように設計されている。本稿では,Ethereumトランザクションの異常検出ツールとしてBlockGPTの有効性を示す。実験では,68万トランザクションのデータセット間の異常なトランザクションを効果的に識別し,バッチ処理のスループットは平均で2284トランザクションである。以上の結果から,BlockGPTは,被害者の契約に係わる最も異常な取引のうち,124件中49件をランク付けし,異常な取引を識別した。この研究は、トランスフォーマーアーキテクチャと互換性のあるカスタムデータエンコーディング、ドメイン固有のトークン化技術、Ethereum仮想マシン(EVM)トレース表現用に特別に開発されたツリーエンコーディングメソッドを導入することで、ブロックチェーントランザクション分析の分野に貢献する。 This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.	翻訳日:2023-05-02 17:46:13 公開日:2023-04-29
# カーネルリッジ回帰のためのロバスト・ランダム化プレコンディショニング Robust, randomized preconditioning for kernel ridge regression ( http://arxiv.org/abs/2304.12465v2 ) ライセンス: Link先を確認	Mateo D\'iaz, Ethan N. Epperly, Zachary Frangella, Joel A. Tropp, and Robert J. Webber	(参考訳) 本稿では,カーネルリッジ回帰(KRR)問題を中～多量のデータポイント(10^4 \leq N \leq 10^7$)で頑健に解くための2つのランダム化プレコンディショニング手法を提案する。最初の方法であるRPCholeskyプレコンディショニングは、カーネル行列固有値の十分速い多項式減衰を仮定して、$O(N^2)$算術演算で全データKRR問題を正確に解くことができる。 2つ目の方法、KRILLプリコンディショニングは、$k \ll N$選択されたデータセンターを$O((N + k^2) k \log k)の演算で制限されたバージョンのKRR問題に対する正確な解決策を提供する。提案手法は,様々なKRR問題を解くとともに,従来のKRRプリコンディショナーの故障モードを克服し,実用化に最適である。 This paper introduces two randomized preconditioning techniques for robustly solving kernel ridge regression (KRR) problems with a medium to large number of data points ($10^4 \leq N \leq 10^7$). The first method, RPCholesky preconditioning, is capable of accurately solving the full-data KRR problem in $O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers an accurate solution to a restricted version of the KRR problem involving $k \ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations. The proposed methods solve a broad range of KRR problems and overcome the failure modes of previous KRR preconditioners, making them ideal for practical applications.	翻訳日:2023-05-02 17:45:52 公開日:2023-04-29
# Sparse Private LASSO Logistic Regression Sparse Private LASSO Logistic Regression ( http://arxiv.org/abs/2304.12429v2 ) ライセンス: Link先を確認	Amol Khanna, Fred Lu, Edward Raff, Brian Testa	(参考訳) LASSOの正規化ロジスティック回帰は、特に組み込みの機能選択に有用であり、配置から係数を除去し、疎解を生成することができる。 LASSOロジスティック回帰の異なるプライベートバージョンが開発されているが、一般に密度の高い解が生成され、LASSOペナルティの本質的な有用性が低下する。本稿では,硬零点を維持できる分散ロジスティック回帰のための微分プライベート法を提案する。我々の重要な洞察は、まず非プライベートラッソロジスティック回帰モデルを訓練し、最終モデル選択に使用する非零係数の民営化数を決定することである。提案手法の性能を示すため,合成および実世界のデータセットを用いた実験を行った。 LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.	翻訳日:2023-05-02 17:45:32 公開日:2023-04-29
# 音声におけるロバストプライバシー保護のための逆表現学習 Adversarial Representation Learning for Robust Privacy Preservation in Audio ( http://arxiv.org/abs/2305.00011v1 ) ライセンス: Link先を確認	Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen	(参考訳) 音響イベント検出システムは、監視や環境監視といった様々なアプリケーションで広く使用されており、データは自動的に収集され、処理され、クラウドに送信される。しかし、このプロセスは必然的にユーザーや周囲に関する機密情報を開示し、プライバシー上の懸念を引き起こす可能性がある。本研究では,音声録音の潜在的特徴から音声活動の検出を効果的に防止する,音声録音の表現を学習するための新しい学習手法を提案する。提案手法は,非音声録音と音声分類器では区別できない音声録音の不変な潜在表現を生成するようにモデルを訓練する。私たちの研究の目新しさは最適化アルゴリズムにあり、音声分類器の重みは教師付きで訓練された分類器の重みに定期的に置き換えられる。これにより、対向訓練中に常に音声分類器の識別能力を高め、対向訓練ループの外で訓練された新しい音声分類器を用いても、発話が識別できない潜在表現を生成する動機付けとなる。提案手法は,プライバシ対策が不要なベースラインアプローチと,プライバシ違反がベースラインアプローチに比べて有意に低減する先行的敵訓練手法に対して評価を行う。また,本手法は,本手法では効果的ではないことを示す。 Sound event detection systems are widely used in various applications such as surveillance and environmental monitoring where data is automatically collected, processed, and sent to a cloud for sound recognition. However, this process may inadvertently reveal sensitive information about users or their surroundings, hence raising privacy concerns. In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings. The proposed method trains a model to generate invariant latent representations of speech-containing audio recordings that cannot be distinguished from non-speech recordings by a speech classifier. The novelty of our work is in the optimization algorithm, where the speech classifier's weights are regularly replaced with the weights of classifiers trained in a supervised manner. This increases the discrimination power of the speech classifier constantly during the adversarial training, motivating the model to generate latent representations in which speech is not distinguishable, even using new speech classifiers trained outside the adversarial training loop. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method, demonstrating a significant reduction in privacy violations compared to the baseline approach. Additionally, we show that the prior adversarial method is practically ineffective for this purpose.	翻訳日:2023-05-02 17:36:42 公開日:2023-04-29
# 欠陥量子ビットアレイ上の適応型表面コードの経験的オーバーヘッド Empirical overhead of the adapted surface code on defective qubit arrays ( http://arxiv.org/abs/2305.00138v1 ) ライセンス: Link先を確認	Sophia Fuhui Lin, Joshua Viszlai, Kaitlin N. Smith, Gokul Subramanian Ravi, Charles Yuan, Frederic T. Chong, Benjamin J. Brown	(参考訳) 固体ハードウェアを用いたフォールトトレラント量子コンピュータの実現には、常に発生する製造のばらつきや欠陥を考慮した量子エラー訂正手順を適用する必要があります。非アドレスの場合、これらのエラーは、量子情報が十分に小さな障害率で処理できるように、システムをスケールすることを妨げます。我々は、任意に分散した欠陥を持つキュービットアレイに適応した表面コードをシミュレートし、欠陥が忠実性に与える影響を特徴づける指標を見つける。次に、フォールトトレラントな量子コンピュータを実現する際のリソースオーバーヘッドに対する欠陥の影響をチップレットベースのモジュラーアーキテクチャで決定する。回路ベースノイズモデルにおいて,非フーティ物理量子ビットの誤差レートが$\sim 0.1\%$であるような論理的故障の指数関数的抑制を示す。これは、欠陥のないsurfaceコードを実行するような典型的な仕組みです。我々は,欠陥チップレットからデバイスを構築するための選択後基準を確立するために,数値結果を用いた。この基準を用いて,論理キュービット当たりの物理キュービットの平均個数の観点から,資源のオーバーヘッドを評価する。欠陥率と目標忠実度に基づいて最適なチップレットサイズを選択することは、欠陥による追加のエラー修正オーバーヘッドを制限するのに不可欠である。最適なチップレットサイズを選択すると、リソースオーバーヘッドが1\%の欠陥率で、使用する2つの欠陥モデルに対してそれぞれ3Xと6X以下に削減され、幅広い目標性能を実現することができる。また、qubitを無効にするか、エラー訂正コードの一部として保持すべきかを特定するのに役立つカットオフ忠実度値を判定する。 The realization of fault-tolerant quantum computers using solid-state hardware will require us to adapt our quantum error correction procedure to account for fabrication variation and defects that will invariably arise. If unaddressed, these errors inhibit us from scaling our system such that quantum information can be processed with sufficiently small failure rates. We simulate the surface code adapted to qubit arrays with arbitrarily distributed defects to find metrics that characterize how defects affect fidelity. We then determine the impact of defects on the resource overhead of realizing a fault-tolerant quantum computer, on a chiplet-based modular architecture. Our strategy for dealing with fabrication defects demonstrates an exponential suppression of logical failure where error rates of non-faulty physical qubits are $\sim 0.1\%$ in a circuit-based noise model. This is a typical regime where we imagine running the defect-free surface code. We use our numerical results to establish post-selection criteria for building a device from defective chiplets. Using our criteria, we then evaluate the resource overhead in terms of the average number of fabricated physical qubits per logical qubit. We find that an optimal choice of chiplet size, based on the defect rate and target fidelity, is essential to limiting any additional error correction overhead due to defects. When the optimal chiplet size is chosen, at a defect rate of $1\%$ the resource overhead can be reduced to below 3X and 6X respectively for the two defect models we use, for a wide range of target performance. We also determine cutoff fidelity values that help identify whether a qubit should be disabled or kept as part of the error correction code.	翻訳日:2023-05-02 17:02:11 公開日:2023-04-29
# 量子空間結合符号 Quantum Spatially-Coupled Codes ( http://arxiv.org/abs/2305.00137v1 ) ライセンス: Link先を確認	Siyi Yang, Robert Calderbank	(参考訳) 空間結合符号 (SC) は畳み込みLDPC符号のクラスであり、高い性能と低遅延デコーダとの互換性により古典的符号化理論においてよく研究されている。古典的2次元空間結合符号(2D-SC)の量子対としてトーリック符号を記述し、一般化として量子空間結合符号(QSC)を導入する。畳み込み構造を用いて、2D-SC符号のパリティチェック行列を2つの不定値の多項式として表現し、2D-SC符号が安定化符号となるために必要な代数的条件を導出する。この代数的フレームワークは、新しいコードファミリの構築を促進する。本稿では,小記憶が量子ビットの物理的接続を容易にし,局所符号化と低遅延ウィンドウの復号化を可能にした点に注目する。本稿では,2D-SC HGP符号のタンナーグラフにおいて,各成分符号の短周期から生じる短周期を最適化するために,代数的フレームワークを用いる。従来の作業では1/10未満のQLDPC符号に重点を置いていたが、2D-SC HGP符号は少ないメモリ、高いレート(約1/3)、優れた閾値で構築した。 Spatially-coupled (SC) codes is a class of convolutional LDPC codes that has been well investigated in classical coding theory thanks to their high performance and compatibility with low-latency decoders. We describe toric codes as quantum counterparts of classical two-dimensional spatially-coupled (2D-SC) codes, and introduce quantum spatially-coupled (QSC) codes as a generalization. We use the convolutional structure to represent the parity check matrix of a 2D-SC code as a polynomial in two indeterminates, and derive an algebraic condition that is both necessary and sufficient for a 2D-SC code to be a stabilizer code. This algebraic framework facilitates the construction of new code families. While not the focus of this paper, we note that small memory facilitates physical connectivity of qubits, and it enables local encoding and low-latency windowed decoding. In this paper, we use the algebraic framework to optimize short cycles in the Tanner graph of 2D-SC HGP codes that arise from short cycles in either component code. While prior work focuses on QLDPC codes with rate less than 1/10, we construct 2D-SC HGP codes with small memory, higher rates (about 1/3), and superior thresholds.	翻訳日:2023-05-02 17:01:41 公開日:2023-04-29
# ベストサポート環境の提供によるAI開発プロセスの最適化 Optimizing the AI Development Process by Providing the Best Support Environment ( http://arxiv.org/abs/2305.00136v1 ) ライセンス: Link先を確認	Taha Khamis, Hamam Mokayed	(参考訳) 本研究の目的は,AI(Artificial Inelegance)と機械学習(ML)アプリケーションの開発プロセスを調査し,最高のサポート環境を提供することである。 MLの主なステージは、問題理解、データ管理、モデル構築、モデル展開、メンテナンスである。本研究は,機械学習開発の最重要段階であるML開発におけるデータ管理段階とその障害を,エンドモデルの精度がモデルに入力されるデータの種類に依存しているため調査することに焦点を当てる。この段階で見つかった最大の障害は、特にデータが機密である分野において、モデル学習に十分なデータがないことである。このプロジェクトの目的は、データ管理の段階で十分なデータ不足を解決するための、研究者と開発者のためのフレームワークの構築と開発である。このフレームワークは、オリジナルのデータセットから新しいデータを生成するために使用可能な、いくつかのデータ拡張技術を利用して、利用可能なデータ量と品質を増大させることで、MLアプリケーションの全体的なパフォーマンスを向上させることができる。このフレームワークはpython言語を使用して構築され、ディープラーニングの進歩を使ってデータ拡張を行う。 The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements.	翻訳日:2023-05-02 17:01:20 公開日:2023-04-29
# 関節センシング、コミュニケーション、ai : 回復力のあるthzユーザエクスペリエンスのためのtrifecta Joint Sensing, Communication, and AI: A Trifecta for Resilient THz User Experiences ( http://arxiv.org/abs/2305.00135v1 ) ライセンス: Link先を確認	Christina Chaccour, Walid Saad, Merouane Debbah, and H. Vincent Poor	(参考訳) 本稿では,テラヘルツ(THz)無線システムに対する拡張現実(XR)体験を最適化するために,新しい共同センシング,通信,人工知能(AI)フレームワークを提案する。提案フレームワークは3つの主要コンポーネントで構成されている。まず、THzチャネルの間隔を利用して、XRユーザとその環境に対するユニークな検知パラメータを抽出するテンソル分解フレームワークを提案する。本質的には、thzバンドの準光学性を活用し、アップリンク通信信号からセンシングパラメータを抽出することにより、通信機能とセンシング機能の両方に同じ波形、スペクトル、ハードウェアを使用できる。そして、クラーラオ下限が導出され、推定されたセンシングパラメータの精度が評価される。第2に,非自己回帰型多解像度生成人工知能(AI)フレームワークと対向変換器を統合したフレームワークを提案する。提案フレームワークは, ユーザ行動と環境条件の両方において, ゆらぎに一般化可能な, 堅牢かつ包括的な歴史的センシング情報と将来の環境変化予測を提供する。第3に、再構成可能なインテリジェントサーフェス(RIS)サブアレイのハンドオーバポリシを制御し、センサ情報の情報的特性を活用してハンドオーバコストを最小化し、個人的体験(QoPE)の質を最大化し、THzリンクの堅牢性とレジリエンスを向上させるために、マルチエージェント深部学習型Qニューラルネットワークを開発した。シミュレーションの結果,提案する非教師付き生成型aiフレームワークのユーザ行動と速度の変動に対する高い一般化性を示し,既知のチャネル状態情報を持つスキームと比較して,瞬時信頼性が61%向上した。 In this paper a novel joint sensing, communication, and artificial intelligence (AI) framework is proposed so as to optimize extended reality (XR) experiences over terahertz (THz) wireless systems. The proposed framework consists of three main components. First, a tensor decomposition framework is proposed to extract unique sensing parameters for XR users and their environment by exploiting then THz channel sparsity. Essentially, THz band's quasi-opticality is exploited and the sensing parameters are extracted from the uplink communication signal, thereby allowing for the use of the same waveform, spectrum, and hardware for both communication and sensing functionalities. Then, the Cramer-Rao lower bound is derived to assess the accuracy of the estimated sensing parameters. Second, a non-autoregressive multi-resolution generative artificial intelligence (AI) framework integrated with an adversarial transformer is proposed to predict missing and future sensing information. The proposed framework offers robust and comprehensive historical sensing information and anticipatory forecasts of future environmental changes, which are generalizable to fluctuations in both known and unforeseen user behaviors and environmental conditions. Third, a multi-agent deep recurrent hysteretic Q-neural network is developed to control the handover policy of reconfigurable intelligent surface (RIS) subarrays, leveraging the informative nature of sensing information to minimize handover cost, maximize the individual quality of personal experiences (QoPEs), and improve the robustness and resilience of THz links. Simulation results show a high generalizability of the proposed unsupervised generative AI framework to fluctuations in user behavior and velocity, leading to a 61 % improvement in instantaneous reliability compared to schemes with known channel state information.	翻訳日:2023-05-02 17:01:01 公開日:2023-04-29
# LD-GAN:可変規則化を用いたスペクトル画像生成のための低次元生成逆ネットワーク LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization ( http://arxiv.org/abs/2305.00132v1 ) ライセンス: Link先を確認	Emmanuel Martinez, Roman Jacome, Alejandra Hernandez-Rojas and Henry Arguello	(参考訳) ディープラーニング法はスペクトル画像(SI)計算タスクの最先端技術である。しかし、これらの手法は高いコストと長い取得時間のために利用可能なデータセットが制限されているため、性能に制約がある。通常、データの欠如を軽減するためにデータ拡張技術が使用される。幾何学的変換のような古典的拡張法を超越したganは、データ分布から学習およびサンプリングすることで多様な拡張を可能にする。しかしながら、この種のデータの高次元性はGANトレーニングの収束を妨げるため、GANベースのSI生成は困難である。この制限を克服するため、我々は、事前訓練されたオートエンコーダネットワークの潜伏空間と低次元のデータベース表現を用いた低次元GAN(LD-GAN)を提案する。これにより,事前学習したデコーダネットワークを用いてsi次元にマッピングした新しい低次元サンプルを生成する。さらに,自動エンコーダ訓練のための低次元表現分散を制御し,GANで生成されたサンプルの多様性を達成するための統計正規化を提案する。圧縮スペクトル画像, SI超解像, RBGにおけるデータ拡張戦略としてLD-GAN法を検証し, それぞれ0.5から1[dB]に改善した。我々は,非データ強化トレーニングである従来のDAとの比較を行い,全サイズのSIを生成するための調整および訓練を行った。本論文のコードはhttps://github.com/romanjacome99/LD_GAN.gitにある。 Deep learning methods are state-of-the-art for spectral image (SI) computational tasks. However, these methods are constrained in their performance since available datasets are limited due to the highly expensive and long acquisition time. Usually, data augmentation techniques are employed to mitigate the lack of data. Surpassing classical augmentation methods, such as geometric transformations, GANs enable diverse augmentation by learning and sampling from the data distribution. Nevertheless, GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation. To surmount this limitation, we propose low-dimensional GAN (LD-GAN), where we train the GAN employing a low-dimensional representation of the {dataset} with the latent space of a pretrained autoencoder network. Thus, we generate new low-dimensional samples which are then mapped to the SI dimension with the pretrained decoder network. Besides, we propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN. We validate our method LD-GAN as data augmentation strategy for compressive spectral imaging, SI super-resolution, and RBG to spectral tasks with improvements varying from 0.5 to 1 [dB] in each task respectively. We perform comparisons against the non-data augmentation training, traditional DA, and with the same GAN adjusted and trained to generate the full-sized SIs. The code of this paper can be found in https://github.com/romanjacome99/LD_GAN.git	翻訳日:2023-05-02 17:00:27 公開日:2023-04-29
# 構造制約による教師なしドメイン適応のための正規化自己学習 Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints ( http://arxiv.org/abs/2305.00131v1 ) ライセンス: Link先を確認	Rajshekhar Das, Jonathan Francis, Sanket Vaibhav Mehta, Jean Oh, Emma Strubell, Jose Moura	(参考訳) 擬似ラベルに基づく自己学習は、意味的セグメンテーション問題に対する教師なしドメイン適応(UDA)における条件分布シフトに対処する主要なアプローチとして現れてきた。しかし、注目すべき欠点は、このアプローチのファミリーが、ソースドメインのバイアスの確認から生じ、ターゲットドメインの迷惑要因として現れる誤った擬似ラベルに影響を受けやすいことである。このミスマッチの原因は、RGB画像入力によって提供される測光キューのみに依存するため、最終的には準最適適応につながる可能性がある。擬似ラベルのミスマッチ効果を軽減するため,従来の自己学習目標を正規化するために,奥行きなどの補助的モーダルから構造的手がかりを取り入れることを提案する。具体的には、異なるオブジェクトカテゴリを分割しながら、オブジェクトインスタンスの領域内のピクセル表現を近くまで引っ張る、対照的なピクセルレベルのオブジェクト性制約を導入する。真の基礎となる対象と整合する対象領域を得るため,マルチモーダルクラスタリングという形で深度マップとRGB画像の両方から情報を抽出する。重要なことに、対象性制約は基幹構造的ラベルに依存しないため、教師なしドメイン適応に適している。本研究では, セマンティックセグメンテーションのためのUDAベンチマークにおいて, セマンティックセグメンテーションにおいて, 最上位の自己学習法(最大2ドルポイント)を著しく改善することを示す。補足にすべてのコードを含めます。 Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in the target domain. A possible source for this mismatch is the reliance on only photometric cues provided by RGB image inputs, which may ultimately lead to sub-optimal adaptation. To mitigate the effect of mismatched pseudo-labels, we propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives. Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer, while pushing those from different object categories apart. To obtain object regions consistent with the true underlying object, we extract information from both depth maps and RGB-images in the form of multimodal clustering. Crucially, the objectness constraint is agnostic to the ground-truth semantic labels and, hence, appropriate for unsupervised domain adaptation. In this work, we show that our regularizer significantly improves top performing self-training methods (by up to $2$ points) in various UDA benchmarks for semantic segmentation. We include all code in the supplementary.	翻訳日:2023-05-02 17:00:02 公開日:2023-04-29
# ViewFormer:多視点3次元形状理解のためのビューセット注意 ViewFormer: View Set Attention for Multi-view 3D Shape Understanding ( http://arxiv.org/abs/2305.00161v1 ) ライセンス: Link先を確認	Hongyu Sun, Yongcai Wang, Peng Wang, Xudong Cai, Deying Li	(参考訳) 本稿では,多次元形状認識と検索のための簡易かつ効果的なモデルであるViewFormerを提案する。マルチビュー情報を集約する既存の手法を体系的に検討し,ビューに関する関係仮定を最小化し,表現の自由度を解放する,新しい「ビューセット」視点を提案する。我々は、ビューセット内の要素のペアワイズおよび高次相関を捉えるための適応的注意モデルを作成する。学習されたマルチビュー相関は、認識および検索のための表現型ビューセット記述子に集約される。実験では、異なるタスクやデータセットにまたがる驚くべき機能を解き放つ方法を示した。例えば、2つのアテンションブロックと4.8mの学習可能なパラメータを持つviewformerは、modelnet40で初めて98.8%の認識精度に達し、以前のベストメソッドを1.1%上回った。難易度の高いRGBDデータセットでは、98.4%の認識精度が達成され、最強のベースラインに対して4.1%の絶対改善が達成された。 ViewFormerはまた、SHREC'17ベンチマークで定義された3次元形状検索のいくつかの評価次元で新しいレコードを設定する。 This paper presents ViewFormer, a simple yet effective model for multi-view 3d shape recognition and retrieval. We systematically investigate the existing methods for aggregating multi-view information and propose a novel ``view set" perspective, which minimizes the relation assumption about the views and releases the representation flexibility. We devise an adaptive attention model to capture pairwise and higher-order correlations of the elements in the view set. The learned multi-view correlations are aggregated into an expressive view set descriptor for recognition and retrieval. Experiments show the proposed method unleashes surprising capabilities across different tasks and datasets. For instance, with only 2 attention blocks and 4.8M learnable parameters, ViewFormer reaches 98.8% recognition accuracy on ModelNet40 for the first time, exceeding previous best method by 1.1% . On the challenging RGBD dataset, our method achieves 98.4% recognition accuracy, which is a 4.1% absolute improvement over the strongest baseline. ViewFormer also sets new records in several evaluation dimensions of 3D shape retrieval defined on the SHREC'17 benchmark.	翻訳日:2023-05-02 16:50:46 公開日:2023-04-29
# ランダムな特徴を持つグラフカーネルの改ざん Taming graph kernels with random features ( http://arxiv.org/abs/2305.00156v1 ) ライセンス: Link先を確認	Krzysztof Choromanski	(参考訳) 本稿では,グラフランダム特徴(GRF)のメカニズムを紹介する。 GRFはグラフのノード上で定義されたいくつかの重要なカーネル、特に正規化されたラプラシアカーネルの非バイアスランダム化推定器を構築するのに使うことができる。非グラフカーネルの通常のRFとして、グラフ上で定義されたカーネルメソッドをより大きなネットワークにスケールアップする手段を提供する。重要なのは、下流のアプリケーションに適用しながら、より小さなグラフに対してもかなりの計算量が得られることだ。その結果、GRFはグラフカーネルアルゴリズムの3乗(グラフのノード数)時間複雑性の非常に難しい問題に対処する。速度テストからフロベニウス相対誤差解析から、グラフカーネルを用いたkmeansグラフクラスタリングまで、幅広い経験的評価を行った。 GRFの計算は、考慮中のグラフを複数のマシンに分割する必要がある場合に適用可能な、恥ずかしいほど単純な分散アルゴリズムを許容していることを示す。我々はまた、GRFの分散を最適化するために用いられる、いわゆる強化ランダムウォークに依存する(まだバイアスのない)準モンテカルロ変種 q-GRF も導入する。副産物として、正および対称行列を持つ線型方程式のある種のクラスを解く新しいアプローチを得る。 We introduce in this paper the mechanism of graph random features (GRFs). GRFs can be used to construct unbiased randomized estimators of several important kernels defined on graphs' nodes, in particular the regularized Laplacian kernel. As regular RFs for non-graph kernels, they provide means to scale up kernel methods defined on graphs to larger networks. Importantly, they give substantial computational gains also for smaller graphs, while applied in downstream applications. Consequently, GRFs address the notoriously difficult problem of cubic (in the number of the nodes of the graph) time complexity of graph kernels algorithms. We provide a detailed theoretical analysis of GRFs and an extensive empirical evaluation: from speed tests, through Frobenius relative error analysis to kmeans graph-clustering with graph kernels. We show that the computation of GRFs admits an embarrassingly simple distributed algorithm that can be applied if the graph under consideration needs to be split across several machines. We also introduce a (still unbiased) quasi Monte Carlo variant of GRFs, q-GRFs, relying on the so-called reinforced random walks, that might be used to optimize the variance of GRFs. As a byproduct, we obtain a novel approach to solve certain classes of linear equations with positive and symmetric matrices.	翻訳日:2023-05-02 16:50:28 公開日:2023-04-29
# 学習から学びへ:非確率的外乱に反するマルチエージェントのオンラインソース Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances ( http://arxiv.org/abs/2305.00154v1 ) ライセンス: Link先を確認	Bin Du and Kun Qian and Christian Claudel and Dengfeng Sun	(参考訳) 本稿では,新しい学習手法を活用し,未知環境下でのマルチエージェントオンライン検索アルゴリズムを提案する。問題設定における特に重要な点は一基礎となる環境は、未知だけでなく、動的に変化し、二種類の非確率的障害に悩まされていること。二エージェントの集団が配置され、できるだけ多くの情報源を協力的に探究することが期待されていること。そこで,非確率的障害に対処するために,割引カルマンフィルタの新たな手法を開発し,ポリトープの性質に結びついた信頼感の概念を用いて,マルチプルエージェント間の計算効率のよい協調を支援する。未知の環境と乱れに関する標準的な仮定により、我々のアルゴリズムは2種類の非確率的乱れのタイプの下で線形的後悔を達成し、どちらも最先端のものと同等である。本手法の有効性を示すために,実環境汚染モニタリングアプリケーションの数値例を示した。 This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.	翻訳日:2023-05-02 16:50:07 公開日:2023-04-29
# 転校学習におけるモデル選択の限界 Limits of Model Selection under Transfer Learning ( http://arxiv.org/abs/2305.00152v1 ) ライセンス: Link先を確認	Steve Hanneke, Samory Kpotufe, Yasaman Mahdaviyeh	(参考訳) 転送学習やドメイン適応に関する理論的研究はこれまで、既知の仮説クラスやモデルでの状況に焦点を当ててきたが、実際には、いくつかのモデル選択は、通常、ハイパーパラメータチューニング(hyperparameter-tuning)という包括的用語の下に現れる。現在、モデル選択に関わる近似と推定誤差の通常のトレードオフに加えて、この問題は新たな複雑性項、すなわち、ソースとターゲットの分布間の移動距離が仮説クラスの選択によって異なることが知られている。特に、分析によって注目すべき現象が明らかになる: 適応率、すなわち、分布情報を持たないもの、すなわち、距離に関する知識が与えられたとき、oracleの速度よりも任意に遅い可能性がある。 Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.	翻訳日:2023-05-02 16:49:52 公開日:2023-04-29
# X線認識:コントラスト目的を用いたX線からの患者識別 X-ray Recognition: Patient identification from X-rays using a contrastive objective ( http://arxiv.org/abs/2305.00149v1 ) ライセンス: Link先を確認	Hao Liang, Kevin Ni, Guha Balakrishnan	(参考訳) 近年の研究では、深層学習モデルは患者の胸部X線(CXR)から生体情報(人種、性別、年齢など)を正確に抽出できることが示されている。本稿ではさらに,同一患者に属するcxrと異なる患者に属するcxrとの識別において,ディープラーニングモデルが驚くほど正確であることを示す。これらの結果は、医療画像コミュニティが大規模なCXRデータベースの普及に関して考慮すべき潜在的なプライバシー上の配慮を示唆している。 Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases.	翻訳日:2023-05-02 16:49:13 公開日:2023-04-29
# GANを用いた胸部X線データセットバイアスの可視化 Visualizing chest X-ray dataset biases using GANs ( http://arxiv.org/abs/2305.00147v1 ) ライセンス: Link先を確認	Hao Liang, Kevin Ni, Guha Balakrishnan	(参考訳) 最近の研究では、様々な胸部X線データセットの画像には、人種や性別といった保護された人口特性と強く相関する視覚的特徴が含まれていることが示されている。これらの要因のいくつかは臨床予測のために下流アルゴリズムによって使用される可能性があるため、この発見は公平性の問題を提起する。本研究では,2つの層群に属するX線に最も異なる特徴を可視化するために,GAN(Generative Adversarial Network)を用いたフレームワークを提案する。 Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups.	翻訳日:2023-05-02 16:49:04 公開日:2023-04-29
# 量子型理論とアプリケーション、モデル、アルゴリズム、コンパイル、エラー訂正のキャズムを統合する Integrating Across Application, Model, Algorithm, Compilation, and Error Correction Chasms With Quantum Type Theory ( http://arxiv.org/abs/2305.00144v1 ) ライセンス: Link先を確認	Eugene Dumitrescu	(参考訳) 本稿では,量子型理論の現状と今後の計算的意味について概説する。 We briefly discuss the current state, and future computational implications, of quantum type theory.	翻訳日:2023-05-02 16:48:57 公開日:2023-04-29
# 連続予測2サンプルと独立試験 Sequential Predictive Two-Sample and Independence Testing ( http://arxiv.org/abs/2305.00143v1 ) ライセンス: Link先を確認	Aleksandr Podkopaev, Aaditya Ramdas	(参考訳) 逐次非パラメトリック2サンプルと独立テストの問題点について検討する。シーケンシャルテストはデータをオンラインで処理し、観測データを使用してヌル仮説を停止または拒否するか、タイプiのエラーコントロールを維持しながらより多くのデータを収集するかを決定する。我々は賭けによる(非パラメトリックな)テストの原理に基づいており、ギャンブラーは将来の観測に賭け、その富はヌル仮説に対する証拠を測定する。最近開発されたカーネルベースのベッティング戦略は、単純な分布でよく機能するが、テキストや画像のような高次元または構造化データに適したカーネルを選択することは、しばしば簡単ではない。この欠点に対処するために、我々は次の事実に依存する予測ベースの賭け戦略を設計する。 (a) インスタンスが引き出されるもの,又は b) インスタンスがジョイント分布またはマージン分布の積から引き出されるか(後者は外部ランダム化によって生成される)、それぞれ2つのサンプルまたは独立ヌルに対する証拠を提供する。構造化された設定下でのカーネルベースのアプローチよりもテストが優れていることを実証的に示す。我々のテストは、独立で同一の分散データ以外に適用でき、データ分布が時間とともにドリフトしても有効で強力なままです。 We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as text and images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time.	翻訳日:2023-05-02 16:48:54 公開日:2023-04-29
# 3ドルのケメニー問題に対する空間削減技術 Space reduction techniques for the $3$-wise Kemeny problem ( http://arxiv.org/abs/2305.00140v1 ) ライセンス: Link先を確認	Xuan Kien Phung and Sylvie Hamel	(参考訳) ケメニーの法則は、計算社会選択と生物学に様々な重要な応用がある最も研究されよく知られた投票方式の1つである。近年、ケメニーの法則はギルバートらによる集合的アプローチによって一般化された。アルこのパラダイムに従い、我々は \cite{phung-hamel-2023} において、3ドルのケンドール-タウ距離によって引き起こされる3ドルのケメニー投票スキームが古典的なケメニー規則と比較して興味深い利点を示していることを示した。投票プロファイルの3ドルのコンセンサスランキングを計算することからなる3ドルのkemeny問題はnp-hardであるが、本論文では、従来のkemenyルールに対するcite{milosz-hamel-2020} で得られた主要な次数定理のいくつかの一般化を、多項式時間で相対次数を効率的に決定することにより、実質的な検索空間削減を達成するための3ドルのkemeny投票スキームのために確立する。本質的には、我々の定理は、選挙において別の選択肢よりも別の選択肢の選好が十分強く、また1つまたは2つの選択肢を考慮しても十分強い場合、これら2つの選択肢の相対順序が3$のコンセンサスランキングで期待通りである、という非自明な性質を正確に定量化する。さらに、古典的なケメニー規則に対するベツラーらの有名な3/4ドルのマジョリティルールは、3ドルのケメニー・スキームに関して5ドル以下の選択肢を持たない選挙に対してのみ有効であることを示す。 3ドルのkemenyルールは、古典的なルールよりも操作に抵抗があることを示す例もある。 Kemeny's rule is one of the most studied and well-known voting schemes with various important applications in computational social choice and biology. Recently, Kemeny's rule was generalized via a set-wise approach by Gilbert et. al. Following this paradigm, we have shown in \cite{Phung-Hamel-2023} that the $3$-wise Kemeny voting scheme induced by the $3$-wise Kendall-tau distance presents interesting advantages in comparison with the classical Kemeny rule. While the $3$-wise Kemeny problem, which consists of computing the set of $3$-wise consensus rankings of a voting profile, is NP-hard, we establish in this paper several generalizations of the Major Order Theorems, as obtained in \cite{Milosz-Hamel-2020} for the classical Kemeny rule, for the $3$-wise Kemeny voting scheme to achieve a substantial search space reduction by efficiently determining in polynomial time the relative orders of pairs of alternatives. Essentially, our theorems quantify precisely the non-trivial property that if the preference for an alternative over another one in an election is strong enough, not only in the head-to-head competition but even when taking into consideration one or two more alternatives, then the relative order of these two alternatives in every $3$-wise consensus ranking must be as expected. Moreover, we show that the well-known $3/4$-majority rule of Betzler et al. for the classical Kemeny rule is only valid for elections with no more than $5$ alternatives with respect to the $3$-wise Kemeny scheme. Examples are also provided to show that the $3$-wise Kemeny rule is more resistant to manipulation than the classical one.	翻訳日:2023-05-02 16:48:32 公開日:2023-04-29
# グラフニューラルネットワークにおけるノード分類のためのラベル非均一性の活用 Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks ( http://arxiv.org/abs/2305.00139v1 ) ライセンス: Link先を確認	Feng Ji and See Hian Lee and Hanyang Meng and Kai Zhao and Jielong Yang and Wee Peng Tay	(参考訳) グラフニューラルネットワーク(GNN)を用いたノード分類では、典型的なモデルは各ノードで異なるクラスラベルのログを生成する。ソフトマックス層はしばしば最大のロジットに基づいてラベル予測を出力する。これらのロジットを用いてデータセットから隠れたグラフ構造情報を推測できることを実証する。本稿では,ロジットのソフトマックス分布と均一分布との間のワッサーシュタイン距離から導かれるラベル非均一性の鍵となる概念を紹介する。ラベルの不均一性の低いノードを正しく分類することは困難である。我々は,ラベルの非一様性がグラフ全体でどのように変化するのかを理論的に分析し,モデル性能の向上に関する洞察を与える: トレーニングサンプルを高一様性で増加させるか,あるいは、エッジを落として、小さな非一様性のノードセットの最大カットサイズを小さくする。これらのメカニズムはベースGNNモデルに簡単に追加できる。実験により,多くのベンチマークベースモデルの性能向上が示された。 In node classification using graph neural networks (GNNs), a typical model generates logits for different class labels at each node. A softmax layer often outputs a label prediction based on the largest logit. We demonstrate that it is possible to infer hidden graph structural information from the dataset using these logits. We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution. We demonstrate that nodes with small label non-uniformity are harder to classify correctly. We theoretically analyze how the label non-uniformity varies across the graph, which provides insights into boosting the model performance: increasing training samples with high non-uniformity or dropping edges to reduce the maximal cut size of the node set of small non-uniformity. These mechanisms can be easily added to a base GNN model. Experimental results demonstrate that our approach improves the performance of many benchmark base models.	翻訳日:2023-05-02 16:47:55 公開日:2023-04-29
# 線形回帰のためのデータ駆動サブグループ同定 Data-Driven Subgroup Identification for Linear Regression ( http://arxiv.org/abs/2305.00195v1 ) ライセンス: Link先を確認	Zachary Izzo, Ruishan Liu, James Zou	(参考訳) 医学研究はしばしば、それぞれの共変量と統計的信頼度尺度による結果の関係を抽出する必要がある。これを実現するために、単純なパラメトリックモデルは頻繁に使用される(例えば線形回帰係数)が、通常はデータセット全体に適合する。しかし、共変量体が全集団に対して一様効果を持たず、従って統一された単純なモデルが異種信号を見逃すことはよくある。例えば、線形モデルはデータのサブセットを説明することができるが、データの非線形性と不均一性のために残りの部分で失敗することがある。本稿では,データ中の部分群を特徴とラベル間の一様線形関係で効果的に識別するデータ駆動手法であるddgroup(data-driven group discovery)を提案する。 DDGroupは線形モデルが保持されるであろう解釈可能な領域を出力する。簡単に実装でき、計算処理も可能である。理論的には, 十分なサンプルを与えられたddgroupは, 低分散の1つの線形モデルが十分に特定された領域を回復し, 実世界の医療データセット実験により, 局所線形モデルの性能が向上した領域を発見できることを確認した。実験の結果,DDGroupはデータセット全体にパラメトリックなアプローチを適用するだけで,質的に異なる関係を持つサブグループを発見できることがわかった。 Medical studies frequently require to extract the relationship between each covariate and the outcome with statistical confidence measures. To do this, simple parametric models are frequently used (e.g. coefficients of linear regression) but usually fitted on the whole dataset. However, it is common that the covariates may not have a uniform effect over the whole population and thus a unified simple model can miss the heterogeneous signal. For example, a linear model may be able to explain a subset of the data but fail on the rest due to the nonlinearity and heterogeneity in the data. In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. DDGroup outputs an interpretable region in which the linear model is expected to hold. It is simple to implement and computationally tractable for use. We show theoretically that, given a large enough sample, DDGroup recovers a region where a single linear model with low variance is well-specified (if one exists), and experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. Our experiments also show that DDGroup can uncover subgroups with qualitatively different relationships which are missed by simply applying parametric approaches to the whole dataset.	翻訳日:2023-05-02 16:42:14 公開日:2023-04-29
# 領域からポイントへの探索:セマンティック・ジオメトリ複合機能マッチングのための階層的フレームワーク Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching ( http://arxiv.org/abs/2305.00194v1 ) ライセンス: Link先を確認	Yesheng Zhang, Xu Zhao, Dahong Qian	(参考訳) 特徴マッチングはコンピュータビジョンにおいて重要な技術である。本質的には、画像間の対応を確立するための探索問題と見なすことができる。このタスクにおける重要な課題は、明確に定義された検索空間の欠如であり、現在のメソッドの不正確なポイントマッチングにつながる。本稿では,適切なマッチング検索空間を求めて,まず画像間の意味的領域マッチング(a2pm)を探索し,次に領域マッチングを行う階層的特徴マッチングフレームワークを提案する。 A2PMフレームワークの適切な検索空間は、最先端のTransformerベースのマッチング手法の精度の制限を緩和する。この枠組みを実現するために、画像間の正確な領域マッチングを確立するために、意味的前後整合性と幾何学的一貫性を利用した意味的・幾何学的領域マッチング(sgam)手法を提案する。 SGAMとオフザシェルトランスフォーマーベースのマーカを組み合わせることで,A2PMフレームワークを取り入れた特徴マッチング手法により,大規模点マッチングの精度向上と,現在の美術品のポーズ推定実験を実現する。 Feature matching is a crucial technique in computer vision. Essentially, it can be considered as a searching problem to establish correspondences between images. The key challenge in this task lies in the lack of a well-defined search space, leading to inaccurate point matching of current methods. In pursuit of a reasonable matching search space, this paper introduces a hierarchical feature matching framework: Area to Point Matching (A2PM), to first find semantic area matches between images, and then perform point matching on area matches, thus setting the search space as the area matches with salient features to achieve high matching precision. This proper search space of A2PM framework also alleviates the accuracy limitation in state-of-the-art Transformer-based matching methods. To realize this framework, we further propose Semantic and Geometry Area Matching (SGAM) method, which utilizes semantic prior and geometry consistency to establish accurate area matches between images. By integrating SGAM with off-the-shelf Transformer-based matchers, our feature matching methods, adopting the A2PM framework, achieve encouraging precision improvements in massive point matching and pose estimation experiments for present arts.	翻訳日:2023-05-02 16:41:51 公開日:2023-04-29
# 血管構造の異常観察のためのリアルタイム表面静脈イメージングシステム Real-Time Superficial Vein Imaging System for Observing Abnormalities on Vascular Structures ( http://arxiv.org/abs/2305.00189v1 ) ライセンス: Link先を確認	Ayse Altay, Abdurrahman Gumus	(参考訳) 循環系異常は疾患や組織障害の指標である。血管異常の早期発見は治療中に重要な役割を担い、また患者の意識を高める可能性がある。血管画像の現在の検出方法は、高価で侵襲的で、主に放射線によるものである。本研究では,近赤外(NIR)表面血管イメージング装置として,低コストでポータブルなマイクロコンピュータベースのツールを開発した。デバイスは850nmのnir発光ダイオード(led)光と他の電子部品と光学部品を使用する。非接触で安全な赤外線イメージング(IR)をリアルタイムで行う。画像および映像解析は、主にコンピュータビジョンで使用されるプログラミング関数のライブラリであるopencv(open-source computer vision)を用いて行われる。撮像システムを最適化し、適切な外部環境を構築するために様々な試験が行われた。血液中のグルコース濃度の上昇による変形の可能性から血管構造に異常があると思われる3人の糖尿病ボランティアの画像を,非糖尿病ボランティアの2人の画像と比較した。その結果, 表面血管構造においてtortuosityが良好に観察され, 基礎的理由を理解するためには, 現場の医療専門家による解釈が必要である。本研究は, 工学的な研究であり, 疾患を診断する意図はないが, 血管構造の早期診断, 治療フォローアップにおいて医療従事者を支援し, さらなる機会を期待できる。 Circulatory system abnormalities might be an indicator of diseases or tissue damage. Early detection of vascular abnormalities might have an important role during treatment and also raise the patient's awarenes. Current detection methods for vascular imaging are high-cost, invasive, and mostly radiation-based. In this study, a low-cost and portable microcomputer-based tool has been developed as a near-infrared (NIR) superficial vascular imaging device. The device uses NIR light-emitting diode (LED) light at 850 nm along with other electronic and optical components. It operates as a non-contact and safe infrared (IR) imaging method in real-time. Image and video analysis are carried out using OpenCV (Open-Source Computer Vision), a library of programming functions mainly used in computer vision. Various tests were carried out to optimize the imaging system and set up a suitable external environment. To test the performance of the device, the images taken from three diabetic volunteers, who are expected to have abnormalities in the vascular structure due to the possibility of deformation caused by high glucose levels in the blood, were compared with the images taken from two non-diabetic volunteers. As a result, tortuosity was observed successfully in the superficial vascular structures, where the results need to be interpreted by the medical experts in the field to understand the underlying reasons. Although this study is an engineering study and does not have an intention to diagnose any diseases, the developed system here might assist healthcare personnel in early diagnosis and treatment follow-up for vascular structures and may enable further opportunities.	翻訳日:2023-05-02 16:41:33 公開日:2023-04-29
# 整数線形計画法の局所探索 Local Search for Integer Linear Programming ( http://arxiv.org/abs/2305.00188v1 ) ライセンス: Link先を確認	Peng Lin, Shaowei Cai, Mengchuan Zou, Jinkun Lin	(参考訳) 整数線形プログラミングは、様々な実用的な組合せ最適化問題をモデル化し、産業や管理分野に大きな影響を与えている。本研究では,大規模不均一問題データセット上で検証可能な一般整数線形計画のための,最初の単独局所探索ソルバを開発した。本研究では,検索モード,改善モード,復元モードの3つのモードに切り替えるローカル検索フレームワークを提案する。探索・復元モードについては,制約を厳格にしようとする変数の値を適応的に修正する,tight moveという演算子を提案する。改良モードでは, 有効性を維持しつつ, 目的関数の品質向上を図るために, 効率的な昇降動作が提案されている。これらを組み合わせることで、ローカルILPと呼ばれる整数線形プログラミングのための局所探索解法を開発する。 MIPLIBデータセットで行った実験は,大規模ハード整数線形計画問題の解法の有効性を合理的に短時間で示すものである。ローカルILPは最先端の商用ソルバであるGurobiと競合し相補的であり、最先端の非商用ソルバSCIPを著しく上回っている。さらに,6つのMIPLIBオープンインスタンスの新たなレコードを確立する。 Integer linear programming models a wide range of practical combinatorial optimization problems and has significant impacts in industry and management sectors. This work develops the first standalone local search solver for general integer linear programming validated on a large heterogeneous problem dataset. We propose a local search framework that switches in three modes, namely Search, Improve, and Restore modes, and design tailored operators adapted to different modes, thus improve the quality of the current solution according to different situations. For the Search and Restore modes, we propose an operator named tight move, which adaptively modifies variables' values trying to make some constraint tight. For the Improve mode, an efficient operator lift move is proposed to improve the quality of the objective function while maintaining feasibility. Putting these together, we develop a local search solver for integer linear programming called Local-ILP. Experiments conducted on the MIPLIB dataset show the effectiveness of our solver in solving large-scale hard integer linear programming problems within a reasonably short time. Local-ILP is competitive and complementary to the state-of-the-art commercial solver Gurobi and significantly outperforms the state-of-the-art non-commercial solver SCIP. Moreover, our solver establishes new records for 6 MIPLIB open instances.	翻訳日:2023-05-02 16:41:10 公開日:2023-04-29
# 欧州の報道機関「Covid-19no-Vax運動」の調査:NLPフレームワーク Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework ( http://arxiv.org/abs/2305.00182v1 ) ライセンス: Link先を確認	David Alonso del Barrio and Daniel Gatica-Perez	(参考訳) 本稿は、欧州の報道機関がコビッドウイルスワクチンに対するノバックス反応と、この動きに関連する偽情報と偽情報にどう対処したかを検討する。 2020-2021年の22ヶ月にわたる反ワクチン運動に関する19のヨーロッパの新聞の1786の記事のキュレーションデータセットを用いて、トピックモデリング、感情分析、単語埋め込みとの意味関係、政治的分析、名前付きエンティティ認識、意味ネットワークといった自然言語処理技術を用いて、欧州伝統的メディアの偽情報エコシステムにおける特定の役割を理解した。この多角的分析の結果、ヨーロッパの報道機関は、主にソーシャルメディアに広がる様々なホックスに積極的に反対し、新聞の政治的指向に関係なく、反バックスの傾向に批判的であった。これは、偽情報生態系における高品質プレスの役割を研究することの意義を裏付けるものである。 This paper examines how the European press dealt with the no-vax reactions against the Covid-19 vaccine and the dis- and misinformation associated with this movement. Using a curated dataset of 1786 articles from 19 European newspapers on the anti-vaccine movement over a period of 22 months in 2020-2021, we used Natural Language Processing techniques including topic modeling, sentiment analysis, semantic relationship with word embeddings, political analysis, named entity recognition, and semantic networks, to understand the specific role of the European traditional press in the disinformation ecosystem. The results of this multi-angle analysis demonstrate that the European well-established press actively opposed a variety of hoaxes mainly spread on social media, and was critical of the anti-vax trend, regardless of the political orientation of the newspaper. This confirms the relevance of studying the role of high-quality press in the disinformation ecosystem.	翻訳日:2023-05-02 16:40:48 公開日:2023-04-29
# TAPE:時間的注意に基づく確率的人間のポーズと形状推定 TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation ( http://arxiv.org/abs/2305.00181v1 ) ライセンス: Link先を確認	Nikolaos Vasilikopoulos, Nikos Kolotouros, Aggeliki Tsoli, Antonis Argyros	(参考訳) モノクロビデオから3Dのポーズと形状を再構築することは、よく研究されているが難しい問題だ。一般的な課題として、オクルージョン、2Dから3Dマッピングにおける固有の曖昧さ、ビデオ処理の計算複雑性などがある。既存の手法では復元のあいまいさを無視し、3Dポーズの1つの決定論的推定を提供する。これらの問題に対処するため、RGBビデオで動作する時間的注意に基づく確率的人間のポーズと形状推定法(TAPE)を提案する。具体的には,注意に基づくニューラルネットワークを用いて映像フレームを時間的特徴にエンコードするニューラルネットワークを提案する。これらの特徴を考慮し、正規化フローを用いた人間のポーズに対するフレーム単位の時間的インフォームド確率分布を出力する。テープは標準ベンチマークで最先端の手法よりも優れており、最適化に基づく人間のポーズや形状推定に有効なビデオベースプリエントとして機能する。 https: //github.com/nikosvasilik/TAPE Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE	翻訳日:2023-05-02 16:40:29 公開日:2023-04-29
# 広範学習システムに基づく実時間マルチモード障害診断手法の提案 An Evidential Real-Time Multi-Mode Fault Diagnosis Approach Based on Broad Learning System ( http://arxiv.org/abs/2305.00169v1 ) ライセンス: Link先を確認	Chen Li and Zeyi Liu and Limin Wang and Minyue Li and Xiao He	(参考訳) 故障診断は、非ゲージ、マルチモード、センタードリフト特性を示す多様な動作条件により、業界で重要な研究領域である。現在、データ駆動アプローチはこの分野で主に注目されているが、連続的な障害分類や障害分類器のパラメータ更新、特に複数の運用モードやリアルタイム設定といった課題を提起している。したがって, 産業システムにおけるリアルタイムマルチモード故障診断の実現が課題である。本稿では,エビデンス推論(er)アルゴリズムを用いて,異なるベース分類器からの情報を融合し,出力をマージする新しい手法を提案する。これらのベース分類器を広範学習システム(bls)を用いて開発し、故障診断性能を向上させる。さらに,本手法では,モデルパラメータをリアルタイムで更新するために擬似ラベル学習法を用いる。提案手法の有効性を実証するため,マルチモードのテネシー・イーストマンプロセスデータセットを用いて実験を行った。 Fault diagnosis is a crucial area of research in the industry due to diverse operating conditions that exhibit non-Gaussian, multi-mode, and center-drift characteristics. Currently, data-driven approaches are the main focus in the field, but they pose challenges for continuous fault classification and parameter updates of fault classifiers, particularly in multiple operating modes and real-time settings. Therefore, a pressing issue is to achieve real-time multi-mode fault diagnosis for industrial systems. To address this problem, this paper proposes a novel approach that utilizes an evidence reasoning (ER) algorithm to fuse information and merge outputs from different base classifiers. These base classifiers are developed using a broad learning system (BLS) to improve good fault diagnosis performance. Moreover, in this approach, the pseudo-label learning method is employed to update model parameters in real-time. To demonstrate the effectiveness of the proposed approach, we perform experiments using the multi-mode Tennessee Eastman process dataset.	翻訳日:2023-05-02 16:40:13 公開日:2023-04-29
# RRAMと人工知能のための酸化物層としての金属酸化物の複合 The Combination of Metal Oxides as Oxide Layers for RRAM and Artificial Intelligence ( http://arxiv.org/abs/2305.00166v1 ) ライセンス: Link先を確認	Sun Hanyu	(参考訳) 抵抗性ランダムアクセスメモリ(RRAM)は、高速、低消費電力、スケーラビリティに優れた次世代メモリデバイスにとって有望な候補である。金属酸化物は、高い誘電率と安定性のため、RRAM装置の酸化物層として一般的に用いられる。しかし、RRAMデバイスの性能をさらに向上させるため、最近の研究は人工知能(AI)の統合に焦点を当てている。 AIはRRAMデバイスのパフォーマンスの最適化に使用することができ、RRAMはハードウェアアクセラレータやニューロモルフィックコンピューティングでAIを駆動することもできる。本稿では,金属酸化物をベースとしたRRAMとAIの組み合わせについて概説する。我々は、RRAMデバイスの性能向上のためのAIの使用と、AIを駆動するRRAMの使用について論じる。さらに、この分野の重要な課題に取り組み、今後の研究方向性に関する洞察を提供する。 Resistive random-access memory (RRAM) is a promising candidate for next-generation memory devices due to its high speed, low power consumption, and excellent scalability. Metal oxides are commonly used as the oxide layer in RRAM devices due to their high dielectric constant and stability. However, to further improve the performance of RRAM devices, recent research has focused on integrating artificial intelligence (AI). AI can be used to optimize the performance of RRAM devices, while RRAM can also power AI as a hardware accelerator and in neuromorphic computing. This review paper provides an overview of the combination of metal oxides-based RRAM and AI, highlighting recent advances in these two directions. We discuss the use of AI to improve the performance of RRAM devices and the use of RRAM to power AI. Additionally, we address key challenges in the field and provide insights into future research directions	翻訳日:2023-05-02 16:39:56 公開日:2023-04-29
# ビデオスーパーリゾリューションのための暗黙のアライメント An Implicit Alignment for Video Super-Resolution ( http://arxiv.org/abs/2305.00163v1 ) ライセンス: Link先を確認	Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao	(参考訳) ビデオのスーパーレゾリューションは通常、時間とともに情報の伝播をサポートするためにフレームアライメントを使用する。アライメントの役割は、ビデオの低レベルエンハンスメントのためによく研究されているが、既存の作品が重要なステップである再サンプリングを見落としている。フレーム間の動作を補償する方法に関わらず、フローベースのワーピングや変形可能な畳み込み/アテンションなど、ほとんどの作業では、再サンプリングにバイリニア補間(bilinear interpolation)のデフォルト選択を使用する。しかし、双線形補間はローパスフィルタとして効果的に機能し、超解像のために高周波コンテンツを回復する目的を阻害する。本稿では,ビデオ高分解能アライメントにおける再サンプリングの影響について検討する。大規模な実験により、アライメントを効果的にするためには、再サンプリングは特徴の本来の鋭さを保ち、歪みを防ぐ必要があることが判明した。そこで,本研究では,正弦波位置符号化により符号化されたサンプリング位置をウィンドウベースのクロスアテンションで再サンプリングする暗黙的アライメント手法を提案する。再サンプリングは学習したネットワーク重みによって暗黙的に計算される。実験によると、提案された暗黙のアライメントは、合成データセットと実世界のデータセットの両方に最小限の影響で、最先端フレームワークのパフォーマンスを向上させる。 Video super-resolution commonly uses a frame-wise alignment to support the propagation of information over time. The role of alignment is well-studied for low-level enhancement in video, but existing works have overlooked one critical step -- re-sampling. Most works, regardless of how they compensate for motion between frames, be it flow-based warping or deformable convolution/attention, use the default choice of bilinear interpolation for re-sampling. However, bilinear interpolation acts effectively as a low-pass filter and thus hinders the aim of recovering high-frequency content for super-resolution. This paper studies the impact of re-sampling on alignment for video super-resolution. Extensive experiments reveal that for alignment to be effective, the re-sampling should preserve the original sharpness of the features and prevent distortions. From these observations, we propose an implicit alignment method that re-samples through a window-based cross-attention with sampling positions encoded by sinusoidal positional encoding. The re-sampling is implicitly computed by learned network weights. Experiments show that the proposed implicit alignment enhances the performance of state-of-the-art frameworks with minimal impact on both synthetic and real-world datasets.	翻訳日:2023-05-02 16:39:42 公開日:2023-04-29
# beyond prediction:不均一グラフに基づくリストワイズランキングを用いた路上駐車推薦 Beyond Prediction: On-street Parking Recommendation using Heterogeneous Graph-based List-wise Ranking ( http://arxiv.org/abs/2305.00162v1 ) ライセンス: Link先を確認	Hanyu Sun, Xiao Huang, Wei Ma	(参考訳) リアルタイムの駐車情報を提供するため、既存の研究は、ドライバーの走行時間を節約するための間接的なアプローチであるパーキング可用性の予測に重点を置いている。本稿では,運転者に直接駐車スペースを推薦するために,路上駐車推奨(opr)タスクを初めて提案する。この目的のために、OPR-LTRと呼ばれるLearning-to-rank(LTR)ベースのOPRモデルを構築している。具体的には、駐車勧告は、各駐車空間の「転倒イベント」と密接に関連しているため、ESGraphと呼ばれる高効率な異種グラフを設計し、歴史的かつリアルタイムなメータの転倒イベントと地理的関係を表現し、その後、畳み込みに基づくイベント列グラフネットワークを用いて異種グラフの表現を集約・更新する。ランキングモデルはさらに、特定の路上駐車クエリに対してランク付けされた駐車スポットのリストを推奨するスコア関数を学習するために利用される。この方法は、香港とサンフランシスコの路上駐車メーターデータを用いて検証される。予測のみと予測を推奨する2種類の手法を比較することにより,提案手法は様々な指標において良好な性能を実現する。大規模な実験により、提案したESGraphとレコメンデーションモデルは、計算効率の面でより効率的であり、ドライバーの路上駐車時間を節約できることを示した。 To provide real-time parking information, existing studies focus on predicting parking availability, which seems an indirect approach to saving drivers' cruising time. In this paper, we first time propose an on-street parking recommendation (OPR) task to directly recommend a parking space for a driver. To this end, a learn-to-rank (LTR) based OPR model called OPR-LTR is built. Specifically, parking recommendation is closely related to the "turnover events" (state switching between occupied and vacant) of each parking space, and hence we design a highly efficient heterogeneous graph called ESGraph to represent historical and real-time meters' turnover events as well as geographical relations; afterward, a convolution-based event-then-graph network is used to aggregate and update representations of the heterogeneous graph. A ranking model is further utilized to learn a score function that helps recommend a list of ranked parking spots for a specific on-street parking query. The method is verified using the on-street parking meter data in Hong Kong and San Francisco. By comparing with the other two types of methods: prediction-only and prediction-then-recommendation, the proposed direct-recommendation method achieves satisfactory performance in different metrics. Extensive experiments also demonstrate that the proposed ESGraph and the recommendation model are more efficient in terms of computational efficiency as well as saving drivers' on-street parking time.	翻訳日:2023-05-02 16:39:20 公開日:2023-04-29
# LiDAR投影画像によるセンサの等価性 Sensor Equivariance by LiDAR Projection Images ( http://arxiv.org/abs/2305.00221v1 ) ライセンス: Link先を確認	Hannes Reichert, Manuel Hetzel, Steven Schreck, Konrad Doll, and Bernhard Sick	(参考訳) 本研究では,関連した投影特性を符号化した追加チャネルによる従来の画像データの拡張を提案する。このことは、LiDARのような射影型センサーにおけるセンサ依存のオブジェクト表現の問題に対処し、センサの解像度や視野の変化による物理的および幾何学的性質の歪みを引き起こす可能性がある。そこで我々は,このデータをインスタンスセグメンテーションフレームワークで処理するためのアーキテクチャを提案する。我々は、機械ビジョンタスクと高度自動運転(HAD)のためのキーセンサーモダリティとして、特にLiDARに焦点を当てる。制御された合成環境における実験的な設定により,センサ解像度と視野のバイアスを同定し,提案手法がlidarインスタンスのセグメンテーションにおけるバイアスを低減できることを実証する。さらに,カメラなどの他の投影型センサにも適用可能な手法を定義した。透明性を促進するため、コードとデータセットを公開しています。本手法は,プロジェクションベースセンサを用いた各種マシンビジョンタスクの性能向上とロバスト性向上の可能性を示す。 In this work, we propose an extension of conventional image data by an additional channel in which the associated projection properties are encoded. This addresses the issue of sensor-dependent object representation in projection-based sensors, such as LiDAR, which can lead to distorted physical and geometric properties due to variations in sensor resolution and field of view. To that end, we propose an architecture for processing this data in an instance segmentation framework. We focus specifically on LiDAR as a key sensor modality for machine vision tasks and highly automated driving (HAD). Through an experimental setup in a controlled synthetic environment, we identify a bias on sensor resolution and field of view and demonstrate that our proposed method can reduce said bias for the task of LiDAR instance segmentation. Furthermore, we define our method such that it can be applied to other projection-based sensors, such as cameras. To promote transparency, we make our code and dataset publicly available. This method shows the potential to improve performance and robustness in various machine vision tasks that utilize projection-based sensors.	翻訳日:2023-05-02 16:32:17 公開日:2023-04-29
# リラクシド強制選択は視覚品質評価法の性能を向上させる Relaxed forced choice improves performance of visual quality assessment methods ( http://arxiv.org/abs/2305.00220v1 ) ライセンス: Link先を確認	Mohsen Jenadeleh, Johannes Zagermann, Harald Reiterer, Ulf-Dietrich Reips, Raouf Hamzaoui, Dietmar Saupe	(参考訳) 画像品質評価において、多数の被験者の個人評価から画像又は映像の集合的視覚品質スコアを得る。これらの実験でよく使われる形式は、2つの代替的な強制選択法である。同じ内容だが視覚品質の異なる2つの刺激を順次または並べて提示する。被験者は、より良い品質の1つを選択するように求められ、不確かでない場合は、推測する必要がある。緩和された代替選択フォーマットは、第3の応答オプション、すなわち‘not sure’を提供することによって、推測による認知的負荷と応答のノイズを低減することを目的としている。この研究は、これらの2つのレスポンスフォーマットを比較するために、大規模で包括的なクラウドソーシング実験を提示している。品質評価のための曖昧な基礎的真理を提供するため、被験者は点数が異なる画像のペアを示し、より多くの点を持つものを選ぶように毎回要求した。クラウドソーシング研究には254人の参加者が参加し,イントラサブジェクトデザインを用いて実施した。各被験者は,「不確実」反応オプションの有無と40対比較の回答を求められ,各テスト条件に対する認知負荷を評価するためのアンケートを完了した。実験結果から,強制選択法に `<not sure'' 応答オプションを組み込むことで,心理的負荷が減少し,データ適合性が向上し,真理に対応するモデルが得られた。また、モデルの等価性をテストした結果、それらが異なることがわかった。データセットはhttp://database.mmsp-kn.de/cogvqa-database.htmlで利用可能である。 In image quality assessment, a collective visual quality score for an image or video is obtained from the individual ratings of many subjects. One commonly used format for these experiments is the two-alternative forced choice method. Two stimuli with the same content but differing visual quality are presented sequentially or side-by-side. Subjects are asked to select the one of better quality, and when uncertain, they are required to guess. The relaxed alternative forced choice format aims to reduce the cognitive load and the noise in the responses due to the guessing by providing a third response option, namely, ``not sure''. This work presents a large and comprehensive crowdsourcing experiment to compare these two response formats: the one with the ``not sure'' option and the one without it. To provide unambiguous ground truth for quality evaluation, subjects were shown pairs of images with differing numbers of dots and asked each time to choose the one with more dots. Our crowdsourcing study involved 254 participants and was conducted using a within-subject design. Each participant was asked to respond to 40 pair comparisons with and without the ``not sure'' response option and completed a questionnaire to evaluate their cognitive load for each testing condition. The experimental results show that the inclusion of the ``not sure'' response option in the forced choice method reduced mental load and led to models with better data fit and correspondence to ground truth. We also tested for the equivalence of the models and found that they were different. The dataset is available at http://database.mmsp-kn.de/cogvqa-database.html.	翻訳日:2023-05-02 16:32:00 公開日:2023-04-29
# 非ネイティブ話者の割合が言語複雑性に与える影響の証拠はまだない -- Kauhanen, Einhaus & Walkden (2023)に対する回答 Still no evidence for an effect of the proportion of non-native speakers on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023) ( http://arxiv.org/abs/2305.00217v1 ) ライセンス: Link先を確認	Alexander Koplenig	(参考訳) Journal of Language Evolutionに掲載された最近の論文で、Kauhanen, Einhaus & Walkden (https://doi.org/10.1093/jole/lzad005, KEW)は、私の論文の1つ(Koplenig, Royal Society Open Science 6, 181274 (2019), https://doi.org/10.1098/rsos.181274)で示された結果に異議を唱えました。この目的のために、Ethnologueが言語ステータスを評価する方法に注目します。L1(第一言語)話者が使用することに加えて、かなりの数のL2ユーザを持つ必要がある場合、言語はvehicularとして特徴づけられます。 KEWは、言語がかなりの数のL2ユーザを持つかどうかを示す(バイナリ)指標として、そしてその比率の直接推定が不可能なときに、L2話者の0パーセントを非車種言語に出力するという考え方の両方を批判している。出版後論評の重要性は認識していますが,本論では両論点が明記され,私の論文で分析されていることを示します。さらに、KEWが提起した他の点についてもコメントし、KEWが提供する代替分析も、より精査に至らないことを実証します。 In a recent paper published in the Journal of Language Evolution, Kauhanen, Einhaus & Walkden (https://doi.org/10.1093/jole/lzad005, KEW) challenge the results presented in one of my papers (Koplenig, Royal Society Open Science, 6, 181274 (2019), https://doi.org/10.1098/rsos.181274), in which I tried to show through a series of statistical analyses that large numbers of L2 (second language) speakers do not seem to affect the (grammatical or statistical) complexity of a language. To this end, I focus on the way in which the Ethnologue assesses language status: a language is characterised as vehicular if, in addition to being used by L1 (first language) speakers, it should also have a significant number of L2 users. KEW criticise both the use of vehicularity as a (binary) indicator of whether a language has a significant number of L2 users and the idea of imputing a zero proportion of L2 speakers to non-vehicular languages whenever a direct estimate of that proportion is unavailable. While I recognise the importance of post-publication commentary on published research, I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper. In addition, I also comment on other points raised by KEW and demonstrate that both alternative analyses offered by KEW do not stand up to closer scrutiny.	翻訳日:2023-05-02 16:31:35 公開日:2023-04-29
# 実時間交流/dcパワーフロー解析のための物理誘導グラフニューラルネットワーク Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis ( http://arxiv.org/abs/2305.00216v1 ) ライセンス: Link先を確認	Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding	(参考訳) 交流電流と直流(AC/DC)ハイブリッドシステムの増大は、これまで以上に高速な電力フロー解析ツールを必要とする。本稿では,物理誘導型グラフニューラルネットワーク(PG-GNN)を提案する。 PG-GNNのトポロジ適応性を高めるために,まずACグリッドとDCグリッドの調整グラフモデリングを行う。データから信頼性の低いエミュレーションを推定するために、AC/DC物理は二重性を用いてPG-GNNに埋め込まれる。拡張されたラグランジアン法に基づく学習スキームが提示され、PG-GNNが非凸パターンを教師なしラベルフリーで学習するのに役立つ。マルチPG-GNNは、最終的に様々なDC制御モードをマスターするために実行される。ケーススタディでは、他の7つのデータ駆動型ライバルと比較して、提案手法はモデルベースベンチマークの性能と一致し、計算効率も10倍以上に向上している。 The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data, AC/DC physics are embedded in the PG-GNN using duality. Augmented Lagrangian method-based learning scheme is then presented to help the PG-GNN better learn nonconvex patterns in an unsupervised label-free manner. Multi-PG-GNN is finally conducted to master varied DC control modes. Case study shows that, relative to the other 7 data-driven rivals, only the proposed method matches the performance of the model-based benchmark, also beats it in computational efficiency beyond 10 times.	翻訳日:2023-05-02 16:30:55 公開日:2023-04-29
# EBLIME: ベイズ局所解釈型モデル非依存的説明 EBLIME: Enhanced Bayesian Local Interpretable Model-agnostic Explanations ( http://arxiv.org/abs/2305.00213v1 ) ライセンス: Link先を確認	Yuhao Zhong, Anirban Bhattacharya, Satish Bukkapatnam	(参考訳) ブラックボックス機械学習モデルの説明とベイジアンリッジ回帰モデルを用いた特徴量の分布を求めるため,EBLIMEを提案する。ベイズフレームワークの数学的表現とリッジパラメータの意義を含む理論的結果を提供する。ケーススタディは、ベンチマークデータセットと、製造製品の内部欠陥を見つけるための実世界の工業的応用に基づいて行われた。最新の手法と比較して、eblimeはより直感的で正確な結果を得ることができ、後方分布、信頼できる間隔、特徴重要性のランキングといった点でより不確実性が定量化される。 We propose EBLIME to explain black-box machine learning models and obtain the distribution of feature importance using Bayesian ridge regression models. We provide mathematical expressions of the Bayesian framework and theoretical outcomes including the significance of ridge parameter. Case studies were conducted on benchmark datasets and a real-world industrial application of locating internal defects in manufactured products. Compared to the state-of-the-art methods, EBLIME yields more intuitive and accurate results, with better uncertainty quantification in terms of deriving the posterior distribution, credible intervals, and rankings of the feature importance.	翻訳日:2023-05-02 16:30:40 公開日:2023-04-29
# ShipHullGAN:Deep Convolutional Generative Modelを用いた船体設計のための汎用パラメトリックモデル ShipHullGAN: A generic parametric modeller for ship hull design using deep convolutional generative model ( http://arxiv.org/abs/2305.00210v1 ) ライセンス: Link先を確認	Shahroz Khan, Kosa Goucher-Lambert, Konstantinos Kostas, Panagiotis Kaklis	(参考訳) 本稿では,船殻の汎用表現と生成のために,深部畳み込み生成逆数ネットワーク(GAN)を用いて構築された汎用パラメトリック・モデルラーであるShipHullGANを紹介する。高いレベルでは、新しいモデルはパラメトリックな船の設計パラダイムにおける現在の保守性に対処することを目的としており、パラメトリックなモデラーは特定の船種しか扱えない。 shiphullganを52,591 \textit{physically validated}の設計で訓練し、コンテナ船、タンカー、ばら積み貨物船、タグボート、乗組員の補給船など、さまざまな船種から設計した。我々は、全てのトレーニングデザインを同じ解像度の共通幾何学的表現に変換するための新しい形状抽出と表現戦略を開発した。スペース充填層はジェネレータコンポーネントの直後に置かれ、トレーニングされたジェネレータがすべての設計クラスをカバーできることを保証する。トレーニング中の設計は、幾何学的モーメントを用いてコンパクトな幾何学的表現を利用する形状変化テンソル(SST)の形で提供される。我々は,ShipHullGANが拡張された特徴を持つデザインを生成できるという広範な研究と最適化事例を通じて,幾何学的に有効かつ実用的な形状の伝統的かつ斬新なデザインを創出する多目的デザイン空間を提示した。 In this work, we introduce ShipHullGAN, a generic parametric modeller built using deep convolutional generative adversarial networks (GANs) for the versatile representation and generation of ship hulls. At a high level, the new model intends to address the current conservatism in the parametric ship design paradigm, where parametric modellers can only handle a particular ship type. We trained ShipHullGAN on a large dataset of 52,591 \textit{physically validated} designs from a wide range of existing ship types, including container ships, tankers, bulk carriers, tugboats, and crew supply vessels. We developed a new shape extraction and representation strategy to convert all training designs into a common geometric representation of the same resolution, as typically GANs can only accept vectors of fixed dimension as input. A space-filling layer is placed right after the generator component to ensure that the trained generator can cover all design classes. During training, designs are provided in the form of a shape-signature tensor (SST) which harnesses the compact geometric representation using geometric moments that further enable the inexpensive incorporation of physics-informed elements in ship design. We have shown through extensive comparative studies and optimisation cases that ShipHullGAN can generate designs with augmented features resulting in versatile design spaces that produce traditional and novel designs with geometrically valid and practically feasible shapes.	翻訳日:2023-05-02 16:30:29 公開日:2023-04-29
# bi-rnnネットワークを用いた高移動度通信における深層学習に基づくチャネル推定 Deep Learning Based Channel Estimation in High Mobility Communications Using Bi-RNN Networks ( http://arxiv.org/abs/2305.00208v1 ) ライセンス: Link先を確認	Abdul Karim Gizzini, Marwa Chafii	(参考訳) 二重選択チャネル推定は、無線システムにおける通信信頼性を保証する重要な要素である。動的環境におけるマルチパス伝搬とドップラー干渉の影響により,2重選択チャネル推定が困難となる。従来のチャネル推定手法は、限られた訓練パイロットの使用により、高移動度シナリオにおける性能劣化に遭遇する。近年,畳み込みニューラルネットワーク(CNN)ネットワークを用いたフレーム・バイ・フレーム(FBF)チャネル推定において,深層学習(DL)を二重選択チャネル推定に利用している。しかし、cnnベースの推定器は高い複雑さを必要とし、実際のシナリオでは実用的でない。この目的のために,2重選択チャネルを正確に推定する最適化された双方向リカレントニューラルネットワーク (Bi-RNN) を用いたチャネル推定器を提案することにより,この問題を克服する。提案手法は,ゲートリカレントユニット(GRU)ユニットを用いてエンドツーエンドの補間を行う。広範な数値実験により、開発されたbi-gru推定器は、最近提案されたcnnベースの推定器を異なる移動シナリオで大幅に上回っていることが示され、計算の複雑さは大幅に減少する。 Doubly-selective channel estimation represents a key element in ensuring communication reliability in wireless systems. Due to the impact of multi-path propagation and Doppler interference in dynamic environments, doubly-selective channel estimation becomes challenging. Conventional channel estimation schemes encounter performance degradation in high mobility scenarios due to the usage of limited training pilots. Recently, deep learning (DL) has been utilized for doubly-selective channel estimation, where convolutional neural network (CNN) networks are employed in the frame-by-frame (FBF) channel estimation. However, CNN-based estimators require high complexity, making them impractical in real-case scenarios. For this purpose, we overcome this issue by proposing an optimized and robust bi-directional recurrent neural network (Bi-RNN) based channel estimator to accurately estimate the doubly-selective channel, especially in high mobility scenarios. The proposed estimator is based on performing end-to-end interpolation using gated recurrent unit (GRU) unit. Extensive numerical experiments demonstrate that the developed Bi-GRU estimator significantly outperforms the recently proposed CNN-based estimators in different mobility scenarios, while substantially reducing the overall computational complexity.	翻訳日:2023-05-02 16:30:01 公開日:2023-04-29
# CARLA-BSP:歩行者によるシミュレーションデータセット CARLA-BSP: a simulated dataset with pedestrians ( http://arxiv.org/abs/2305.00204v1 ) ライセンス: Link先を確認	Maciej Wielgosz and Antonio M. L\'opez and Muhammad Naveed Riaz	(参考訳) 本稿では,CARLA (0.9.13) で新たにデータセットを生成するARCANEフレームワークを用いて,歩行者を特徴付けるサンプルデータセットを提案する。歩行者検出,自動符号化,ポーズ推定,ポーズリフトのユースケースを提供する。ベースラインの結果も紹介します。詳細はhttps://project-arcane.eu/を参照。 We present a sample dataset featuring pedestrians generated using the ARCANE framework, a new framework for generating datasets in CARLA (0.9.13). We provide use cases for pedestrian detection, autoencoding, pose estimation, and pose lifting. We also showcase baseline results. For more information, visit https://project-arcane.eu/.	翻訳日:2023-05-02 16:29:41 公開日:2023-04-29
# インストラクション-ViT:ViTにおけるインストラクション学習のためのマルチモーダルプロンプト Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT ( http://arxiv.org/abs/2305.00201v1 ) ライセンス: Link先を確認	Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang	(参考訳) プロンプトは大規模言語モデルにおいて重要な役割を果たすことが証明されており、近年では複数の下流タスクのスケーラビリティ向上のためにプロンプトも使用されている。本稿では、インストラクション-ViTと呼ばれる画像分類のための視覚変換器モデルに、命令チューニングに基づくプロンプト設計を適用することに焦点を当てる。キーとなるアイデアは、カテゴリ情報に関連するマルチモーダルプロンプト(テキストまたは画像プロンプト)を実装し、モデルの微調整を導くことである。いくつかの画像キャプションタスクの実験に基づいて、性能とドメイン適応性を改善した。我々の研究は、視覚分類モデルの性能と適応性を向上したマルチモーダルプロンプトを融合する革新的な戦略を提供した。 Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt design based on instruction tuning into a visual transformer model for image classification which we called Instruction-ViT. The key idea is to implement multi-modal prompts (text or image prompt) related to category information to guide the fine-tuning of the model. Based on the experiments of several image captionining tasks, the performance and domain adaptability were improved. Our work provided an innovative strategy to fuse multi-modal prompts with better performance and faster adaptability for visual classification models.	翻訳日:2023-05-02 16:29:36 公開日:2023-04-29
# 新型コロナウイルスパンデミックによる中国の労働市場動態の大規模評価 Large-Scale Assessment of Labour Market Dynamics in China during the COVID-19 Pandemic ( http://arxiv.org/abs/2305.00199v1 ) ライセンス: Link先を確認	Ying Sun, Hengshu Zhu, Hui Xiong	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが中国の労働市場に前例のない影響を与え、さまざまな地域での労働供給と需要の構造を大きく変えた。政策立案者は、ポストパンデミック労働市場の新たなダイナミクスを理解し、地域経済の持続可能な発展を支援する適切な政策を提供することが重要となる。そこで本稿では,大規模オンライン求人情報検索と求人情報投稿による地域労働市場の変動動態の評価と理解を目的とした,データ駆動型アプローチを提案する。特に、地域労働市場の魅力を反映した、労働の流れと労働需要の空間的・時間的パターンをモデル化する。分析の結果,地域労働市場は劇的な変化に悩まされ,パンデミック時の回復の兆候がみられた。具体的には、大都市から小都市へ、南北地方へ移住する傾向から、労働フローの意図が急速に回復した。一方、パンデミックにより、ブルーカラー労働者の需要はホワイトカラー労働者に比べて大幅に減少した。また、青カラー雇用の需要構造も製造業からサービス産業へと変化した。以上の結果から,パンデミックは労働需要の異なる地域や規制政策に様々な影響を及ぼす可能性が示唆された。この分析は、パンデミックのような極端なイベント中の雇用市場の変化に直面する個人と組織の両方にタイムリーな情報を提供する。また、地方経済の持続的な発展を促進する上で、雇用市場に対する適切な政策の提供を政府も支援できる。 The outbreak of the COVID-19 pandemic has had an unprecedented impact on China's labour market, and has largely changed the structure of labour supply and demand in different regions. It becomes critical for policy makers to understand the emerging dynamics of the post-pandemic labour market and provide the right policies for supporting the sustainable development of regional economies. To this end, in this paper, we provide a data-driven approach to assess and understand the evolving dynamics in regions' labour markets with large-scale online job search queries and job postings. In particular, we model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour markets. Our analysis shows that regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of migrating from large to small cities and from northern to southern regions, respectively. Meanwhile, due to the pandemic, the demand of blue-collar workers has been substantially reduced compared to that of white-collar workers. In addition, the demand structure of blue-collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic can cause varied impacts on regions with different structures of labour demand and control policies. This analysis provides timely information for both individuals and organizations in confronting the dynamic change in job markets during the extreme events, such as pandemics. Also, the governments can be better assisted for providing the right policies on job markets in facilitating the sustainable development of regions' economies.	翻訳日:2023-05-02 16:29:23 公開日:2023-04-29
# 逆中散乱問題に対する直接サンプリングに基づく深層学習手法 A Direct Sampling-Based Deep Learning Approach for Inverse Medium Scattering Problems ( http://arxiv.org/abs/2305.00250v1 ) ライセンス: Link先を確認	Jianfeng Ning, Fuqun Han and Jun Zou	(参考訳) 本研究では,計測された散乱データに基づいて未知の散乱器を回収することを目的とした逆媒質散乱問題(imsp)に着目する。 23]で導入された効率的な直接サンプリング法(dsm)に動機づけられ,不均質な散乱器を再構成する新しい直接サンプリング型深層学習法(dsm-dl)を提案する。特に、u-netニューラルネットワークを用いて、インデックス関数と真のコントラストの関係を学習する。提案するdsm-dlは, 計算効率が高く, 雑音に頑健であり, 実装が容易であり, 高品質な再構築を実現するために複数の計測データを自然に組み込むことができる。提案手法の性能を評価するため, 各種入射波数, 騒音レベルの異なる代表実験を行った。その結果,深層学習技術とDSM for IMSPの併用による有望なメリットが示された。 In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the relation between the index functions and the true contrasts. Our proposed DSM-DL is computationally efficient, robust to noise, easy to implement, and able to naturally incorporate multiple measured data to achieve high-quality reconstructions. Some representative tests are carried out with varying numbers of incident waves and different noise levels to evaluate the performance of the proposed method. The results demonstrate the promising benefits of combining deep learning techniques with the DSM for IMSP.	翻訳日:2023-05-02 16:23:29 公開日:2023-04-29
# 自由生活環境におけるパーキンソン震検出改善のための複数事例学習問題における非競合データの活用 Leveraging Unlabelled Data in Multiple-Instance Learning Problems for Improved Detection of Parkinsonian Tremor in Free-Living Conditions ( http://arxiv.org/abs/2305.00249v1 ) ライセンス: Link先を確認	Alexandros Papadopoulos, Anastasios Delopoulos	(参考訳) パーキンソン病とその運動症状を遠隔で検出するためのデータ駆動アプローチは、早期診断の潜在的な臨床効果のために近年普及している。このようなアプローチの聖杯は、データが日々の生活の中で継続的に無害に収集される自由生活のシナリオである。しかし, 微粒な接地構造と, 残りは邪魔にならないことが矛盾しているため, マルチスタンス学習によって問題に対処することが普通である。しかし、大規模研究では、完全な神経学的評価が必要であるため、必要な粗い地面でも得ることは自明ではない。対照的に、根拠のない大規模なデータ収集はずっと簡単です。しかし、このトピックは研究の注目をほとんど受けていないため、複数インスタンス設定での非競合データの利用は簡単ではない。本稿では,このギャップを補うために,半教師付き学習と複数インスタンス学習を組み合わせた新しい手法を提案する。本手法は,通常の半教師あり学習における最先端のアプローチである仮想適応学習原理に基づいており,複数インスタンス設定に適応し,適切な修正を行う。まず,2つのよく知られたベンチマークデータセットから生成した合成問題に対する概念実証実験により,提案手法の有効性を検証した。次に, 完全にラベルが付かないデータが存在する場合, 手動加速度信号からpd振れを検知する実際のタスクに進む。その結果,454名の被験者の非ラベルデータを利用することで,震動が知られている45名のコーホートに対して,サブジェクト毎の震動検出において,高い性能向上(最大9%のf1-score増加)が達成できることがわかった。 Data-driven approaches for remote detection of Parkinson's Disease and its motor symptoms have proliferated in recent years, owing to the potential clinical benefits of early diagnosis. The holy grail of such approaches is the free-living scenario, in which data are collected continuously and unobtrusively during every day life. However, obtaining fine-grained ground-truth and remaining unobtrusive is a contradiction and therefore, the problem is usually addressed via multiple-instance learning. Yet for large scale studies, obtaining even the necessary coarse ground-truth is not trivial, as a complete neurological evaluation is required. In contrast, large scale collection of data without any ground-truth is much easier. Nevertheless, utilizing unlabelled data in a multiple-instance setting is not straightforward, as the topic has received very little research attention. Here we try to fill this gap by introducing a new method for combining semi-supervised with multiple-instance learning. Our approach builds on the Virtual Adversarial Training principle, a state-of-the-art approach for regular semi-supervised learning, which we adapt and modify appropriately for the multiple-instance setting. We first establish the validity of the proposed approach through proof-of-concept experiments on synthetic problems generated from two well-known benchmark datasets. We then move on to the actual task of detecting PD tremor from hand acceleration signals collected in-the-wild, but in the presence of additional completely unlabelled data. We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains (up to 9% increase in F1-score) in per-subject tremor detection for a cohort of 45 subjects with known tremor ground-truth.	翻訳日:2023-05-02 16:23:13 公開日:2023-04-29
# 新しい金融時系列事例表現を用いた産業分類 Industry Classification Using a Novel Financial Time-Series Case Representation ( http://arxiv.org/abs/2305.00245v1 ) ライセンス: Link先を確認	Rian Dolphin, Barry Smyth, Ruihai Dong	(参考訳) 金融分野は、予測、クラスタリング、分類など、さまざまなタスクにまたがる、機械学習の課題の肥大した源泉であることが証明されている。研究者は大量の時系列データにアクセスでき、微妙なパフォーマンス改善さえも大きな付加価値に変換できる。本研究では,この領域における重要な課題に対するケースベース推論の活用を,業界分類における過去の株価リターン時系列データを用いて検討する。本稿では,従来のケースベース推論手法において,時系列データが重要な表象的課題を呈する理由を考察し,それに対応するために,ストックリターン埋め込みに基づく新しい表現を提案し,生のストックリターンデータから容易に計算できることを示す。この表現は、事例に基づく推論に適しており、業界セクターの分類タスクに大規模な公開データセットを使用することで、従来の表現を用いた複数のベースラインのパフォーマンス向上を実証する。 The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations.	翻訳日:2023-05-02 16:22:37 公開日:2023-04-29
# 分割部分スキャンにおける深層学習に基づく3次元歯科メッシュ分割法の限界に関する批判的解析 A Critical Analysis of the Limitation of Deep Learning based 3D Dental Mesh Segmentation Methods in Segmenting Partial Scans ( http://arxiv.org/abs/2305.00244v1 ) ライセンス: Link先を確認	Ananya Jana, Aniruddha Maiti, Dimitris N. Metaxas	(参考訳) 口腔内スキャンによる歯のセグメンテーションは歯科医療において重要な要素である。多くのDeep Learningベースの歯のセグメンテーションアルゴリズムが開発されている。ほとんどの場合、高い精度が達成されているが、利用可能な歯のセグメンテーション技術のほとんどは、全顎モデルの暗黙的な制限的な仮定をしており、全顎モデルに基づいて精度を報告している。しかし、医学的には歯の完全なスキャンは必要とせず、あるいは使用できない場合もある。この実践的な問題を考えると、現在広く使われているDeep Learningベースの歯のセグメンテーション技術の堅牢性を理解することが重要である。そこで本研究では, 部分的口腔内スキャンに利用可能なセグメント化手法を適用し, 利用可能な深層学習技術が大幅に低下していることを発見した。この研究で示された分析と比較は、問題の深刻さを理解するのに役立ち、完全な顎モデルを強く仮定することなく、頑健な歯のセグメンテーション技術の開発を可能にする。 Tooth segmentation from intraoral scans is a crucial part of digital dentistry. Many Deep Learning based tooth segmentation algorithms have been developed for this task. In most of the cases, high accuracy has been achieved, although, most of the available tooth segmentation techniques make an implicit restrictive assumption of full jaw model and they report accuracy based on full jaw models. Medically, however, in certain cases, full jaw tooth scan is not required or may not be available. Given this practical issue, it is important to understand the robustness of currently available widely used Deep Learning based tooth segmentation techniques. For this purpose, we applied available segmentation techniques on partial intraoral scans and we discovered that the available deep Learning techniques under-perform drastically. The analysis and comparison presented in this work would help us in understanding the severity of the problem and allow us to develop robust tooth segmentation technique without strong assumption of full jaw model.	翻訳日:2023-05-02 16:22:23 公開日:2023-04-29
# ディープラーニングが多面体理論を満たすとき:調査 When Deep Learning Meets Polyhedral Theory: A Survey ( http://arxiv.org/abs/2305.00241v1 ) ライセンス: Link先を確認	Joey Huchette, Gonzalo Mu\~noz, Thiago Serra, Calvin Tsay	(参考訳) 過去10年間、コンピュータビジョンや自然言語処理といったタスクにおけるディープニューラルネットワークの驚くべき精度のおかげで、ディープラーニングは予測モデリングの一般的な方法論となった。一方、ニューラルネットワークの構造はより単純な表現に収束し、Rectified Linear Unit (ReLU) のような断片的定数と断片的線形関数がニューラルネットワークで最もよく使われるタイプのアクティベーション関数となった。これにより、ある種のネットワーク構造を$\unicode{x2014}$、一般的な完全連結フィードフォワードニューラルネットワーク$\unicode{x2014}$、多面体理論による解析や線形計画法(LP)や混合整数線形計画法(MILP)といった様々な目的に応用することができる。本稿では、ニューラルネットワークのより詳細な理解と、ネットワークのサイズを訓練、検証、縮小するための線形最適化手法の適用に新たな視点をもたらす。 In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.	翻訳日:2023-05-02 16:22:06 公開日:2023-04-29
# 遺伝的アルゴリズムのFairy Tale The FAIRy Tale of Genetic Algorithms ( http://arxiv.org/abs/2305.00238v1 ) ライセンス: Link先を確認	Fahad Maqbool, Muhammad Saad Razzaq, Hajira Jabeen	(参考訳) 遺伝的アルゴリズム(GA)は確率演算子を用いて最適な解を求めるメタヒューリスティック進化アルゴリズムであり、多くの複雑な最適化問題の解法(分類、最適化、スケジューリングなど)においてその効果が証明されている。しかし、その性能、人気、単純さにもかかわらず、GAの再現性と再利用性にはあまり注意が払われていない。本稿では,Finderable,Accessible,Interoperable and Reusable (FAIR)データ原則を拡張し,アルゴリズムの再現性と再利用性を実現する。提案原則の適用性を実証するためのユースケースとして,GAを選択しました。また, GAの方法論的展開と変種について概説し, 適切なソースの再現や発見を困難にしている。さらに、FAIRアルゴリズムを有効にするために、軽量RDFフォーマットを用いた語彙(例えば$evo$)を提案し、再現性を向上させる。 GAの確率的性質を考えると、この作業は多くの最適化や機械学習アルゴリズム/メソッドにまで拡張できる。 Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability of GA. In this paper, we have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable the reproducibility and reusability of algorithms. We have chosen GA as a usecase to the demonstrate the applicability of the proposed principles. Also we have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source. Additionally, to enable FAIR algorithms, we propose a vocabulary (i.e. $evo$) using light weight RDF format, facilitating the reproducibility. Given the stochastic nature of GAs, this work can be extended to numerous Optimization and machine learning algorithms/methods.	翻訳日:2023-05-02 16:21:47 公開日:2023-04-29
# 教育, マーケティング, ソフトウェア工学, 医療におけるChatGPT応用の概観:利益, 欠点, 研究の方向性 A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions ( http://arxiv.org/abs/2305.00237v1 ) ライセンス: Link先を確認	Mohammad Fraiwan and Natheer Khasawneh	(参考訳) ChatGPTは、ディープラーニングアルゴリズムを使用して、テキストベースのプロンプトに対する人間的な応答を生成する人工知能言語モデルの一種である。 2022年11月に最新のchatgptバージョンが導入されたことで、産業コミュニティと学術コミュニティは、その強力な能力、多くの応用可能性、そして悪用の可能性に衝撃を与えた。この作品の執筆時点で、他のいくつかの言語モデル(google bardやmeta llamaなど)が、可能な限りの市場における足場を築こうと試みて登場した。これらのモデルには、コンピュータとの対話方法に革命を起こす能力があり、教育、ソフトウェア工学、医療、マーケティングなど、多くの分野に潜在的な応用がある。本稿では,これらの分野における高度な言語チャットボット(例えばchatgpt)を用いたアプリケーション,欠点,研究の方向性について述べる。まず、人工知能に基づく言語モデルの簡単な導入と開発スケジュールから始め、その後、そのようなモデルの応用の可能性について検討し、その後、現在の技術状況の限界と欠点について議論し、最後に、今後の研究の方向性を指摘する。 ChatGPT is a type of artificial intelligence language model that uses deep learning algorithms to generate human-like responses to text-based prompts. The introduction of the latest ChatGPT version in November of 2022 has caused shockwaves in the industrial and academic communities for its powerful capabilities, plethora of possible applications, and the great possibility for abuse. At the time of writing this work, several other language models (e.g., Google Bard and Meta LLaMA) just came out in an attempt to get a foothold in the vast possible market. These models have the ability to revolutionize the way we interact with computers and have potential applications in many fields, including education, software engineering, healthcare, and marketing. In this paper, we will discuss the possible applications, drawbacks, and research directions using advanced language Chatbots (e.g., ChatGPT) in each of these fields. We first start with a brief introduction and the development timeline of artificial intelligence based language models, then we go through possible applications of such models, after that we discuss the limitations and drawbacks of the current technological state of the art, and finally we point out future possible research directions.	翻訳日:2023-05-02 16:21:32 公開日:2023-04-29
# ベストプラクティスによる機械学習を目指して Towards machine learning guided by best practices ( http://arxiv.org/abs/2305.00233v1 ) ライセンス: Link先を確認	Anamaria Mojica-Hanke	(参考訳) 現在、機械学習(ML)は、医学からソフトウェア工学(SE)まで、複数のアプリケーション分野を持つソフトウェアシステムで使われている。一方、業界におけるMLの人気は、その成長と普及を示す統計に見ることができる。一方、その人気は研究、特にseでも見られ、seの会議やジャーナルで複数の研究が公開されているだけでなく、ソフトウェア工学の会議において複数のワークショップや共催の会議でも取り上げられている。同時に、研究者や実践者は、機械学習には特定の課題や落とし穴があることを示した。特に、ML対応システムは従来のSEとは異なる開発プロセスを持つことが研究で示されている。特定された課題や落とし穴を軽減するために、白とグレーの文献は自身の経験に基づいて、ドメイン(例えばバイオメカニクス)に焦点を当てた一連の勧告を提案しているが、私たちの知る限りでは、seコミュニティに焦点を当てたガイドラインはない。本論文は,SE の視点による実践の集合を提示する以前の研究研究と,質問や回答などの実践の源泉を分析して,SE コミュニティの実践者や研究者が使用し,議論するプラクティスを理解するのに役立つ研究質問に答えることにより,このギャップを小さくすることを目的とする。 Nowadays, machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE). On the one hand, the popularity of ML in the industry can be seen in the statistics showing its growth and adoption. On the other hand, its popularity can also be seen in research, particularly in SE, where not only have multiple studies been published in SE conferences and journals but also in the multiple workshops and co-located conferences in software engineering conferences. At the same time, researchers and practitioners have shown that machine learning has some particular challenges and pitfalls. In particular, research has shown that ML-enabled systems have a different development process than traditional SE, which also describes some of the challenges of ML applications. In order to mitigate some of the identified challenges and pitfalls, white and gray literature has proposed a set of recommendations based on their own experiences and focused on their domain (e.g., biomechanics), but for the best of our knowledge, there is no guideline focused on the SE community. This thesis aims to reduce this gap by answering research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community by analyzing possible sources of practices such as question and answer communities and also previous research studies to present a set of practices with an SE perspective.	翻訳日:2023-05-02 16:21:12 公開日:2023-04-29
# 不完全機械工学的知識を有する製造プロセスのための高速化・安価な機械学習 Accelerated and Inexpensive Machine Learning for Manufacturing Processes with Incomplete Mechanistic Knowledge ( http://arxiv.org/abs/2305.00229v1 ) ライセンス: Link先を確認	Jeremy Cleeman, Kian Agrawala, Rajiv Malhotra	(参考訳) 機械学習(ML)は、製造プロセスにおけるパラメトリック効果のモデリングへの関心が高まっている。最先端のアプローチでは、トレーニングデータを生成する実験的および/または計算的コストの削減に重点を置いているが、新しいプロセスのための定性的に正確な物理ベースのモデルを開発するための本質的で重要なコストは無視されている。本稿では,この問題に対処するトランスファーラーニングに基づくアプローチを提案する。そこでは,MLモデルを物理ベースプロセスモデル(ソース)から大量の計算コストのかかるデータに基づいて訓練し,より安価な実験データ(ターゲット)に基づいて微調整を行う。この斬新さは、文献において高いと推定されるソースモデルに要求される定性的精度の境界を押し下げることであり、高モデル開発コストの根源である。溶融フィラメント製造におけるプリントライン幅のモデル化について検討した。極端な機能的・量的不正確さにもかかわらず、我々のアプローチはモデル開発コストを年々削減し、実験コストを56-76%、計算コストを桁違いに、予測誤差を16-24%削減する。 Machine Learning (ML) is of increasing interest for modeling parametric effects in manufacturing processes. But this approach is limited to established processes for which a deep physics-based understanding has been developed over time, since state-of-the-art approaches focus on reducing the experimental and/or computational costs of generating the training data but ignore the inherent and significant cost of developing qualitatively accurate physics-based models for new processes . This paper proposes a transfer learning based approach to address this issue, in which a ML model is trained on a large amount of computationally inexpensive data from a physics-based process model (source) and then fine-tuned on a smaller amount of costly experimental data (target). The novelty lies in pushing the boundaries of the qualitative accuracy demanded of the source model, which is assumed to be high in the literature, and is the root of the high model development cost. Our approach is evaluated for modeling the printed line width in Fused Filament Fabrication. Despite extreme functional and quantitative inaccuracies in the source our approach reduces the model development cost by years, experimental cost by 56-76%, computational cost by orders of magnitude, and prediction error by 16-24%.	翻訳日:2023-05-02 16:20:49 公開日:2023-04-29
# 教師なし修復学習におけるsparsity-aware optimal transport Sparsity-Aware Optimal Transport for Unsupervised Restoration Learning ( http://arxiv.org/abs/2305.00273v1 ) ライセンス: Link先を確認	Fei Wen, Wei Wang and Wenxian Yu	(参考訳) 近年の研究では,教師なし復元学習問題を最適輸送(ot)問題として最適に定式化することが可能であり,教師付き手法の性能に接近するタスクに有望な性能が示された。しかし、超高分解能、デラリニング、デハジングといった複雑な修復作業における最先端の監督手法の遅れは依然として顕著である。本稿では,otフレームワークの劣化のスパースを生かして,これらのタスクにおける性能を大幅に向上させる。まず,これらの課題の劣化が周波数領域において極めて少ないという観察を開示し,教師なし回復学習のためのsparsity-aware optimal transport (sot) 基準を提案する。さらに,スパーシリティの活用が修復のための逆写像の発見におけるあいまいさの軽減に役立つことを示す分析例を示す。実世界の超解像、デラリニング、デハジングの実験では、SOTがそれぞれ約2.6dB、2.7dB、1.3dBでOTのPSNRを改善できることが示されている。特に3つのタスクにおいて、SOTは既存の教師なし手法を著しく上回り、最先端の教師付き手法の性能にアプローチする。 Recent studies show that, without any prior model, the unsupervised restoration learning problem can be optimally formulated as an optimal transport (OT) problem, which has shown promising performance on denoising tasks to approach the performance of supervised methods. However, it still significantly lags behind state-of-the-art supervised methods on complex restoration tasks such as super-resolution, deraining, and dehazing. In this paper, we exploit the sparsity of degradation in the OT framework to significantly boost its performance on these tasks. First, we disclose an observation that the degradation in these tasks is quite sparse in the frequency domain, and then propose a sparsity-aware optimal transport (SOT) criterion for unsupervised restoration learning. Further, we provide an analytic example to illustrate that exploiting the sparsity helps to reduce the ambiguity in finding an inverse map for restoration. Experiments on real-world super-resolution, deraining, and dehazing demonstrate that SOT can improve the PSNR of OT by about 2.6 dB, 2.7 dB and 1.3 dB, respectively, while achieving the best perception scores among the compared supervised and unsupervised methods. Particularly, on the three tasks, SOT significantly outperforms existing unsupervised methods and approaches the performance of state-of-the-art supervised methods.	翻訳日:2023-05-02 16:13:24 公開日:2023-04-29
# 時間分割開システムの量子速度限界 Quantum Speed Limit for Time-Fractional Open Systems ( http://arxiv.org/abs/2305.00270v1 ) ライセンス: Link先を確認	Dongmei Wei, Hailing Liu, Yongmei Li, Fei Gao, Sujuan Qin, Qiaoyan Wen	(参考訳) Time-Fractional Schr\"odinger Equation (TFSE)は、その散逸環境と相互作用する量子系を研究するためによく調整されている。量子速度制限(quantum speed limit, qsl)は、量子系が2つの状態の間を進化させるのに必要な最短時間であり、量子過程の最大速度を評価する上で重要である。本研究では,tfse を基本開放量子系モデル,すなわち共振散逸性jaynes-cummings (jc) モデルに適用し,システムのqsl時間を調べることにより,一般時間分解型単一量子ビットオープンシステムに対して正確に解く。環境のマルコフ的でない記憶効果は時間-屈折量子進化を加速し、結果としてQSL時間が小さくなることを示した。さらに、与えられた駆動時間における時間分割開量子系、すなわち分数次数、結合強度、光子数の間のトレードオフの加速進化の条件を光に導く。特に、長い駆動時間に対する分数順序を調整することにより、時間差分開量子系の非マルコフ散逸ダイナミクスを演算する方法について述べる。 The Time-Fractional Schr\"odinger Equation (TFSE) is well-adjusted to study a quantum system interacting with its dissipative environment. The Quantum Speed Limit (QSL) time captures the shortest time required for a quantum system to evolve between two states, which is significant for evaluating the maximum speed in quantum processes. In this work, we solve exactly for a generic time-fractional single qubit open system by applying the TFSE to a basic open quantum system model, namely the resonant dissipative Jaynes-Cummings (JC) model, and investigate the QSL time for the system. It is shown that the non-Markovian memory effects of the environment can accelerate the time-fractional quantum evolution, thus resulting in a smaller QSL time. Additionally, the condition for the acceleration evolution of the time-fractional open quantum system at a given driving time, i.e., a tradeoff among the fractional order, coupling strength, and photon number, is brought to light. In particular, a method to manipulate the non-Markovian dissipative dynamics of a time-fractional open quantum system by adjusting the fractional order for a long driving time is presented.	翻訳日:2023-05-02 16:12:59 公開日:2023-04-29
# 超微細構造決定のための$^{85}$Rb 4$D_{3/2}$状態の分光 Spectroscopy of the $^{85}$Rb 4$D_{3/2}$ state for hyperfine-structure determination ( http://arxiv.org/abs/2305.00265v1 ) ライセンス: Link先を確認	Alisher Duspayev and Georg Raithel	(参考訳) 我々は、2光子5$S_{1/2}\rightarrow$4$D_{3/2}$遷移を用いて、$^{85}$Rb 4$D_{3/2}$状態の超微細構造定数の測定を報告する。超微細遷移は、795nmレーザー周波数の関数として低温原子試料を介して低出力の795nm下段レーザー光の透過を測定し、上段1476nmレーザーの周波数を固定する。 4つの超微粒子成分は、記録された透過スペクトルにおいてよく分解される。 acシフトは慎重に考慮される。測定されたライン位置をゼロレーザーパワーに外挿することにより、フィールドフリーの超微細ライン位置を求める。磁気双極子と電気四極子定数である$A$と$B$はそれぞれ7.419(35)~MHzと4.19(19)~MHzと決定される。結果は,先行研究の文脈で評価される。 Rb 4$D_J$状態のRydberg-atom-physics,precision-metrology,quantum-technology への応用について論じる。 We report a measurement of the hyperfine-structure constants of the $^{85}$Rb 4$D_{3/2}$ state using a two-photon 5$S_{1/2}\rightarrow$4$D_{3/2}$ transition. The hyperfine transitions are probed by measuring the transmission of the low-power 795-nm lower-stage laser beam through a cold-atom sample as a function of 795-nm laser frequency, with the frequency of the upper-stage 1476-nm laser fixed. All 4 hyperfine components are well-resolved in the recorded transmission spectra. AC shifts are carefully considered. The field-free hyperfine line positions are obtained by extrapolating measured line positions to zero laser power. The magnetic-dipole and electric-quadrupole constants, $A$ and $B$, are determined from the hyperfine intervals to be 7.419(35)~MHz and 4.19(19)~MHz, respectively. The results are evaluated in context with previous works. Possible uses of the Rb 4$D_J$ states in Rydberg-atom-physics, precision-metrology and quantum-technology applications are discussed.	翻訳日:2023-05-02 16:12:37 公開日:2023-04-29
# 画像線分節の検出と記述に関する総合的レビュー:分類学,比較,課題 A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and Challenges ( http://arxiv.org/abs/2305.00264v1 ) ライセンス: Link先を確認	Xinyu Lin, Yingjie Zhou, Yipeng Liu, and Ce Zhu	(参考訳) ラインセグメントの検出と記述は多くの視覚タスクの基礎となった。多くの研究は線分の検出と記述を目的としているが、包括的なレビューは欠如しており、その進捗を妨げている。本研究は,二次元画像線セグメントの検出と記述に関する関連研究を包括的にレビューし,研究者に全体像と深い理解を与えることにより,このギャップを埋めている。それらの機構に基づき,線分検出と記述のための2つの分類法を提案し,これらの研究の紹介,解析,要約を行い,研究者が迅速かつ広範囲に学べるようにした。主要な問題、中核的な考え、既存手法の利点とデメリット、そして各カテゴリの潜在的な応用について分析・要約し、これまで未知の発見を含む。既存の方法の課題とそれを解決するための関連する洞察は、研究者を刺激するためにも提供される。さらに、いくつかの最先端の線分検出および記述アルゴリズムをバイアスなく評価し、評価コードを公開する。理論的解析は、実験結果と相まって、研究者が意図した視覚応用に最適な方法を選択するためのガイドとなる。最後に、この研究は、この分野の研究者からより多くの注目を集めるために、潜在的に興味深い将来の研究方向についての洞察を提供する。 Detection and description of line segments lay the basis for numerous vision tasks. Although many studies have aimed to detect and describe line segments, a comprehensive review is lacking, obstructing their progress. This study fills the gap by comprehensively reviewing related studies on detecting and describing two-dimensional image line segments to provide researchers with an overall picture and deep understanding. Based on their mechanisms, two taxonomies for line segment detection and description are presented to introduce, analyze, and summarize these studies, facilitating researchers to learn about them quickly and extensively. The key issues, core ideas, advantages and disadvantages of existing methods, and their potential applications for each category are analyzed and summarized, including previously unknown findings. The challenges in existing methods and corresponding insights for potentially solving them are also provided to inspire researchers. In addition, some state-of-the-art line segment detection and description algorithms are evaluated without bias, and the evaluation code will be publicly available. The theoretical analysis, coupled with the experimental results, can guide researchers in selecting the best method for their intended vision applications. Finally, this study provides insights for potentially interesting future research directions to attract more attention from researchers to this field.	翻訳日:2023-05-02 16:12:14 公開日:2023-04-29
# Voigt系光ポンピング磁気センサの分光マイクロ波分光 Stroboscopic microwave spectroscopy of Voigt based optically pumped magnetometers ( http://arxiv.org/abs/2305.00263v1 ) ライセンス: Link先を確認	Hans Marin Florez, Tadas Pyragius and Thomas Fernholz	(参考訳) 高周波式光ポンピング磁気センサの分光マイクロ波分光結果について報告する。高周波装束原子と同期パルスマイクロ波場との相互作用と、Voigt効果に基づく光プローブにより、部分状態トモグラフィを行い、状態形成プロセスの効率を評価することができる。このシステムを理論的に記述するために,フロッケ展開を用いた密度行列の動的方程式を解く。我々の理論的結果は、幅広いパラメータとポンプ条件に関する実験結果とよく一致している。最後に、この研究で示された理論的および実験的分析は、複雑な状態準備技術を含む他のシステムに一般化することができる。 We present results of stroboscopic microwave spectroscopy of radio-frequency dressed optically pumped magnetometer. Interaction between radio-frequency dressed atoms and a synchronously pulsed microwave field followed by Voigt effect-based optical probing allows us to perform partial state tomography and assess the efficiency of the state preparation process. To theoretically describe the system, we solve the dynamical equation of the density matrix employing Floquet expansion. Our theoretical results are in good agreement with experimental measurements over a wide range of parameters and pumping conditions. Finally, the theoretical and experimental analysis presented in this work can be generalised to other systems involving complex state preparation techniques.	翻訳日:2023-05-02 16:11:54 公開日:2023-04-29
# 特殊トークンとターンレベルの注意による階層的対話理解 Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention ( http://arxiv.org/abs/2305.00262v1 ) ライセンス: Link先を確認	Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You	(参考訳) 標準的なテキストと比較すると、各ターンの動的および予期せぬ意味変化として、機械にとって対話を理解することはより困難である。このような一貫性のない意味論をモデル化するために,階層的対話理解モデルhidialogを提案する。具体的には,まず対話に複数の特殊トークンを挿入し,ターンレベルの注意を階層的に学習する。そして、学習された埋め込みを磨くために異種グラフモジュールを利用する。我々は,対話関係抽出,対話感情認識,対話行為分類など,対話理解タスクにおけるモデルの評価を行った。その結果, 上述の3つのタスクすべてにおいて, 最新のパフォーマンスを実現するための簡単な手法が得られた。ソースコードはすべてhttps://github.com/ShawX825/HiDialog.comで公開されています。 Compared with standard text, understanding dialogue is more challenging for machines as the dynamic and unexpected semantic changes in each turn. To model such inconsistent semantics, we propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog. Specifically, we first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically. Then, a heterogeneous graph module is leveraged to polish the learned embeddings. We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification. Results show that our simple approach achieves state-of-the-art performance on all three tasks above. All our source code is publicly available at https://github.com/ShawX825/HiDialog.	翻訳日:2023-05-02 16:11:45 公開日:2023-04-29
# CME適応時間予測のためのアンサンブル学習 Ensemble Learning for CME Arrival Time Prediction ( http://arxiv.org/abs/2305.00258v1 ) ライセンス: Link先を確認	Khalid A. Alobaid, Jason T. L. Wang	(参考訳) 太陽は常に放射とプラズマをヘリウム圏に放出する。散発的に太陽はフレアやコロナ質量放出(cmes)のような太陽の噴火を起こす。 CMEは大量の質量と磁束を輸送する。地球指向のCMEは、人間のシステムに深刻な影響をもたらす可能性がある。電力網、パイプライン、衛星、通信を破壊できる。したがって、人体システムへのダメージを最小限に抑えるためには、正確な監視と予測が重要である。本研究では,太陽から地球へのCMEの到着時刻を予測するため,CMETNetというアンサンブル学習手法を提案する。我々は,1996年から2021年までの2つの太陽周期,#23と#24の噴火事象を,合計363個の地球効率CMEを用いて収集・統合した。予測に使用されるデータには、SOHO/LASCO C2コロナグラフから得られたCMEの特徴、太陽風パラメータ、CME画像が含まれる。本学習フレームワークは,数値データ解析のための回帰アルゴリズムと,画像処理のための畳み込みニューラルネットワークから構成される。実験の結果,CMETNetはPearsonの製品モーメント相関係数0.83,絶対誤差9.75時間で,既存の機械学習手法よりも優れた性能を示した。 The Sun constantly releases radiation and plasma into the heliosphere. Sporadically, the Sun launches solar eruptions such as flares and coronal mass ejections (CMEs). CMEs carry away a huge amount of mass and magnetic flux with them. An Earth-directed CME can cause serious consequences to the human system. It can destroy power grids/pipelines, satellites, and communications. Therefore, accurately monitoring and predicting CMEs is important to minimize damages to the human system. In this study we propose an ensemble learning approach, named CMETNet, for predicting the arrival time of CMEs from the Sun to the Earth. We collect and integrate eruptive events from two solar cycles, #23 and #24, from 1996 to 2021 with a total of 363 geoeffective CMEs. The data used for making predictions include CME features, solar wind parameters and CME images obtained from the SOHO/LASCO C2 coronagraph. Our ensemble learning framework comprises regression algorithms for numerical data analysis and a convolutional neural network for image processing. Experimental results show that CMETNet performs better than existing machine learning methods reported in the literature, with a Pearson product-moment correlation coefficient of 0.83 and a mean absolute error of 9.75 hours.	翻訳日:2023-05-02 16:11:33 公開日:2023-04-29
# 深層学習法を用いたmri画像からの脳腫瘍分割 Brain Tumor Segmentation from MRI Images using Deep Learning Techniques ( http://arxiv.org/abs/2305.00257v1 ) ライセンス: Link先を確認	Ayan Gupta, Mayank Dixit, Vipul Kumar Mishra, Attulya Singh, Atul Dayal	(参考訳) 良性であれ悪性であれ、脳腫瘍は生命を脅かす可能性があり、病気のタイプ、起源、位置を特定するのに苦労する必要がある。医療専門家による手動セグメンテーションは時間のかかる作業であり、高い精度でプロセスを早めるテクノロジーの関与を訴える。医用画像セグメンテーションの目的で,脳腫瘍セグメンテーションに用いるデータセットにおいて一貫した結果を示す有能な深層学習モデルを検査,同定した。本研究では, 3種類の脳腫瘍, viz. meningioma, glioma, 下垂体腫瘍233例のti強調画像3064例について, mri画像データセットを用いて検討した。データセットファイルは、様々なバックボーンを持つU-Net & Attention U-Net、Deep Residual U-Net、ResUnet++、Recurrent Residual U-Netといった、よく知られたイメージセグメンテーションのディープラーニングモデルの実装とトレーニングを利用する方法論に順応する前に、変換および事前処理された。様々なパラメーターで人間の脳腫瘍の分類とセグメンテーションに関する文献のレビューから入手した実験結果から,Adamオプティマイザを用いた再帰的残差U-Netは平均差0.8665に達し,他の最先端ディープラーニングモデルよりも優れていることがわかった。視覚的な発見はまた、MRIスキャンによる脳腫瘍のセグメンテーションの顕著な結果を示し、医師がMRIスキャンから自動的に脳がんを抽出し、人類に役立てるためのアルゴリズムがいかに有用かを示している。 A brain tumor, whether benign or malignant, can potentially be life threatening and requires painstaking efforts in order to identify the type, origin and location, let alone cure one. Manual segmentation by medical specialists can be time-consuming, which calls out for the involvement of technology to hasten the process with high accuracy. For the purpose of medical image segmentation, we inspected and identified the capable deep learning model, which shows consistent results in the dataset used for brain tumor segmentation. In this study, a public MRI imaging dataset contains 3064 TI-weighted images from 233 patients with three variants of brain tumor, viz. meningioma, glioma, and pituitary tumor. The dataset files were converted and preprocessed before indulging into the methodology which employs implementation and training of some well-known image segmentation deep learning models like U-Net & Attention U-Net with various backbones, Deep Residual U-Net, ResUnet++ and Recurrent Residual U-Net. with varying parameters, acquired from our review of the literature related to human brain tumor classification and segmentation. The experimental findings showed that among all the applied approaches, the recurrent residual U-Net which uses Adam optimizer reaches a Mean Intersection Over Union of 0.8665 and outperforms other compared state-of-the-art deep learning models. The visual findings also show the remarkable results of the brain tumor segmentation from MRI scans and demonstrates how useful the algorithm will be for physicians to extract the brain cancers automatically from MRI scans and serve humanity.	翻訳日:2023-05-02 16:11:14 公開日:2023-04-29
# 駆動量子系のクリロフ構成と複雑性 Krylov construction and complexity for driven quantum systems ( http://arxiv.org/abs/2305.00256v1 ) ライセンス: Link先を確認	Amin A. Nizami and Ankit W. Shrestha	(参考訳) クリロフ複雑性は作用素の成長と量子カオスの研究と関連する重要な力学量であり、最近では様々な時間に依存しない系で多くの研究がなされている。時間依存型(駆動型)量子システムにおけるK-複素性の研究を開始する。周期時間依存(フローク)系では、クリロフ構成を行う自然な方法を与え、そのような系に対して(状態と演算子)k-複素性を定義する。特にトーラスとハーパー写像上の量子キックロータに着目し,ランチョス様係数の時間依存性と,弱結合状態と強結合状態とのカップリング定数とのk-複素性について詳細な数値的研究を行った。 Krylov complexity is an important dynamical quantity with relevance to the study of operator growth and quantum chaos and has recently been much studied for various time-independent systems. We initiate the study of K-complexity in time-dependent (driven) quantum systems. For periodic time-dependent (Floquet) systems, we give a natural method for doing the Krylov construction and then define (state and operator) K-complexity for such systems. Focusing on kicked systems, in particular the quantum kicked rotor on a torus and the Harper map, we undertake a detailed numerical study of the time dependence of Lanczos-like coefficients as well as of the K-complexity with the coupling constant interpolating between the weak and strong coupling regime.	翻訳日:2023-05-02 16:10:41 公開日:2023-04-29
# 半無限拘束マルコフ決定過程と効率的な強化学習 Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning ( http://arxiv.org/abs/2305.00254v1 ) ライセンス: Link先を確認	Liangyu Zhang, Yang Peng, Wenhao Yang and Zhihua Zhang	(参考訳) 本稿では,制約付きマルコフ決定過程 (CMDP) の新たな一般化を提案し,これを<emph{semi-infinitely constrained Markov decision process} (SICMDP) と呼ぶ。特に、通常のCMDPの場合のように、有限個の制約ではなく制約の連続性を考える。また,SI-CRL と SI-CPO の2つの強化学習アルゴリズムを考案した。 SI-CRLはモデルに基づく強化学習アルゴリズムである。遷移モデルを推定すると、まず強化学習問題を線形半無限プログラミング(LSIP)問題に変換し、次にLSIP文学における二重交換法を用いて解決する。 SI-CPOはポリシー最適化アルゴリズムである。協調確率近似アプローチからアイデアを借用し,政策パラメータの代替更新を行い,報酬を最大化し,コストを最小化する。我々の知る限り、我々は、制約付き強化学習問題を解決するために、半無限プログラミング(SIP)のツールを最初に適用しました。 SI-CRL と SI-CPO の理論的解析を行い,それらの反復複雑性とサンプル複雑性を同定した。また,sicmdpモデルを説明するために広範な数値実験を行い,最新の深層強化学習手法を用いて,提案手法が複雑な逐次的意思決定課題を解決できることを実証した。 We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CRL is a model-based reinforcement learning algorithm. Given an estimate of the transition model, we first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. SI-CPO is a policy optimization algorithm. Borrowing the ideas from the cooperative stochastic approximation approach, we make alternative updates to the policy parameters to maximize the reward or minimize the cost. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems. We present theoretical analysis for SI-CRL and SI-CPO, identifying their iteration complexity and sample complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and demonstrate that our proposed algorithms are able to solve complex sequential decision-making tasks leveraging modern deep reinforcement learning techniques.	翻訳日:2023-05-02 16:10:27 公開日:2023-04-29
# 模擬学習のための結合フローアプローチ A Coupled Flow Approach to Imitation Learning ( http://arxiv.org/abs/2305.00303v1 ) ライセンス: Link先を確認	Gideon Freund, Elad Sarafian, Sarit Kraus	(参考訳) 強化学習と模倣学習において、中心的重要性の対象は政策によって引き起こされる状態分布である。この定理は政策勾配定理において重要な役割を担っており、関連する状態-作用分布とともにそれを参照している。その重要性にもかかわらず、状態分布は明示的にモデル化されるのではなく、主に間接的に理論的に議論される。適切な密度推定ツールがないのは理由です。本研究では,上記の分布に対する正規化フローベースモデルの応用について検討する。特に、分布マッチングに基づく模倣学習において、KL(Kulback-Leibler)発散のDonsker-Varadhan表現の最適点を介して結合された一対の流れを用いる。我々のアルゴリズムであるCFIL(Coupled Flow Imitation Learning)は,1つの専門的軌道を持つベンチマークタスクにおける最先端のパフォーマンスを達成し,サブサンプルとステートのみのルールを含むさまざまな設定に自然に拡張する。 In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it--along with the related state-action distribution--can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow-based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.	翻訳日:2023-05-02 16:04:25 公開日:2023-04-29
# イデオグラフィーのパズルの認知的記述 A Cognitive Account of the Puzzle of Ideography ( http://arxiv.org/abs/2305.00296v1 ) ライセンス: Link先を確認	Xerxes D. Arsiwalla	(参考訳) モリンの「イデノグラフィーのパズル」の解説記事において、モリンの標準化を補完するイデオログラフィーのパズルの認知的記述を新たに発表した。音声言語の効率的な標準化は、認知表現のチャンキングと組み合わさったモダリティ効果に現象論的に起因し、さらに多感的な統合と注意のシリアライズされた性質によって支援される。これらの認知メカニズムは、汎用コミュニケーションにおいて言語がグラフィックコードを支配している理由を説明する上で重要である。 In this commentary article to 'The Puzzle of Ideography' by Morin, we put forth a new cognitive account of the puzzle of ideography, that complements the standardization account of Morin. Efficient standardization of spoken language is phenomenologically attributed to a modality effect coupled with chunking of cognitive representations, further aided by multi-sensory integration and the serialized nature of attention. These cognitive mechanisms are crucial for explaining why languages dominate graphic codes for general-purpose human communication.	翻訳日:2023-05-02 16:04:06 公開日:2023-04-29
# フローダイナミクス最適化深層学習法を用いた網膜眼底画像の分類の改善 Improving Classification of Retinal Fundus Image Using Flow Dynamics Optimized Deep Learning Methods ( http://arxiv.org/abs/2305.00294v1 ) ライセンス: Link先を確認	V. Banupriya, S. Anusuya	(参考訳) 糖尿病網膜症(英: diabetes retinopathy、dr)は、網膜に存在する血管ネットワークを損傷する糖尿病の障害である。これは糖尿病を患っている場合、被験者の視覚を危険にさらす可能性がある。経験豊富な臨床医は、疾患の特定に使用する画像中の腫瘍を識別する必要があるため、色眼底写真を用いてdr診断を行うのに時間がかかる。 DRの自動検出は非常に難しい作業である。畳み込みニューラルネットワーク(cnn)は、現在の状況において、特に手作りや機能的手法と比較して、画像の分類に非常に有効である。高い結果を保証するため、研究者たちは基礎画像の特徴を決定するための最先端のcnnモデルも提案した。 cnn出力の特徴は,提案システムにおける機械学習の各種分類器に応用された。このモデルは後に異なる形態の深層学習法と視覚幾何学群(vgg)ネットワークを用いて評価された。これは、一般的なKAGGLEデータセットのイメージを使用することで実現された。ここでは, 網膜眼底像検出のためのファンネットとともに, 河川形成ダイナミクス (rfd) アルゴリズムが提案されている。調査の結果、アプローチは代替アプローチよりも優れていることが示された。 Diabetic Retinopathy (DR) refers to a barrier that takes place in diabetes mellitus damaging the blood vessel network present in the retina. This may endanger the subjects' vision if they have diabetes. It can take some time to perform a DR diagnosis using color fundus pictures because experienced clinicians are required to identify the tumors in the imagery used to identify the illness. Automated detection of the DR can be an extremely challenging task. Convolutional Neural Networks (CNN) are also highly effective at classifying images when applied in the present situation, particularly compared to the handmade and functionality methods employed. In order to guarantee high results, the researchers also suggested a cutting-edge CNN model that might determine the characteristics of the fundus images. The features of the CNN output were employed in various classifiers of machine learning for the proposed system. This model was later evaluated using different forms of deep learning methods and Visual Geometry Group (VGG) networks). It was done by employing the images from a generic KAGGLE dataset. Here, the River Formation Dynamics (RFD) algorithm proposed along with the FUNDNET to detect retinal fundus images has been employed. The investigation's findings demonstrated that the approach performed better than alternative approaches.	翻訳日:2023-05-02 16:03:54 公開日:2023-04-29
# Polyp-SAM:ポリプセグメンテーションのためのトランスファーSAM Polyp-SAM: Transfer SAM for Polyp Segmentation ( http://arxiv.org/abs/2305.00293v1 ) ライセンス: Link先を確認	Yuheng Li, Mingzhe Hu, and Xiaofeng Yang	(参考訳) 大腸ポリープは大腸癌の重要な前駆体と考えられている。大腸ポリープの自動分画は大腸癌の誤診を著しく低減し、医師の診断効率を向上させる。ポリープセグメンテーションには多くの方法が提案されているが,大腸内視鏡データを限定した大規模セグメンテーションネットワークの訓練は課題である。近年,Segment Anything Model (SAM) は,自然画像と医用画像のセグメンテーションにおいて注目されている。 SAMはいくつかの画像ベンチマークにおいて優れた性能を示しており、医用画像のセグメンテーションに大きな可能性を示している。本研究では,ポリプセグメンテーションのための微調整samモデルであるpoly-samを提案し,その性能を最先端ポリプセグメンテーションモデルと比較する。 samの2つの転送学習戦略をエンコーダを微調整することなく比較した。 5つのパブリックデータセットで評価され、2つのデータセットで最先端のパフォーマンスを達成し、3つのデータセットで印象的なパフォーマンスを実現しました。本研究は,SAMを医用画像分割タスクに適用する大きな可能性を示す。この記事では、コードとモデルの重み付けを次のようにリリースする予定です。 Colon polyps are considered important precursors for colorectal cancer. Automatic segmentation of colon polyps can significantly reduce the misdiagnosis of colon cancer and improve physician annotation efficiency. While many methods have been proposed for polyp segmentation, training large-scale segmentation networks with limited colonoscopy data remains a challenge. Recently, the Segment Anything Model (SAM) has recently gained much attention in both natural and medical image segmentation. SAM demonstrates superior performance in several image benchmarks and therefore shows great potential for medical image segmentation. In this study, we propose Poly-SAM, a finetuned SAM model for polyp segmentation, and compare its performance to several state-of-the-art polyp segmentation models. We also compare two transfer learning strategies of SAM with and without finetuning its encoders. Evaluated on five public datasets, our Polyp-SAM achieves state-of-the-art performance on two datasets and impressive performance on three datasets, with dice scores all above 88%. This study demonstrates the great potential of adapting SAM to medical image segmentation tasks. We plan to release the code and model weights for this paper at: https://github.com/ricklisz/Polyp-SAM.	翻訳日:2023-05-02 16:03:36 公開日:2023-04-29
# 生成aiに関する学生の声 : 高等教育における認識・利益・課題 Students' Voices on Generative AI: Perceptions, Benefits, and Challenges in Higher Education ( http://arxiv.org/abs/2305.00290v1 ) ライセンス: Link先を確認	Cecilia Ka Yuk Chan and Wenjie Hu	(参考訳) 本研究は、高等教育におけるChatGPTのような生成AI(GenAI)技術に対する大学生の認識について、親しみ、取り組みへの意欲、潜在的な利益と課題、効果的な統合に焦点を当てたものである。香港の様々な分野の大学生・大学院生399名を対象に調査を行ったところ、教育・学習におけるGenAIに対する概して肯定的な態度を示した。学生は、パーソナライズされた学習支援、執筆とブレインストーミング支援、研究と分析機能の可能性を認識した。しかし, 正確性, プライバシ, 倫理的問題, 個人の発達, キャリアの見通し, 社会的価値への影響についても懸念が表明された。 John Biggs氏の3Pモデルによると、学生の知覚は学習のアプローチや成果に大きな影響を与えている。学生の認識を理解することで、教育者や政策立案者はGenAI技術をニーズや関心に対処し、効果的な学習成果を促進することができる。本研究から得られた知見は、GenAI技術の高等教育への統合に関する政策開発に影響を及ぼす。学生の認識を理解し、その懸念に対処することで、政策立案者は、GenAIツールの責任と効果的な実装のための、しっかりとインフォームドされたガイドラインと戦略を作成し、最終的に高等教育における教育と学習の経験を向上することができる。 This study explores university students' perceptions of generative AI (GenAI) technologies, such as ChatGPT, in higher education, focusing on familiarity, their willingness to engage, potential benefits and challenges, and effective integration. A survey of 399 undergraduate and postgraduate students from various disciplines in Hong Kong revealed a generally positive attitude towards GenAI in teaching and learning. Students recognized the potential for personalized learning support, writing and brainstorming assistance, and research and analysis capabilities. However, concerns about accuracy, privacy, ethical issues, and the impact on personal development, career prospects, and societal values were also expressed. According to John Biggs' 3P model, student perceptions significantly influence learning approaches and outcomes. By understanding students' perceptions, educators and policymakers can tailor GenAI technologies to address needs and concerns while promoting effective learning outcomes. Insights from this study can inform policy development around the integration of GenAI technologies into higher education. By understanding students' perceptions and addressing their concerns, policymakers can create well-informed guidelines and strategies for the responsible and effective implementation of GenAI tools, ultimately enhancing teaching and learning experiences in higher education.	翻訳日:2023-05-02 16:03:16 公開日:2023-04-29
# LiDAR点雲上のバンドル調整のための効率的な平面抽出手法 An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds ( http://arxiv.org/abs/2305.00287v1 ) ライセンス: Link先を確認	Zheng Liu and Fu Zhang	(参考訳) LiDARポイントクラウド上のバンドル調整(BA)は、複数のポーズを同時に最適化する能力により、ポイントクラウドの高精度でグローバルな一貫性をもたらすため、近年広く研究されている。しかし、LiDARバンドル調整の精度と速度は、LiDAR BAの点関連性を提供する平面抽出の品質に依存する。本研究では,lidarバンドル調整のためのポイントアソシエーションを提供するために特別に設計された,voxelに基づく平面抽出手法を提案する。まず、空間を一定サイズの複数のボクセルに分割し、その点が同じ平面上にあるかどうかに基づいて、octree構造を用いてこれらのルートボクセルを分割する。また,基本成分分析(pca)に基づく新しい平面決定法を考案し,各点を4つの偶数クォーターに分割し,それらの最小固有値と初期点クラウドの値を比較する。最後に,1つのボクセル内に存在する小さな平面が多すぎることを防止し,BAに必要な最適化時間を短縮する平面マージ手法を提案する。 HILTIを用いた実験結果から,提案手法が他の平面抽出法と比較して最適かつ最小の時間コストを実現することを示す。 Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.	翻訳日:2023-05-02 16:02:53 公開日:2023-04-29
# 自己監督型タスク表現学習に基づくメタ強化学習 Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning ( http://arxiv.org/abs/2305.00286v1 ) ライセンス: Link先を確認	Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang and Alois Knoll	(参考訳) メタ強化学習により、人工知能は関連するトレーニングタスクから学び、最小限のインタラクションデータで新しいタスクに効率的に適応することができる。しかし、既存の研究の多くは、まだパラメトリックで定常的な狭いタスク分布に限られており、評価中に配布外タスクを考慮せず、適用を制限している。本稿では,この課題に対処するために,自己監督型タスク表現学習に基づくコンテキストベースメタ強化学習アルゴリズムMOSSを提案する。メタRLは、これまで探索されたことのない幅広い非パラメトリックタスク分布に拡張し、非定常および非分布タスクにおける最先端結果を達成する。具体的には、MOSSはタスク推論モジュールとポリシーモジュールで構成される。タスク表現にはガウス混合モデルを用いてパラメトリックおよび非パラメトリックタスクのバリエーションを模倣する。さらに、我々のオンライン適応戦略により、エージェントはタスク変更の第一の視点で反応し、非定常的なタスクに適用できる。 MoSSはまた、信頼性と堅牢なタスク表現の恩恵を受けるアウト・オブ・ディストリビューションタスクにおいて、強力な一般化ロバスト性を示す。ポリシーはオフ・ポリシーrlアルゴリズム上に構築されており、ネットワーク全体が完全にオフ・ポリシーに訓練され、高いサンプル効率が保証される。 MuJoCo と Meta-World のベンチマークでは、MoSS は漸近的性能、サンプル効率(3-50倍高速)、適応効率、広範囲で多様なタスク分布に対する一般化ロバスト性といった点において先行研究より優れていた。 Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.	翻訳日:2023-05-02 16:02:32 公開日:2023-04-29
# NSLF-OL:リアルタイムインクリメンタル3次元再構成を伴うニューラルネットワークのオンライン学習 NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction ( http://arxiv.org/abs/2305.00282v1 ) ライセンス: Link先を確認	Yijun Yuan and Andreas Nuchter	(参考訳) 没入型新規ビュー生成はグラフィックス分野における重要な技術であり,近年,操作者による人間ロボットのインタラクションにも注目されている。しかし、関連するトレーニングは時間がかかるため、現在のテスト範囲は、主にオブジェクトのキャプチャにかかっている。これは、ロボットコミュニティにおける3次元再構築のための関連するモデルの使用を制限する。(1) ロボットは、通常、目に見えない、新しい方向の任意の予測を引き起こす表面への非常に小さな視野方向のみをキャプチャし、(2) リアルタイムアルゴリズムを必要とし、(3) ロボット探索のような成長するシーンで作業するためである。そこで本研究では,視線方向の小さな方向に対応できるニューラルサーフェス光場モデルを提案する。最近のエンコーディング技術を活用することで、モデルのトレーニングは非常に効率的です。さらに,大規模に成長するシーンに対して,各小領域を並列に学習する汎用フレームワークであるMANA(Multiple Asynchronous Neural Agents)を設計した。我々のモデルは、リアルタイムな3次元再構成の他に、シーケンシャルなデータストリームを共有入力として、ニューラルネットワーク光場(NSLF)をオンラインで学習する。オンライントレーニングに加えて,可視化のためのデータストリームの完了後にリアルタイムレンダリングも提供する。我々は,有名なrgbd屋内データセットを用いて実験を行い,実時間3次元再構成にモデルを埋め込むための高い柔軟性を示し,これらのシーンに対する高忠実度な映像合成を示す。コードはgithubで入手できる。 Immersive novel view generation is an important technology in the field of graphics and has recently also received attention for operator-based human-robot interaction. However, the involved training is time-consuming, and thus the current test scope is majorly on object capturing. This limits the usage of related models in the robotics community for 3D reconstruction since robots (1) usually only capture a very small range of view directions to surfaces that cause arbitrary predictions on unseen, novel direction, (2) requires real-time algorithms, and (3) work with growing scenes, e.g., in robotic exploration. The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions. Exploiting recent encoding techniques, the training of our model is highly efficient. In addition, we design Multiple Asynchronous Neural Agents (MANA), a universal framework to learn each small region in parallel for large-scale growing scenes. Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input. In addition to online training, our model also provides real-time rendering after completing the data stream for visualization. We implement experiments using well-known RGBD indoor datasets, showing the high flexibility to embed our model into real-time 3D reconstruction and demonstrating high-fidelity view synthesis for these scenes. The code is available on github.	翻訳日:2023-05-02 16:02:06 公開日:2023-04-29
# 大学教育・学習のための総合的AI政策教育フレームワーク A Comprehensive AI Policy Education Framework for University Teaching and Learning ( http://arxiv.org/abs/2305.00280v1 ) ライセンス: Link先を確認	Cecilia Ka Yuk Chan	(参考訳) 本研究は,テキスト生成型AI技術の認識と意義を検証し,高等教育のためのAI教育政策を開発することを目的とする。香港大学で457人の学生と180人の教員とスタッフから,定量的・質的調査手法を用いて収集した。本研究は,大学教育と学習におけるAI統合の多面的影響に対処する,AIエコロジー教育政策枠組みを提案する。このフレームワークは、Pedagogical、Government、Operationalの3つの次元に分けられます。教育のディメンションはAIを使用して教育と学習の成果を改善することに集中し、ガバナンスディメンションはプライバシ、セキュリティ、説明責任に関する問題に取り組む。運用次元は、インフラストラクチャとトレーニングに関する問題に対処する。このフレームワークは、学術的な設定におけるai統合の意味を微妙に理解し、ステークホルダーが責任を認識し、適切な行動を取ることを保証する。 This study aims to develop an AI education policy for higher education by examining the perceptions and implications of text generative AI technologies. Data was collected from 457 students and 180 teachers and staff across various disciplines in Hong Kong universities, using both quantitative and qualitative research methods. Based on the findings, the study proposes an AI Ecological Education Policy Framework to address the multifaceted implications of AI integration in university teaching and learning. This framework is organized into three dimensions: Pedagogical, Governance, and Operational. The Pedagogical dimension concentrates on using AI to improve teaching and learning outcomes, while the Governance dimension tackles issues related to privacy, security, and accountability. The Operational dimension addresses matters concerning infrastructure and training. The framework fosters a nuanced understanding of the implications of AI integration in academic settings, ensuring that stakeholders are aware of their responsibilities and can take appropriate actions accordingly.	翻訳日:2023-05-02 16:01:43 公開日:2023-04-29
# segment anything model (sam)がガラスを満たす - 鏡や透明な物体は容易に検出できない Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected ( http://arxiv.org/abs/2305.00278v1 ) ライセンス: Link先を確認	Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong	(参考訳) meta ai researchが先日リリースしたsam(segment anything model)は、10億以上のマスクからなる大規模なセグメンテーションデータセットでトレーニングされている。コンピュータビジョンの分野での基礎モデルとして、sam(segment anything model)は汎用オブジェクトセグメンテーションにおける印象的なパフォーマンスで注目を集めている。幅広いゼロショット転送タスクの強い能力にもかかわらず、SAMが透明なオブジェクトのような挑戦的なセットアップで何かを検出できるかどうかは不明だ。本研究では,鏡と透明物体の2つのガラス関連課題を実証的に評価する。 SAMは両方のシナリオでガラスの検出に失敗することが多く、様々な形態のガラスを持つ安全クリティカルな状況においてSAMをデプロイすることを懸念する。 Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unknown whether SAM can detect things in challenging setups like transparent objects. In this work, we perform an empirical evaluation of two glass-related challenging scenarios: mirror and transparent objects. We found that SAM often fails to detect the glass in both scenarios, which raises concern for deploying the SAM in safety-critical situations that have various forms of glass.	翻訳日:2023-05-02 16:01:26 公開日:2023-04-29
# fedgrad: 局所的究極的勾配検査によるフェデレーション学習におけるバックドア攻撃の軽減 FedGrad: Mitigating Backdoor Attacks in Federated Learning Through Local Ultimate Gradients Inspection ( http://arxiv.org/abs/2305.00328v1 ) ライセンス: Link先を確認	Thuy Dung Nguyen, Anh Duy Nguyen, Kok-Seng Wong, Huy Hieu Pham, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen	(参考訳) フェデレートラーニング(FL)により、複数のクライアントが機密データを妥協することなくモデルをトレーニングできる。 FLの分散した性質は、特に訓練中のバックドア挿入において敵の攻撃を受けやすい。近年,データ分布の尾部を利用したエッジケースバックドア攻撃が強力な攻撃として提案され,現状の防御の堅牢性保証の不足に関する疑問が提起されている。特に、既存の防御の多くは、エッジケースバックドア攻撃を排除できないか、バックドア防御の有効性とプライマリタスクにおける全体的なパフォーマンスのトレードオフに苦しむ。この課題に取り組むため,我々は,エッジケース攻撃を含む最先端バックドア攻撃に耐性を持ち,異種クライアントデータと多数の漏洩したクライアントにおいて効果的に実行する,新しいflバックドア防御手法であるfeedgradを提案する。 fedgradは、究極のレイヤの勾配を徹底的に分析し、疑わしいローカルアップデートを特定し、集約プロセスから削除する2層フィルタリングメカニズムとして設計されている。我々は、異なる攻撃シナリオ下でFedGradを評価し、最先端の防御機構を著しく上回ることを示す。特にfeedgradは、悪意のある参加者をほぼ100%正しく検出することができ、主要なタスクの精度を低下させることなく、バックドア効果(例えばバックドア精度が8%未満)を大幅に削減することができる。 Federated learning (FL) enables multiple clients to train a model without compromising sensitive data. The decentralized nature of FL makes it susceptible to adversarial attacks, especially backdoor insertion during training. Recently, the edge-case backdoor attack employing the tail of the data distribution has been proposed as a powerful one, raising questions about the shortfall in current defenses' robustness guarantees. Specifically, most existing defenses cannot eliminate edge-case backdoor attacks or suffer from a trade-off between backdoor-defending effectiveness and overall performance on the primary task. To tackle this challenge, we propose FedGrad, a novel backdoor-resistant defense for FL that is resistant to cutting-edge backdoor attacks, including the edge-case attack, and performs effectively under heterogeneous client data and a large number of compromised clients. FedGrad is designed as a two-layer filtering mechanism that thoroughly analyzes the ultimate layer's gradient to identify suspicious local updates and remove them from the aggregation process. We evaluate FedGrad under different attack scenarios and show that it significantly outperforms state-of-the-art defense mechanisms. Notably, FedGrad can almost 100% correctly detect the malicious participants, thus providing a significant reduction in the backdoor effect (e.g., backdoor accuracy is less than 8%) while not reducing the main accuracy on the primary task.	翻訳日:2023-05-02 15:55:14 公開日:2023-04-29
# スパース行列による加法ガウス過程の表現 Representing Additive Gaussian Processes by Sparse Matrices ( http://arxiv.org/abs/2305.00324v1 ) ライセンス: Link先を確認	Lu Zou, Haoyuan Chen, Liang Ding	(参考訳) 一般化された加法モデルの中で、加法的Mat\'ern Gaussian Processes (GPs) はスケーラブルな高次元問題において最もよく用いられる。彼らの加法構造と確率微分方程式表現のおかげで、バックフィッティングに基づくアルゴリズムは、後進平均の計算の時間的複雑さを$O(n^3)$から$O(n\log n)$ timeに減らすことができる。しかし、これらのアルゴリズムを一般化して後方分散と最大対数類似度を効率的に計算することは未解決の問題である。本研究では,加法的Mat\'ern GP に対して,後続平均だけでなく,後続分散,対数類似度,勾配もスパース行列とスパースベクトルのみを含む式で表すことができることを示した。これらのスパース式を用いてバックフィッティングに基づくアルゴリズムを一般化し,これら3つの関数の後方平均,後方分散,対数類似度,勾配を,すべてo(n \log n)$ timeで効率的に計算する方法を示す。我々はベイジアン最適化にアルゴリズムを適用し、ベイジアン最適化における後方更新、ハイパーパラメータ学習、および取得関数の計算とその勾配の効率的なアルゴリズムを提案する。後者を考えると、アルゴリズムは、取得関数とその勾配を一般の学習率で$o(n^2)$から$o(\log n)$に、小さな学習率で$o(1)$まで計算する時間の複雑さを大幅に削減します。 Among generalized additive models, additive Mat\'ern Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Mat\'ern GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.	翻訳日:2023-05-02 15:54:48 公開日:2023-04-29
# データマイニングアルゴリズムを活用してソースコードの変更を推奨 Leveraging Data Mining Algorithms to Recommend Source Code Changes ( http://arxiv.org/abs/2305.00323v1 ) ライセンス: Link先を確認	AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal, Latifa Guerrouj, Foutse Khomh	(参考訳) コンテキスト: 最近の研究では、開発者がソースコードの変更をガイドできる技術を開発するために、データマイニングが使われています。私たちの知る限りでは、データマイニング技術を調査したり、他のアルゴリズムやベースラインと比較したりする研究はほとんどありません。目的: 4つのデータマイニングアルゴリズムを用いてソースコード変更を推奨する自動手法を提案する。これらのアルゴリズムはソースコードの変更を推奨するだけでなく、実証的な評価も行います。方法: 調査には7つのオープンソースプロジェクトが含まれており、ファイルレベルでのソース変更履歴を抽出した。 4つの広範にわたるデータマイニングアルゴリズム \ie{} apriori, fp- growth, eclat, relimを用いて、アルゴリズムの性能(精度、リコール、f-測定)と実行時間の比較を行った。結果:Aprioriのような頻繁なパターンマイニングアルゴリズムが,他のアルゴリズムよりも優れている場合もあるが,研究対象のプロジェクトの性質や特性,特に変更履歴が原因で,すべてのソフトウェアプロジェクトにおいて一貫性が保たれているという実証的証拠が得られた。結論: aprioriは大規模プロジェクトに適しているが、eclatは小規模プロジェクトに適しているようだ。さらに、FP-Growthは実行時間の面で効率的なアプローチである。 Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.	翻訳日:2023-05-02 15:54:18 公開日:2023-04-29
# l_\infty$-recovery of nonlinear functions: a polynomial sample complexity bound for gaussian random fields Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields ( http://arxiv.org/abs/2305.00322v1 ) ライセンス: Link先を確認	Kefan Dong, Tengyu Ma	(参考訳) 多くの機械学習アプリケーションは入力領域全体、すなわち$L_\infty$-errorという小さな最悪のエラーを持つ関数を学習する必要があるが、既存の理論では$L_2$-errorのような平均エラーの回復しか保証していない。多項式サンプルからの$L_\infty$-recoveryは、定数ノルム無限幅2層ニューラルネットのような一見単純な関数クラスでは不可能である。本稿では, 地中関数のランダム性を活用することにより, 予測不可能性を超えた初期ステップを提案する。ガウス確率場から引き出されたランダム接地構造関数に束縛された多項式サンプル複雑性を証明した。我々の重要な技術的ノベルティは、ガウス確率場からの函数の次数-$k$球面調和成分が、その$L_\infty$/$L_2$比が高い確率で$O(d \sqrt{\ln k})$で上界であることを証明することである。対照的に、次数-k$球面調和に対する最悪の場合の$l_\infty$/$l_2$比は、$\omega(\min\{d^{k/2},k^{d/2}\})$である。 Many machine learning applications require learning a function with a small worst-case error over the entire input domain, that is, the $L_\infty$-error, whereas most existing theoretical works only guarantee recovery in average errors such as the $L_2$-error. $L_\infty$-recovery from polynomial samples is even impossible for seemingly simple function classes such as constant-norm infinite-width two-layer neural nets. This paper makes some initial steps beyond the impossibility results by leveraging the randomness in the ground-truth functions. We prove a polynomial sample complexity bound for random ground-truth functions drawn from Gaussian random fields. Our key technical novelty is to prove that the degree-$k$ spherical harmonics components of a function from Gaussian random field cannot be spiky in that their $L_\infty$/$L_2$ ratios are upperbounded by $O(d \sqrt{\ln k})$ with high probability. In contrast, the worst-case $L_\infty$/$L_2$ ratio for degree-$k$ spherical harmonics is on the order of $\Omega(\min\{d^{k/2},k^{d/2}\})$.	翻訳日:2023-05-02 15:53:57 公開日:2023-04-29
# 腐敗したマルチモーダルデータを用いた実世界サーベイランスにおける近赤外人物認証の融合 Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data ( http://arxiv.org/abs/2305.00320v1 ) ライセンス: Link先を確認	Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger	(参考訳) V-I ReID (Visible-infrared person re-identification) は、RGBとIRカメラの分散ネットワーク上で撮影された個人の画像と一致する。 vモードとiモードの大きな違い、特に実世界の状況下では、画像がぼやけ、ノイズ、天気によって腐敗しているため、この課題は困難である。実際、最先端のV-I ReIDモデルは、破損したモダリティ情報を利用して高い精度を維持することはできない。本稿では,マルチモーダル画像に対するロバスト性を改善するために,モダリティ固有の知識を保持するマルチモーダル中間流融合(mmsf)と呼ばれるマルチモーダルv-iリードの効率的なモデルを提案する。さらに、3つの最先端の注意に基づくマルチモーダル融合モデルを用いて、v-i reidの破損したマルチモーダルデータに対処する。近年,現実シナリオにおけるReIDモデルの堅牢性を評価するための評価プロトコルが提案されている。しかしながら、これらのプロトコルはunimodal V設定に限られている。マルチモーダル(およびクロスモーダル)のV-I人物ReIDモデルの現実的な評価のために,VとIカメラが共位置(CL)であり、共位置(NCL)ではないシナリオを対象とした,新しい挑戦的破損データセットを提案する。最後に、マルチモーダル汚職に対するReIDモデルの堅牢性を改善するため、我々のMasking and Local Multimodal Data Augmentation(ML-MDA)戦略の利点を検討する。 SYSU-MM01, RegDB, および ThermalWORLD データセットのクリーンで破損したバージョンについて実験した結果, 実世界の運用条件下では良好に動作しそうなマルチモーダル V-I ReID モデルが得られた。特に,我々のML-MDAは,劣化したマルチモーダル画像を処理する際の高精度かつ堅牢性を維持するために,V-I人物ReIDシステムにとって重要な戦略である。また,マルチモーダル ReID モデル MMSF は,CL と NCL のカメラシナリオ下での全手法より優れている。 Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.	翻訳日:2023-05-02 15:53:33 公開日:2023-04-29
# 制約付きメタ最適輸送による再ランク学習 Learning to Re-rank with Constrained Meta-Optimal Transport ( http://arxiv.org/abs/2305.00319v1 ) ライセンス: Link先を確認	Andr\'es Hoyos-Idrobo	(参考訳) 検索システムにおける多くの再ランク戦略は確率的ランク付けポリシーに依存しており、Douubly-Stochastic (DS) 行列として符号化されており、期待されるランク付けの制約を満たす。これらの戦略は一般的に2段階のパイプラインである: \emph{i} はオフラインのポリシー構築ステップであり、 \emph{ii} はオンラインのランキング手順のサンプリングである。再ランクポリシを構築するには、各発行されたクエリに対して、制約付き最適化問題を繰り返し解決する必要がある。したがって、新しい/未知のクエリの最適化手順を再計算する必要がある。サンプリングに関して、Birkhoff-von-Neumann分解(BvND)は、DSベースのポリシーからランキングを引き出すための好ましいアプローチである。しかし、BvNDはオンラインで計算するには高すぎる。したがって、サンプリングソリューションとしてのBvNDは、$N$クエリと$n$ドキュメントに対して$\gO(N\, n^2)$として成長できるため、メモリ消費である。本稿では,公正な確率的再配置政策を予測するための新しい,高速で軽量な方法,制約付きメタ最適輸送 (comot) を提案する。この方法は、学習からランクまでのシステムのようなクエリ間で共有されるニューラルネットワークに適合する。また、dsベースのポリシーによるオンラインサンプリングアプローチであるgumbel-matching sampling (gumms)についても紹介する。提案するパイプラインである CoMOT + GumMS は,単一のモデルのパラメータを格納するだけでよい。 FOE制約の下で、TREC 2019と2020のデータセットでパイプラインを実証的に評価しました。実験の結果,CoMOTは,クエリ毎の平均文書数に比例して,保持データに対する公正な再ランクポリシを急速に予測することがわかった。また、オリジナルの最適化ベースのポリシーと同様の公平さとランキングパフォーマンスを表示する。さらに,GumMS の有効性を実証的に検証し,DS ベースのポリシーを予測する。 Many re-ranking strategies in search systems rely on stochastic ranking policies, encoded as Doubly-Stochastic (DS) matrices, that satisfy desired ranking constraints in expectation, e.g., Fairness of Exposure (FOE). These strategies are generally two-stage pipelines: \emph{i)} an offline re-ranking policy construction step and \emph{ii)} an online sampling of rankings step. Building a re-ranking policy requires repeatedly solving a constrained optimization problem, one for each issued query. Thus, it is necessary to recompute the optimization procedure for any new/unseen query. Regarding sampling, the Birkhoff-von-Neumann decomposition (BvND) is the favored approach to draw rankings from any DS-based policy. However, the BvND is too costly to compute online. Hence, the BvND as a sampling solution is memory-consuming as it can grow as $\gO(N\, n^2)$ for $N$ queries and $n$ documents. This paper offers a novel, fast, lightweight way to predict fair stochastic re-ranking policies: Constrained Meta-Optimal Transport (CoMOT). This method fits a neural network shared across queries like a learning-to-rank system. We also introduce Gumbel-Matching Sampling (GumMS), an online sampling approach from DS-based policies. Our proposed pipeline, CoMOT + GumMS, only needs to store the parameters of a single model, and it generalizes to unseen queries. We empirically evaluated our pipeline on the TREC 2019 and 2020 datasets under FOE constraints. Our experiments show that CoMOT rapidly predicts fair re-ranking policies on held-out data, with a speed-up proportional to the average number of documents per query. It also displays fairness and ranking performance similar to the original optimization-based policy. Furthermore, we empirically validate the effectiveness of GumMS to approximate DS-based policies in expectation.	翻訳日:2023-05-02 15:52:55 公開日:2023-04-29
# 理想的な連続学習者:決して忘れないエージェント The Ideal Continual Learner: An Agent That Never Forgets ( http://arxiv.org/abs/2305.00316v1 ) ライセンス: Link先を確認	Liangzu Peng, Paris V. Giampouras, Ren\'e Vidal	(参考訳) 連続学習の目的は、学習者に順次提示される複数の学習課題を解決するモデルを見つけることである。この設定における重要な課題は、新しいタスクを学ぶとき、学習者が前のタスクの解き方を忘れてしまう可能性があることである。この課題に対処するために,メモリベース,正規化ベース,拡張ベースなど,多くの実用的な手法が提案されている。しかし、これらの手法の厳密な理論的理解はいまだ解明されていない。本稿では,この理論と実践のギャップを埋めるために,建設による破滅的忘れ去を回避できるideal continual learninger(icl)と呼ばれる新しい連続学習フレームワークを提案する。 ICLは複数の確立された連続学習手法を統合し、これらの手法の強みと弱みに関する新たな理論的知見を提供する。また、リハーサルが一般化にどのように影響するかを理論的に定量化できるiclの一般化境界も導出する。最後に、ICLをいくつかの古典的主題と近代的関心の研究トピックに結びつけることで、歴史的発言をし、今後の方向性を刺激することができる。 The goal of continual learning is to find a model that solves multiple learning tasks which are presented sequentially to the learner. A key challenge in this setting is that the learner may forget how to solve a previous task when learning a new task, a phenomenon known as catastrophic forgetting. To address this challenge, many practical methods have been proposed, including memory-based, regularization-based, and expansion-based methods. However, a rigorous theoretical understanding of these methods remains elusive. This paper aims to bridge this gap between theory and practice by proposing a new continual learning framework called Ideal Continual Learner (ICL), which is guaranteed to avoid catastrophic forgetting by construction. We show that ICL unifies multiple well-established continual learning methods and gives new theoretical insights into the strengths and weaknesses of these methods. We also derive generalization bounds for ICL which allow us to theoretically quantify how rehearsal affects generalization. Finally, we connect ICL to several classic subjects and research topics of modern interest, which allows us to make historical remarks and inspire future directions.	翻訳日:2023-05-02 15:52:22 公開日:2023-04-29
# InfraDet3D:ロードサイドインフラストラクチャカメラとLiDARセンサを用いたマルチモード3Dオブジェクト検出 InfraDet3D: Multi-Modal 3D Object Detection based on Roadside Infrastructure Camera and LiDAR Sensors ( http://arxiv.org/abs/2305.00314v1 ) ライセンス: Link先を確認	Walter Zimmer, Joseph Birkner, Marcel Brucker, Huu Tung Nguyen, Stefan Petrovski, Bohan Wang, Alois C. Knoll	(参考訳) 現在のマルチモーダル物体検出手法は車両領域に焦点をあてており、知覚範囲と処理能力に制限がある。道路脇センサユニット(rsus)は、知覚システムのための新しいドメインを導入し、高度を利用して交通を観測する。ガントリーブリッジに搭載されたカメラとLiDARは認識範囲を増やし、トラフィックの完全なデジタル双対を生成する。本研究では,道路インフラストラクチャセンサのためのマルチモーダル3Dオブジェクト検出器であるInfraDet3Dを紹介する。初期核融合により2つのLiDARを融合させ、さらに単眼カメラからの検知を取り入れてロバスト性を高め、小さな物体を検出する。我々の単分子3D検出モジュールはHDマップを使って仮説を立て、最終的な知覚結果を改善する。知覚フレームワークは、ドイツのミュンヘンにあるa9テストストレッチの一部である現実世界の交差点にデプロイされる。いくつかのアブレーション研究と実験を行い、2台のLiDARを2台のカメラで融合させることで、カメラのみのソリューションに比べて+1.90 mAPが改善されることを示した。 a9 インフラストラクチャデータセットでの結果を評価し,テストセット上で68.48 マップを達成した。データセットとコードはhttps://a9-dataset.comで公開され、研究コミュニティは認識結果をさらに改善し、自動運転をより安全にすることができる。 Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this work, we introduce InfraDet3D, a multi-modal 3D object detector for roadside infrastructure sensors. We fuse two LiDARs using early fusion and further incorporate detections from monocular cameras to increase the robustness and to detect small objects. Our monocular 3D detection module uses HD maps to ground object yaw hypotheses, improving the final perception results. The perception framework is deployed on a real-world intersection that is part of the A9 Test Stretch in Munich, Germany. We perform several ablation studies and experiments and show that fusing two LiDARs with two cameras leads to an improvement of +1.90 mAP compared to a camera-only solution. We evaluate our results on the A9 infrastructure dataset and achieve 68.48 mAP on the test set. The dataset and code will be available at https://a9-dataset.com to allow the research community to further improve the perception results and make autonomous driving safer.	翻訳日:2023-05-02 15:52:03 公開日:2023-04-29
# 制約付き多目的フェデレーション学習におけるプライバシ、ユーティリティ、効率の最適化 Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning ( http://arxiv.org/abs/2305.00312v1 ) ライセンス: Link先を確認	Yan Kang, Hanlin Gu, Xingxing Tang, Yuanqin He, Yuzhu Zhang, Jinnan He, Yuxing Han, Lixin Fan, Qiang Yang	(参考訳) 従来、連合学習は単一の目的、通常はユーティリティを最適化することを目的としていた。しかし、連合学習システムが信頼できるためには、モデル性能の最大化、プライバシのリークとトレーニングコストの最小化、悪意のある攻撃に対する堅牢性など、複数の目標を同時に満たす必要がある。複数の競合する目的を同時に最適化することを目的とした多目的最適化(MOO)は、信頼できるフェデレートラーニング(TFL)の最適化問題を解決するのに非常に適している。本稿では,制約付き多目的フェデレーション学習(CMOFL)の問題を定式化し,MOOとTFLを統一する。この定式化の下では、既存のMOOアルゴリズムをTFLに簡単に適用することができる。汎用性,効率性,公平性,堅牢性を重視した既存のcmoflとは違って,tflシステムの3つの主な目的であるユーティリティ損失とトレーニングコストとともに,プライバシリークの最適化を検討する。 NSGA-II と PSL に基づく 2 つの改良された CMOFL アルゴリズムを開発し,Pareto 最適解を効果的かつ効率的に検出し,その収束に関する理論的解析を行った。我々は、ランダム化、BatchCrypt(同型暗号化の効率的なバージョン)、スパシフィケーションの3つのプライバシ保護メカニズムに対して、プライバシー漏洩、ユーティリティ損失、トレーニングコストの具体的な測定を設計する。 3つの保護機構のそれぞれで実験を行い,提案手法の有効性を実証した。 Conventionally, federated learning aims to optimize a single objective, typically the utility. However, for a federated learning system to be trustworthy, it needs to simultaneously satisfy multiple/many objectives, such as maximizing model performance, minimizing privacy leakage and training cost, and being robust to malicious attacks. Multi-Objective Optimization (MOO) aiming to optimize multiple conflicting objectives at the same time is quite suitable for solving the optimization problem of Trustworthy Federated Learning (TFL). In this paper, we unify MOO and TFL by formulating the problem of constrained multi-objective federated learning (CMOFL). Under this formulation, existing MOO algorithms can be adapted to TFL straightforwardly. Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system. We develop two improved CMOFL algorithms based on NSGA-II and PSL, respectively, for effectively and efficiently finding Pareto optimal solutions, and we provide theoretical analysis on their convergence. We design specific measurements of privacy leakage, utility loss, and training cost for three privacy protection mechanisms: Randomization, BatchCrypt (An efficient version of homomorphic encryption), and Sparsification. Empirical experiments conducted under each of the three protection mechanisms demonstrate the effectiveness of our proposed algorithms.	翻訳日:2023-05-02 15:51:39 公開日:2023-04-29
# 典型性をもつ条件論理における多層パーセプトロンの優先的解釈 A preferential interpretation of MultiLayer Perceptrons in a conditional logic with typicality ( http://arxiv.org/abs/2305.00304v1 ) ライセンス: Link先を確認	Mario Alviano, Francesco Bartoli, Marco Botta, Roberto Esposito, Laura Giordano, Daniele Theseider Dupr\'e	(参考訳) 本稿では,知識表現におけるデファシブル推論のための多項述語セマンティクスと多層ニューラルネットワークモデルとの関係について検討する。典型的な単純な記述論理に対する重み付き知識ベースは、(多値) ``concept-wise" 多重参照セマンティクスの下で考慮される。セマンティクスは、MultiLayer Perceptrons(MLP)の優先的な解釈を提供するために使用される。 MLPの条件特性の検証には,モデルチェックとエンテーメントに基づくアプローチが有効である。 In this paper we investigate the relationships between a multipreferential semantics for defeasible reasoning in knowledge representation and a multilayer neural network model. Weighted knowledge bases for a simple description logic with typicality are considered under a (many-valued) ``concept-wise" multipreference semantics. The semantics is used to provide a preferential interpretation of MultiLayer Perceptrons (MLPs). A model checking and an entailment based approach are exploited in the verification of conditional properties of MLPs.	翻訳日:2023-05-02 15:51:13 公開日:2023-04-29
# 計算量子秘密共有 Computational Quantum Secret Sharing ( http://arxiv.org/abs/2305.00356v1 ) ライセンス: Link先を確認	Alper \c{C}akan, Vipul Goyal, Chen-Da Liu-Zhang, Jo\~ao Ribeiro	(参考訳) 量子秘密共有(quantum secret sharing, qss)は、ディーラーが秘密の量子状態を一組のパーティに分散させ、あるサブセットが秘密を再構築できるようにする。 QSSは20年以上前に導入されたが、以前の研究は完全なセキュアなスキームの存在のみに焦点を当てており、既知のスキームの共有サイズは多項式サイズモノトーン回路によって計算されたアクセス構造に対しても指数関数的である。これは古典的な場合とは対照的に、$\mathsf{monotone~P}$の全てのアクセス構造に対して効率的な計算安全スキームが長く知られており、完全なセキュリティでは不可能な秘密よりもはるかに短い共有を得ることもできる。本研究では、計算安全QSSの研究を開始し、計算仮定がQSSスキームの構築に大いに役立つことを示す。我々は、単純なコンパイラを示し、それを用いて多種多様な結果を得る:我々は、リッチなアクセス構造のための標準仮定の下で多項式時間qssスキームを構築する。これには、以前のQSSの結果が指数的な共有サイズを必要とする多くのアクセス構造が含まれている。また、株のサイズが秘密のサイズよりも大幅に小さいQSSスキームを構築します。古典的な場合のように、完全なセキュリティでは不可能です。また、計算QSSを超える結果を得るためにコンパイラを使用します。情報理論では、大規模アクセス構造に対する完全なQSSスキームの共有サイズを1.5^{n+o(n)}$に改善し、最もよく知られたスキームを改善し、古典的ケースにおける一般的なアクセス構造に対して最もよく知られた結果と整合する。最後に、量子秘密共有スキームに秘密のコピーが複数与えられた場合、すべてのアクセス構造に対する効率的なスキームを $\mathsf{p}$ と $\mathsf{np}$ で構築する。 Quantum secret sharing (QSS) allows a dealer to distribute a secret quantum state among a set of parties so that certain subsets can reconstruct the secret, while unauthorized subsets obtain no information. While QSS was introduced over twenty years ago, previous works focused only on existence of perfectly secure schemes, and the share size of the known schemes is exponential even for access structures computed by polynomial size monotone circuits. This stands in contrast to the classical case, where efficient computationally-secure schemes have been long known for all access structures in $\mathsf{monotone~P}$, and one can even obtain shares which are much shorter than the secret which is impossible with perfect security. In this work, we initiate the study of computationally-secure QSS and show that computational assumptions help significantly in building QSS schemes. We present a simple compiler and use it to obtain a large variety results: We construct polynomial-time QSS schemes under standard assumptions for a rich class of access structures. This includes many access structures for which previous results in QSS required exponential share size. We also construct QSS schemes for which the size of the shares is significantly smaller than the size of the secret. As in the classical case, this is impossible with perfect security. We also use our compiler to obtain results beyond computational QSS. In the information-theoretic setting, we improve the share size of perfect QSS schemes for a large class of access structures to $1.5^{n+o(n)}$, improving upon best known schemes and matching the best known result for general access structures in the classical case. Finally, we show construct efficient schemes for all access structures in $\mathsf{P}$ and $\mathsf{NP}$ when the quantum secret sharing scheme is given multiple of copies of the secret.	翻訳日:2023-05-02 15:44:44 公開日:2023-04-29
# MH-DETR:クロスモーダルトランスを用いたビデオモーメントと光検出 MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer ( http://arxiv.org/abs/2305.00355v1 ) ライセンス: Link先を確認	Yifang Xu, Yunzhuo Sun, Yang Li, Yilei Shi, Xiaoxiang Zhu, Sidan Du	(参考訳) ビデオ理解の需要が高まり、ビデオモーメントとハイライト検出(MHD)が重要な研究トピックとして浮上している。 MHDはすべての瞬間をローカライズし、クリップワイドのサリエンシスコアを同時に予測することを目的としている。既存のDETRに基づく手法の進歩にもかかわらず、これらの手法は時間的モード内コンテキストを弱め、結果としてモーダル間相互作用が不十分となる様々なモードから粗い特徴を融合する。本稿では,MHDに適したMH-DETR(Moment and Highlight Detection Transformer)を提案する。具体的には,ユニモーダルエンコーダ内に,グローバル・イントラモーダル・コンテキストをキャプチャする簡易かつ効率的なプーリング演算子を導入する。さらに、時間的に調整されたクロスモーダル特徴を得るために、エンコーダとデコーダ間のプラグ・アンド・プレイクロスモーダル相互作用モジュールを設計し、視覚的な特徴とテキスト的な特徴をシームレスに統合する。 QVHighlights、Charades-STA、Activity-Net、TVSumデータセットに関する総合的な実験は、MH-DETRが既存の最先端手法よりも優れており、その効果と優位性を示していることを示している。私たちのコードはhttps://github.com/YoucanBaby/MH-DETRで利用可能です。 With the increasing demand for video understanding, video moment and highlight detection (MHD) has emerged as a critical research topic. MHD aims to localize all moments and predict clip-wise saliency scores simultaneously. Despite progress made by existing DETR-based methods, we observe that these methods coarsely fuse features from different modalities, which weakens the temporal intra-modal context and results in insufficient cross-modal interaction. To address this issue, we propose MH-DETR (Moment and Highlight Detection Transformer) tailored for MHD. Specifically, we introduce a simple yet efficient pooling operator within the uni-modal encoder to capture global intra-modal context. Moreover, to obtain temporally aligned cross-modal features, we design a plug-and-play cross-modal interaction module between the encoder and decoder, seamlessly integrating visual and textual features. Comprehensive experiments on QVHighlights, Charades-STA, Activity-Net, and TVSum datasets show that MH-DETR outperforms existing state-of-the-art methods, demonstrating its effectiveness and superiority. Our code is available at https://github.com/YoucanBaby/MH-DETR.	翻訳日:2023-05-02 15:43:49 公開日:2023-04-29
# 法医学的顔比較のための埋め込みアグリゲーション Embedding Aggregation for Forensic Facial Comparison ( http://arxiv.org/abs/2305.00352v1 ) ライセンス: Link先を確認	Rafael Oliveira Ribeiro, Jo\~ao C. R. Neves, Arnout C. C. Ruifrok, Flavio de Barros Vidal	(参考訳) 法医学的な顔比較では、疑わしいソース画像は、通常、制御されていない環境、不均一な照明、そして非協力的な被験者から撮影される。このような資料の質の低さは、通常法的事項の証拠としての価値を損なう。一方、法医学的なケースワークでは、興味ある人物の複数の画像が通常利用可能である。本稿では,顔認証の性能向上のために,同一人物のさまざまな画像からのディープニューラルネットワークの埋め込みを集約することを提案する。特に低画質画像では,性能が著しく向上した。さらなる改善は、より多くの画像の埋め込みを集約し、品質重み付けアグリゲーションを適用することで得られる。本手法は,cctv画像に対して最大95%(0.249～0.012),ソーシャルメディア画像では最大96%(0.083～0.003)のcllr改善を報告し,スコアベース度比システムの開発と検証を行い,法医学的評価において有効であることを示す。 In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to aggregate deep neural network embeddings from various images of the same person to improve performance in facial verification. We observe significant performance improvements, especially for very low-quality images. Further improvements are obtained by aggregating embeddings of more images and by applying quality-weighted aggregation. We demonstrate the benefits of this approach in forensic evaluation settings with the development and validation of score-based likelihood ratio systems and report improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.	翻訳日:2023-05-02 15:43:14 公開日:2023-04-29
# POUF: 大規模事前訓練モデルのためのプロンプト指向の教師なし微調整 POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models ( http://arxiv.org/abs/2305.00350v1 ) ライセンス: Link先を確認	Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou	(参考訳) プロンプトを通じて、大規模な事前訓練型モデルはより表現力が高く、力強くなり、近年は注目されている。これらの大きなモデルはゼロショット機能を持っているが、一般にラベル付きデータはダウンストリームタスクに適応するために必要である。この限界を克服するために、モデルを直接微調整したり、ラベルのないターゲットデータにプロンプトを付与する教師なしの微調整フレームワークを提案する。本稿では,プロンプトとターゲットデータから抽出した離散分布を整列させて,言語拡張視覚とマスキング言語モデルの両方に適用する方法を示す。提案手法の適用性を検証するため,画像分類,感情分析,自然言語推論タスクについて広範な実験を行った。 13のイメージ関連タスクと15の言語関連タスクに対して,提案手法はベースラインよりも一貫した改善を実現する。 Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data. To verify our approach's applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.	翻訳日:2023-05-02 15:42:56 公開日:2023-04-29
# 身体視におけるモダリティ不変の視覚計測 Modality-invariant Visual Odometry for Embodied Vision ( http://arxiv.org/abs/2305.00348v1 ) ライセンス: Link先を確認	Marius Memmel, Roman Bachmann, Amir Zamir	(参考訳) エージェントを現実的でノイズの多い環境で効果的にローカライズすることは、多くの具体的視覚タスクに不可欠である。ビジュアルオドメトリー(VO)は、特に屋内環境では、信頼性の低いGPSやコンパスセンサーの代替となる。 SLAMベースの手法は、大きなデータ要求なしに安定した性能を示すが、学習ベースのアプローチに比べて、ノイズやセンサースイートの変更に対して柔軟性が低く、堅牢である。しかし、最近のディープVOモデルは、数百万のサンプルをトレーニングしながら、RGBや深さなどの入力モードの固定セットに制限されている。センサーが故障した場合、センサースイートが変更され、あるいは電力消費などの利用可能なリソースのために、モダリティが意図的にループアウトされる。さらに、これらのモデルをスクラッチからトレーニングすることは、シミュレーターアクセスや、微調整可能な既存のモデルなしでさらにコストがかかる。このようなシナリオはシミュレーションでほとんど無視されるが、実世界のアプリケーションでモデルの再利用性を妨げる。本稿では,様々なナビゲーションエージェントのセンサスイートに対応可能なトランスフォーマティブ型モダリティ不変voアプローチを提案する。我々のモデルは、データの一部をトレーニングしながら、以前の方法よりも優れています。この手法が、フレキシブルで学習されたVOモデルの恩恵を受けることができる幅広い現実世界アプリケーションへの扉を開くことを願っている。 Effectively localizing an agent in a realistic, noisy setting is crucial for many embodied vision tasks. Visual Odometry (VO) is a practical substitute for unreliable GPS and compass sensors, especially in indoor environments. While SLAM-based methods show a solid performance without large data requirements, they are less flexible and robust w.r.t. to noise and changes in the sensor suite compared to learning-based approaches. Recent deep VO models, however, limit themselves to a fixed set of input modalities, e.g., RGB and depth, while training on millions of samples. When sensors fail, sensor suites change, or modalities are intentionally looped out due to available resources, e.g., power consumption, the models fail catastrophically. Furthermore, training these models from scratch is even more expensive without simulator access or suitable existing models that can be fine-tuned. While such scenarios get mostly ignored in simulation, they commonly hinder a model's reusability in real-world applications. We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents. Our model outperforms previous methods while training on only a fraction of the data. We hope this method opens the door to a broader range of real-world applications that can benefit from flexible and learned VO models.	翻訳日:2023-05-02 15:42:40 公開日:2023-04-29
# 協調型aiの可能性を解き放つ ---連合機械学習の社会技術的課題- Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning ( http://arxiv.org/abs/2304.13688v3 ) ライセンス: Link先を確認	Tobias M\"uller, Milena Zahn and Florian Matthes	(参考訳) AIシステムの破壊的なポテンシャルは、ビッグデータの出現に根ざしている。しかし、かなりの部分が散らばってデータサイロに閉じ込められ、その潜在能力は失われている。 Federated Machine Learningは、分散化された潜在的サイロデータからAIモデルを作成することができる、新しいAIパラダイムである。したがって、フェデレーション機械学習は技術的にデータサイロを開放し、経済的な可能性を開くことができる。しかし、これはデータサイロを所有する複数のパーティ間のコラボレーションを必要とする。協調型ビジネスモデルのセットアップは複雑であり、しばしば失敗の原因となる。現在の文献には、協調AIプロジェクトを成功させるために考慮すべき側面のガイドラインが欠けている。本研究では,協調型ビジネスモデルの普及の課題と,連合機械学習の異なる側面について検討する。体系的な文献レビュー、フォーカスグループ、エキスパートインタビューを通じて、社会技術的課題の体系化されたコレクションと、協調aiプロジェクトの初期実行可能性評価のための拡張ビジネスモデルキャンバスを提供する。 The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic potential. However, this requires collaboration between multiple parties owning data silos. Setting up collaborative business models is complex and often a reason for failure. Current literature lacks guidelines on which aspects must be considered to successfully realize collaborative AI projects. This research investigates the challenges of prevailing collaborative business models and distinct aspects of Federated Machine Learning. Through a systematic literature review, focus group, and expert interviews, we provide a systemized collection of socio-technical challenges and an extended Business Model Canvas for the initial viability assessment of collaborative AI projects.	翻訳日:2023-05-02 10:43:23 公開日:2023-04-29
# 三対角トープリッツ行列と二部量子相関 Tridiagonal Toeplitz Matrices and Bipartite Quantum Correlations ( http://arxiv.org/abs/2302.10192v3 ) ライセンス: Link先を確認	Varsha S. Sambhaje, Suprabhat Sinha, Kapil K. Sharma	(参考訳) 本稿では,量子情報によく用いられる有効なハミルトニアンの要件を満たす三対角トエプリッツエルミット行列に着目する。このような行列の挙動を調べ、二部分級ヴェルナー状態と最大絡み合った混合状態に対する量子相関(絡み合いと量子不協和)のダイナミクスを追求する。 Toeplitz行列の主対角線項が両方の量子状態の量子相関に影響を与えないことは興味深い結果である。しかし、超対角および亜対角項は力学において重要な役割を果たす。突然の絡み合い死の現象を調査し,絡み合いがない場合の量子不協和の存在を観察した。最も重要なことは、MEMSがワーナー状態よりも敏感であることである。 In this article, we focus on tridiagonal Toeplitz Hermitian matrices, which fulfill the requirement of a valid Hamiltonian often used in Quantum Information. We investigate the behavior of such matrices to pursue the dynamics of quantum correlations (entanglement and quantum discord) for bipartite Werner state and maximally entangled mixed states. We have found interesting results that the main diagonal terms in the Toeplitz matrices never affect the quantum correlations in both quantum states. However, super-diagonal and sub-diagonal terms play the important role in the dynamics. We investigate the phenomenon of entanglement sudden death and also observe the presence of quantum discord in the absence of entanglement. Most importantly it is found that MEMS is more sensitive in comparison to the Werner state.	翻訳日:2023-05-02 10:43:09 公開日:2023-04-29
# chatvideo:トラックレット中心のマルチモーダル・多目的ビデオ理解システム ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System ( http://arxiv.org/abs/2304.14407v2 ) ライセンス: Link先を確認	Junke Wang and Dongdong Chen and Chong Luo and Xiyang Dai and Lu Yuan and Zuxuan Wu and Yu-Gang Jiang	(参考訳) 既存のディープビデオモデルは、特定のタスク、固定された入出力空間、一般化能力に制限されているため、現実のシナリオでのデプロイが困難である。本稿では,マルチモーダル・多目的ビデオ理解のためのビジョンを示し,プロトタイプシステムである \system を提案する。本システムは,トラックレットを基本ビデオ単位として扱い,様々なビデオファウンデーションモデル(ViFM)を用いて,その特性,例えば外見,動き,および<etc>をアノテートする,トラックレット中心のパラダイムに基づいて構築されている。検出されたトラックレットはすべてデータベースに格納され、データベースマネージャを介してユーザと対話する。我々は,様々な形態の動画のケーススタディを行い,様々なビデオ関連問題に対処するための手法の有効性を実証した。私たちのプロジェクトはhttps://www.wangjunke.info/ChatVideo/で利用可能です。 Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios. In this paper, we present our vision for multimodal and versatile video understanding and propose a prototype system, \system. Our system is built upon a tracklet-centric paradigm, which treats tracklets as the basic video unit and employs various Video Foundation Models (ViFMs) to annotate their properties e.g., appearance, motion, \etc. All the detected tracklets are stored in a database and interact with the user through a database manager. We have conducted extensive case studies on different types of in-the-wild videos, which demonstrates the effectiveness of our method in answering various video-related problems. Our project is available at https://www.wangjunke.info/ChatVideo/	翻訳日:2023-05-02 10:32:58 公開日:2023-04-29
# 都市空間時間予測の効率化に向けて:統一図書館と性能ベンチマーク Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark ( http://arxiv.org/abs/2304.14343v2 ) ライセンス: Link先を確認	Jingyuan Wang, Jiawei Jiang, Wenjun Jiang, Chengkai Han, Wayne Xin Zhao	(参考訳) 深層学習技術が進歩し、都市空間時空間データが蓄積するにつれて、都市空間時空間予測問題を解決するための深層学習モデルが増えている。しかし、既存の分野には、さまざまなフォーマットで、使いづらいオープンソースのデータ、コードとデータをオープンに利用可能にする論文、さまざまなフレームワークやプラットフォームを使用するオープンソースモデルなど、制限があり、比較が難しい。これらのメソッドを実装し評価するには、標準化されたフレームワークが緊急に必要です。これらの課題に対処するため、都市空間時空間予測の総合的なレビューを行い、原子ファイルと呼ばれる空間時空間データの統一記憶形式を提案する。また、libcityは、研究者に信頼できる実験ツールと便利な開発フレームワークを提供するオープンソースライブラリである。本図書館では,65の空間-時間予測モデルを再現し,55の空間-時間データセットを収集した。 LibCityを用いて、異なるモデルやコンポーネントの有効性を検証する一連の実験を行い、将来有望な技術開発と研究の方向性を時空間予測のために要約した。公平なモデル比較を可能にし、統一されたデータストレージフォーマットを設計し、新しいモデルの開発プロセスを簡単にすることで、libcityは空間-時間予測分野に大きな貢献をする準備が整っている。 As deep learning technology advances and more urban spatial-temporal data accumulates, an increasing number of deep learning models are being proposed to solve urban spatial-temporal prediction problems. However, there are limitations in the existing field, including open-source data being in various formats and difficult to use, few papers making their code and data openly available, and open-source models often using different frameworks and platforms, making comparisons challenging. A standardized framework is urgently needed to implement and evaluate these methods. To address these issues, we provide a comprehensive review of urban spatial-temporal prediction and propose a unified storage format for spatial-temporal data called atomic files. We also propose LibCity, an open-source library that offers researchers a credible experimental tool and a convenient development framework. In this library, we have reproduced 65 spatial-temporal prediction models and collected 55 spatial-temporal datasets, allowing researchers to conduct comprehensive experiments conveniently. Using LibCity, we conducted a series of experiments to validate the effectiveness of different models and components, and we summarized promising future technology developments and research directions for spatial-temporal prediction. By enabling fair model comparisons, designing a unified data storage format, and simplifying the process of developing new models, LibCity is poised to make significant contributions to the spatial-temporal prediction field.	翻訳日:2023-05-02 10:32:43 公開日:2023-04-29

Title

Authors

Abstract

論文公表日・翻訳日

# 材料を用いた拡張可能なマルチモーダルマルチタスクオブジェクトデータセット

An Extensible Multimodal Multi-task Object Dataset with Materials ( http://arxiv.org/abs/2305.14352v1 )

ライセンス: Link先を確認

Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese

(参考訳) リッチマテリアルアノテーションを含むAmazon製品リストの,拡張可能なマルチモーダルデータセットEMMaを提案する。これは280万以上のオブジェクトを含み、それぞれが画像、テキスト、質量、価格、製品評価、およびAmazonの製品分類における位置をリストアップしている。 182の物理材料(プラスチック$\rightarrow$熱可塑性$\rightarrow$ acrylic)の包括的な分類も設計しています。対象は、この分類から1つまたは複数の材料で注釈される。各オブジェクトに利用可能な多数の属性で、我々はSmart Labelingフレームワークを開発し、手作業によるラベル付けをほとんど行わずに、すべてのオブジェクトに新しいバイナリラベルを素早く追加し、データセットを拡張可能にします。データセットの各オブジェクト属性は、モデル入力または出力のいずれかに含めることができるため、タスク設定の組合せ可能性につながります。例えば、リストテキストからオブジェクトカテゴリを予測するためにモデルをトレーニングしたり、製品一覧画像から商品の質量と価格を予測することができる。 emmaはコンピュータビジョンとnlpでマルチタスク学習のための新しいベンチマークを提供し、実践者が大規模に新しいタスクやオブジェクト属性を効率的に追加できるようにする。

We present EMMa, an Extensible, Multimodal dataset of Amazon product listings that contains rich Material annotations. It contains more than 2.8 million objects, each with image(s), listing text, mass, price, product ratings, and position in Amazon's product-category taxonomy. We also design a comprehensive taxonomy of 182 physical materials (e.g., Plastic $\rightarrow$ Thermoplastic $\rightarrow$ Acrylic). Objects are annotated with one or more materials from this taxonomy. With the numerous attributes available for each object, we develop a Smart Labeling framework to quickly add new binary labels to all objects with very little manual labeling effort, making the dataset extensible. Each object attribute in our dataset can be included in either the model inputs or outputs, leading to combinatorial possibilities in task configurations. For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image. EMMa offers a new benchmark for multi-task learning in computer vision and NLP, and allows practitioners to efficiently add new tasks and object attributes at scale.

翻訳日:2023-05-28 05:01:03 公開日:2023-04-29

# 多地点雲の効率的な処理・転送のための高度医用画像表現

Advanced Medical Image Representation for Efficient Processing and Transfer in Multisite Clouds ( http://arxiv.org/abs/2305.15411v1 )

ライセンス: Link先を確認

Elena-Simona Apostol and Ciprian-Octavian Truic\u{a}

(参考訳) 医学研究における重要なトピックは、医療機器から得られる画像を改善するプロセスである。結果として、医療画像の解像度と分析を改善する必要もある。この分野でのもう一つの問題は、大量の保存された医療データ[16]である。例えば医療機関の人間の脳データベースは、年間数十テラバイトのデータを蓄積することができる。本稿では,医療画像に保持される情報を改善するために,複数のデータ構造に基づく新しい医用画像形式表現を提案する。新しい表現は、画像に見つかったオブジェクトのイメージクラスやタグなど、追加のメタデータ情報を保持する。我々は,多層ニューラルネットワークを用いて医用画像中の物体を分類するために,独自のオントロジーを定義した。一般的に大規模なデータセットを扱うため、クラウド環境でmapreduceパラダイムを使用して画像処理をスピードアップしました。クラウドノード間の転送を最適化し,前処理時間を短縮するために,復号化に基づくデータ圧縮手法を提案する。マルチサイトクラウド環境で画像表現と効率的なデータ転送のためのソリューションをテストします。提案手法では,平均27%の時間改善でデータ転送を最適化する。

An important topic in medical research is the process of improving the images obtained from medical devices. As a consequence, there is also a need to improve medical image resolution and analysis. Another issue in this field is the large amount of stored medical data [16]. Human brain databases at medical institutes, for example, can accumulate tens of Terabytes of data per year. In this paper, we propose a novel medical image format representation based on multiple data structures that improve the information maintained in the medical images. The new representation keeps additional metadata information, such as the image class or tags for the objects found in the image. We defined our own ontology to help us classify the objects found in medical images using a multilayer neural network. As we generally deal with large data sets, we used the MapReduce paradigm in the Cloud environment to speed up the image processing. To optimize the transfer between Cloud nodes and to reduce the preprocessing time, we also propose a data compression method based on deduplication. We test our solution for image representation and efficient data transfer in a multisite cloud environment. Our proposed solution optimizes the data transfer with a time improvement of 27% on average.

翻訳日:2023-05-28 04:40:15 公開日:2023-04-29

# フィンテック分野における強化学習の体系的レビュー

Systematic Review on Reinforcement Learning in the Field of Fintech ( http://arxiv.org/abs/2305.07466v1 )

ライセンス: Link先を確認

Nadeem Malibari, Iyad Katib and Rashid Mehmood

(参考訳) 最近、金融技術(Fintech)における強化学習の応用は、多くの賞賛を集めている。膨大な能力と能力を通じて、間違いなく学習が強化され、フィンテックの分野で素晴らしい結果が得られた。本研究の目的は,強化学習とフィンテックの相関関係を探索的に検討し,予測精度,複雑性,スケーラビリティ,リスク,収益性,パフォーマンスを明らかにすることである。金融やフィンテックにおける強化学習の主な用途は、ポートフォリオ最適化、信用リスク低減、投資資本管理、利益の最大化、効果的なレコメンデーションシステム、より良い価格設定戦略である。いくつかの研究は、金融機関の業績に対する強化学習の実際の貢献に対処してきた。この調査に含まれる最新の研究は、2018年以降の出版物である。この調査はレビューの報告に焦点を当てたPRISMA技術を用いて行われ、チェックリストと4相フロー図に基づいている。調査の結果、フィンテック分野におけるRLベースの戦略の性能は、他の最先端のアルゴリズムよりもかなり優れていることが判明した。本稿では、フィンテックにおける多様な意思決定課題における強化学習アルゴリズムの利用について論じ、金融を扱う組織は、ロボアドバイザリング、スマートオーダーチャネル、マーケットメイキング、ヘッジとオプションの価格設定、ポートフォリオ最適化、最適な実行から大きな利益を得ることができると結論づける。

Applications of Reinforcement Learning in the Finance Technology (Fintech) have acquired a lot of admiration lately. Undoubtedly Reinforcement Learning, through its vast competence and proficiency, has aided remarkable results in the field of Fintech. The objective of this systematic survey is to perform an exploratory study on a correlation between reinforcement learning and Fintech to highlight the prediction accuracy, complexity, scalability, risks, profitability and performance. Major uses of reinforcement learning in finance or Fintech include portfolio optimization, credit risk reduction, investment capital management, profit maximization, effective recommendation systems, and better price setting strategies. Several studies have addressed the actual contribution of reinforcement learning to the performance of financial institutions. The latest studies included in this survey are publications from 2018 onward. The survey is conducted using PRISMA technique which focuses on the reporting of reviews and is based on a checklist and four-phase flow diagram. The conducted survey indicates that the performance of RL-based strategies in Fintech fields proves to perform considerably better than other state-of-the-art algorithms. The present work discusses the use of reinforcement learning algorithms in diverse decision-making challenges in Fintech and concludes that the organizations dealing with finance can benefit greatly from Robo-advising, smart order channelling, market making, hedging and options pricing, portfolio optimization, and optimal execution.

翻訳日:2023-05-21 11:13:33 公開日:2023-04-29

# POET: ProFINET産業運用のためのセルフラーニングフレームワーク

POET: A Self-learning Framework for PROFINET Industrial Operations Behaviour ( http://arxiv.org/abs/2305.03175v1 )

ライセンス: Link先を確認

Ankush Meshram, Markus Karch, Christian Haas, J\"urgen Beyerer

(参考訳) 2010年以降、StuxnetやCrashOverrideといった産業インフラにおける複数のサイバーインシデントが、ICS(Industrial Control Systems)の脆弱性をサイバー脅威にさらしている。産業システムは数十年にわたって発注されており、しばしば産業用サイバーセキュリティ機構の技術的進歩に非準拠している。ネットワークインフラストラクチャ情報の利用不可能は,セキュリティポリシの設計やネットワーク侵入検知システム(nid)などのサイバーセキュリティ対策の設定を困難にする。実証的な解決策は、監視されたネットワークトラフィックから産業システムのネットワークインフラストラクチャ情報を自己学習し、異常検出などの下流解析タスクに対してネットワークを透明化することである。本稿では,Pythonをベースとした産業コミュニケーションのパラダイムを意識したフレームワークであるPROFINET Operations Enumeration and Tracking(POET)について報告する。オペレーション駆動の産業ネットワークプロトコルフレームは、オペレーションの列挙のために解剖される。通信イベントによって引き起こされる産業操作間の遷移をキャプチャする要求に対して、有限状態機械(FSM)はデバイス、接続、システムのPROFINET操作を列挙するようにモデル化される。 POETはネットワークトラフィックからネットワーク情報を抽出し、適切なFSMモデル(デバイス、接続、システム)をインスタンス化し、産業運用を追跡する。ネットワーク攻撃によって引き起こされる異常を、PROFINETベースの産業システムで検知し、報告し、有効なネットワークプロトコル交換によって実行し、デバイスに対する不正なPROFINET操作遷移をもたらす。

Since 2010, multiple cyber incidents on industrial infrastructure, such as Stuxnet and CrashOverride, have exposed the vulnerability of Industrial Control Systems (ICS) to cyber threats. The industrial systems are commissioned for longer duration amounting to decades, often resulting in non-compliance to technological advancements in industrial cybersecurity mechanisms. The unavailability of network infrastructure information makes designing the security policies or configuring the cybersecurity countermeasures such as Network Intrusion Detection Systems (NIDS) challenging. An empirical solution is to self-learn the network infrastructure information of an industrial system from its monitored network traffic to make the network transparent for downstream analyses tasks such as anomaly detection. In this work, a Python-based industrial communication paradigm-aware framework, named PROFINET Operations Enumeration and Tracking (POET), that enumerates different industrial operations executed in a deterministic order of a PROFINET-based industrial system is reported. The operation-driving industrial network protocol frames are dissected for enumeration of the operations. For the requirements of capturing the transitions between industrial operations triggered by the communication events, the Finite State Machines (FSM) are modelled to enumerate the PROFINET operations of the device, connection and system. POET extracts the network information from network traffic to instantiate appropriate FSM models (Device, Connection or System) and track the industrial operations. It successfully detects and reports the anomalies triggered by a network attack in a miniaturized PROFINET-based industrial system, executed through valid network protocol exchanges and resulting in invalid PROFINET operation transition for the device.

翻訳日:2023-05-14 21:15:36 公開日:2023-04-29

# 80MHzWi-Fiチャネルを用いた無線人体センシングのためのCSIデータセット

A CSI Dataset for Wireless Human Sensing on 80 MHz Wi-Fi Channels ( http://arxiv.org/abs/2305.03170v1 )

ライセンス: Link先を確認

Francesca Meneghello, Nicol\`o Dal Fabbro, Domenico Garlisi, Ilenia Tinnirello, Michele Rossi

(参考訳) 近年,Wi-Fiチャネルの読み上げから人の動きを監視する機械学習技術がいくつか提案されている。しかし、異なる環境に対して堅牢に動作するドメイン適応型アルゴリズムの開発は、まだオープンな問題であり、そのソリューションは環境、人、Wi-Fiハードウェアの観点から、強力なドメイン多様性を特徴とする大きなデータセットを必要とする。現在利用可能な数少ないパブリックデータセットは、20MHzまたは40MHz帯で動作するWi-Fiデバイスを通じて得られるもので、ドメインの多様性がほとんどあるいは全くないため、センシングアルゴリズムの設計の進歩が劇的に制限されている。本研究は,ieee 802.11acチャネル測定のデータセットを,異なる環境,日,ハードウェア間で13名の被験者を対象とした測定キャンペーンを通じて,著名なドメイン多様性を特徴とする80mhz帯域チャネル上で提供することで,このギャップを埋めることを目的としている。送信機とモニタとの間の直接経路を遮断し、半直交室(マルチパスフェーディングなし)で測定値を収集し、新しい実験データを提供する。全体として、データセットは、ieee dataport [1]で利用可能で、13時間以上のチャネル状態情報読み込み(23.6gb)が含まれており、研究者はアクティビティ/id認識とアルゴリズムのカウントをテストできる。

In the last years, several machine learning-based techniques have been proposed to monitor human movements from Wi-Fi channel readings. However, the development of domain-adaptive algorithms that robustly work across different environments is still an open problem, whose solution requires large datasets characterized by strong domain diversity, in terms of environments, persons and Wi-Fi hardware. To date, the few public datasets available are mostly obsolete - as obtained via Wi-Fi devices operating on 20 or 40 MHz bands - and contain little or no domain diversity, thus dramatically limiting the advancements in the design of sensing algorithms. The present contribution aims to fill this gap by providing a dataset of IEEE 802.11ac channel measurements over an 80 MHz bandwidth channel featuring notable domain diversity, through measurement campaigns that involved thirteen subjects across different environments, days, and with different hardware. Novel experimental data is provided by blocking the direct path between the transmitter and the monitor, and collecting measurements in a semi-anechoic chamber (no multi-path fading). Overall, the dataset - available on IEEE DataPort [1] - contains more than thirteen hours of channel state information readings (23.6 GB), allowing researchers to test activity/identity recognition and people counting algorithms.

翻訳日:2023-05-14 21:15:09 公開日:2023-04-29

# qichwabase: ケチュア語とケチュア人コミュニティのための知識ベース

QICHWABASE: A Quechua Language and Knowledge Base for Quechua Communities ( http://arxiv.org/abs/2305.06173v1 )

ライセンス: Link先を確認

Elwin Huaman, David Lindemann, Valeria Caruso, Jorge Luis Huaman

(参考訳) 過去10年間で、ウェブはますます言語と知識の表現の場になりつつある。しかし、それはよく読まれた言語と確立されたコミュニティにのみ当てはまり、少数派コミュニティとその資源はあまり注目されなかった。本稿では,ケチュア語と知識の調和プロセスとそのコミュニティを支援するため,qichwabaseを提案する。そのために、世界中のKechuaコミュニティに有利なゲームチェンジャーになり得る方法とツールを採用しています。 Wikibase インスタンスである QICHWABASE の構築に採用されている方法論やツールは,Web 上でのマイノリティの存在を高めることができる。

Over the last decade, the Web has increasingly become a space of language and knowledge representation. However, it is only true for well-spread languages and well-established communities, while minority communities and their resources received less attention. In this paper, we propose QICHWABASE to support the harmonization process of the Quechua language and knowledge, and its community. For doing it, we adopt methods and tools that could become a game changer in favour of Quechua communities around the world. We conclude that the methodology and tools adopted on building QICHWABASE, which is a Wikibase instance, could enhance the presence of minorities on the Web.

翻訳日:2023-05-14 20:46:18 公開日:2023-04-29

# 教育におけるチャットGPT : ソーシャルメディアに関する不安と懸念の談話分析

ChatGPT in education: A discourse analysis of worries and concerns on social media ( http://arxiv.org/abs/2305.02201v1 )

ライセンス: Link先を確認

Lingyao Li, Zihui Ma, Lizhou Fan, Sanggyu Lee, Huizi Yu, Libby Hemphill

(参考訳) 生成型AIモデルの急速な進歩は、教育分野に新たな機会をもたらす。しかし、その使用によって生じる可能性のあるリスクや懸念を認識し、対処することが不可欠である。教育におけるchatgptの利用に関する重要な懸念を明らかにするためにtwitterのデータを分析した。我々は,会話における影響力のあるユーザを特定するために,BERTに基づくトピックモデリングを用いて談話分析とソーシャルネットワーク分析を行った。 twitterユーザは一般的に、chatgptの使用に対する肯定的な態度を強調するが、彼らの懸念は、学術的整合性、学習結果とスキル開発への影響、能力の制限、政策と社会的関心、労働力の課題の5つのカテゴリに収束した。また、テクノロジー分野、教育分野、メディア分野のユーザーは会話にしばしば関与しており、教育やテクノロジーの個人ユーザーは懸念の議論を主導していることもわかりました。これらの知見に基づき、この研究は政策立案者、テック企業、個人、教育者、メディアエージェンシーにいくつかの意味を与えている。まとめると、我々の研究は、教育におけるAIの責任と倫理的利用の重要性を強調し、利害関係者間の協力の必要性を強調している。

The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We analyzed Twitter data to identify key concerns related to the use of ChatGPT in education. We employed BERT-based topic modeling to conduct a discourse analysis and social network analysis to identify influential users in the conversation. While Twitter users generally ex-pressed a positive attitude towards the use of ChatGPT, their concerns converged to five specific categories: academic integrity, impact on learning outcomes and skill development, limitation of capabilities, policy and social concerns, and workforce challenges. We also found that users from the tech, education, and media fields were often implicated in the conversation, while education and tech individual users led the discussion of concerns. Based on these findings, the study provides several implications for policymakers, tech companies and individuals, educators, and media agencies. In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.

翻訳日:2023-05-04 14:17:05 公開日:2023-04-29

# ChatGPTは入門レベルの関数型言語プログラミングコースをパスできるか?

Can ChatGPT Pass An Introductory Level Functional Language Programming Course? ( http://arxiv.org/abs/2305.02230v1 )

ライセンス: Link先を確認

Chuqin Geng, Zhang Yihan, Brigitte Pientka, Xujie Si

(参考訳) chatgptの最近の導入は、言語翻訳、テキスト要約、コンピュータプログラミングなど、さまざまなタスクを解決できるという印象的な能力によって、業界とアカデミアの両方から大きな注目を集めている。コードを書き、修正し、修正する能力と使いやすさ、アクセス性は、すでにコンピュータサイエンス教育に劇的に影響を与えています。本稿では,ChatGPTが導入レベルの関数型言語プログラミングコースでどのように機能するかを検討する。システム評価では,chatgptを学生の1人として扱い,b級の成績が得られ,全学生314名中155名であることを示した。総合的な評価は、ChatGPTが学生とインストラクターの両方に与える影響についての貴重な洞察を提供する。さらに、ChatGPTが両グループに提供できる潜在的なメリットをいくつか挙げる。全体として、この研究はChatGPTの能力と潜在的なコンピュータサイエンス教育への影響についての理解を深めるものであると信じている。

The recent introduction of ChatGPT has drawn significant attention from both industry and academia due to its impressive capabilities in solving a diverse range of tasks, including language translation, text summarization, and computer programming. Its capability for writing, modifying, and even correcting code together with its ease of use and access is already dramatically impacting computer science education. This paper aims to explore how well ChatGPT can perform in an introductory-level functional language programming course. In our systematic evaluation, we treated ChatGPT as one of our students and demonstrated that it can achieve a grade B- and its rank in the class is 155 out of 314 students overall. Our comprehensive evaluation provides valuable insights into ChatGPT's impact from both student and instructor perspectives. Additionally, we identify several potential benefits that ChatGPT can offer to both groups. Overall, we believe that this study significantly clarifies and advances our understanding of ChatGPT's capabilities and potential impact on computer science education.

翻訳日:2023-05-04 14:07:10 公開日:2023-04-29

# スクイーズと励磁によるスウィントランスを用いた表情認識

Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers ( http://arxiv.org/abs/2301.10906v7 )

ライセンス: Link先を確認

Arpita Vats, Aman Chadha

(参考訳) 顔の感情を認識して解釈する能力は、表情や発声音を通じて伝達される感情を理解し、応答することができるため、人間のコミュニケーションの重要な要素である。顔の感情の認識は、視覚と聴覚の情報の統合や、事前の知識や社会的手がかりを含む複雑な認知過程である。社会的相互作用、情緒的処理、共感において重要な役割を担い、人間とコンピュータの相互作用、仮想アシスタント、メンタルヘルス診断と治療を含む多くの現実世界の応用において重要な側面である。顔の感情認識のための正確かつ効率的なモデルの開発は、様々な研究分野に大きな影響を与える可能性があり、コンピュータビジョンや人工知能の分野において、顔の感情認識(FER)の分野は大きな意味を持ち、セキュリティ、広告、エンターテイメントといった分野において、商業的および学術的な可能性を持っている。本研究では,Swin Vision Transformers (SwinT) とSwin Vision Transformers (SE) を併用したFERフレームワークを提案する。このアプローチでは、アテンション機構を備えたトランスフォーマーモデル、SE、SAMを使用して、トランスフォーマーが大量のデータを必要とする場合が多いため、モデルの効率を改善する。我々の焦点は、最小限のデータを使って顔の感情を認識できるSwinTアーキテクチャに基づく効率的なFERモデルを作ることであった。我々はハイブリッドデータセットでモデルをトレーニングし,AffectNetデータセット上での性能評価を行い,欧州コンピュータビジョン会議(ECCV)2022~\cite{Kollias}で開催されるABAWコンペティションの優勝者を上回ったF1スコア0.5420を達成した。

The ability to recognize and interpret facial emotions is a critical component of human communication, as it allows individuals to understand and respond to emotions conveyed through facial expressions and vocal tones. The recognition of facial emotions is a complex cognitive process that involves the integration of visual and auditory information, as well as prior knowledge and social cues. It plays a crucial role in social interaction, affective processing, and empathy, and is an important aspect of many real-world applications, including human-computer interaction, virtual assistants, and mental health diagnosis and treatment. The development of accurate and efficient models for facial emotion recognition is therefore of great importance and has the potential to have a significant impact on various fields of study.The field of Facial Emotion Recognition (FER) is of great significance in the areas of computer vision and artificial intelligence, with vast commercial and academic potential in fields such as security, advertising, and entertainment. We propose a FER framework that employs Swin Vision Transformers (SwinT) and squeeze and excitation block (SE) to address vision tasks. The approach uses a transformer model with an attention mechanism, SE, and SAM to improve the efficiency of the model, as transformers often require a large amount of data. Our focus was to create an efficient FER model based on SwinT architecture that can recognize facial emotions using minimal data. We trained our model on a hybrid dataset and evaluated its performance on the AffectNet dataset, achieving an F1-score of 0.5420, which surpassed the winner of the Affective Behavior Analysis in the Wild (ABAW) Competition held at the European Conference on Computer Vision (ECCV) 2022~\cite{Kollias}.

翻訳日:2023-05-03 17:26:41 公開日:2023-04-29

# ガウス混合ブロックモデルにおけるスペクトルクラスタリング

Spectral clustering in the Gaussian mixture block model ( http://arxiv.org/abs/2305.00979v1 )

ライセンス: Link先を確認

Shuangping Li, Tselil Schramm

(参考訳) ガウス混合ブロックモデルは、現代のネットワークをモデル化しようとするグラフ上の分布である: そのようなモデルからグラフを生成するために、各頂点 $i$ と遅延特徴ベクトル $u_i \in \mathbb{R}^d$ をガウスの混合からサンプリングし、特徴ベクトルが十分に類似している場合にのみ edge $(i,j)$ を加える。ガウス混合の異なる構成要素は、機能上の異なる分布を持つ異なる種類のノードが存在するという事実を表している。これらのネットワークに関連する自然なアルゴリズムタスクは、埋め込み(潜在特徴ベクトルの復元)とクラスタリング(混合成分によるノードのグループ化)である。本稿では、高次元ガウス混合ブロックモデルからサンプリングされたクラスタリングと埋め込みグラフの研究を開始し、ネットワークの$n \to \infty$として潜在特徴ベクトルの次元を$d\to \infty$とする。この高次元の設定は、潜在特徴空間が高次元であると考える現代のネットワークの文脈において最も適切である。 2成分球面ガウス混合の場合、そのようなグラフに対する標準スペクトルクラスタリングと埋め込みアルゴリズムの性能を分析し、これらのモデルにクラスタリングと埋め込みのための情報計算の展望をスケッチし始める。

Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge \tau$ for a pre-specified threshold $\tau$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.

翻訳日:2023-05-03 16:40:24 公開日:2023-04-29

# 分子関係学習のための条件付きグラフ情報基盤

Conditional Graph Information Bottleneck for Molecular Relational Learning ( http://arxiv.org/abs/2305.01520v1 )

ライセンス: Link先を確認

Namkyeong Lee, Dongmin Hyun, Gyoung S. Na, Sungwon Kim, Junseok Lee, Chanyoung Park

(参考訳) 分子関係学習は、分子対間の相互作用の振る舞いを学ぶことを目的としており、その幅広い応用のために分子科学への関心が高まった。近年、グラフニューラルネットワークは、分子をグラフ構造としてモデル化し、2分子間の原子レベルの相互作用を考慮し、分子関係学習において大きな成功を収めている。その成功にもかかわらず、既存の分子関係学習法は化学の性質を見落としている傾向にあり、例えば、化学反応を引き起こす官能基のような複数のサブ構造からなる化合物である。本研究では,コアサブグラフを検出することによって,グラフ対間のインタラクション挙動を予測するcgibと呼ばれる新しい関係学習フレームワークを提案する。主なアイデアは、一対のグラフが与えられたとき、条件付きグラフ情報ボトルネックの原理に基づいて、ペア付きグラフ上で条件付けされたタスクに関する最小限の十分な情報を含むグラフからサブグラフを見つけることである。提案手法は化学反応の性質、すなわち分子の核構造がどの分子と相互作用するかによって変化するという性質を模倣していると論じる。実世界のデータセットを用いた様々なタスクに関する大規模な実験は、最先端のベースラインよりもCGIBの方が優れていることを示す。私たちのコードはhttps://github.com/Namkyeong/CGIB.comで利用可能です。

Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.

翻訳日:2023-05-03 13:57:25 公開日:2023-04-29

# 無線センサネットワークにおける臨界ノード同定のための教師付き能動学習法

A supervised active learning method for identifying critical nodes in Wireless Sensor Network ( http://arxiv.org/abs/2004.08885v4 )

ライセンス: Link先を確認

Behnam Ojaghi and Mohammad Mahdi Dehshibi

(参考訳) 無線センサネットワーク(WSN)のエネルギー効率は、ホップ数、ユーザの位置、割り当てられた電力、リレーなどの主な特性に依存する。しかし,これらの特徴に影響を及ぼすノードの同定は,計算オーバーヘッドやエネルギー消費に大きく影響している。本稿では,wsnにおける臨界ノード同定の計算オーバーヘッドに対処するためのアクティブラーニング手法を提案する。提案手法は非クリティカルノードを識別するバイアスを克服し、wsnの動的性質に適応するための微調整の労力をはるかに少なくする。この手法はクラスタリングと分類モジュールの協調によって、典型的な教師付き学習シナリオにおけるデータの要求数を反復的に減少させ、非クリティカルノードである非形式的な例の存在下での精度を高めることができる。実験の結果,提案手法は,大規模WSN環境,第5世代モバイルネットワーク(5G),大規模分散IoT(センサネットワーク)など,ネットワークの寿命を延ばすことができる。

Energy Efficiency of a wireless sensor network (WSN) relies on its main characteristics, including hop-number, user's location, allocated power, and relay. Identifying nodes, which have more impact on these characteristics, is, however, subject to a substantial computational overhead and energy consumption. In this paper, we proposed an active learning approach to address the computational overhead of identifying critical nodes in a WSN. The proposed approach can overcome biasing in identifying non-critical nodes and needs much less effort in fine-tuning to adapt to the dynamic nature of WSN. This method benefits from the cooperation of clustering and classification modules to iteratively decrease the required number of data in a typical supervised learning scenario and to increase the accuracy in the presence of uninformative examples, i.e., non-critical nodes. Experiments show that the proposed method has more flexibility, compared to the state-of-the-art, to be employed in large scale WSN environments, the fifth-generation mobile networks (5G), and massively distributed IoT (i.e., sensor networks), where it can prolong the network lifetime.

翻訳日:2023-05-02 22:37:29 公開日:2023-04-29

# 多部相関によるスピン1鎖の量子相転移の検出

Detection of quantum phase transition in spin-1 chain through multipartite high-order correlations ( http://arxiv.org/abs/2105.12391v2 )

ライセンス: Link先を確認

Dongkeun Lee, Adel Sohbi and Wonmin Son

(参考訳) 我々は XXZ spin-1 鎖の基底状態と相転移領域における部位異方性との相関関係に反するベルの不等式を設計する。スピン1系におけるそのような相関を検出するために、多部相関と高次相関を用いて一般化ベル不等式の定式化を利用する。我々は、いわゆる大D相とAFM相の間の量子相転移付近で鋭い破れを観察する。興味深いことに,我々のベル不等式違反は,臨界領域におけるXXZスピン-1鎖基底状態からGHZ様状態への変化によるものである。本研究は, XXZ スピン-1 連鎖の相関によるベル型制約違反による量子相転移を, 多体相関および高次測定により初めて評価した。

We design a Bell inequality that is violated by correlations obtained from the ground states of XXZ spin-1 chain with on site anisotropies at the region of phase transition. In order to detect such correlations in spin-1 systems we exploit the formalism of generalized Bell inequality via the use of multipartite and high order correlations. We observe sharp violation in the vicinity of quantum phase transition between the so called large D and AFM phase. Interestingly, the violation of our Bell inequality is manifested by the change of the XXZ spin-1 chain ground state to a Greenberger-Horne-Zeilinger (GHZ)-like state at the critical region. Our results provide the first characterization of quantum phase transition via the violation of Bell-type constraint by correlations in the XXZ spin-1 chain with multi-body correlations and high-order measurements.

翻訳日:2023-05-02 22:10:41 公開日:2023-04-29

# 残留ニューラルネットワークにおける拡散機構:理論と応用

Diffusion Mechanism in Residual Neural Network: Theory and Applications ( http://arxiv.org/abs/2105.03155v5 )

ライセンス: Link先を確認

Tangjun Wang, Zehao Dou, Chenglong Bao, Zuoqiang Shi

(参考訳) 多くの物理プロセスで現れる基本的な内部機構である拡散は、異なるオブジェクト間の相互作用を記述する。限られたトレーニングサンプルを持つ多くの学習タスクでは、拡散はラベル付きデータポイントとラベルなしデータポイントを接続し、高い分類精度を達成するための重要な要素である。既存のディープラーニングアプローチの多くは、ニューラルネットワークのトレーニング時に直接核融合損失を課している。本研究では, 対流拡散常微分方程式(odes)に着想を得て, ニューラルネットワークのアーキテクチャに内部拡散を導入する新しい拡散残差ネットワーク(diff-resnet)を提案する。構造的データ仮定により,提案した拡散ブロックは,クラス間点の分離性を向上し,クラス間点間の距離を減少させる距離-距離比を増大させることができることを示した。さらに、この性質は分離可能な超平面を構築するための残留ネットワークにより容易に適用できる。合成二分法,半教師付きグラフノード分類,少数ショット画像分類の大規模な実験により,提案手法の有効性が検証された。

Diffusion, a fundamental internal mechanism emerging in many physical processes, describes the interaction among different objects. In many learning tasks with limited training samples, the diffusion connects the labeled and unlabeled data points and is a critical component for achieving high classification accuracy. Many existing deep learning approaches directly impose the fusion loss when training neural networks. In this work, inspired by the convection-diffusion ordinary differential equations (ODEs), we propose a novel diffusion residual network (Diff-ResNet), internally introduces diffusion into the architectures of neural networks. Under the structured data assumption, it is proved that the proposed diffusion block can increase the distance-diameter ratio that improves the separability of inter-class points and reduces the distance among local intra-class points. Moreover, this property can be easily adopted by the residual networks for constructing the separable hyperplanes. Extensive experiments of synthetic binary classification, semi-supervised graph node classification and few-shot image classification in various datasets validate the effectiveness of the proposed method.

翻訳日:2023-05-02 22:10:32 公開日:2023-04-29

# 多様性保存グラフリファインメントによるグラフ表現学習

Graph Representation Learning via Diversity-preserving Graph Refinement ( http://arxiv.org/abs/2103.07295v3 )

ライセンス: Link先を確認

Shuai Zheng

(参考訳) 実世界のグラフデータの場合、ノード間の複雑な関係はしばしばハードバイナリリンクとして表される。明らかに、これはノード間の連続的な関係の離散的で単純化された形式であり、学習したノード表現の表現性を著しく制限する。一方、埋め込み空間で得られるノード表現は、ノード間の固有の関係を明らかにするために使うことができる。ノード間の関係をよりよく特徴付けし、さらにノード表現の学習を容易にするため、直感的な方法は、組み込みノード表現を用いて元のグラフ構造を洗練することである。しかし、区別のない全てのノード間の関係のグローバルな改善は、必然的にノイズの多いエッジにつながり、ノード表現学習モデルのトレーニングをさらに混乱させる可能性がある。さらに、大規模なグラフにもスケーラビリティの問題があります。これらの問題に対処するために,ノードの潜在関係を徐々に明らかにし,効率的かつ堅牢なグラフリファインメントを実現するために,局所構造を考慮したグラフリファインメントを提案する。

For real-world graph data, the complex relationship between nodes is often represented as a hard binary link. Obviously, it is a discrete and simplified form of continuous relationship between nodes, which seriously limits the expressibility of the learned node representation. On the other hand, the node representation obtained in the embedding space can in turn be used to reveal the intrinsic relationship between nodes. To better characterize the node relationships and further facilitate the learning of node representation, an intuitive way is to refine the originally given graph structure with the embedded node representations. However, such global refinement of the relationships among all nodes without distinction will inevitably lead to some noisy edges, which may further confuse the training of the node representation learning model. In addition, it also has scalability problems on large graphs. To address these issues, we propose a local structure aware graph refinement to progressively reveal the latent relationships of nodes, thus achieving efficient and robust graph refinement.

翻訳日:2023-05-02 22:10:14 公開日:2023-04-29

# 深いガウス過程に対する疎拡大

A Sparse Expansion For Deep Gaussian Processes ( http://arxiv.org/abs/2112.05888v3 )

ライセンス: Link先を確認

Liang Ding and Rui Tuo and Shahin Shahrampour

(参考訳) 本研究では,複雑な分布をもつ確率過程の統計代理として,深いガウス過程(dgps)を用いる。 DGPモデルの従来の推論手法は、トレーニングと推論のためにカーネル行列を用いた大規模演算を必要とするため、計算の複雑さに悩まされる。本研究では, テンソルマルコフ・ガウス過程 (TMGP) と呼ばれる, ガウス過程の幅に基づいて, 正確な推論と効率的なトレーニングを行うための効率的なスキームを提案する。階層展開(hierarchical expansion)と呼ばれるTMGPの誘導近似を構築する。次に,深部TMGP(DTMGP)モデルを構築し,TMGPの多重階層展開の合成を行う。提案したDTMGPモデルには以下の特性がある: (1) 各活性化関数の出力は決定論的であり、一方で重みは標準ガウス分布から独立に選択される; (2) 訓練や予測において、ポリログ(M) のみの活性化関数はゼロではないので、計算効率が大幅に向上する。合成モデルと実データセットに関する数値実験により、既存のDGPモデルよりもDTMGPの計算効率が優れていることを示した。

In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and efficient training based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only polylog(M) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on synthetic models and real datasets show the superior computational efficiency of DTMGP over existing DGP models.

翻訳日:2023-05-02 22:01:47 公開日:2023-04-29

# 協調型悪質勾配フィルタリングによるビザンチン・ロバスト連関学習

Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering ( http://arxiv.org/abs/2109.05872v2 )

ライセンス: Link先を確認

Jian Xu, Shao-Lun Huang, Linqi Song, Tian Lan

(参考訳) フェデレーション学習における勾配ベースのトレーニングは、しばしばビザンチンクライアントとしてモデル化される、欠陥/悪意のあるクライアントに対して脆弱であることが知られている。この目的のために、以前の研究ではパラメータサーバで補助データを使用して受信した勾配(例えば、検証エラー率の計算など)を検証するか、統計ベースの手法(中央値やKrumなど)を利用して、ビザンティンのクライアントから悪意のある勾配を特定し削除する。本稿では,補助データの利用が必ずしも可能とは限らないことを指摘し,統計ベースのアプローチに焦点をあてる。しかし、近年のモデル中毒攻撃の研究は、高度に作り上げられた攻撃は、中央値と距離に基づく統計的防御手法のほとんどを回避できることを示した。この課題に取り組むために,勾配ベクトルの要素方向符号がモデル中毒攻撃の検出に有用な洞察を与えることを示す。我々は, \textit{little is enough}攻撃の理論的解析に基づいて,ビザンチン・ロバスト連関学習を実現するための新しい手法である \textit{signguard}を提案する。より正確には、受信された勾配は最初に処理され、関連する等級、符号、類似度統計を生成し、最終的に集約する前に悪意のある勾配を取り除くために複数のフィルタによって協調的に利用される。最後に,最近提案された攻撃および防衛戦略に基づいて,画像およびテキスト分類タスクの広範な実験を行った。その結果,提案手法の有効性と優位性を示した。コードは \textit{\url{https://github.com/jianxu95/signguard}} で利用可能である。

Gradient-based training in federated learning is known to be vulnerable to faulty/malicious clients, which are often modeled as Byzantine clients. To this end, previous work either makes use of auxiliary data at parameter server to verify the received gradients (e.g., by computing validation error rate) or leverages statistic-based methods (e.g. median and Krum) to identify and remove malicious gradients from Byzantine clients. In this paper, we remark that auxiliary data may not always be available in practice and focus on the statistic-based approach. However, recent work on model poisoning attacks has shown that well-crafted attacks can circumvent most of median- and distance-based statistical defense methods, making malicious gradients indistinguishable from honest ones. To tackle this challenge, we show that the element-wise sign of gradient vector can provide valuable insight in detecting model poisoning attacks. Based on our theoretical analysis of the \textit{Little is Enough} attack, we propose a novel approach called \textit{SignGuard} to enable Byzantine-robust federated learning through collaborative malicious gradient filtering. More precisely, the received gradients are first processed to generate relevant magnitude, sign, and similarity statistics, which are then collaboratively utilized by multiple filters to eliminate malicious gradients before final aggregation. Finally, extensive experiments of image and text classification tasks are conducted under recently proposed attacks and defense strategies. The numerical results demonstrate the effectiveness and superiority of our proposed approach. The code is available at \textit{\url{https://github.com/JianXu95/SignGuard}}

翻訳日:2023-05-02 22:00:37 公開日:2023-04-29

# 冗長表現は広域ニューラルネットワークの一般化に役立つ

Redundant representations help generalization in wide neural networks ( http://arxiv.org/abs/2106.03485v4 )

ライセンス: Link先を確認

Diego Doimo, Aldo Glielmo, Sebastian Goldt, Alessandro Laio

(参考訳) ディープラーニング(DNN)は、古典的なバイアス分散トレードオフを否定する: トレーニングデータを補間するパラメータをDNNに追加することで、一般化のパフォーマンスが向上する。ディープネットワークにおけるこの ‘benign overfitting' のメカニズムを説明することは、いまだに優れた課題である。本稿では,最先端の畳み込みニューラルネットワークにおける最後の隠れ層表現について検討し,最後に隠れた表現が十分に広い場合,そのニューロンは同一の情報を持つグループに分けられる傾向にあり,統計的に独立したノイズによってのみ互いに異なることを見出した。このような群の数は層幅とともに直線的に増加するが、その幅が臨界値を超える場合に限る。トレーニングプロセスが補間され、トレーニングエラーがゼロとなる場合にのみ、冗長ニューロンが現れることを示す。

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise. The number of such groups increases linearly with the width of the layer, but only if the width is above a critical value. We show that redundant neurons appear only when the training process reaches interpolation and the training error is zero.

翻訳日:2023-05-02 21:59:17 公開日:2023-04-29

# 歪み補正と高精度特徴検出を用いた学習型カメラ校正フレームワーク

Learning-Based Framework for Camera Calibration with Distortion Correction and High Precision Feature Detection ( http://arxiv.org/abs/2202.00158v3 )

ライセンス: Link先を確認

Yesheng Zhang, Xu Zhao and Dahong Qian

(参考訳) カメラキャリブレーションは多くのロボットシステムの性能に大きな影響を及ぼす重要な技術である。堅牢性と高精度は、常に多様な校正方法の追求である。しかし、Zhangの手法に基づく最先端のキャリブレーション技術は、環境ノイズ、ラジアルレンズ歪み、準最適パラメータ推定に悩まされている。そこで本稿では,学習に基づくアプローチと,これらのボトルネックに対処する従来の手法を組み合わせたハイブリッドカメラキャリブレーションフレームワークを提案する。特にこのフレームワークは、効率的な歪み補正とロバストなチェスボードコーナー座標符号化を行うために学習に基づくアプローチを利用する。コーナー検出のサブピクセル精度向上のために,組込み外乱除去機構を備えた特別設計座標復号アルゴリズムを提案する。提案手法は, RANSACアルゴリズムによる従来のパラメータ推定を改良し, 安定した結果を得る。広範に使われている2つのカメラキャリブレーションツールボックスと比較して、実データと合成データの両方の実験結果は、提案フレームワークのより良い堅牢性と高い精度を示す。大規模な合成データセットは、当社のフレームワークの十分なパフォーマンスの基礎であり、https://github.com/Easonyesheng/CCS.comのコードとともに公開されます。

Camera calibration is a crucial technique which significantly influences the performance of many robotic systems. Robustness and high precision have always been the pursuit of diverse calibration methods. State-of-the-art calibration techniques based on classical Zhang's method, however, still suffer from environmental noise, radial lens distortion and sub-optimal parameter estimation. Therefore, in this paper, we propose a hybrid camera calibration framework which combines learning-based approaches with traditional methods to handle these bottlenecks. In particular, this framework leverages learning-based approaches to perform efficient distortion correction and robust chessboard corner coordinate encoding. For sub-pixel accuracy of corner detection, a specially-designed coordinate decoding algorithm with embed outlier rejection mechanism is proposed. To avoid sub-optimal estimation results, we improve the traditional parameter estimation by RANSAC algorithm and achieve stable results. Compared with two widely-used camera calibration toolboxes, experiment results on both real and synthetic datasets manifest the better robustness and higher precision of the proposed framework. The massive synthetic dataset is the basis of our framework's decent performance and will be publicly available along with the code at https://github.com/Easonyesheng/CCS.

翻訳日:2023-05-02 20:15:46 公開日:2023-04-29

# よりロバストなサンプルにより正規化を施すことによる対向ロバスト性の向上

Improving adversarial robustness by putting more regularizations on less robust samples ( http://arxiv.org/abs/2206.03353v2 )

ライセンス: Link先を確認

Dongyoon Yang, Insung Kong, Yongdai Kim

(参考訳) 敵の攻撃に対する堅牢性を高めるための敵意トレーニングは、与えられた深層ニューラルネットワークを欺くために、人間の知覚可能なデータの摂動を生成することが容易であるため、多くの注目を集めている。本稿では,既存のアルゴリズムよりも理論的にモチベーションが高く,経験的に優れている新しい学習アルゴリズムを提案する。提案アルゴリズムの新たな特徴は、既存の正規化アルゴリズムよりも敵攻撃に弱いデータに対してより規則化を適用することである。理論的には,本アルゴリズムはロバストリスクの新たな上限から誘導される正規化経験的リスクを最小化するためのアルゴリズムとして理解できることを示す。数値実験により,提案アルゴリズムは一般化(実例の精度)と強靭性(敵攻撃の精度)を同時に改善し,最先端の性能を実現する。

Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.

翻訳日:2023-05-02 20:08:32 公開日:2023-04-29

# オープン量子系における代数のスクランブル

Scrambling of Algebras in Open Quantum Systems ( http://arxiv.org/abs/2206.02033v5 )

ライセンス: Link先を確認

Faidon Andreadakis, Namit Anand, Paolo Zanardi

(参考訳) 量子システムにおける情報のダイナミックスクランブルに対する多くの定量的アプローチは、オフオブタイムコリエータ(otocs)の研究を含んでいる。本稿では、一般化された量子サブシステムの量子チャネル下での情報スクランブルを研究するための代数OTOC(\mathcal{A}$-OTOC)を提案する。閉量子系において、この代数的フレームワークは近年、作用素の絡み合い、コヒーレンス生成力、ロシミットエコーの量子情報理論の統一に用いられている。この研究の主な焦点は、これらの技術の自然に一般化して量子システムを開くことである。まず、ユニタリダイナミクスにおいて、$\mathcal{a}$-otoc は情報スクランブルの一般化された概念、すなわち可観測圏とその可換圏の間を定量化する。一方,オープン量子システムでは,グローバル環境デコヒーレンスと局所的な情報のスクランブルが競合している。この相互作用は代数や量子チャネルの様々な例を解析的に研究することによって説明できる。解析結果を補完するため,PXPモデルとハイゼンベルクXXXモデルという2つのパラダイムシステムの数値シミュレーションを行った。数値計算の結果,多体傷と脱コヒーレンスのない部分空間の安定性が明らかとなった。

Many quantitative approaches to the dynamical scrambling of information in quantum systems involve the study of out-of-time-ordered correlators (OTOCs). In this paper, we introduce an algebraic OTOC ($\mathcal{A}$-OTOC) that allows us to study information scrambling of generalized quantum subsystems under quantum channels. For closed quantum systems, this algebraic framework was recently employed to unify quantum information-theoretic notions of operator entanglement, coherence-generating power, and Loschmidt echo. The main focus of this work is to provide a natural generalization of these techniques to open quantum systems. We first show that, for unitary dynamics, the $\mathcal{A}$-OTOC quantifies a generalized notion of information scrambling, namely between a subalgebra of observables and its commutant. For open quantum systems, on the other hand, we find a competition between the global environmental decoherence and the local scrambling of information. We illustrate this interplay by analytically studying various illustrative examples of algebras and quantum channels. To complement our analytical results, we perform numerical simulations of two paradigmatic systems: the PXP model and the Heisenberg XXX model, under dephasing. Our numerical results reveal connections with many-body scars and the stability of decoherence-free subspaces.

翻訳日:2023-05-02 20:07:58 公開日:2023-04-29

# Occupancy-MAE: Masked Occupancy Autoencoders を用いた自己学習型大規模LiDAR点雲

Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders ( http://arxiv.org/abs/2206.09900v6 )

ライセンス: Link先を確認

Chen Min and Xinli Xu and Dawei Zhao and Liang Xiao and Yiming Nie and Bin Dai

(参考訳) 自動運転における現在の認識モデルは、大規模ラベル付きLiDARデータに大きく依存している。本研究では,自動運転において利用可能な大量のラベルなしLiDARデータを用いて,自己指導型マスク学習の研究を促進することを目的とする。しかしながら、既存のマスク付きポイント自動符号化法は、小規模の屋内点雲にのみ焦点をあて、通常、多くの分散されていないLiDAR点を持つ屋外のシーンに適応するのに苦労する。これらの課題に対処するために,大規模屋外LiDARポイントに特化して設計されたOccupancy-MAEという自己教師型マスク学習手法を提案する。本研究では,大規模ライダ点雲の空間占有構造を緩やかに活用し,レンジアウェアなランダムマスキング戦略と占有予測のプリテキストタスクを導入する。 Occupancy-MAEは、LiDARへの距離に基づいて、LiDAR点雲のボクセルをランダムにマスクし、3Dシーン全体のマスクされた占有構造を予測する。この単純な占有予測目的により、Occupancy-MAEは、少量の目に見えるボクセルからマスクされたボクセルを回収するために、高いレベルの意味情報を抽出する。大規模な実験は、複数の下流タスクにおけるOccupancy-MAEの有効性を示す。 3dオブジェクト検出タスクでは、kittiの車検出に必要なラベル付きデータを半分に削減し、waymo上の小さなオブジェクト検出を約2%増加させる。 3Dセマンティックセグメンテーションタスクでは、Occupancy-MAEはnuScenesでトレーニングをスクラッチから約2%のmIOUで上回ります。教師なしのドメイン適応タスクでは、Occupancy-MAEは約0.5\% ~ 1% mAPの性能を改善する。以上の結果から,未ラベルの大規模lidar点雲をマスク付きオートエンコーディングで事前訓練することで,自律運転の3次元知覚能力を向上させることが可能であった。

Current perception models in autonomous driving rely heavily on large-scale labeled LiDAR data, which is costly and time-consuming to annotate. In this work, we aim to facilitate research on self-supervised masked learning using the vast amount of unlabeled LiDAR data available in autonomous driving. However, existing masked point autoencoding methods only focus on small-scale indoor point clouds and struggle to adapt to outdoor scenes, which usually have a large number of non-evenly distributed LiDAR points. To address these challenges, we propose a new self-supervised masked learning method named Occupancy-MAE, specifically designed for large-scale outdoor LiDAR points. We leverage the gradually sparse occupancy structure of large-scale outdoor LiDAR point clouds and introduce a range-aware random masking strategy and a pretext task of occupancy prediction. Occupancy-MAE randomly masks voxels of LiDAR point clouds based on their distance to LiDAR and predicts the masked occupancy structure of the whole 3D scene. This simple occupancy prediction objective encourages Occupancy-MAE to extract high-level semantic information to recover the masked voxel from only a small amount of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For the 3D object detection task, Occupancy-MAE reduces the labeled data required for car detection on KITTI by half and boosts small object detection by around 2% mAP on Waymo. For the 3D semantic segmentation task, Occupancy-MAE outperforms training from scratch by around 2% mIOU on nuScenes. For the unsupervised domain adaptation task, Occupancy-MAE improves the performance by about 0.5\% ~ 1% mAP. Our results show that it is feasible to pre-train unlabeled large-scale LiDAR point clouds with masked autoencoding to enhance the 3D perception ability of autonomous driving.

翻訳日:2023-05-02 19:56:19 公開日:2023-04-29

# 参照限定合成ゼロショット学習

Reference-Limited Compositional Zero-Shot Learning ( http://arxiv.org/abs/2208.10046v2 )

ライセンス: Link先を確認

Siteng Huang, Qiyao Wei, Donglin Wang

(参考訳) compositional zero-shot learning (czsl)とは、人工知能システムが世界を学習し理解するための必須の能力である、既知の視覚プリミティブの未熟な構成を認識することを指す。既存のベンチマークではかなりの進歩があったが、一般的なCZSL手法は、実世界の見えない環境での学習において一般的である、少数ショットと少数参照合成の課題に対処できるかどうかを疑っている。そこで本研究では,数個のサンプルのみを含む限定的構成を基準として,観察されたプリミティブの見当たらない構成を同定する,難解な参照限定合成ゼロショット学習(rl-czsl)問題について検討する。本稿では,不十分な参照情報から効率的に構成性を学習し,未知の合成に一般化できるメタ合成グラフ学習器(metacgl)を提案する。さらに、多様な合成ラベルを持つ自然画像からなる2つの新しい大規模データセットでベンチマークを構築し、rl-czslのより現実的な環境を提供します。評価実験の結果,提案手法は,参照が作曲学習に限られている場合の未知の合成を認識できる。

Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world. While considerable progress has been made on existing benchmarks, we suspect whether popular CZSL methods can address the challenges of few-shot and few referential compositions, which is common when learning in real-world unseen environments. To this end, we study the challenging reference-limited compositional zero-shot learning (RL-CZSL) problem in this paper, i.e., given limited seen compositions that contain only a few samples as reference, unseen compositions of observed primitives should be identified. We propose a novel Meta Compositional Graph Learner (MetaCGL) that can efficiently learn the compositionality from insufficient referential information and generalize to unseen compositions. Besides, we build a benchmark with two new large-scale datasets that consist of natural images with diverse compositional labels, providing more realistic environments for RL-CZSL. Extensive experiments in the benchmarks show that our method achieves state-of-the-art performance in recognizing unseen compositions when reference is limited for compositional learning.

翻訳日:2023-05-02 19:49:53 公開日:2023-04-29

# コンフォーマルリスク制御

Conformal Risk Control ( http://arxiv.org/abs/2208.02814v3 )

ライセンス: Link先を確認

Anastasios N. Angelopoulos and Stephen Bates and Adam Fisch and Lihua Lei and Tal Schuster

(参考訳) 我々はコンフォメーション予測を拡張して,任意の単調損失関数の期待値を制御する。このアルゴリズムは、カバレッジ保証とともに分割共形予測を一般化する。共形予測と同様に、共形リスク制御手順は$\mathcal{O}(1/n)$ factorまで厳密である。また, 分散シフト, 量子リスク制御, 複数対逆リスク制御, およびU統計学の期待に対する考え方の拡張についても紹介する。コンピュータビジョンと自然言語処理によるサンプルは、偽陰性率、グラフ距離、トークンレベルのf1-scoreをバインドするアルゴリズムの使用例を示している。

We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.

翻訳日:2023-05-02 19:48:37 公開日:2023-04-29

# 軽量画像超解像のためのクロスレセプティブフォーカス型推論ネットワーク

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution ( http://arxiv.org/abs/2207.02796v2 )

ライセンス: Link先を確認

Wenjie Li, Juncheng Li, Guangwei Gao, Jiantao Zhou, Jian Yang, and Guo-Jun Qi

(参考訳) 近年,トランスフォーマーを用いた手法は,グローバルな特徴抽出能力により,単一画像超解像(SISR)タスクにおいて顕著な性能を示した。しかし、動的に特徴を抽出するために文脈情報を組み込む必要のあるトランスフォーマーの能力は無視される。そこで本研究では,CNNとTransformerを混合したCTブロックのカスケードで構成される,軽量なクロスレセプティブ・フォーカスド推論ネットワーク(CFIN)を提案する。具体的には、CTブロックにおいて、まずCNNベースのクロススケール情報集約モジュール(CIAM)を提案する。そこで我々は,現在の意味情報を理解し,異なる自己意図内での情報相互作用を利用する変調畳み込みカーネルを用いて,再構成に必要なコンテキスト情報の選択を可能にする,新しいクロスレセプティブフィールドガイドトランス (CFGT) を設計した。大規模実験により,提案したCFINは文脈情報を用いて画像の再構成を効果的に行うことができ,計算コストとモデル性能のバランスが良くなることを示した。ソースコードはhttps://github.com/IVIPLab/CFINで入手できる。

Recently, Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks due to the ability of global feature extraction. However, the capabilities of Transformers that need to incorporate contextual information to extract features dynamically are neglected. To address this issue, we propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer. Specifically, in the CT block, we first propose a CNN-based Cross-Scale Information Aggregation Module (CIAM) to enable the model to better focus on potentially helpful information to improve the efficiency of the Transformer phase. Then, we design a novel Cross-receptive Field Guided Transformer (CFGT) to enable the selection of contextual information required for reconstruction by using a modulated convolutional kernel that understands the current semantic information and exploits the information interaction within different self-attention. Extensive experiments have shown that our proposed CFIN can effectively reconstruct images using contextual information, and it can strike a good balance between computational cost and model performance as an efficient model. Source codes will be available at https://github.com/IVIPLab/CFIN.

翻訳日:2023-05-02 19:47:28 公開日:2023-04-29

# 頂点色制約下での量子インスパイアされた完全マッチング

Quantum-Inspired Perfect Matching under Vertex-Color Constraints ( http://arxiv.org/abs/2209.13063v3 )

ライセンス: Link先を確認

Moshe Y. Vardi and Zhiwei Zhang

(参考訳) 両色エッジを持つグラフに頂点色制約の下で完全マッチングが存在するというグラフ理論問題EXISTS-PMVCを提案する。 EXISTS-PMVCは、量子状態の同定と量子実験設計によるモチベーションと、その豊かな表現性、すなわち、EXISTS-PMVCは、完全マッチングのような重要な制約付きマッチング問題を自然に仮定するため、特に関心がある。我々は,(1)決定ダイアグラム制約(EXISTS-PMVC-DD)と(2)対称性制約(EXISTS-PMVC-Sym)の2種類の頂点色制約の下で,EXISTS-PMVCの複雑性とアルゴリズム的結果を与える。 EXISTS-PMVC-DDでは,グラフガジェット法によりNP硬度を明らかにする。有界な色数(EXISTS-PMVC-Sym-Bunded)を持つEXISTS-PMVC-SymがExact Perfect Matching(XPM)と多項式的に等価であることを証明する。しかし、EXISTS-PMVC-Sym-Boundedを解くためにXPMのアルゴリズムを直接適用することは現実的ではない。我々は, EXISTS-PMVC-Sym-Bounded を複雑に処理するアルゴリズムを提案する。 EXISTS-PMVCの新たな結果は、制約付きマッチングとスケーラブルな量子実験設計の両方に関する洞察を提供する。

We propose and study the graph-theoretical problem EXISTS-PMVC: the existence of perfect matching under vertex-color constraints on graphs with bi-colored edges. EXISTS-PMVC is of special interest because of its motivation from quantum-state identification and quantum-experiment design, as well as its rich expressiveness, i.e., EXISTS-PMVC naturally subsumes important constrained matching problems, such as exact perfect matching. We give complexity and algorithmic results for EXISTS-PMVC under two types of vertex color constraints: (1) decision-diagram constraints (EXISTS-PMVC-DD) and (2) symmetric constraints (EXISTS-PMVC-Sym). For EXISTS-PMVC-DD, we reveal its NP-hardness by a graph-gadget technique. We prove that EXISTS-PMVC-Sym with a bounded number of colors (EXISTS-PMVC-Sym-Bounded) is polynomially equivalent with Exact Perfect Matching (XPM), which implies that EXISTS-PMVC-Sym-Bounded is in RNC on general graphs and PTIME on planar graphs. Directly applying algorithms for XPM to solve EXISTS-PMVC-Sym-Bounded is, however, impractical. We propose algorithms that natively handle EXISTS-PMVC-Sym-Bounded with considerably better complexity. Our novel results for EXISTS-PMVC provide insights into both constrained matching and scalable quantum experiment design.

翻訳日:2023-05-02 19:40:52 公開日:2023-04-29

# dytanvo: 動的環境における視覚オドメトリと運動セグメンテーションの合同改良

DytanVO: Joint Refinement of Visual Odometry and Motion Segmentation in Dynamic Environments ( http://arxiv.org/abs/2209.08430v4 )

ライセンス: Link先を確認

Shihao Shen and Yilin Cai and Wenshan Wang and Sebastian Scherer

(参考訳) 学習ベースビジュアル・オドメトリー(VO)アルゴリズムは、高容量モデルと大量の注釈付きデータの恩恵を受けながら、動的で人口密度の高い環境では失敗する傾向がある。セマンティクスセグメンテーションは、カメラの動きを推定する前にダイナミックな関連を破棄するために主に使用されるが、静的な特徴を破棄するコストがかかるため、未認識のカテゴリにスケールアップするのは難しい。本稿では,カメラエゴモーションとモーションセグメンテーションの相互依存性を活用し,単一学習ベースで協調的に両者を洗練できることを示す。特に,動的環境を扱う最初の教師付き学習ベースVO法であるDytanVOを提案する。 2つの連続した単眼フレームをリアルタイムで取得し、反復的にカメラのエゴモーションを予測する。本手法は,現実の動的環境における最先端VOソリューションよりも平均27.7%向上し,バックエンド上での軌跡を最適化する動的視覚SLAMシステムと競合する性能を実現している。また,本手法の一般化可能性を示す実験も行った。

Learning-based visual odometry (VO) algorithms achieve remarkable performance on common static scenes, benefiting from high-capacity models and massive annotated data, but tend to fail in dynamic, populated environments. Semantic segmentation is largely used to discard dynamic associations before estimating camera motions but at the cost of discarding static features and is hard to scale up to unseen categories. In this paper, we leverage the mutual dependence between camera ego-motion and motion segmentation and show that both can be jointly refined in a single learning-based framework. In particular, we present DytanVO, the first supervised learning-based VO method that deals with dynamic environments. It takes two consecutive monocular frames in real-time and predicts camera ego-motion in an iterative fashion. Our method achieves an average improvement of 27.7% in ATE over state-of-the-art VO solutions in real-world dynamic environments, and even performs competitively among dynamic visual SLAM systems which optimize the trajectory on the backend. Experiments on plentiful unseen environments also demonstrate our method's generalizability.

翻訳日:2023-05-02 19:40:12 公開日:2023-04-29

# ノイズレジームにおける高次元簡易学習のためのサンプル複雑境界

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes ( http://arxiv.org/abs/2209.05953v2 )

ライセンス: Link先を確認

Amir Hossein Saberi, Amir Najafi, Seyed Abolfazl Motahari and Babak H. Khalaj

(参考訳) 本稿では,ノイズのあるサンプルからsimplexを学習するためのサンプル複雑性を求める。大きさ$n$のデータセットは、未知の単純体上の一様分布から引き出されたサンプルを$\mathbb{R}^K$と仮定し、サンプルは任意の大きさの多変量加法的ガウス雑音によって破損すると仮定する。我々は、高い確率で、真単純数から少なくとも$\varepsilon$の$\ell_2$距離を持つ単純数(任意の$\varepsilon>0$)を出力するアルゴリズムの存在を証明した。また、このバウンドを達成するために、理論上、$n\ge\left(k^2/\varepsilon^2\right)e^{\omega\left(k/\mathrm{snr}^2\right)}$サンプルを持つことが示されている。この結果は重要な開問題を解き、$\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$ さえあれば、ノイズのない場合と同じ順序でノイズレジームのサンプル複雑性が得られる。我々の証明は、 \citep{ashtiani2018nearly} におけるいわゆるサンプル圧縮技術、高次元幾何学の数学的ツール、フーリエ解析の組み合わせである。特に,加法的ガウス雑音からより一般的な分布族を復元するための一般フーリエに基づく手法を提案し,他の様々な問題にさらに適用することができる。

In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

翻訳日:2023-05-02 19:39:04 公開日:2023-04-29

# 運用経済改善に向けた二段階的MIPベース予測最適化フレームワーク

Towards Improving Operation Economics: A Bilevel MIP-Based Closed-Loop Predict-and-Optimize Framework for Prescribing Unit Commitment ( http://arxiv.org/abs/2208.13065v2 )

ライセンス: Link先を確認

Xianbang Chen, Yikui Liu, Lei Wu

(参考訳) システムオペレータは、一般に、再生可能エネルギー源(RES)の可用性とシステム予備要件を最初に予測し、予測により、ユニットコミットメント(UC)などの最適化モデルを解き、それに応じて経済的な運用計画を決定する。しかし、そのようなオープンループプロセスは、その予測器が究極の演算コストではなく即時統計的予測誤差を改善するために、本質的に運用経済学を損なう可能性がある。そこで,本稿では,演算経済学を改善するための規範的な uc を提供するクローズドループ予測最適化フレームワークを提案する。まず, 2レベル混合整数プログラミングモデルを用いて, 最適システム動作に適したコスト指向予測器を訓練する: 上位レベルは, 誘導運転コストに基づいて RES および予備予測器を訓練する; 下位レベルは, 与えられた予測によりシステム動作プロセスを模倣し, 誘導運転コストを上位レベルに戻す。さらに、トレーニングされた予測器の組込み可能性により、規範的なUCモデルが与えられ、RES保存予測とUC決定を同時に行なえる。最後に、実世界のデータを用いた数値ケーススタディでは、決定論的、堅牢で確率的なUCモデルよりも、規範的UCの経済的および実践的な利点が示される。

Generally, system operators conduct the economic operation of power systems in an open-loop predict-then-optimize process: the renewable energy source (RES) availability and system reserve requirements are first predicted; given the predictions, system operators solve optimization models such as unit commitment (UC) to determine the economical operation plans accordingly. However, such an open-loop process could essentially compromise the operation economics because its predictors myopically seek to improve the immediate statistical prediction errors instead of the ultimate operation cost. To this end, this paper presents a closed-loop predict-and-optimize framework, offering a prescriptive UC to improve the operation economics. First, a bilevel mixed-integer programming model is leveraged to train cost-oriented predictors tailored for optimal system operations: the upper level trains the RES and reserve predictors based on their induced operation cost; the lower level, with given predictions, mimics the system operation process and feeds the induced operation cost back to the upper level. Furthermore, the embeddability of the trained predictors grants a prescriptive UC model, which simultaneously provides RES-reserve predictions and UC decisions with enhanced operation economics. Finally, numerical case studies using real-world data illustrate the potential economic and practical advantages of prescriptive UC over deterministic, robust, and stochastic UC models.

翻訳日:2023-05-02 19:37:34 公開日:2023-04-29

# 統一バングラ多クラス感情コーパスのトランスフォーマーによるテキスト分類

Transformer-based Text Classification on Unified Bangla Multi-class Emotion Corpus ( http://arxiv.org/abs/2210.06405v2 )

ライセンス: Link先を確認

Md Sakib Ullah Sourav, Huidong Wang

(参考訳) 様々なWeb 2.0サービスにおける人々の思考を研究することの重要性から、感情分類(EC)は重要な業務である。一方、既存の研究は主に英語に重点を置いており、低リソース言語にはほとんど取り組んでいない。感情分析、特に英語のecは近年多くの注目を集めているが、世界で最も広く話されている言語の1つであるバングラの文脈ではほとんど研究されていない。本研究では,バングラ語テキストから感情を識別し抽出する手法の完全セットを提案する。バングラ語からの6つのクラス(怒り,嫌悪感,恐怖,喜び,悲しみ,驚き)に対して,近年,特に高資源言語において顕著な結果を示すトランスフォーマーベースモデルを用いて感情分類を行う。本モデルの性能評価には,Unified Bangla Multi-class Emotion Corpus (UBMEC) が用いられている。 UBMECは、6-emotionクラスでBanglaコメントをラベル付けした2つのデータセットと、私たちが開発した新しい手動タグ付きBanglaコメントを組み合わせたものだ。この作業で使用したコーパスデータセットとコードは、公開されています。

Because of its importance in studying people's thoughts on various Web 2.0 services, emotion classification (EC) is an important undertaking. Existing research, on the other hand, is mostly focused on the English language, with little work on low-resource languages. Though sentiment analysis, particularly the EC in English, has received a lot of attention in recent years, little study has been done in the context of Bangla, one of the world's most widely spoken languages. We propose a complete set of approaches for identifying and extracting emotions from Bangla texts in this research. We provide a Bangla emotion classifier for six classes (anger, disgust, fear, joy, sadness, and surprise) from Bangla words, using transformer-based models which exhibit phenomenal results in recent days, especially for high resource languages. The "Unified Bangla Multi-class Emotion Corpus (UBMEC)" is used to assess the performance of our models. UBMEC was created by combining two previously released manually labeled datasets of Bangla comments on 6-emotion classes with fresh manually tagged Bangla comments created by us. The corpus dataset and code we used in this work is publicly available.

翻訳日:2023-05-02 19:30:12 公開日:2023-04-29

# 対称性平均化を伴う臨界イジングモデルの変分量子シミュレーション

Variational quantum simulation of critical Ising model with symmetry averaging ( http://arxiv.org/abs/2210.15053v2 )

ライセンス: Link先を確認

Troy J. Sewell, Ning Bao, Stephen P. Jordan

(参考訳) 本稿では, ギャップレスシステムの基底状態に対する可変アンサッツとして, DMERA(Deep Multi-scale entanglement Renormalization)回路を用いることを検討した。正解可能な一次元臨界横場イジングモデルをテストベッドとして用いる。この場合、ansatzの数値的正確なシミュレーションは、効率的な古典アルゴリズムを利用してマッチゲート回路をシミュレートすることにより、数百キュービットに実行することができる。このシステムでは、DMERAは標準的なQAOAスタイルのアンサッツを強く上回り、DMERAを用いて近似した相関関数の体系的誤差の主な原因は、逆場イジングモデルの変換対称性とクラマース・ワニエ対称性の破れである。この誤差を対称性平均化によって最大4桁削減できるが、量子ビットや回路の深さに余計なコストがかかることはない。本手法は,他の対称性を持つ物理系のnisqシミュレーションに適用できることを示す。

Here, we investigate the use of deep multi-scale entanglement renormalization (DMERA) circuits as a variational ansatz for ground states of gapless systems. We use the exactly-solvable one-dimensional critical transverse-field Ising model as a testbed. Numerically exact simulation of the ansatz can in this case be carried out to hundreds of qubits by exploiting efficient classical algorithms for simulating matchgate circuits. We find that, for this system, DMERA strongly outperforms a standard QAOA-style ansatz, and that a major source of systematic error in correlation functions approximated using DMERA is the breaking of the translational and Kramers-Wannier symmetries of the transverse-field Ising model. We are able to reduce this error by up to four orders of magnitude by symmetry averaging, without incurring additional cost in qubits or circuit depth. We propose that this technique for mitigating systematic error could be applied to NISQ simulations of physical systems with other symmetries.

翻訳日:2023-05-02 19:21:02 公開日:2023-04-29

# 言語を使って見えないドメインに拡張する

Using Language to Extend to Unseen Domains ( http://arxiv.org/abs/2210.09520v6 )

ライセンス: Link先を確認

Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, Aditi Raghunathan, Anja Rohrbach

(参考訳) ビジョンモデルがデプロイ時に遭遇する可能性のあるすべてのドメインのトレーニングデータを集めることは、費用がかかる。代わりに、訓練領域(例えば「鳥の写真」)と拡張したいがデータを持たない領域(例えば「鳥の絵」)がいかに堅牢性を向上させるかを考える。共同画像と言語埋め込み空間を備えたマルチモーダルモデルを用いて、LADSは、タスク関連情報を保存しながら、トレーニング領域から各未確認テスト領域への画像埋め込みの変換を学習する。未確認テストドメインからのイメージを一切使用せずに、トレーニングドメインと未確認テストドメインの両方を含む拡張ドメイン上で、LADSは、ドメイン適応とデータセットバイアスをターゲットとする4つのベンチマークのスイートに対して、標準的な微調整とアンサンブルアプローチより優れていることを示す。

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias.

翻訳日:2023-05-02 19:19:33 公開日:2023-04-29

# コンビネータ型機械学習のためのゲーム理論的混合エキスパート

Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning ( http://arxiv.org/abs/2211.14669v2 )

ライセンス: Link先を確認

Ethan Rathbun, Kaleel Mahmood, Sohaib Ahmad, Caiwen Ding, Marten van Dijk

(参考訳) 敵の機械学習の最近の進歩は、堅牢であると考えられる防御は、その弱点を狙うように特別にカスタマイズされた敵の攻撃の影響を受けやすいことを示している。これらの防衛には、BaRT(Barrage of Random Transforms)、FAT(Friendly Adversarial Training)、Trash is Treasure(TiT)、ViT(Vision Transformers)、Big Transfer(Big Transfer)モデル、SNN(Spike Neural Networks)で構成されるアンサンブルモデルが含まれる。まず,一方の防衛をカスタマイズした攻撃によって生じる敵の事例を,他方の防衛で誤分類されることが少なくないことを示す。この発見は2つの重要な疑問をもたらす。まず、ゲーム理論の枠組みにおいて、防御間の低転送性をどのように活用してロバスト性を向上させるのか。第2に、このフレームワーク内の敵は、どのようにして効果的なマルチモデル攻撃を開発できるのか? 本稿では,敵の攻撃と防御をアンサンブルするためのゲーム理論フレームワークを提案する。我々のフレームワークはGame Theoretic Mixed Experts (GaME)と呼ばれる。これは、検知器ベースと標準ディフェンダーの両方の混合ナッシュ戦略を見つけるために設計されており、構成的敵攻撃を用いる攻撃者に直面している。さらに,ランダム化変換による防御,マルチモデル投票方式,敵検出アーキテクチャを対象とした3つの攻撃アルゴリズムを提案する。これらの攻撃は、GaMEフレームワークによって生成された防御を強化し、予期せぬ攻撃に対する堅牢性を検証するのに役立つ。全体として、我々のフレームワークと分析は、構成的攻撃と防御の定式化に新たな洞察を与えることで、敵対的機械学習の分野を前進させます。

Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). We first conduct a transferability analysis, to demonstrate the adversarial examples generated by customized attacks on one defense, are not often misclassified by another defense. This finding leads to two important questions. First, how can the low transferability between defenses be utilized in a game theoretic framework to improve the robustness? Second, how can an adversary within this framework develop effective multi-model attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses. Our framework is called Game theoretic Mixed Experts (GaME). It is designed to find the Mixed-Nash strategy for both a detector based and standard defender, when facing an attacker employing compositional adversarial attacks. We further propose three new attack algorithms, specifically designed to target defenses with randomized transformations, multi-model voting schemes, and adversarial detector architectures. These attacks serve to both strengthen defenses generated by the GaME framework and verify their robustness against unforeseen attacks. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.

翻訳日:2023-05-02 19:13:05 公開日:2023-04-29

# 局所情報の流れをもつ量子理論

Quantum theories with local information flow ( http://arxiv.org/abs/2211.13325v2 )

ライセンス: Link先を確認

Eduarda Fonseca da Nova Cruz, David M\"ockli

(参考訳) ベル非局所性(bell non-locality)は、量子力学の特定の修正に適用される用語である。しかし、ベルの定理は、修正されていない量子力学自体が非局所的であり、局所実在論的な解釈は維持できないと宣伝するために常用的に用いられる。ベルの元々の不等式に基づいて、局所量子力学、超決定論、非局所崩壊量子力学、非局所隠れ変数理論の4つの可能なカテゴリを同定する。多くの局所的・決定論的記述は見過ごされている。これら3つのカテゴリについて、量子情報の局所的な流れが可能である解釈の例を示す。我々は,現在の実験的提案と改良された科学哲学が,解釈を対比し,両者を区別できるかどうかを評価する。

Bell non-locality is a term that applies to specific modifications of quantum mechanics. Yet, Bell's theorem is habitually used to advertise that unmodified quantum mechanics itself is non-local and that local realist interpretations are untenable. Based on Bell's original inequality, we identify four viable categories of quantum theories: local quantum mechanics, superdeterminism, non-local collapse quantum mechanics, and non-local hidden variable theories. Many local and deterministic descriptions appear to be overlooked. For three of those categories, we present an example of an interpretation where a local flow of quantum information is possible. We assess whether current experimental proposals and an improved philosophy of science can contrast interpretations and distinguish between them.

翻訳日:2023-05-02 19:12:37 公開日:2023-04-29

# グラフニューラルネットワークと構造化状態空間モデルを用いた多変量生体信号のモデリング

Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models ( http://arxiv.org/abs/2211.11176v3 )

ライセンス: Link先を確認

Siyi Tang, Jared A. Dunnmon, Liangqiong Qu, Khaled K. Saab, Tina Baykaner, Christopher Lee-Messer, Daniel L. Rubin

(参考訳) 多変量バイオシグナールは、脳波、ポリソムノグラフィ、心電図など多くの医療領域で広く使われている。多変量生体信号の時空間依存性のモデル化は,(1)長距離時間依存性と(2)電極間の複雑な空間相関により困難である。これらの課題に対処するために,多変量バイオシグナーを時間依存グラフとして表現し,バイオシグナーの時空間依存性をモデル化して生体シグナー分類タスクの性能を向上させる汎用グラフニューラルネットワーク(GNN)アーキテクチャであるGraphS4merを提案する。具体的には,(1)生体信号の長期的時間依存性を捉えるために,最先端のディープシーケンスモデルである構造化状態空間アーキテクチャを利用し,(2)グラフ構造学習層をgraphs4merで提案し,データ内の動的に進化するグラフ構造を学習する。 We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score.

Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as time-dependent graphs and introduce GraphS4mer, a general graph neural network (GNN) architecture that improves performance on biosignal classification tasks by modeling spatiotemporal dependencies in biosignals. Specifically, (1) we leverage the Structured State Space architecture, a state-of-the-art deep sequence model, to capture long-range temporal dependencies in biosignals and (2) we propose a graph structure learning layer in GraphS4mer to learn dynamically evolving graph structures in the data. We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score.

翻訳日:2023-05-02 19:12:26 公開日:2023-04-29

# 文法的誤り訂正 : 美術の現状調査

Grammatical Error Correction: A Survey of the State of the Art ( http://arxiv.org/abs/2211.05166v4 )

ライセンス: Link先を確認

Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe

(参考訳) 文法的誤り訂正(英: grammatical error correction、gec)は、テキスト中の誤りを自動的に検出し修正する作業である。このタスクには、前置詞の欠如や主語-動詞の一致の誤りなどの文法的誤りの修正だけでなく、スペルミスや単語選択エラーなどの正書法と意味的誤りも含んでいる。この分野は過去10年間に顕著な進歩を遂げており、一部にはルールベースの手法、統計分類器、統計機械翻訳、そして芸術の現在の支配的な状態を表すニューラルネットワーク翻訳システムの開発を推進した5つの共有タスクが動機となっている。本稿では,この分野を一つの記事にまとめ,まず,課題の言語的課題について概説し,研究者が利用可能な最も一般的なデータセット(英語と他言語)を紹介し,特に人工的エラー生成に焦点を当てた様々な手法とテクニックを要約する。次に,評価に対する様々なアプローチについて述べるとともに,特に主観的人間の判断に関して,メートル法信頼性に関する懸念について述べるとともに,最近の進歩と今後の課題への提言の概要をまとめる。この調査が、この分野に新しい研究者や、最近の進歩を評価され続けたい研究者にとって、包括的なリソースになることを期待しています。

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

翻訳日:2023-05-02 19:12:02 公開日:2023-04-29

# 相関型不確かさによるドメインの一般化

Domain Generalization with Correlated Style Uncertainty ( http://arxiv.org/abs/2212.09950v2 )

ライセンス: Link先を確認

Zheyuan Zhang, Bin Wang, Debesh Jha, Ugur Demir, Ulas Bagci

(参考訳) ドメイン一般化(dg)アプローチは、より堅牢なディープラーニングモデルにつながるドメイン不変機能を抽出することを目的としている。この点において、スタイル拡張は、合成新規ドメインに対する情報的スタイル特性を含むインスタンス固有の特徴統計を利用する強力なDG手法である。しかしながら、スタイル拡張に関する先行研究は、異なる特徴チャネル間の相互依存を無視したり、スタイル拡張を線形補間のみに制限している。本研究では,スタイル統計空間における線形補間の限界を乗り越え,バイタル相関情報を同時に保持する,相関型不確実性(csu)という最先端拡張手法を提案する。本手法の有効性は,pacs,office-home,camlyon17データセット,duke-market1501インスタンス検索タスクなど,多種多様なクロスドメインコンピュータビジョンおよび医用画像分類タスクの広範な実験によって確立される。その結果,既存の最先端技術に比べて著しく改善率が向上した。ソースコードは一般公開されている。

Domain generalization (DG) approaches intend to extract domain invariant features that can lead to a more robust deep learning model. In this regard, style augmentation is a strong DG method taking advantage of instance-specific feature statistics containing informative style characteristics to synthetic novel domains. However, prior works on style augmentation have disregarded the interdependence amongst distinct feature channels or have solely constrained style augmentation to linear interpolation. In this work, we introduce a cutting-edge augmentation approach named Correlated Style Uncertainty (CSU), which surpasses the limitations of linear interpolation in style statistic space and simultaneously preserves vital correlation information. Our method's efficacy is established through extensive experimentation on diverse cross-domain computer vision and medical imaging classification tasks, namely PACS, Office-Home, and Camelyon17 datasets, as well as the Duke-Market1501 instance retrieval task. The results showcase a remarkable improvement margin over existing state-of-the-art techniques. The source code is available for public use.

翻訳日:2023-05-02 19:02:32 公開日:2023-04-29

# 3量子状態における絡み合い、コヒーレンス、ステアリング、ベル非局所不等式違反の相補的関係

Complementary relations of entanglement, coherence, steering and Bell nonlocality inequality violation in three-qubit states ( http://arxiv.org/abs/2212.09326v2 )

ライセンス: Link先を確認

Dong-Dong Dong, Xue-Ke Song, Xiao-Gang Fan, Liu Ye, and Dong Wang

(参考訳) 我々は,任意の3ビット状態に対する絡み合い,コヒーレンス,ステアリング不等式違反,ベル非局所性の相補関係を提唱した。一つのパラメータを持つ真に絡み合った3量子状態の2つの族が存在し、それぞれ一定量の負性に対して最大コヒーレンスおよびステアリング不等式違反を示す。ネガティリティは常に3量子ビット混合状態のネガティリティよりも小さいかまたは等しいが、ネガティリティは3量子ビット純状態の2成分共役の幾何学的平均とちょうど等しいことが判明した。さらに, 3部交絡状態に対する負性度と一階コヒーレンスとの相補関係を確立する。さらに, 負性度と最大操舵不等式違反の関係について検討した。さらに、任意の3ビット状態に対する負性度とベル不等式最大違反の相補関係を得る。この結果は、絡み合い、コヒーレンス、操舵不等式違反、ベル非局所性との間の基本的な関係の信頼できる証拠を提供する。

We put forward complementary relations of entanglement, coherence, steering inequality violation, and Bell nonlocality for arbitrary three-qubit states. We show that two families of genuinely entangled three-qubit pure states with single parameter exist, and they exhibit maximum coherence and steering inequality violation for a fixed amount of negativity, respectively. It is found that the negativity is exactly equal to the geometric mean of bipartite concurrences for the three-qubit pure states, although the negativity is always less than or equal to the latter for three-qubit mixed states. Moreover, the complementary relation between negativity and first-order coherence for tripartite entanglement states are established. Furthermore, we investigate the close relation between the negativity and the maximum steering inequality violation. In addition, the complementary relation between negativity and the maximum Bell-inequality violation for arbitrary three-qubit states is obtained. The results provide reliable evidence of fundamental connections among entanglement, coherence, steering inequality violation, and Bell nonlocality.

翻訳日:2023-05-02 19:01:56 公開日:2023-04-29

# 氷河氷モデルのベイズ推定への応用による多段階スタイン変分勾配降下のさらなる解析

Further analysis of multilevel Stein variational gradient descent with an application to the Bayesian inference of glacier ice models ( http://arxiv.org/abs/2212.03366v2 )

ライセンス: Link先を確認

Terrence Alsup and Tucker Hartland and Benjamin Peherstorfer and Noemi Petra

(参考訳) 多レベルスタイン変分勾配勾配は、様々なコストと忠実さで代理対象分布の階層性を活用し、推論を計算的に高速化する粒子ベースの変分勾配勾配法である。この作品の貢献は2つある。まず, 単レベルスタイン変分勾配勾配の指数収束速度が反復変動パラメータに依存する場合においても, 従来のコスト複雑性解析の拡張を示す。第2に,アロラ氷河の離散基底すべり係数場を推定する大規模ベイズ逆問題に対して,多値スタイン変分勾配勾配を適用した。数値実験により、マルチレベルバージョンはシングルレベルバージョンに比べて桁違いのスピードアップを達成することが示された。

Multilevel Stein variational gradient descent is a method for particle-based variational inference that leverages hierarchies of surrogate target distributions with varying costs and fidelity to computationally speed up inference. The contribution of this work is twofold. First, an extension of a previous cost complexity analysis is presented that applies even when the exponential convergence rate of single-level Stein variational gradient descent depends on iteration-varying parameters. Second, multilevel Stein variational gradient descent is applied to a large-scale Bayesian inverse problem of inferring discretized basal sliding coefficient fields of the Arolla glacier ice. The numerical experiments demonstrate that the multilevel version achieves orders of magnitude speedups compared to its single-level version.

翻訳日:2023-05-02 19:00:50 公開日:2023-04-29

# 多量子ビット状態の特殊コアテンソルと3線の並行性

Special core tensors of multi-qubit states and the concurrency of three lines ( http://arxiv.org/abs/2301.05953v2 )

ライセンス: Link先を確認

Pak Shen Choong, Hishamuddin Zainuddin, Kar Tim Chan, Sharifah Kartini Said Husain

(参考訳) マルチパーティイト状態の分類は、局所ユニタリ(LU)または確率的局所演算および古典的通信(SLOCC)の作用の下で、運用上有用で有限な絡み合いクラスを得ることを目的としている。本研究では,高階特異値分解 (hosvd) と 3 行の並行性を用いて,これらのクラスを計算的に簡易に求める手法を提案する。 HOSVDは同時に多粒子状態の1体還元密度行列(RDM)を対角化するため、多粒子状態のコアテンソルはそのような対角化1体RDMの純粋状態表現である。 3 と 4 のキュービットの特別なコアテンソルを同定し、これはデフォルトでも真に絡み合っている。特別な核テンソルは、最初の$n$モード特異値である$\sigma_1^{(i)2}$に基づいて、状態の族に分類される。現在の提案はマルチキュービットシステムに限定されているが、大規模なマルチキュービットシステムとよく合致し、有限個の状態族を生成する。

Classification of multipartite states aims to obtain a set of operationally useful and finite entanglement classes under the action of either local unitary (LU) or stochastic local operation and classical communication (SLOCC). In this work, we propose a computationally simple approach to find these classes by using higher order singular value decomposition (HOSVD) and the concurrency of three lines. Since HOSVD simultaneously diagonalizes the one-body reduced density matrices (RDM) of multipartite states, the core tensor of multipartite states is the pure-state representation of such simultaneously diagonalized one-body RDM. We identified the special core tensors of three and four qubits, which are also genuinely entangled by default. The special core tensors are further categorized into families of states based on their first $n$-mode singular values, $\sigma_1^{(i)2}$. The current proposal is limited to multi-qubit system, but it scales well with large multi-qubit systems and produces a finite number of families of states.

翻訳日:2023-05-02 18:54:12 公開日:2023-04-29

# Hungry Hungry Hippos: 状態空間モデルによる言語モデリングを目指して

Hungry Hungry Hippos: Towards Language Modeling with State Space Models ( http://arxiv.org/abs/2212.14052v3 )

ライセンス: Link先を確認

Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher R\'e

(参考訳) 状態空間モデル (SSM) は、いくつかのモダリティにおいて最先端のシーケンスモデリング性能を示しているが、言語モデリングではあまり注目されていない。さらに、二乗ではなく列長でほぼ線形にスケーリングしても、ハードウェア使用率の低さから、ssmはトランスフォーマーよりも遅い。本稿では,言語モデリングにおけるssmと注意の間の表現性ギャップの理解と,ssmと注意の間のハードウェア障壁の低減について述べる。まず,SSMと注意のギャップを理解するために,合成言語モデリングタスクを用いる。既存のssmには2つの機能があります。シーケンス内の以前のトークンのリコールと、シーケンス全体のトークンの比較です。言語モデリングへの影響を理解するため,これらの機能に特化して設計された新しいSSM層H3を提案する。 H3は合成言語に注意を向け、OpenWebText上のTransformersの0.4 PPL以内である。さらに、2つの注意層を保持する125mパラメータh3アテンションハイブリッドモデルは、openwebtextのトランスフォーマーを1.0 pplで驚くほど上回っている。次に,最新のハードウェア上でのssmトレーニングの効率を向上させるため,flashconvを提案する。 FlashConvは8Kまでのシーケンスの効率を改善するために融合ブロックFFTアルゴリズムを使用し、SSMの繰り返し特性を利用して長いシーケンスにスケールする新しいステートパスアルゴリズムを導入した。 FlashConvは、長距離アリーナベンチマークで2$\times$スピードアップし、トランスフォーマーよりも2.4$\times$のテキストを生成することができる。 flashconvを使用すると、最大2.7bのパラメータを持つハイブリッドh3-attention言語モデルにスケールし、最初の結果が期待できる。

State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 2.4$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 2.7B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.

翻訳日:2023-05-02 18:52:11 公開日:2023-04-29

# オープン語彙オブジェクト検出のための検出とセグメントの学習

Learning to Detect and Segment for Open Vocabulary Object Detection ( http://arxiv.org/abs/2212.12130v5 )

ライセンス: Link先を確認

Tao Wang and Nan Li

(参考訳) オープンボキャブラリのオブジェクト検出は,最近開発された視覚言語事前学習モデルによって,意味カテゴリーのみを持つ新規なオブジェクトの認識を支援することで,大きく進歩している。先行研究は、主にオブジェクト提案分類への知識伝達に焦点をあて、クラスに依存しないボックスとマスク予測を採用する。本研究では,オープン語彙設定のためのボックス回帰とマスクセグメンテーションをより一般化する,原理的動的ネットワーク設計であるCondHeadを提案する。中心となる考え方は、セマンティック埋め込みに基づいてネットワークヘッドを条件付きパラメータ化することで、新しいカテゴリをよりよく検出するために、クラス固有の知識でモデルが導かれることである。特に、condheadは、動的に集約されたヘッドと動的に生成されたヘッドの2つのネットワークヘッドからなる。前者は条件付き集約された静的なヘッドでインスタンス化され、これらのヘッドはエキスパートとして最適化され、洗練された予測を学ぶことが期待されている。後者は動的に生成されたパラメータでインスタンス化し、一般的なクラス固有の情報をエンコードする。このような条件付き設計により、検出モデルは意味埋め込みによって橋渡しされ、強い一般化可能なクラスワイズボックスとマスク予測を提供する。提案手法は,最先端のオープンボキャブラリオブジェクト検出手法に非常に小さなオーバーヘッドで大幅な改善をもたらす。例えば,新しいカテゴリのAPを3.0で検出し,計算量はわずか1.1%に留まる。

Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1% more computation.

翻訳日:2023-05-02 18:51:19 公開日:2023-04-29

# 深層学習型アクセシブルパーキング管理システムShine

SHINE: Deep Learning-Based Accessible Parking Management System ( http://arxiv.org/abs/2302.00837v2 )

ライセンス: Link先を確認

Dhiraj Neupane, Aashish Bhattarai, Sunil Aryal, Mohamed Reda Bouadjenek, Uk-Min Seok, and Jongwon Seok

(参考訳) 科学技術の進歩により、現在進行中の都市部の拡大は、韓国を含む世界中の民間所有車両の数が大幅に増加した。しかし、この段階的な車両数の増加は必然的に、障害者専用駐車スペース(以下「アクセス可能な駐車スペース」と呼ぶ)の乱用など、駐車関連の問題を引き起こしている。従来のlprシステムは、監視カメラのフレームレートが高いこと、自然と人工のノイズの存在、これらのシステムによる検出と認識を妨げる照明や気象条件の変化などにより、このような問題をリアルタイムに対処できないことが証明されている。パーキング4.0の概念の高まりにより、多くのセンサー、IoTおよびディープラーニングベースのアプローチが自動LPRとパーキング管理システムに適用された。それにもかかわらず、この研究は韓国でアクセス可能な駐車スペースを管理するための堅牢で効率的なモデルの必要性を示している。これに対処するため,我々は,深層学習に基づく物体検出アルゴリズムを用いて車両,ナンバープレート,障害バッジ(以下,カード,バッジ,アクセスバッジとして参照)を検出し,中央サーバと協調してアクセス可能な駐車スペースの使用権を検証する,shineという新しいシステムを提案する。本モデルは,平均92.16%の精度を実現し,アクセス可能な駐車スペース乱用の問題に対処し,都市環境における効率的な駐車管理に大いに寄与する。

The ongoing expansion of urban areas facilitated by advancements in science and technology has resulted in a considerable increase in the number of privately owned vehicles worldwide, including in South Korea. However, this gradual increment in the number of vehicles has inevitably led to parking-related issues, including the abuse of disabled parking spaces (hereafter referred to as accessible parking spaces) designated for individuals with disabilities. Traditional license plate recognition (LPR) systems have proven inefficient in addressing such a problem in real-time due to the high frame rate of surveillance cameras, the presence of natural and artificial noise, and variations in lighting and weather conditions that impede detection and recognition by these systems. With the growing concept of parking 4.0, many sensors, IoT and deep learning-based approaches have been applied to automatic LPR and parking management systems. Nonetheless, the studies show a need for a robust and efficient model for managing accessible parking spaces in South Korea. To address this, we have proposed a novel system called, SHINE, which uses the deep learning-based object detection algorithm for detecting the vehicle, license plate, and disability badges (referred to as cards, badges, or access badges hereafter) and verifies the rights of the driver to use accessible parking spaces by coordinating with the central server. Our model, which achieves a mean average precision of 92.16%, is expected to address the issue of accessible parking space abuse and contributes significantly towards efficient and effective parking management in urban environments.

翻訳日:2023-05-02 18:44:21 公開日:2023-04-29

# リプシッツ境界深部ネットワークの直接パラメータ化

Direct Parameterization of Lipschitz-Bounded Deep Networks ( http://arxiv.org/abs/2301.11526v2 )

ライセンス: Link先を確認

Ruigang Wang, Ian R. Manchester

(参考訳) 本稿では、リプシッツ境界が保証される深層ニューラルネットワーク(完全接続と畳み込みの両方)の新しいパラメータ化、すなわち摂動に対する感度の制限を導入する。リプシッツ保証は半定値プログラム(SDP)による認証に基づく最も厳密な既知の境界と等価であり、大きなモデルにスケールしない。 sdp のアプローチとは対照的に ``direct'' パラメータ化、すなわち$\mathbb r^n$ からリプシッツ境界ネットワークの重み集合への滑らかな写像を提供する。これにより、計算集約的なプロジェクションや障壁項を使わずに、標準的な勾配法によるトレーニングが可能になる。新しいパラメータ化は、新しい層タイプ( \textit{sandwich layer} )や、近隣層間のパラメータ共有を伴う標準フィードフォワードネットワークの新しいパラメータ化のいずれかと考えることができる。最後に、画像分類に関する総合的な実験により、サンドイッチ層は経験的および証明された堅牢な精度において、以前のアプローチよりも優れていることが示された。

This paper introduces a new parameterization of deep neural networks (both fully-connected and convolutional) with guaranteed Lipschitz bounds, i.e. limited sensitivity to perturbations. The Lipschitz guarantees are equivalent to the tightest-known bounds based on certification via a semidefinite program (SDP), which does not scale to large models. In contrast to the SDP approach, we provide a ``direct'' parameterization, i.e. a smooth mapping from $\mathbb R^N$ onto the set of weights of Lipschitz-bounded networks. This enables training via standard gradient methods, without any computationally intensive projections or barrier terms. The new parameterization can equivalently be thought of as either a new layer type (the \textit{sandwich layer}), or a novel parameterization of standard feedforward networks with parameter sharing between neighbouring layers. Finally, the comprehensive set of experiments on image classification shows that sandwich layers outperform previous approaches on both empirical and certified robust accuracy.

翻訳日:2023-05-02 18:43:42 公開日:2023-04-29

# ユーザビリティギャップの橋渡し--隠れマルコフモデルのスペクトル学習のための理論的および方法論的進歩

Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models ( http://arxiv.org/abs/2302.07437v2 )

ライセンス: Link先を確認

Xiaoyuan Ma, Jordan Rodu

(参考訳) Baum-Welch (B-W) アルゴリズムは隠れマルコフモデル (HMM) を推論する最も広く受け入れられている手法である。しかし、ローカルの最適化では立ち往生する傾向があり、多くのリアルタイムアプリケーションでは遅すぎる可能性がある。モーメント法(MOM)に基づくHMM(SHMM)のスペクトル学習は,これらの障害を克服するために文献で提案されている。 SHMMに対する漸近的理論は期待されているが, SHMMの長期性能は未確認誤差の伝播により劣化する可能性がある。本稿では, SHMMが推定した推定値の近似誤差の漸近分布について, 2) 誤り伝播の問題を緩和するプロジェクテッドSHMM (PSHMM) と呼ばれる新しいアルゴリズムを提案し, (3) 潜在的な非定常性に対応するSHMMとPSHMMの両方のオンライン学習用変種を開発する。 SHMMの性能をPSHMMと比較し、実世界のアプリケーションからのデータとシミュレーションデータの両方でB-Wアルゴリズムを用いて推定し、PSHMMがSHMMの計算上の優位性を保持するだけでなく、より堅牢な推定と予測を提供することを示した。

The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMM), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM has been elusive, and the long-run performance of SHMM can degrade due to unchecked propagation of error. In this paper, we (1) provide an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, (2) propose a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propagation, and (3) develop online learning variants of both SHMM and PSHMM that accommodate potential nonstationarity. We compare the performance of SHMM with PSHMM and estimation through the B-W algorithm on both simulated data and data from real world applications, and find that PSHMM not only retains the computational advantages of SHMM, but also provides more robust estimation and forecasting.

翻訳日:2023-05-02 18:34:34 公開日:2023-04-29

# 混合量子古典力学におけるラグランジュ軌道と閉包モデル

Lagrangian trajectories and closure models in mixed quantum-classical dynamics ( http://arxiv.org/abs/2303.01975v2 )

ライセンス: Link先を確認

Cesare Tronci, Fran\c{c}ois Gay-Balmaz

(参考訳) 完全量子アプローチの計算課題を克服するために、混合量子古典モデルがいくつかの文脈で提案されている。しかし、平均場近似を超えた現在のモデルは、通常長期にわたる一貫性の問題に悩まされ、場合によってはハイゼンベルクの不確実性原理を無効にする。ここでは量子古典力学の完全ハミルトン理論を提示し、量子密度と古典密度の正則性を超えた一連の一貫性特性を最初に保証したように見える。ラグランジアン位相空間パスに基づいて、モデルはカシミール汎函数の無限類と同様に量子古典的なポアンカーイ積分不変量を持つ。また,エーレンフェスト模型を化学物理学から拡張する軌道閉包スキームを提案する。

Mixed quantum-classical models have been proposed in several contexts to overcome the computational challenges of fully quantum approaches. However, current models beyond mean-field approximations typically suffer from long-standing consistency issues, and, in some cases, invalidate Heisenberg's uncertainty principle. Here, we present a fully Hamiltonian theory of quantum-classical dynamics that appears to be the first to ensure a series of consistency properties, beyond positivity of quantum and classical densities. Based on Lagrangian phase-space paths, the model possesses a quantum-classical Poincar\'e integral invariant as well as infinite classes of Casimir functionals. We also present a trajectory closure scheme that extends the Ehrenfest model from chemical physics.

翻訳日:2023-05-02 18:26:18 公開日:2023-04-29

# テスト時間適応のための特徴調整と均一性

Feature Alignment and Uniformity for Test Time Adaptation ( http://arxiv.org/abs/2303.10902v2 )

ライセンス: Link先を確認

Shuai Wang, Daoan Zhang, Zipei Yan, Jianguo Zhang, Rui Li

(参考訳) テスト時間適応(TTA)は、分散テストドメインサンプルの受信時にディープニューラルネットワークを適用することを目的としている。この設定では、モデルはオンラインのラベルのないテストサンプルとトレーニングドメインで事前トレーニングされたモデルのみにアクセスできる。まず、ソースドメインとターゲットドメイン間のドメインギャップにより、TTAを機能リビジョン問題として扱う。その後、2つの測定アライメントと均一性に従い,テスト時間特徴の修正について検討した。テスト時間特徴の均一性について,本研究では,現在のバッチと前回のバッチの表現間の均一性の一貫性を保証するための,テスト時間自己蒸留戦略を提案する。テスト時間の特徴的アライメントを実現するため, 周辺サンプル間の表現の整合化を図った空間的局所クラスタリング手法を提案する。一般的なノイズラベル問題に対処するため,エントロピーと一貫性フィルタを提案し,ノイズラベルの選択と削除を行う。本手法のスケーラビリティと有効性を証明するため,種々のバックボーンを用いた4つの領域一般化ベンチマークと4つの医療画像分割タスクの実験を行った。実験の結果,本手法はベースラインを安定的に改善するだけでなく,既存のテスト時間適応法よりも優れていることがわかった。

Test time adaptation (TTA) aims to adapt deep neural networks when receiving out of distribution test domain samples. In this setting, the model can only access online unlabeled test samples and pre-trained models on the training domains. We first address TTA as a feature revision problem due to the domain gap between source domains and target domains. After that, we follow the two measurements alignment and uniformity to discuss the test time feature revision. For test time feature uniformity, we propose a test time self-distillation strategy to guarantee the consistency of uniformity between representations of the current batch and all the previous batches. For test time feature alignment, we propose a memorized spatial local clustering strategy to align the representations among the neighborhood samples for the upcoming batch. To deal with the common noisy label problem, we propound the entropy and consistency filters to select and drop the possible noisy labels. To prove the scalability and efficacy of our method, we conduct experiments on four domain generalization benchmarks and four medical image segmentation tasks with various backbones. Experiment results show that our method not only improves baseline stably but also outperforms existing state-of-the-art test time adaptation methods.

翻訳日:2023-05-02 18:16:59 公開日:2023-04-29

# オーディオ信号処理のためのコンテンツ適応フロントエンド

Content Adaptive Front End For Audio Signal Processing ( http://arxiv.org/abs/2303.10446v2 )

ライセンス: Link先を確認

Prateek Verma and Chris Chafe

(参考訳) 音声信号処理のための学習可能なコンテンツ適応フロントエンドを提案する。ディープラーニングが出現する前は、spectrogramやmel-spectrogramのような固定表現非学習フロントエンドを使用していた。 ASRや音響シーン理解などの様々な応用をサポートする畳み込みアーキテクチャでは、学習可能なフロントエンドへのシフトが発生し、基礎関数の種類と重みの両方がスクラッチから学習され、特定の作業に最適化される。畳み込みブロックのないトランスフォーマーベースのアーキテクチャへの移行により、線形層は小さな波形パッチを小さな潜在次元に投影し、トランスフォーマーアーキテクチャに供給する。本研究では,コンテンツ適応学習可能な時間周波数表現の計算法を提案する。我々は各音声信号を畳み込みフィルタのバンクに通し、それぞれが固定次元ベクトルを与える。有限インパルス応答フィルタバンクのバンクを学習し、入力信号の内容に応じて最適なフィルタバンクを介して入力信号を渡すのと同じである。コンテンツ適応学習可能な時間周波数表現は、本論文の実験以上に広く適用することができる。

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural architectures. With convolutional architectures supporting various applications such as ASR and acoustic scene understanding, a shift to a learnable front ends occurred in which both the type of basis functions and the weight were learned from scratch and optimized for the particular task of interest. With the shift to transformer-based architectures with no convolutional blocks present, a linear layer projects small waveform patches onto a small latent dimension before feeding them to a transformer architecture. In this work, we propose a way of computing a content-adaptive learnable time-frequency representation. We pass each audio signal through a bank of convolutional filters, each giving a fixed-dimensional vector. It is akin to learning a bank of finite impulse-response filterbanks and passing the input signal through the optimum filter bank depending on the content of the input signal. A content-adaptive learnable time-frequency representation may be more broadly applicable, beyond the experiments in this paper.

翻訳日:2023-05-02 18:16:39 公開日:2023-04-29

# 大きな言語モデルは意識できるのか?

Could a Large Language Model be Conscious? ( http://arxiv.org/abs/2303.07103v2 )

ライセンス: Link先を確認

David J. Chalmers

(参考訳) 最近、大きな言語モデルが知覚的か意識的であるかという議論が広まっている。このアイデアを真剣に考えるべきか? 私は最強の理由と反対の理由を断ち切る。意識科学における主要な仮定を考えると、現在のモデルでは意識に重大な障害がある:例えば、リカレント処理の欠如、グローバルワークスペース、統合されたエージェンシーなどである。同時に、これらの障害が今後10年ほどで克服される可能性は極めて高い。結論としては、現在の大きな言語モデルが意識されている可能性は少しありそうにないが、大きな言語モデルの後継者が近い将来意識される可能性について真剣に考えるべきであると結論づけます。

There has recently been widespread discussion of whether large language models might be sentient or conscious. Should we take this idea seriously? I will break down the strongest reasons for and against. Given mainstream assumptions in the science of consciousness, there are significant obstacles to consciousness in current models: for example, their lack of recurrent processing, a global workspace, and unified agency. At the same time, it is quite possible that these obstacles will be overcome in the next decade or so. I conclude that while it is somewhat unlikely that current large language models are conscious, we should take seriously the possibility that successors to large language models may be conscious in the not-too-distant future.

翻訳日:2023-05-02 18:16:20 公開日:2023-04-29

# 多値拡散:画像生成のための無限次元スコアベース拡散モデル

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation ( http://arxiv.org/abs/2303.04772v2 )

ライセンス: Link先を確認

Paul Hagemann, Sophie Mildenberger, Lars Ruthotto, Gabriele Steidl, Nicole Tianjiao Yang

(参考訳) スコアベース拡散モデル(SBDM)は画像生成のための最先端のアプローチとして最近登場した。既存のSBDMは通常有限次元の設定で定式化され、画像は有限サイズのテンソルと見なされる。本稿では, 無限次元のSBDM, すなわち, 矩形領域でサポートされている関数としてトレーニングデータをモデル化する。より高解像度で画像を生成することの探求に加えて、我々の主な動機は、よく考えられた無限次元の学習問題を作成し、複数の解像度レベルで一貫した識別を可能にすることである。これにより,異なる解像度レベルにまたがる拡散モデルが得られ,訓練プロセスの効率が向上することを期待している。無限次元設定におけるsbdmアプローチの2つの欠点を克服する方法を示す。まず, 潜在分布が無限次元設定においてトレースクラス作用素の概念を用いて well-defined であることを保証するために, フォワードプロセスを修正した。第2に,演算子ネットワークを用いたスコア関数の近似化は,fno(fourier neural operator)が多レベルトレーニングに有用であることを示す。有限近似に対する無限次元の設定および逆過程のフォワード過程を導出した後、それらの正当性を示し、適切な離散化を導出し、潜在分布の役割を研究する。 2つのデータセット、MNISTと材料構造について、まず有望な数値結果を提供する。特に、このフレームワークでマルチレベルトレーニングが実現可能であることを示す。

Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of a finite size. This papers develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently on multiple resolution levels. We thereby hope to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process to ensure that the latent distribution is well-defined in the infinite-dimensional setting using the notion of trace class operators. Second, we illustrate that approximating the score function with an operator network, in our case Fourier neural operators (FNOs), is beneficial for multilevel training. After deriving the forward process in the infinite-dimensional setting and reverse processes for finite approximations, we show their well-posedness, derive adequate discretizations, and investigate the role of the latent distributions. We provide first promising numerical results on two datasets, MNIST and material structures. In particular, we show that multilevel training is feasible within this framework.

翻訳日:2023-05-02 18:14:41 公開日:2023-04-29

# 腎移植のための限られた表データによる臨床プロンプトを用いた3次元医用画像

MEDIMP: 3D Medical Images with clinical Prompts from limited tabular data for renal transplantation ( http://arxiv.org/abs/2303.12445v2 )

ライセンス: Link先を確認

Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou

(参考訳) 腎移植は末期腎疾患の最も有効な解決策として出現する。複雑な原因から発生し、慢性的な機能不全のかなりのリスクが持続し、移植片が失われる可能性がある。医療画像は、臨床における腎移植モニタリングにおいて重要な役割を果たす。しかし, 移植管理は, 腎学, 尿学, 放射線学の分野において多分野にまたがっており, このような高次元・複雑な診断データから堅牢なバイオマーカーを同定することは困難である。本研究では,近年の大規模言語モデル(llms)の成功から着想を得て,腎移植におけるダイナミックコントラスト強調mri(dce mri)の有意義なマルチモーダル表現を学習するためのモデルとして,臨床画像からテキストプロンプトへの変換後の構造的臨床生物学データを取り込むことにより,医用画像(クリニカルプロンプトを用いた医用画像)を提案する。 MEDIMPは、この困難なタスクを実行するために、ジョイントテキストイメージのペア埋め込みから対照的な学習に基づいている。さらに,LSMから自動テキストデータ拡張を用いて医療用プロンプトを生成するフレームワークを提案する。本研究の目的は,移植後2年,3年,4年後の患者状態の予後に興味深い腎移植dce mriの有意義な多様体を探索することであり,限られたマルチモーダルデータを最も効率的に活用することである。広範にわたる実験と、限られたデータによる他の腎移植表現学習法との比較は、関連する臨床環境におけるmedimpの有効性を証明し、医学的プロンプトへの新しい方向性を与える。私たちのコードはhttps://github.com/leomlck/MEDIMPで利用可能です。

Renal transplantation emerges as the most effective solution for end-stage renal disease. Occurring from complex causes, a substantial risk of transplant chronic dysfunction persists and may lead to graft loss. Medical imaging plays a substantial role in renal transplant monitoring in clinical practice. However, graft supervision is multi-disciplinary, notably joining nephrology, urology, and radiology, while identifying robust biomarkers from such high-dimensional and complex data for prognosis is challenging. In this work, taking inspiration from the recent success of Large Language Models (LLMs), we propose MEDIMP -- Medical Images with clinical Prompts -- a model to learn meaningful multi-modal representations of renal transplant Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE MRI) by incorporating structural clinicobiological data after translating them into text prompts. MEDIMP is based on contrastive learning from joint text-image paired embeddings to perform this challenging task. Moreover, we propose a framework that generates medical prompts using automatic textual data augmentations from LLMs. Our goal is to learn meaningful manifolds of renal transplant DCE MRI, interesting for the prognosis of the transplant or patient status (2, 3, and 4 years after the transplant), fully exploiting the limited available multi-modal data most efficiently. Extensive experiments and comparisons with other renal transplant representation learning methods with limited data prove the effectiveness of MEDIMP in a relevant clinical setting, giving new directions toward medical prompts. Our code is available at https://github.com/leomlck/MEDIMP.

翻訳日:2023-05-02 18:04:10 公開日:2023-04-29

# 機械学習モデルに専門家の判断を組み込む

Incorporating Experts' Judgment into Machine Learning Models ( http://arxiv.org/abs/2304.11870v2 )

ライセンス: Link先を確認

Hogun Park and Aly Megahed and Peifeng Yin and Yuya Ong and Pravar Mahajan and Pei Guo

(参考訳) 機械学習(ML)モデルは、多くのアプリケーションで結果を予測することに成功している。しかし、場合によっては、ドメインの専門家はMLモデルの予測と矛盾する可能性のある期待された結果について判断するかもしれない。この主な理由は、トレーニングデータが完全に人口を表すものではないかもしれないためである。本稿では,専門家の判断を活かして紛争を緩和することを目的とした新しい枠組みを提案する。私たちのフレームワークの背後にある基本的な考え方は、トレーニングデータ内のラベルのないデータポイントの表現度を、生成的な敵ネットワークを用いて最初に決定することです。そして,そのような度合いに基づいて,上記の表現度が高いほど,補正された出力に付加する専門家の直感に重みが小さいほど,その逆であるとする専門家の判断を組み込むことで,「textcolor{black}{machine learning}」モデルの予測を補正する。我々は,合成データと実世界のケーススタディ(ITサービス産業と金融産業のケーススタディ)について,複数の数値実験を行った。その結果,複数の基準法と比較して,予測精度の犠牲を最小限に抑えながら,専門家の判断に非常に近い精度が得られることがわかった。また,予測精度と専門家の判断の近接性を組み合わせた新しい評価指標を開発した。我々のフレームワークは、そのメトリックで評価すると統計的に有意な結果をもたらす。

Machine learning (ML) models have been quite successful in predicting outcomes in many applications. However, in some cases, domain experts might have a judgment about the expected outcome that might conflict with the prediction of ML models. One main reason for this is that the training data might not be totally representative of the population. In this paper, we present a novel framework that aims at leveraging experts' judgment to mitigate the conflict. The underlying idea behind our framework is that we first determine, using a generative adversarial network, the degree of representation of an unlabeled data point in the training data. Then, based on such degree, we correct the \textcolor{black}{machine learning} model's prediction by incorporating the experts' judgment into it, where the higher that aforementioned degree of representation, the less the weight we put on the expert intuition that we add to our corrected output, and vice-versa. We perform multiple numerical experiments on synthetic data as well as two real-world case studies (one from the IT services industry and the other from the financial industry). All results show the effectiveness of our framework; it yields much higher closeness to the experts' judgment with minimal sacrifice in the prediction accuracy, when compared to multiple baseline methods. We also develop a new evaluation metric that combines prediction accuracy with the closeness to experts' judgment. Our framework yields statistically significant results when evaluated on that metric.

翻訳日:2023-05-02 17:57:56 公開日:2023-04-29

# 倫理的・哲学的原則による信頼できる医療人工知能の確保

Ensuring Trustworthy Medical Artificial Intelligence through Ethical and Philosophical Principles ( http://arxiv.org/abs/2304.11530v3 )

ライセンス: Link先を確認

Debesh Jha, Ashish Rauniyar, Abhiskek Srivastava, Desta Haileselassie Hagos, Nikhil Kumar Tomar, Vanshali Sharma, Elif Keles, Zheyuan Zhang, Ugur Demir, Ahmet Topcu, Anis Yazidi, Jan Erik H{\aa}akeg{\aa}rd, and Ulas Bagci

(参考訳) 人工知能(AI)手法は、医療専門家や患者の経験を高めることで、多くの医療に革命をもたらす可能性がある。 aiベースのコンピュータ支援診断ツールは、臨床専門家のレベルに匹敵する能力や性能を発揮できれば、非常に有益である。その結果、先進的な医療サービスは発展途上国では手頃な価格で提供でき、専門医の欠如の問題にも対処できる。 AIベースのツールは、患者の治療の時間、リソース、全体的なコストを節約できる。さらに、人間とは対照的に、AIは大量の入力からデータの複雑な関係を明らかにし、医学における新たなエビデンスベースの知識へと導くことができる。しかし、医療におけるAIの統合は、バイアス、透明性、自律性、責任、説明責任など、いくつかの倫理的および哲学的な懸念を提起する。本稿では、AIを用いた医療画像分析の最近の進歩、既存の標準、および臨床現場におけるAIの応用のための倫理的問題やベストプラクティスを理解することの重要性を強調する。我々は、AIの技術的および倫理的課題と、病院や公共機関にAIを配置することの意味について取り上げる。また、倫理的課題、データ不足、人種的バイアス、透明性の欠如、アルゴリズム的バイアスに対処するための重要な手段と手法についても論じる。最後に、私たちは、医療アプリケーションにおけるAIに関連する倫理的課題に対処するための推奨事項と今後の方向性を提供し、このワークフローをより効率的に、正確で、アクセス可能で、透明で、世界中の患者に信頼できるものにするために、AIを臨床環境にデプロイすることを目的としています。

Artificial intelligence (AI) methods have great potential to revolutionize numerous medical care by enhancing the experience of medical experts and patients. AI based computer-assisted diagnosis tools can have a tremendous benefit if they can outperform or perform similarly to the level of a clinical expert. As a result, advanced healthcare services can be affordable in developing nations, and the problem of a lack of expert medical practitioners can be addressed. AI based tools can save time, resources, and overall cost for patient treatment. Furthermore, in contrast to humans, AI can uncover complex relations in the data from a large set of inputs and even lead to new evidence-based knowledge in medicine. However, integrating AI in healthcare raises several ethical and philosophical concerns, such as bias, transparency, autonomy, responsibility and accountability, which must be addressed before integrating such tools into clinical settings. In this article, we emphasize recent advances in AI-assisted medical image analysis, existing standards, and the significance of comprehending ethical issues and best practices for the applications of AI in clinical settings. We cover the technical and ethical challenges of AI and the implications of deploying AI in hospitals and public organizations. We also discuss promising key measures and techniques to address the ethical challenges, data scarcity, racial bias, lack of transparency, and algorithmic bias. Finally, we provide our recommendation and future directions for addressing the ethical challenges associated with AI in healthcare applications, with the goal of deploying AI into the clinical settings to make the workflow more efficient, accurate, accessible, transparent, and reliable for the patient worldwide.

翻訳日:2023-05-02 17:57:34 公開日:2023-04-29

# 再調査なしの研究: 最大更新パラメトリゼーションはスケールにわたって正確な損失予測をもたらす

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales ( http://arxiv.org/abs/2304.06875v2 )

ライセンス: Link先を確認

Yiqun Yao and Yequan Wang

(参考訳) 言語モデルが拡大するにつれて、小さなモデルの結論が容易に大きなモデルに移行しないため、研究アイデアの検証がますます高価になる。考えられる解決策は、小さなモデルの結果とハイパーパラメータのみに基づいて、大規模モデルのメトリクスを直接予測する汎用システムを確立することである。スケーリングの法則に基づく既存の手法では,最大モデルのハイパーパラメータ探索が必要となる。我々は,最大更新パラメトリゼーション(muP)により,共通損失盆地近傍のハイパーパラメータのスケーリング法則を,探索なしで正確に適合させることができることを示す発見を提示することによって,この問題に対処する。これにより、トレーニング開始前であっても、複数のモデルを直接比較して損失予測を行うことができる。重計算を伴わないモデルスケールの信頼性の高い学術研究への第一歩として,新しいパラダイムを提案する。コードは近々公開される予定だ。

As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that directly predicts some metrics for large models solely based on the results and hyperparameters from small models. Existing methods based on scaling laws require hyperparameter search on the largest models, which is impractical with limited resources. We address this issue by presenting our discoveries indicating that Maximal Update parametrization (muP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation. Code will be publicly available shortly.

翻訳日:2023-05-02 17:55:28 公開日:2023-04-29

# ChartSumm: 長文と短文の自動チャート要約のための総合ベンチマーク

ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries ( http://arxiv.org/abs/2304.13620v2 )

ライセンス: Link先を確認

Raian Rahman, Rizvi Hasan, Abdullah Al Farhad, Md Tahmid Rahman Laskar, Md. Hamjajul Ashmafee, Abu Raihan Mostofa Kamal

(参考訳) テキスト要約への自動チャートは、視覚障害者に有効なツールであり、自然言語による表データの正確な洞察をユーザに提供します。大規模で構造化されたデータセットは、データ駆動モデルにとって常に重要な部分です。本稿では,トータル84,363のチャートからなる大規模ベンチマークデータセットであるchartsummを提案する。強力なベースラインモデルによる大規模な実験は、これらのモデルが様々な自動評価指標で十分なスコアを達成して流動的で情報的な要約を生成するにもかかわらず、しばしば幻覚に苦しむこと、重要なデータポイントを欠いていること、チャートの複雑な傾向の誤った説明といった問題に直面していることを示している。また、自動翻訳ツールを用いてChartSummを他の言語に拡張する可能性についても検討した。これらのデータセットは、将来の研究のための挑戦的なベンチマークになります。

Automatic chart to text summarization is an effective tool for the visually impaired people along with providing precise insights of tabular data in natural language to the user. A large and well-structured dataset is always a key part for data driven models. In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts along with their metadata and descriptions covering a wide range of topics and chart types to generate short and long summaries. Extensive experiments with strong baseline models show that even though these models generate fluent and informative summaries by achieving decent scores in various automatic evaluation metrics, they often face issues like suffering from hallucination, missing out important data points, in addition to incorrect explanation of complex trends in the charts. We also investigated the potential of expanding ChartSumm to other languages using automated translation tools. These make our dataset a challenging benchmark for future research.

翻訳日:2023-05-02 17:46:48 公開日:2023-04-29

# ブロックチェーンの大規模言語モデル

Blockchain Large Language Models ( http://arxiv.org/abs/2304.12749v2 )

ライセンス: Link先を確認

Yu Gai, Liyi Zhou, Kaihua Qin, Dawn Song, Arthur Gervais

(参考訳) 本稿では,異常なブロックチェーントランザクションを検出するための動的,リアルタイムなアプローチを提案する。提案ツールであるblockgptは、ブロックチェーンアクティビティのトレース表現を生成し、大規模な言語モデルをスクラッチからトレーニングすることで、リアルタイム侵入検出システムとして機能する。従来の方法とは異なり、blockgptは制限のない検索空間を提供し、事前定義されたルールやパターンに依存しないように設計されている。本稿では,Ethereumトランザクションの異常検出ツールとしてBlockGPTの有効性を示す。実験では,68万トランザクションのデータセット間の異常なトランザクションを効果的に識別し,バッチ処理のスループットは平均で2284トランザクションである。以上の結果から,BlockGPTは,被害者の契約に係わる最も異常な取引のうち,124件中49件をランク付けし,異常な取引を識別した。この研究は、トランスフォーマーアーキテクチャと互換性のあるカスタムデータエンコーディング、ドメイン固有のトークン化技術、Ethereum仮想マシン(EVM)トレース表現用に特別に開発されたツリーエンコーディングメソッドを導入することで、ブロックチェーントランザクション分析の分野に貢献する。

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 transactions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation.

翻訳日:2023-05-02 17:46:13 公開日:2023-04-29

# カーネルリッジ回帰のためのロバスト・ランダム化プレコンディショニング

Robust, randomized preconditioning for kernel ridge regression ( http://arxiv.org/abs/2304.12465v2 )

ライセンス: Link先を確認

Mateo D\'iaz, Ethan N. Epperly, Zachary Frangella, Joel A. Tropp, and Robert J. Webber

(参考訳) 本稿では,カーネルリッジ回帰(KRR)問題を中～多量のデータポイント(10^4 \leq N \leq 10^7$)で頑健に解くための2つのランダム化プレコンディショニング手法を提案する。最初の方法であるRPCholeskyプレコンディショニングは、カーネル行列固有値の十分速い多項式減衰を仮定して、$O(N^2)$算術演算で全データKRR問題を正確に解くことができる。 2つ目の方法、KRILLプリコンディショニングは、$k \ll N$選択されたデータセンターを$O((N + k^2) k \log k)の演算で制限されたバージョンのKRR問題に対する正確な解決策を提供する。提案手法は,様々なKRR問題を解くとともに,従来のKRRプリコンディショナーの故障モードを克服し,実用化に最適である。

This paper introduces two randomized preconditioning techniques for robustly solving kernel ridge regression (KRR) problems with a medium to large number of data points ($10^4 \leq N \leq 10^7$). The first method, RPCholesky preconditioning, is capable of accurately solving the full-data KRR problem in $O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers an accurate solution to a restricted version of the KRR problem involving $k \ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations. The proposed methods solve a broad range of KRR problems and overcome the failure modes of previous KRR preconditioners, making them ideal for practical applications.

翻訳日:2023-05-02 17:45:52 公開日:2023-04-29

# Sparse Private LASSO Logistic Regression

Sparse Private LASSO Logistic Regression ( http://arxiv.org/abs/2304.12429v2 )

ライセンス: Link先を確認

Amol Khanna, Fred Lu, Edward Raff, Brian Testa

(参考訳) LASSOの正規化ロジスティック回帰は、特に組み込みの機能選択に有用であり、配置から係数を除去し、疎解を生成することができる。 LASSOロジスティック回帰の異なるプライベートバージョンが開発されているが、一般に密度の高い解が生成され、LASSOペナルティの本質的な有用性が低下する。本稿では,硬零点を維持できる分散ロジスティック回帰のための微分プライベート法を提案する。我々の重要な洞察は、まず非プライベートラッソロジスティック回帰モデルを訓練し、最終モデル選択に使用する非零係数の民営化数を決定することである。提案手法の性能を示すため,合成および実世界のデータセットを用いた実験を行った。

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.

翻訳日:2023-05-02 17:45:32 公開日:2023-04-29

# 音声におけるロバストプライバシー保護のための逆表現学習

Adversarial Representation Learning for Robust Privacy Preservation in Audio ( http://arxiv.org/abs/2305.00011v1 )

ライセンス: Link先を確認

Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

(参考訳) 音響イベント検出システムは、監視や環境監視といった様々なアプリケーションで広く使用されており、データは自動的に収集され、処理され、クラウドに送信される。しかし、このプロセスは必然的にユーザーや周囲に関する機密情報を開示し、プライバシー上の懸念を引き起こす可能性がある。本研究では,音声録音の潜在的特徴から音声活動の検出を効果的に防止する,音声録音の表現を学習するための新しい学習手法を提案する。提案手法は,非音声録音と音声分類器では区別できない音声録音の不変な潜在表現を生成するようにモデルを訓練する。私たちの研究の目新しさは最適化アルゴリズムにあり、音声分類器の重みは教師付きで訓練された分類器の重みに定期的に置き換えられる。これにより、対向訓練中に常に音声分類器の識別能力を高め、対向訓練ループの外で訓練された新しい音声分類器を用いても、発話が識別できない潜在表現を生成する動機付けとなる。提案手法は,プライバシ対策が不要なベースラインアプローチと,プライバシ違反がベースラインアプローチに比べて有意に低減する先行的敵訓練手法に対して評価を行う。また,本手法は,本手法では効果的ではないことを示す。

Sound event detection systems are widely used in various applications such as surveillance and environmental monitoring where data is automatically collected, processed, and sent to a cloud for sound recognition. However, this process may inadvertently reveal sensitive information about users or their surroundings, hence raising privacy concerns. In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings. The proposed method trains a model to generate invariant latent representations of speech-containing audio recordings that cannot be distinguished from non-speech recordings by a speech classifier. The novelty of our work is in the optimization algorithm, where the speech classifier's weights are regularly replaced with the weights of classifiers trained in a supervised manner. This increases the discrimination power of the speech classifier constantly during the adversarial training, motivating the model to generate latent representations in which speech is not distinguishable, even using new speech classifiers trained outside the adversarial training loop. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method, demonstrating a significant reduction in privacy violations compared to the baseline approach. Additionally, we show that the prior adversarial method is practically ineffective for this purpose.

翻訳日:2023-05-02 17:36:42 公開日:2023-04-29

# 欠陥量子ビットアレイ上の適応型表面コードの経験的オーバーヘッド

Empirical overhead of the adapted surface code on defective qubit arrays ( http://arxiv.org/abs/2305.00138v1 )

ライセンス: Link先を確認

Sophia Fuhui Lin, Joshua Viszlai, Kaitlin N. Smith, Gokul Subramanian Ravi, Charles Yuan, Frederic T. Chong, Benjamin J. Brown

(参考訳) 固体ハードウェアを用いたフォールトトレラント量子コンピュータの実現には、常に発生する製造のばらつきや欠陥を考慮した量子エラー訂正手順を適用する必要があります。非アドレスの場合、これらのエラーは、量子情報が十分に小さな障害率で処理できるように、システムをスケールすることを妨げます。我々は、任意に分散した欠陥を持つキュービットアレイに適応した表面コードをシミュレートし、欠陥が忠実性に与える影響を特徴づける指標を見つける。次に、フォールトトレラントな量子コンピュータを実現する際のリソースオーバーヘッドに対する欠陥の影響をチップレットベースのモジュラーアーキテクチャで決定する。回路ベースノイズモデルにおいて,非フーティ物理量子ビットの誤差レートが$\sim 0.1\%$であるような論理的故障の指数関数的抑制を示す。これは、欠陥のないsurfaceコードを実行するような典型的な仕組みです。我々は,欠陥チップレットからデバイスを構築するための選択後基準を確立するために,数値結果を用いた。この基準を用いて,論理キュービット当たりの物理キュービットの平均個数の観点から,資源のオーバーヘッドを評価する。欠陥率と目標忠実度に基づいて最適なチップレットサイズを選択することは、欠陥による追加のエラー修正オーバーヘッドを制限するのに不可欠である。最適なチップレットサイズを選択すると、リソースオーバーヘッドが1\%の欠陥率で、使用する2つの欠陥モデルに対してそれぞれ3Xと6X以下に削減され、幅広い目標性能を実現することができる。また、qubitを無効にするか、エラー訂正コードの一部として保持すべきかを特定するのに役立つカットオフ忠実度値を判定する。

The realization of fault-tolerant quantum computers using solid-state hardware will require us to adapt our quantum error correction procedure to account for fabrication variation and defects that will invariably arise. If unaddressed, these errors inhibit us from scaling our system such that quantum information can be processed with sufficiently small failure rates. We simulate the surface code adapted to qubit arrays with arbitrarily distributed defects to find metrics that characterize how defects affect fidelity. We then determine the impact of defects on the resource overhead of realizing a fault-tolerant quantum computer, on a chiplet-based modular architecture. Our strategy for dealing with fabrication defects demonstrates an exponential suppression of logical failure where error rates of non-faulty physical qubits are $\sim 0.1\%$ in a circuit-based noise model. This is a typical regime where we imagine running the defect-free surface code. We use our numerical results to establish post-selection criteria for building a device from defective chiplets. Using our criteria, we then evaluate the resource overhead in terms of the average number of fabricated physical qubits per logical qubit. We find that an optimal choice of chiplet size, based on the defect rate and target fidelity, is essential to limiting any additional error correction overhead due to defects. When the optimal chiplet size is chosen, at a defect rate of $1\%$ the resource overhead can be reduced to below 3X and 6X respectively for the two defect models we use, for a wide range of target performance. We also determine cutoff fidelity values that help identify whether a qubit should be disabled or kept as part of the error correction code.

翻訳日:2023-05-02 17:02:11 公開日:2023-04-29

# 量子空間結合符号

Quantum Spatially-Coupled Codes ( http://arxiv.org/abs/2305.00137v1 )

ライセンス: Link先を確認

Siyi Yang, Robert Calderbank

(参考訳) 空間結合符号 (SC) は畳み込みLDPC符号のクラスであり、高い性能と低遅延デコーダとの互換性により古典的符号化理論においてよく研究されている。古典的2次元空間結合符号(2D-SC)の量子対としてトーリック符号を記述し、一般化として量子空間結合符号(QSC)を導入する。畳み込み構造を用いて、2D-SC符号のパリティチェック行列を2つの不定値の多項式として表現し、2D-SC符号が安定化符号となるために必要な代数的条件を導出する。この代数的フレームワークは、新しいコードファミリの構築を促進する。本稿では,小記憶が量子ビットの物理的接続を容易にし,局所符号化と低遅延ウィンドウの復号化を可能にした点に注目する。本稿では,2D-SC HGP符号のタンナーグラフにおいて,各成分符号の短周期から生じる短周期を最適化するために,代数的フレームワークを用いる。従来の作業では1/10未満のQLDPC符号に重点を置いていたが、2D-SC HGP符号は少ないメモリ、高いレート(約1/3)、優れた閾値で構築した。

Spatially-coupled (SC) codes is a class of convolutional LDPC codes that has been well investigated in classical coding theory thanks to their high performance and compatibility with low-latency decoders. We describe toric codes as quantum counterparts of classical two-dimensional spatially-coupled (2D-SC) codes, and introduce quantum spatially-coupled (QSC) codes as a generalization. We use the convolutional structure to represent the parity check matrix of a 2D-SC code as a polynomial in two indeterminates, and derive an algebraic condition that is both necessary and sufficient for a 2D-SC code to be a stabilizer code. This algebraic framework facilitates the construction of new code families. While not the focus of this paper, we note that small memory facilitates physical connectivity of qubits, and it enables local encoding and low-latency windowed decoding. In this paper, we use the algebraic framework to optimize short cycles in the Tanner graph of 2D-SC HGP codes that arise from short cycles in either component code. While prior work focuses on QLDPC codes with rate less than 1/10, we construct 2D-SC HGP codes with small memory, higher rates (about 1/3), and superior thresholds.

翻訳日:2023-05-02 17:01:41 公開日:2023-04-29

# ベストサポート環境の提供によるAI開発プロセスの最適化

Optimizing the AI Development Process by Providing the Best Support Environment ( http://arxiv.org/abs/2305.00136v1 )

ライセンス: Link先を確認

Taha Khamis, Hamam Mokayed

(参考訳) 本研究の目的は,AI(Artificial Inelegance)と機械学習(ML)アプリケーションの開発プロセスを調査し,最高のサポート環境を提供することである。 MLの主なステージは、問題理解、データ管理、モデル構築、モデル展開、メンテナンスである。本研究は,機械学習開発の最重要段階であるML開発におけるデータ管理段階とその障害を,エンドモデルの精度がモデルに入力されるデータの種類に依存しているため調査することに焦点を当てる。この段階で見つかった最大の障害は、特にデータが機密である分野において、モデル学習に十分なデータがないことである。このプロジェクトの目的は、データ管理の段階で十分なデータ不足を解決するための、研究者と開発者のためのフレームワークの構築と開発である。このフレームワークは、オリジナルのデータセットから新しいデータを生成するために使用可能な、いくつかのデータ拡張技術を利用して、利用可能なデータ量と品質を増大させることで、MLアプリケーションの全体的なパフォーマンスを向上させることができる。このフレームワークはpython言語を使用して構築され、ディープラーニングの進歩を使ってデータ拡張を行う。

The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements.

翻訳日:2023-05-02 17:01:20 公開日:2023-04-29

# 関節センシング、コミュニケーション、ai : 回復力のあるthzユーザエクスペリエンスのためのtrifecta

Joint Sensing, Communication, and AI: A Trifecta for Resilient THz User Experiences ( http://arxiv.org/abs/2305.00135v1 )

ライセンス: Link先を確認

Christina Chaccour, Walid Saad, Merouane Debbah, and H. Vincent Poor

(参考訳) 本稿では,テラヘルツ(THz)無線システムに対する拡張現実(XR)体験を最適化するために,新しい共同センシング,通信,人工知能(AI)フレームワークを提案する。提案フレームワークは3つの主要コンポーネントで構成されている。まず、THzチャネルの間隔を利用して、XRユーザとその環境に対するユニークな検知パラメータを抽出するテンソル分解フレームワークを提案する。本質的には、thzバンドの準光学性を活用し、アップリンク通信信号からセンシングパラメータを抽出することにより、通信機能とセンシング機能の両方に同じ波形、スペクトル、ハードウェアを使用できる。そして、クラーラオ下限が導出され、推定されたセンシングパラメータの精度が評価される。第2に,非自己回帰型多解像度生成人工知能(AI)フレームワークと対向変換器を統合したフレームワークを提案する。提案フレームワークは, ユーザ行動と環境条件の両方において, ゆらぎに一般化可能な, 堅牢かつ包括的な歴史的センシング情報と将来の環境変化予測を提供する。第3に、再構成可能なインテリジェントサーフェス(RIS)サブアレイのハンドオーバポリシを制御し、センサ情報の情報的特性を活用してハンドオーバコストを最小化し、個人的体験(QoPE)の質を最大化し、THzリンクの堅牢性とレジリエンスを向上させるために、マルチエージェント深部学習型Qニューラルネットワークを開発した。シミュレーションの結果,提案する非教師付き生成型aiフレームワークのユーザ行動と速度の変動に対する高い一般化性を示し,既知のチャネル状態情報を持つスキームと比較して,瞬時信頼性が61%向上した。

In this paper a novel joint sensing, communication, and artificial intelligence (AI) framework is proposed so as to optimize extended reality (XR) experiences over terahertz (THz) wireless systems. The proposed framework consists of three main components. First, a tensor decomposition framework is proposed to extract unique sensing parameters for XR users and their environment by exploiting then THz channel sparsity. Essentially, THz band's quasi-opticality is exploited and the sensing parameters are extracted from the uplink communication signal, thereby allowing for the use of the same waveform, spectrum, and hardware for both communication and sensing functionalities. Then, the Cramer-Rao lower bound is derived to assess the accuracy of the estimated sensing parameters. Second, a non-autoregressive multi-resolution generative artificial intelligence (AI) framework integrated with an adversarial transformer is proposed to predict missing and future sensing information. The proposed framework offers robust and comprehensive historical sensing information and anticipatory forecasts of future environmental changes, which are generalizable to fluctuations in both known and unforeseen user behaviors and environmental conditions. Third, a multi-agent deep recurrent hysteretic Q-neural network is developed to control the handover policy of reconfigurable intelligent surface (RIS) subarrays, leveraging the informative nature of sensing information to minimize handover cost, maximize the individual quality of personal experiences (QoPEs), and improve the robustness and resilience of THz links. Simulation results show a high generalizability of the proposed unsupervised generative AI framework to fluctuations in user behavior and velocity, leading to a 61 % improvement in instantaneous reliability compared to schemes with known channel state information.

翻訳日:2023-05-02 17:01:01 公開日:2023-04-29

# LD-GAN:可変規則化を用いたスペクトル画像生成のための低次元生成逆ネットワーク

LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization ( http://arxiv.org/abs/2305.00132v1 )

ライセンス: Link先を確認

Emmanuel Martinez, Roman Jacome, Alejandra Hernandez-Rojas and Henry Arguello

(参考訳) ディープラーニング法はスペクトル画像(SI)計算タスクの最先端技術である。しかし、これらの手法は高いコストと長い取得時間のために利用可能なデータセットが制限されているため、性能に制約がある。通常、データの欠如を軽減するためにデータ拡張技術が使用される。幾何学的変換のような古典的拡張法を超越したganは、データ分布から学習およびサンプリングすることで多様な拡張を可能にする。しかしながら、この種のデータの高次元性はGANトレーニングの収束を妨げるため、GANベースのSI生成は困難である。この制限を克服するため、我々は、事前訓練されたオートエンコーダネットワークの潜伏空間と低次元のデータベース表現を用いた低次元GAN(LD-GAN)を提案する。これにより,事前学習したデコーダネットワークを用いてsi次元にマッピングした新しい低次元サンプルを生成する。さらに,自動エンコーダ訓練のための低次元表現分散を制御し,GANで生成されたサンプルの多様性を達成するための統計正規化を提案する。圧縮スペクトル画像, SI超解像, RBGにおけるデータ拡張戦略としてLD-GAN法を検証し, それぞれ0.5から1[dB]に改善した。我々は,非データ強化トレーニングである従来のDAとの比較を行い,全サイズのSIを生成するための調整および訓練を行った。本論文のコードはhttps://github.com/romanjacome99/LD_GAN.gitにある。

Deep learning methods are state-of-the-art for spectral image (SI) computational tasks. However, these methods are constrained in their performance since available datasets are limited due to the highly expensive and long acquisition time. Usually, data augmentation techniques are employed to mitigate the lack of data. Surpassing classical augmentation methods, such as geometric transformations, GANs enable diverse augmentation by learning and sampling from the data distribution. Nevertheless, GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation. To surmount this limitation, we propose low-dimensional GAN (LD-GAN), where we train the GAN employing a low-dimensional representation of the {dataset} with the latent space of a pretrained autoencoder network. Thus, we generate new low-dimensional samples which are then mapped to the SI dimension with the pretrained decoder network. Besides, we propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN. We validate our method LD-GAN as data augmentation strategy for compressive spectral imaging, SI super-resolution, and RBG to spectral tasks with improvements varying from 0.5 to 1 [dB] in each task respectively. We perform comparisons against the non-data augmentation training, traditional DA, and with the same GAN adjusted and trained to generate the full-sized SIs. The code of this paper can be found in https://github.com/romanjacome99/LD_GAN.git

翻訳日:2023-05-02 17:00:27 公開日:2023-04-29

# 構造制約による教師なしドメイン適応のための正規化自己学習

Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints ( http://arxiv.org/abs/2305.00131v1 )

ライセンス: Link先を確認

Rajshekhar Das, Jonathan Francis, Sanket Vaibhav Mehta, Jean Oh, Emma Strubell, Jose Moura

(参考訳) 擬似ラベルに基づく自己学習は、意味的セグメンテーション問題に対する教師なしドメイン適応(UDA)における条件分布シフトに対処する主要なアプローチとして現れてきた。しかし、注目すべき欠点は、このアプローチのファミリーが、ソースドメインのバイアスの確認から生じ、ターゲットドメインの迷惑要因として現れる誤った擬似ラベルに影響を受けやすいことである。このミスマッチの原因は、RGB画像入力によって提供される測光キューのみに依存するため、最終的には準最適適応につながる可能性がある。擬似ラベルのミスマッチ効果を軽減するため,従来の自己学習目標を正規化するために,奥行きなどの補助的モーダルから構造的手がかりを取り入れることを提案する。具体的には、異なるオブジェクトカテゴリを分割しながら、オブジェクトインスタンスの領域内のピクセル表現を近くまで引っ張る、対照的なピクセルレベルのオブジェクト性制約を導入する。真の基礎となる対象と整合する対象領域を得るため,マルチモーダルクラスタリングという形で深度マップとRGB画像の両方から情報を抽出する。重要なことに、対象性制約は基幹構造的ラベルに依存しないため、教師なしドメイン適応に適している。本研究では, セマンティックセグメンテーションのためのUDAベンチマークにおいて, セマンティックセグメンテーションにおいて, 最上位の自己学習法(最大2ドルポイント)を著しく改善することを示す。補足にすべてのコードを含めます。

Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in the target domain. A possible source for this mismatch is the reliance on only photometric cues provided by RGB image inputs, which may ultimately lead to sub-optimal adaptation. To mitigate the effect of mismatched pseudo-labels, we propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives. Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer, while pushing those from different object categories apart. To obtain object regions consistent with the true underlying object, we extract information from both depth maps and RGB-images in the form of multimodal clustering. Crucially, the objectness constraint is agnostic to the ground-truth semantic labels and, hence, appropriate for unsupervised domain adaptation. In this work, we show that our regularizer significantly improves top performing self-training methods (by up to $2$ points) in various UDA benchmarks for semantic segmentation. We include all code in the supplementary.

翻訳日:2023-05-02 17:00:02 公開日:2023-04-29

# ViewFormer:多視点3次元形状理解のためのビューセット注意

ViewFormer: View Set Attention for Multi-view 3D Shape Understanding ( http://arxiv.org/abs/2305.00161v1 )

ライセンス: Link先を確認

Hongyu Sun, Yongcai Wang, Peng Wang, Xudong Cai, Deying Li

(参考訳) 本稿では,多次元形状認識と検索のための簡易かつ効果的なモデルであるViewFormerを提案する。マルチビュー情報を集約する既存の手法を体系的に検討し,ビューに関する関係仮定を最小化し,表現の自由度を解放する,新しい「ビューセット」視点を提案する。我々は、ビューセット内の要素のペアワイズおよび高次相関を捉えるための適応的注意モデルを作成する。学習されたマルチビュー相関は、認識および検索のための表現型ビューセット記述子に集約される。実験では、異なるタスクやデータセットにまたがる驚くべき機能を解き放つ方法を示した。例えば、2つのアテンションブロックと4.8mの学習可能なパラメータを持つviewformerは、modelnet40で初めて98.8%の認識精度に達し、以前のベストメソッドを1.1%上回った。難易度の高いRGBDデータセットでは、98.4%の認識精度が達成され、最強のベースラインに対して4.1%の絶対改善が達成された。 ViewFormerはまた、SHREC'17ベンチマークで定義された3次元形状検索のいくつかの評価次元で新しいレコードを設定する。

This paper presents ViewFormer, a simple yet effective model for multi-view 3d shape recognition and retrieval. We systematically investigate the existing methods for aggregating multi-view information and propose a novel ``view set" perspective, which minimizes the relation assumption about the views and releases the representation flexibility. We devise an adaptive attention model to capture pairwise and higher-order correlations of the elements in the view set. The learned multi-view correlations are aggregated into an expressive view set descriptor for recognition and retrieval. Experiments show the proposed method unleashes surprising capabilities across different tasks and datasets. For instance, with only 2 attention blocks and 4.8M learnable parameters, ViewFormer reaches 98.8% recognition accuracy on ModelNet40 for the first time, exceeding previous best method by 1.1% . On the challenging RGBD dataset, our method achieves 98.4% recognition accuracy, which is a 4.1% absolute improvement over the strongest baseline. ViewFormer also sets new records in several evaluation dimensions of 3D shape retrieval defined on the SHREC'17 benchmark.

翻訳日:2023-05-02 16:50:46 公開日:2023-04-29

# ランダムな特徴を持つグラフカーネルの改ざん

Taming graph kernels with random features ( http://arxiv.org/abs/2305.00156v1 )

ライセンス: Link先を確認

Krzysztof Choromanski

(参考訳) 本稿では,グラフランダム特徴(GRF)のメカニズムを紹介する。 GRFはグラフのノード上で定義されたいくつかの重要なカーネル、特に正規化されたラプラシアカーネルの非バイアスランダム化推定器を構築するのに使うことができる。非グラフカーネルの通常のRFとして、グラフ上で定義されたカーネルメソッドをより大きなネットワークにスケールアップする手段を提供する。重要なのは、下流のアプリケーションに適用しながら、より小さなグラフに対してもかなりの計算量が得られることだ。その結果、GRFはグラフカーネルアルゴリズムの3乗(グラフのノード数)時間複雑性の非常に難しい問題に対処する。速度テストからフロベニウス相対誤差解析から、グラフカーネルを用いたkmeansグラフクラスタリングまで、幅広い経験的評価を行った。 GRFの計算は、考慮中のグラフを複数のマシンに分割する必要がある場合に適用可能な、恥ずかしいほど単純な分散アルゴリズムを許容していることを示す。我々はまた、GRFの分散を最適化するために用いられる、いわゆる強化ランダムウォークに依存する(まだバイアスのない)準モンテカルロ変種 q-GRF も導入する。副産物として、正および対称行列を持つ線型方程式のある種のクラスを解く新しいアプローチを得る。

We introduce in this paper the mechanism of graph random features (GRFs). GRFs can be used to construct unbiased randomized estimators of several important kernels defined on graphs' nodes, in particular the regularized Laplacian kernel. As regular RFs for non-graph kernels, they provide means to scale up kernel methods defined on graphs to larger networks. Importantly, they give substantial computational gains also for smaller graphs, while applied in downstream applications. Consequently, GRFs address the notoriously difficult problem of cubic (in the number of the nodes of the graph) time complexity of graph kernels algorithms. We provide a detailed theoretical analysis of GRFs and an extensive empirical evaluation: from speed tests, through Frobenius relative error analysis to kmeans graph-clustering with graph kernels. We show that the computation of GRFs admits an embarrassingly simple distributed algorithm that can be applied if the graph under consideration needs to be split across several machines. We also introduce a (still unbiased) quasi Monte Carlo variant of GRFs, q-GRFs, relying on the so-called reinforced random walks, that might be used to optimize the variance of GRFs. As a byproduct, we obtain a novel approach to solve certain classes of linear equations with positive and symmetric matrices.

翻訳日:2023-05-02 16:50:28 公開日:2023-04-29

# 学習から学びへ:非確率的外乱に反するマルチエージェントのオンラインソース

Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances ( http://arxiv.org/abs/2305.00154v1 )

ライセンス: Link先を確認

Bin Du and Kun Qian and Christian Claudel and Dengfeng Sun

(参考訳) 本稿では,新しい学習手法を活用し,未知環境下でのマルチエージェントオンライン検索アルゴリズムを提案する。問題設定における特に重要な点は一基礎となる環境は、未知だけでなく、動的に変化し、二種類の非確率的障害に悩まされていること。二エージェントの集団が配置され、できるだけ多くの情報源を協力的に探究することが期待されていること。そこで,非確率的障害に対処するために,割引カルマンフィルタの新たな手法を開発し,ポリトープの性質に結びついた信頼感の概念を用いて,マルチプルエージェント間の計算効率のよい協調を支援する。未知の環境と乱れに関する標準的な仮定により、我々のアルゴリズムは2種類の非確率的乱れのタイプの下で線形的後悔を達成し、どちらも最先端のものと同等である。本手法の有効性を示すために,実環境汚染モニタリングアプリケーションの数値例を示した。

This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.

翻訳日:2023-05-02 16:50:07 公開日:2023-04-29

# 転校学習におけるモデル選択の限界

Limits of Model Selection under Transfer Learning ( http://arxiv.org/abs/2305.00152v1 )

ライセンス: Link先を確認

Steve Hanneke, Samory Kpotufe, Yasaman Mahdaviyeh

(参考訳) 転送学習やドメイン適応に関する理論的研究はこれまで、既知の仮説クラスやモデルでの状況に焦点を当ててきたが、実際には、いくつかのモデル選択は、通常、ハイパーパラメータチューニング(hyperparameter-tuning)という包括的用語の下に現れる。現在、モデル選択に関わる近似と推定誤差の通常のトレードオフに加えて、この問題は新たな複雑性項、すなわち、ソースとターゲットの分布間の移動距離が仮説クラスの選択によって異なることが知られている。特に、分析によって注目すべき現象が明らかになる: 適応率、すなわち、分布情報を持たないもの、すなわち、距離に関する知識が与えられたとき、oracleの速度よりも任意に遅い可能性がある。

Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.

翻訳日:2023-05-02 16:49:52 公開日:2023-04-29

# X線認識:コントラスト目的を用いたX線からの患者識別

X-ray Recognition: Patient identification from X-rays using a contrastive objective ( http://arxiv.org/abs/2305.00149v1 )

ライセンス: Link先を確認

Hao Liang, Kevin Ni, Guha Balakrishnan

(参考訳) 近年の研究では、深層学習モデルは患者の胸部X線(CXR)から生体情報(人種、性別、年齢など)を正確に抽出できることが示されている。本稿ではさらに,同一患者に属するcxrと異なる患者に属するcxrとの識別において,ディープラーニングモデルが驚くほど正確であることを示す。これらの結果は、医療画像コミュニティが大規模なCXRデータベースの普及に関して考慮すべき潜在的なプライバシー上の配慮を示唆している。

Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases.

翻訳日:2023-05-02 16:49:13 公開日:2023-04-29

# GANを用いた胸部X線データセットバイアスの可視化

Visualizing chest X-ray dataset biases using GANs ( http://arxiv.org/abs/2305.00147v1 )

ライセンス: Link先を確認

Hao Liang, Kevin Ni, Guha Balakrishnan

(参考訳) 最近の研究では、様々な胸部X線データセットの画像には、人種や性別といった保護された人口特性と強く相関する視覚的特徴が含まれていることが示されている。これらの要因のいくつかは臨床予測のために下流アルゴリズムによって使用される可能性があるため、この発見は公平性の問題を提起する。本研究では,2つの層群に属するX線に最も異なる特徴を可視化するために,GAN(Generative Adversarial Network)を用いたフレームワークを提案する。

Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups.

翻訳日:2023-05-02 16:49:04 公開日:2023-04-29

# 量子型理論とアプリケーション、モデル、アルゴリズム、コンパイル、エラー訂正のキャズムを統合する

Integrating Across Application, Model, Algorithm, Compilation, and Error Correction Chasms With Quantum Type Theory ( http://arxiv.org/abs/2305.00144v1 )

ライセンス: Link先を確認

Eugene Dumitrescu

(参考訳) 本稿では,量子型理論の現状と今後の計算的意味について概説する。

We briefly discuss the current state, and future computational implications, of quantum type theory.

翻訳日:2023-05-02 16:48:57 公開日:2023-04-29

# 連続予測2サンプルと独立試験

Sequential Predictive Two-Sample and Independence Testing ( http://arxiv.org/abs/2305.00143v1 )

ライセンス: Link先を確認

Aleksandr Podkopaev, Aaditya Ramdas

(参考訳) 逐次非パラメトリック2サンプルと独立テストの問題点について検討する。シーケンシャルテストはデータをオンラインで処理し、観測データを使用してヌル仮説を停止または拒否するか、タイプiのエラーコントロールを維持しながらより多くのデータを収集するかを決定する。我々は賭けによる(非パラメトリックな)テストの原理に基づいており、ギャンブラーは将来の観測に賭け、その富はヌル仮説に対する証拠を測定する。最近開発されたカーネルベースのベッティング戦略は、単純な分布でよく機能するが、テキストや画像のような高次元または構造化データに適したカーネルを選択することは、しばしば簡単ではない。この欠点に対処するために、我々は次の事実に依存する予測ベースの賭け戦略を設計する。 (a) インスタンスが引き出されるもの,又は b) インスタンスがジョイント分布またはマージン分布の積から引き出されるか(後者は外部ランダム化によって生成される)、それぞれ2つのサンプルまたは独立ヌルに対する証拠を提供する。構造化された設定下でのカーネルベースのアプローチよりもテストが優れていることを実証的に示す。我々のテストは、独立で同一の分散データ以外に適用でき、データ分布が時間とともにドリフトしても有効で強力なままです。

We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as text and images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time.

翻訳日:2023-05-02 16:48:54 公開日:2023-04-29

# 3ドルのケメニー問題に対する空間削減技術

Space reduction techniques for the $3$-wise Kemeny problem ( http://arxiv.org/abs/2305.00140v1 )

ライセンス: Link先を確認

Xuan Kien Phung and Sylvie Hamel

(参考訳) ケメニーの法則は、計算社会選択と生物学に様々な重要な応用がある最も研究されよく知られた投票方式の1つである。近年、ケメニーの法則はギルバートらによる集合的アプローチによって一般化された。アルこのパラダイムに従い、我々は \cite{phung-hamel-2023} において、3ドルのケンドール-タウ距離によって引き起こされる3ドルのケメニー投票スキームが古典的なケメニー規則と比較して興味深い利点を示していることを示した。投票プロファイルの3ドルのコンセンサスランキングを計算することからなる3ドルのkemeny問題はnp-hardであるが、本論文では、従来のkemenyルールに対するcite{milosz-hamel-2020} で得られた主要な次数定理のいくつかの一般化を、多項式時間で相対次数を効率的に決定することにより、実質的な検索空間削減を達成するための3ドルのkemeny投票スキームのために確立する。本質的には、我々の定理は、選挙において別の選択肢よりも別の選択肢の選好が十分強く、また1つまたは2つの選択肢を考慮しても十分強い場合、これら2つの選択肢の相対順序が3$のコンセンサスランキングで期待通りである、という非自明な性質を正確に定量化する。さらに、古典的なケメニー規則に対するベツラーらの有名な3/4ドルのマジョリティルールは、3ドルのケメニー・スキームに関して5ドル以下の選択肢を持たない選挙に対してのみ有効であることを示す。 3ドルのkemenyルールは、古典的なルールよりも操作に抵抗があることを示す例もある。

Kemeny's rule is one of the most studied and well-known voting schemes with various important applications in computational social choice and biology. Recently, Kemeny's rule was generalized via a set-wise approach by Gilbert et. al. Following this paradigm, we have shown in \cite{Phung-Hamel-2023} that the $3$-wise Kemeny voting scheme induced by the $3$-wise Kendall-tau distance presents interesting advantages in comparison with the classical Kemeny rule. While the $3$-wise Kemeny problem, which consists of computing the set of $3$-wise consensus rankings of a voting profile, is NP-hard, we establish in this paper several generalizations of the Major Order Theorems, as obtained in \cite{Milosz-Hamel-2020} for the classical Kemeny rule, for the $3$-wise Kemeny voting scheme to achieve a substantial search space reduction by efficiently determining in polynomial time the relative orders of pairs of alternatives. Essentially, our theorems quantify precisely the non-trivial property that if the preference for an alternative over another one in an election is strong enough, not only in the head-to-head competition but even when taking into consideration one or two more alternatives, then the relative order of these two alternatives in every $3$-wise consensus ranking must be as expected. Moreover, we show that the well-known $3/4$-majority rule of Betzler et al. for the classical Kemeny rule is only valid for elections with no more than $5$ alternatives with respect to the $3$-wise Kemeny scheme. Examples are also provided to show that the $3$-wise Kemeny rule is more resistant to manipulation than the classical one.

翻訳日:2023-05-02 16:48:32 公開日:2023-04-29

# グラフニューラルネットワークにおけるノード分類のためのラベル非均一性の活用

Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks ( http://arxiv.org/abs/2305.00139v1 )

ライセンス: Link先を確認

Feng Ji and See Hian Lee and Hanyang Meng and Kai Zhao and Jielong Yang and Wee Peng Tay

(参考訳) グラフニューラルネットワーク(GNN)を用いたノード分類では、典型的なモデルは各ノードで異なるクラスラベルのログを生成する。ソフトマックス層はしばしば最大のロジットに基づいてラベル予測を出力する。これらのロジットを用いてデータセットから隠れたグラフ構造情報を推測できることを実証する。本稿では,ロジットのソフトマックス分布と均一分布との間のワッサーシュタイン距離から導かれるラベル非均一性の鍵となる概念を紹介する。ラベルの不均一性の低いノードを正しく分類することは困難である。我々は,ラベルの非一様性がグラフ全体でどのように変化するのかを理論的に分析し,モデル性能の向上に関する洞察を与える: トレーニングサンプルを高一様性で増加させるか,あるいは、エッジを落として、小さな非一様性のノードセットの最大カットサイズを小さくする。これらのメカニズムはベースGNNモデルに簡単に追加できる。実験により,多くのベンチマークベースモデルの性能向上が示された。

In node classification using graph neural networks (GNNs), a typical model generates logits for different class labels at each node. A softmax layer often outputs a label prediction based on the largest logit. We demonstrate that it is possible to infer hidden graph structural information from the dataset using these logits. We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution. We demonstrate that nodes with small label non-uniformity are harder to classify correctly. We theoretically analyze how the label non-uniformity varies across the graph, which provides insights into boosting the model performance: increasing training samples with high non-uniformity or dropping edges to reduce the maximal cut size of the node set of small non-uniformity. These mechanisms can be easily added to a base GNN model. Experimental results demonstrate that our approach improves the performance of many benchmark base models.

翻訳日:2023-05-02 16:47:55 公開日:2023-04-29

# 線形回帰のためのデータ駆動サブグループ同定

Data-Driven Subgroup Identification for Linear Regression ( http://arxiv.org/abs/2305.00195v1 )

ライセンス: Link先を確認

Zachary Izzo, Ruishan Liu, James Zou

(参考訳) 医学研究はしばしば、それぞれの共変量と統計的信頼度尺度による結果の関係を抽出する必要がある。これを実現するために、単純なパラメトリックモデルは頻繁に使用される(例えば線形回帰係数)が、通常はデータセット全体に適合する。しかし、共変量体が全集団に対して一様効果を持たず、従って統一された単純なモデルが異種信号を見逃すことはよくある。例えば、線形モデルはデータのサブセットを説明することができるが、データの非線形性と不均一性のために残りの部分で失敗することがある。本稿では,データ中の部分群を特徴とラベル間の一様線形関係で効果的に識別するデータ駆動手法であるddgroup(data-driven group discovery)を提案する。 DDGroupは線形モデルが保持されるであろう解釈可能な領域を出力する。簡単に実装でき、計算処理も可能である。理論的には, 十分なサンプルを与えられたddgroupは, 低分散の1つの線形モデルが十分に特定された領域を回復し, 実世界の医療データセット実験により, 局所線形モデルの性能が向上した領域を発見できることを確認した。実験の結果,DDGroupはデータセット全体にパラメトリックなアプローチを適用するだけで,質的に異なる関係を持つサブグループを発見できることがわかった。

Medical studies frequently require to extract the relationship between each covariate and the outcome with statistical confidence measures. To do this, simple parametric models are frequently used (e.g. coefficients of linear regression) but usually fitted on the whole dataset. However, it is common that the covariates may not have a uniform effect over the whole population and thus a unified simple model can miss the heterogeneous signal. For example, a linear model may be able to explain a subset of the data but fail on the rest due to the nonlinearity and heterogeneity in the data. In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. DDGroup outputs an interpretable region in which the linear model is expected to hold. It is simple to implement and computationally tractable for use. We show theoretically that, given a large enough sample, DDGroup recovers a region where a single linear model with low variance is well-specified (if one exists), and experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. Our experiments also show that DDGroup can uncover subgroups with qualitatively different relationships which are missed by simply applying parametric approaches to the whole dataset.

翻訳日:2023-05-02 16:42:14 公開日:2023-04-29

# 領域からポイントへの探索:セマンティック・ジオメトリ複合機能マッチングのための階層的フレームワーク

Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching ( http://arxiv.org/abs/2305.00194v1 )

ライセンス: Link先を確認

Yesheng Zhang, Xu Zhao, Dahong Qian

(参考訳) 特徴マッチングはコンピュータビジョンにおいて重要な技術である。本質的には、画像間の対応を確立するための探索問題と見なすことができる。このタスクにおける重要な課題は、明確に定義された検索空間の欠如であり、現在のメソッドの不正確なポイントマッチングにつながる。本稿では,適切なマッチング検索空間を求めて,まず画像間の意味的領域マッチング(a2pm)を探索し,次に領域マッチングを行う階層的特徴マッチングフレームワークを提案する。 A2PMフレームワークの適切な検索空間は、最先端のTransformerベースのマッチング手法の精度の制限を緩和する。この枠組みを実現するために、画像間の正確な領域マッチングを確立するために、意味的前後整合性と幾何学的一貫性を利用した意味的・幾何学的領域マッチング(sgam)手法を提案する。 SGAMとオフザシェルトランスフォーマーベースのマーカを組み合わせることで,A2PMフレームワークを取り入れた特徴マッチング手法により,大規模点マッチングの精度向上と,現在の美術品のポーズ推定実験を実現する。

Feature matching is a crucial technique in computer vision. Essentially, it can be considered as a searching problem to establish correspondences between images. The key challenge in this task lies in the lack of a well-defined search space, leading to inaccurate point matching of current methods. In pursuit of a reasonable matching search space, this paper introduces a hierarchical feature matching framework: Area to Point Matching (A2PM), to first find semantic area matches between images, and then perform point matching on area matches, thus setting the search space as the area matches with salient features to achieve high matching precision. This proper search space of A2PM framework also alleviates the accuracy limitation in state-of-the-art Transformer-based matching methods. To realize this framework, we further propose Semantic and Geometry Area Matching (SGAM) method, which utilizes semantic prior and geometry consistency to establish accurate area matches between images. By integrating SGAM with off-the-shelf Transformer-based matchers, our feature matching methods, adopting the A2PM framework, achieve encouraging precision improvements in massive point matching and pose estimation experiments for present arts.

翻訳日:2023-05-02 16:41:51 公開日:2023-04-29

# 血管構造の異常観察のためのリアルタイム表面静脈イメージングシステム

Real-Time Superficial Vein Imaging System for Observing Abnormalities on Vascular Structures ( http://arxiv.org/abs/2305.00189v1 )

ライセンス: Link先を確認

Ayse Altay, Abdurrahman Gumus

(参考訳) 循環系異常は疾患や組織障害の指標である。血管異常の早期発見は治療中に重要な役割を担い、また患者の意識を高める可能性がある。血管画像の現在の検出方法は、高価で侵襲的で、主に放射線によるものである。本研究では,近赤外(NIR)表面血管イメージング装置として,低コストでポータブルなマイクロコンピュータベースのツールを開発した。デバイスは850nmのnir発光ダイオード(led)光と他の電子部品と光学部品を使用する。非接触で安全な赤外線イメージング(IR)をリアルタイムで行う。画像および映像解析は、主にコンピュータビジョンで使用されるプログラミング関数のライブラリであるopencv(open-source computer vision)を用いて行われる。撮像システムを最適化し、適切な外部環境を構築するために様々な試験が行われた。血液中のグルコース濃度の上昇による変形の可能性から血管構造に異常があると思われる3人の糖尿病ボランティアの画像を,非糖尿病ボランティアの2人の画像と比較した。その結果, 表面血管構造においてtortuosityが良好に観察され, 基礎的理由を理解するためには, 現場の医療専門家による解釈が必要である。本研究は, 工学的な研究であり, 疾患を診断する意図はないが, 血管構造の早期診断, 治療フォローアップにおいて医療従事者を支援し, さらなる機会を期待できる。

Circulatory system abnormalities might be an indicator of diseases or tissue damage. Early detection of vascular abnormalities might have an important role during treatment and also raise the patient's awarenes. Current detection methods for vascular imaging are high-cost, invasive, and mostly radiation-based. In this study, a low-cost and portable microcomputer-based tool has been developed as a near-infrared (NIR) superficial vascular imaging device. The device uses NIR light-emitting diode (LED) light at 850 nm along with other electronic and optical components. It operates as a non-contact and safe infrared (IR) imaging method in real-time. Image and video analysis are carried out using OpenCV (Open-Source Computer Vision), a library of programming functions mainly used in computer vision. Various tests were carried out to optimize the imaging system and set up a suitable external environment. To test the performance of the device, the images taken from three diabetic volunteers, who are expected to have abnormalities in the vascular structure due to the possibility of deformation caused by high glucose levels in the blood, were compared with the images taken from two non-diabetic volunteers. As a result, tortuosity was observed successfully in the superficial vascular structures, where the results need to be interpreted by the medical experts in the field to understand the underlying reasons. Although this study is an engineering study and does not have an intention to diagnose any diseases, the developed system here might assist healthcare personnel in early diagnosis and treatment follow-up for vascular structures and may enable further opportunities.

翻訳日:2023-05-02 16:41:33 公開日:2023-04-29

# 整数線形計画法の局所探索

Local Search for Integer Linear Programming ( http://arxiv.org/abs/2305.00188v1 )

ライセンス: Link先を確認

Peng Lin, Shaowei Cai, Mengchuan Zou, Jinkun Lin

(参考訳) 整数線形プログラミングは、様々な実用的な組合せ最適化問題をモデル化し、産業や管理分野に大きな影響を与えている。本研究では,大規模不均一問題データセット上で検証可能な一般整数線形計画のための,最初の単独局所探索ソルバを開発した。本研究では,検索モード,改善モード,復元モードの3つのモードに切り替えるローカル検索フレームワークを提案する。探索・復元モードについては,制約を厳格にしようとする変数の値を適応的に修正する,tight moveという演算子を提案する。改良モードでは, 有効性を維持しつつ, 目的関数の品質向上を図るために, 効率的な昇降動作が提案されている。これらを組み合わせることで、ローカルILPと呼ばれる整数線形プログラミングのための局所探索解法を開発する。 MIPLIBデータセットで行った実験は,大規模ハード整数線形計画問題の解法の有効性を合理的に短時間で示すものである。ローカルILPは最先端の商用ソルバであるGurobiと競合し相補的であり、最先端の非商用ソルバSCIPを著しく上回っている。さらに,6つのMIPLIBオープンインスタンスの新たなレコードを確立する。

Integer linear programming models a wide range of practical combinatorial optimization problems and has significant impacts in industry and management sectors. This work develops the first standalone local search solver for general integer linear programming validated on a large heterogeneous problem dataset. We propose a local search framework that switches in three modes, namely Search, Improve, and Restore modes, and design tailored operators adapted to different modes, thus improve the quality of the current solution according to different situations. For the Search and Restore modes, we propose an operator named tight move, which adaptively modifies variables' values trying to make some constraint tight. For the Improve mode, an efficient operator lift move is proposed to improve the quality of the objective function while maintaining feasibility. Putting these together, we develop a local search solver for integer linear programming called Local-ILP. Experiments conducted on the MIPLIB dataset show the effectiveness of our solver in solving large-scale hard integer linear programming problems within a reasonably short time. Local-ILP is competitive and complementary to the state-of-the-art commercial solver Gurobi and significantly outperforms the state-of-the-art non-commercial solver SCIP. Moreover, our solver establishes new records for 6 MIPLIB open instances.

翻訳日:2023-05-02 16:41:10 公開日:2023-04-29

# 欧州の報道機関「Covid-19no-Vax運動」の調査:NLPフレームワーク

Examining European Press Coverage of the Covid-19 No-Vax Movement: An NLP Framework ( http://arxiv.org/abs/2305.00182v1 )

ライセンス: Link先を確認

David Alonso del Barrio and Daniel Gatica-Perez

(参考訳) 本稿は、欧州の報道機関がコビッドウイルスワクチンに対するノバックス反応と、この動きに関連する偽情報と偽情報にどう対処したかを検討する。 2020-2021年の22ヶ月にわたる反ワクチン運動に関する19のヨーロッパの新聞の1786の記事のキュレーションデータセットを用いて、トピックモデリング、感情分析、単語埋め込みとの意味関係、政治的分析、名前付きエンティティ認識、意味ネットワークといった自然言語処理技術を用いて、欧州伝統的メディアの偽情報エコシステムにおける特定の役割を理解した。この多角的分析の結果、ヨーロッパの報道機関は、主にソーシャルメディアに広がる様々なホックスに積極的に反対し、新聞の政治的指向に関係なく、反バックスの傾向に批判的であった。これは、偽情報生態系における高品質プレスの役割を研究することの意義を裏付けるものである。

This paper examines how the European press dealt with the no-vax reactions against the Covid-19 vaccine and the dis- and misinformation associated with this movement. Using a curated dataset of 1786 articles from 19 European newspapers on the anti-vaccine movement over a period of 22 months in 2020-2021, we used Natural Language Processing techniques including topic modeling, sentiment analysis, semantic relationship with word embeddings, political analysis, named entity recognition, and semantic networks, to understand the specific role of the European traditional press in the disinformation ecosystem. The results of this multi-angle analysis demonstrate that the European well-established press actively opposed a variety of hoaxes mainly spread on social media, and was critical of the anti-vax trend, regardless of the political orientation of the newspaper. This confirms the relevance of studying the role of high-quality press in the disinformation ecosystem.

翻訳日:2023-05-02 16:40:48 公開日:2023-04-29

# TAPE:時間的注意に基づく確率的人間のポーズと形状推定

TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation ( http://arxiv.org/abs/2305.00181v1 )

ライセンス: Link先を確認

Nikolaos Vasilikopoulos, Nikos Kolotouros, Aggeliki Tsoli, Antonis Argyros

(参考訳) モノクロビデオから3Dのポーズと形状を再構築することは、よく研究されているが難しい問題だ。一般的な課題として、オクルージョン、2Dから3Dマッピングにおける固有の曖昧さ、ビデオ処理の計算複雑性などがある。既存の手法では復元のあいまいさを無視し、3Dポーズの1つの決定論的推定を提供する。これらの問題に対処するため、RGBビデオで動作する時間的注意に基づく確率的人間のポーズと形状推定法(TAPE)を提案する。具体的には,注意に基づくニューラルネットワークを用いて映像フレームを時間的特徴にエンコードするニューラルネットワークを提案する。これらの特徴を考慮し、正規化フローを用いた人間のポーズに対するフレーム単位の時間的インフォームド確率分布を出力する。テープは標準ベンチマークで最先端の手法よりも優れており、最適化に基づく人間のポーズや形状推定に有効なビデオベースプリエントとして機能する。 https: //github.com/nikosvasilik/TAPE

Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE

翻訳日:2023-05-02 16:40:29 公開日:2023-04-29

# 広範学習システムに基づく実時間マルチモード障害診断手法の提案

An Evidential Real-Time Multi-Mode Fault Diagnosis Approach Based on Broad Learning System ( http://arxiv.org/abs/2305.00169v1 )

ライセンス: Link先を確認

Chen Li and Zeyi Liu and Limin Wang and Minyue Li and Xiao He

(参考訳) 故障診断は、非ゲージ、マルチモード、センタードリフト特性を示す多様な動作条件により、業界で重要な研究領域である。現在、データ駆動アプローチはこの分野で主に注目されているが、連続的な障害分類や障害分類器のパラメータ更新、特に複数の運用モードやリアルタイム設定といった課題を提起している。したがって, 産業システムにおけるリアルタイムマルチモード故障診断の実現が課題である。本稿では,エビデンス推論(er)アルゴリズムを用いて,異なるベース分類器からの情報を融合し,出力をマージする新しい手法を提案する。これらのベース分類器を広範学習システム(bls)を用いて開発し、故障診断性能を向上させる。さらに,本手法では,モデルパラメータをリアルタイムで更新するために擬似ラベル学習法を用いる。提案手法の有効性を実証するため,マルチモードのテネシー・イーストマンプロセスデータセットを用いて実験を行った。

Fault diagnosis is a crucial area of research in the industry due to diverse operating conditions that exhibit non-Gaussian, multi-mode, and center-drift characteristics. Currently, data-driven approaches are the main focus in the field, but they pose challenges for continuous fault classification and parameter updates of fault classifiers, particularly in multiple operating modes and real-time settings. Therefore, a pressing issue is to achieve real-time multi-mode fault diagnosis for industrial systems. To address this problem, this paper proposes a novel approach that utilizes an evidence reasoning (ER) algorithm to fuse information and merge outputs from different base classifiers. These base classifiers are developed using a broad learning system (BLS) to improve good fault diagnosis performance. Moreover, in this approach, the pseudo-label learning method is employed to update model parameters in real-time. To demonstrate the effectiveness of the proposed approach, we perform experiments using the multi-mode Tennessee Eastman process dataset.

翻訳日:2023-05-02 16:40:13 公開日:2023-04-29

# RRAMと人工知能のための酸化物層としての金属酸化物の複合

The Combination of Metal Oxides as Oxide Layers for RRAM and Artificial Intelligence ( http://arxiv.org/abs/2305.00166v1 )

ライセンス: Link先を確認

Sun Hanyu

(参考訳) 抵抗性ランダムアクセスメモリ(RRAM)は、高速、低消費電力、スケーラビリティに優れた次世代メモリデバイスにとって有望な候補である。金属酸化物は、高い誘電率と安定性のため、RRAM装置の酸化物層として一般的に用いられる。しかし、RRAMデバイスの性能をさらに向上させるため、最近の研究は人工知能(AI)の統合に焦点を当てている。 AIはRRAMデバイスのパフォーマンスの最適化に使用することができ、RRAMはハードウェアアクセラレータやニューロモルフィックコンピューティングでAIを駆動することもできる。本稿では,金属酸化物をベースとしたRRAMとAIの組み合わせについて概説する。我々は、RRAMデバイスの性能向上のためのAIの使用と、AIを駆動するRRAMの使用について論じる。さらに、この分野の重要な課題に取り組み、今後の研究方向性に関する洞察を提供する。

Resistive random-access memory (RRAM) is a promising candidate for next-generation memory devices due to its high speed, low power consumption, and excellent scalability. Metal oxides are commonly used as the oxide layer in RRAM devices due to their high dielectric constant and stability. However, to further improve the performance of RRAM devices, recent research has focused on integrating artificial intelligence (AI). AI can be used to optimize the performance of RRAM devices, while RRAM can also power AI as a hardware accelerator and in neuromorphic computing. This review paper provides an overview of the combination of metal oxides-based RRAM and AI, highlighting recent advances in these two directions. We discuss the use of AI to improve the performance of RRAM devices and the use of RRAM to power AI. Additionally, we address key challenges in the field and provide insights into future research directions

翻訳日:2023-05-02 16:39:56 公開日:2023-04-29

# ビデオスーパーリゾリューションのための暗黙のアライメント

An Implicit Alignment for Video Super-Resolution ( http://arxiv.org/abs/2305.00163v1 )

ライセンス: Link先を確認

Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao

(参考訳) ビデオのスーパーレゾリューションは通常、時間とともに情報の伝播をサポートするためにフレームアライメントを使用する。アライメントの役割は、ビデオの低レベルエンハンスメントのためによく研究されているが、既存の作品が重要なステップである再サンプリングを見落としている。フレーム間の動作を補償する方法に関わらず、フローベースのワーピングや変形可能な畳み込み/アテンションなど、ほとんどの作業では、再サンプリングにバイリニア補間(bilinear interpolation)のデフォルト選択を使用する。しかし、双線形補間はローパスフィルタとして効果的に機能し、超解像のために高周波コンテンツを回復する目的を阻害する。本稿では,ビデオ高分解能アライメントにおける再サンプリングの影響について検討する。大規模な実験により、アライメントを効果的にするためには、再サンプリングは特徴の本来の鋭さを保ち、歪みを防ぐ必要があることが判明した。そこで,本研究では,正弦波位置符号化により符号化されたサンプリング位置をウィンドウベースのクロスアテンションで再サンプリングする暗黙的アライメント手法を提案する。再サンプリングは学習したネットワーク重みによって暗黙的に計算される。実験によると、提案された暗黙のアライメントは、合成データセットと実世界のデータセットの両方に最小限の影響で、最先端フレームワークのパフォーマンスを向上させる。

Video super-resolution commonly uses a frame-wise alignment to support the propagation of information over time. The role of alignment is well-studied for low-level enhancement in video, but existing works have overlooked one critical step -- re-sampling. Most works, regardless of how they compensate for motion between frames, be it flow-based warping or deformable convolution/attention, use the default choice of bilinear interpolation for re-sampling. However, bilinear interpolation acts effectively as a low-pass filter and thus hinders the aim of recovering high-frequency content for super-resolution. This paper studies the impact of re-sampling on alignment for video super-resolution. Extensive experiments reveal that for alignment to be effective, the re-sampling should preserve the original sharpness of the features and prevent distortions. From these observations, we propose an implicit alignment method that re-samples through a window-based cross-attention with sampling positions encoded by sinusoidal positional encoding. The re-sampling is implicitly computed by learned network weights. Experiments show that the proposed implicit alignment enhances the performance of state-of-the-art frameworks with minimal impact on both synthetic and real-world datasets.

翻訳日:2023-05-02 16:39:42 公開日:2023-04-29

# beyond prediction:不均一グラフに基づくリストワイズランキングを用いた路上駐車推薦

Beyond Prediction: On-street Parking Recommendation using Heterogeneous Graph-based List-wise Ranking ( http://arxiv.org/abs/2305.00162v1 )

ライセンス: Link先を確認

Hanyu Sun, Xiao Huang, Wei Ma

(参考訳) リアルタイムの駐車情報を提供するため、既存の研究は、ドライバーの走行時間を節約するための間接的なアプローチであるパーキング可用性の予測に重点を置いている。本稿では,運転者に直接駐車スペースを推薦するために,路上駐車推奨(opr)タスクを初めて提案する。この目的のために、OPR-LTRと呼ばれるLearning-to-rank(LTR)ベースのOPRモデルを構築している。具体的には、駐車勧告は、各駐車空間の「転倒イベント」と密接に関連しているため、ESGraphと呼ばれる高効率な異種グラフを設計し、歴史的かつリアルタイムなメータの転倒イベントと地理的関係を表現し、その後、畳み込みに基づくイベント列グラフネットワークを用いて異種グラフの表現を集約・更新する。ランキングモデルはさらに、特定の路上駐車クエリに対してランク付けされた駐車スポットのリストを推奨するスコア関数を学習するために利用される。この方法は、香港とサンフランシスコの路上駐車メーターデータを用いて検証される。予測のみと予測を推奨する2種類の手法を比較することにより,提案手法は様々な指標において良好な性能を実現する。大規模な実験により、提案したESGraphとレコメンデーションモデルは、計算効率の面でより効率的であり、ドライバーの路上駐車時間を節約できることを示した。

To provide real-time parking information, existing studies focus on predicting parking availability, which seems an indirect approach to saving drivers' cruising time. In this paper, we first time propose an on-street parking recommendation (OPR) task to directly recommend a parking space for a driver. To this end, a learn-to-rank (LTR) based OPR model called OPR-LTR is built. Specifically, parking recommendation is closely related to the "turnover events" (state switching between occupied and vacant) of each parking space, and hence we design a highly efficient heterogeneous graph called ESGraph to represent historical and real-time meters' turnover events as well as geographical relations; afterward, a convolution-based event-then-graph network is used to aggregate and update representations of the heterogeneous graph. A ranking model is further utilized to learn a score function that helps recommend a list of ranked parking spots for a specific on-street parking query. The method is verified using the on-street parking meter data in Hong Kong and San Francisco. By comparing with the other two types of methods: prediction-only and prediction-then-recommendation, the proposed direct-recommendation method achieves satisfactory performance in different metrics. Extensive experiments also demonstrate that the proposed ESGraph and the recommendation model are more efficient in terms of computational efficiency as well as saving drivers' on-street parking time.

翻訳日:2023-05-02 16:39:20 公開日:2023-04-29

# LiDAR投影画像によるセンサの等価性

Sensor Equivariance by LiDAR Projection Images ( http://arxiv.org/abs/2305.00221v1 )

ライセンス: Link先を確認

Hannes Reichert, Manuel Hetzel, Steven Schreck, Konrad Doll, and Bernhard Sick

(参考訳) 本研究では,関連した投影特性を符号化した追加チャネルによる従来の画像データの拡張を提案する。このことは、LiDARのような射影型センサーにおけるセンサ依存のオブジェクト表現の問題に対処し、センサの解像度や視野の変化による物理的および幾何学的性質の歪みを引き起こす可能性がある。そこで我々は,このデータをインスタンスセグメンテーションフレームワークで処理するためのアーキテクチャを提案する。我々は、機械ビジョンタスクと高度自動運転(HAD)のためのキーセンサーモダリティとして、特にLiDARに焦点を当てる。制御された合成環境における実験的な設定により,センサ解像度と視野のバイアスを同定し,提案手法がlidarインスタンスのセグメンテーションにおけるバイアスを低減できることを実証する。さらに,カメラなどの他の投影型センサにも適用可能な手法を定義した。透明性を促進するため、コードとデータセットを公開しています。本手法は,プロジェクションベースセンサを用いた各種マシンビジョンタスクの性能向上とロバスト性向上の可能性を示す。

In this work, we propose an extension of conventional image data by an additional channel in which the associated projection properties are encoded. This addresses the issue of sensor-dependent object representation in projection-based sensors, such as LiDAR, which can lead to distorted physical and geometric properties due to variations in sensor resolution and field of view. To that end, we propose an architecture for processing this data in an instance segmentation framework. We focus specifically on LiDAR as a key sensor modality for machine vision tasks and highly automated driving (HAD). Through an experimental setup in a controlled synthetic environment, we identify a bias on sensor resolution and field of view and demonstrate that our proposed method can reduce said bias for the task of LiDAR instance segmentation. Furthermore, we define our method such that it can be applied to other projection-based sensors, such as cameras. To promote transparency, we make our code and dataset publicly available. This method shows the potential to improve performance and robustness in various machine vision tasks that utilize projection-based sensors.

翻訳日:2023-05-02 16:32:17 公開日:2023-04-29

# リラクシド強制選択は視覚品質評価法の性能を向上させる

Relaxed forced choice improves performance of visual quality assessment methods ( http://arxiv.org/abs/2305.00220v1 )

ライセンス: Link先を確認

Mohsen Jenadeleh, Johannes Zagermann, Harald Reiterer, Ulf-Dietrich Reips, Raouf Hamzaoui, Dietmar Saupe

(参考訳) 画像品質評価において、多数の被験者の個人評価から画像又は映像の集合的視覚品質スコアを得る。これらの実験でよく使われる形式は、2つの代替的な強制選択法である。同じ内容だが視覚品質の異なる2つの刺激を順次または並べて提示する。被験者は、より良い品質の1つを選択するように求められ、不確かでない場合は、推測する必要がある。緩和された代替選択フォーマットは、第3の応答オプション、すなわち‘not sure’を提供することによって、推測による認知的負荷と応答のノイズを低減することを目的としている。この研究は、これらの2つのレスポンスフォーマットを比較するために、大規模で包括的なクラウドソーシング実験を提示している。品質評価のための曖昧な基礎的真理を提供するため、被験者は点数が異なる画像のペアを示し、より多くの点を持つものを選ぶように毎回要求した。クラウドソーシング研究には254人の参加者が参加し,イントラサブジェクトデザインを用いて実施した。各被験者は,「不確実」反応オプションの有無と40対比較の回答を求められ,各テスト条件に対する認知負荷を評価するためのアンケートを完了した。実験結果から,強制選択法に `<not sure'' 応答オプションを組み込むことで,心理的負荷が減少し,データ適合性が向上し,真理に対応するモデルが得られた。また、モデルの等価性をテストした結果、それらが異なることがわかった。データセットはhttp://database.mmsp-kn.de/cogvqa-database.htmlで利用可能である。

In image quality assessment, a collective visual quality score for an image or video is obtained from the individual ratings of many subjects. One commonly used format for these experiments is the two-alternative forced choice method. Two stimuli with the same content but differing visual quality are presented sequentially or side-by-side. Subjects are asked to select the one of better quality, and when uncertain, they are required to guess. The relaxed alternative forced choice format aims to reduce the cognitive load and the noise in the responses due to the guessing by providing a third response option, namely, ``not sure''. This work presents a large and comprehensive crowdsourcing experiment to compare these two response formats: the one with the ``not sure'' option and the one without it. To provide unambiguous ground truth for quality evaluation, subjects were shown pairs of images with differing numbers of dots and asked each time to choose the one with more dots. Our crowdsourcing study involved 254 participants and was conducted using a within-subject design. Each participant was asked to respond to 40 pair comparisons with and without the ``not sure'' response option and completed a questionnaire to evaluate their cognitive load for each testing condition. The experimental results show that the inclusion of the ``not sure'' response option in the forced choice method reduced mental load and led to models with better data fit and correspondence to ground truth. We also tested for the equivalence of the models and found that they were different. The dataset is available at http://database.mmsp-kn.de/cogvqa-database.html.

翻訳日:2023-05-02 16:32:00 公開日:2023-04-29

# 非ネイティブ話者の割合が言語複雑性に与える影響の証拠はまだない -- Kauhanen, Einhaus & Walkden (2023)に対する回答

Still no evidence for an effect of the proportion of non-native speakers on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023) ( http://arxiv.org/abs/2305.00217v1 )

ライセンス: Link先を確認

Alexander Koplenig

(参考訳) Journal of Language Evolutionに掲載された最近の論文で、Kauhanen, Einhaus & Walkden (https://doi.org/10.1093/jole/lzad005, KEW)は、私の論文の1つ(Koplenig, Royal Society Open Science 6, 181274 (2019), https://doi.org/10.1098/rsos.181274)で示された結果に異議を唱えました。この目的のために、Ethnologueが言語ステータスを評価する方法に注目します。L1(第一言語)話者が使用することに加えて、かなりの数のL2ユーザを持つ必要がある場合、言語はvehicularとして特徴づけられます。 KEWは、言語がかなりの数のL2ユーザを持つかどうかを示す(バイナリ)指標として、そしてその比率の直接推定が不可能なときに、L2話者の0パーセントを非車種言語に出力するという考え方の両方を批判している。出版後論評の重要性は認識していますが,本論では両論点が明記され,私の論文で分析されていることを示します。さらに、KEWが提起した他の点についてもコメントし、KEWが提供する代替分析も、より精査に至らないことを実証します。

In a recent paper published in the Journal of Language Evolution, Kauhanen, Einhaus & Walkden (https://doi.org/10.1093/jole/lzad005, KEW) challenge the results presented in one of my papers (Koplenig, Royal Society Open Science, 6, 181274 (2019), https://doi.org/10.1098/rsos.181274), in which I tried to show through a series of statistical analyses that large numbers of L2 (second language) speakers do not seem to affect the (grammatical or statistical) complexity of a language. To this end, I focus on the way in which the Ethnologue assesses language status: a language is characterised as vehicular if, in addition to being used by L1 (first language) speakers, it should also have a significant number of L2 users. KEW criticise both the use of vehicularity as a (binary) indicator of whether a language has a significant number of L2 users and the idea of imputing a zero proportion of L2 speakers to non-vehicular languages whenever a direct estimate of that proportion is unavailable. While I recognise the importance of post-publication commentary on published research, I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper. In addition, I also comment on other points raised by KEW and demonstrate that both alternative analyses offered by KEW do not stand up to closer scrutiny.

翻訳日:2023-05-02 16:31:35 公開日:2023-04-29

# 実時間交流/dcパワーフロー解析のための物理誘導グラフニューラルネットワーク

Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis ( http://arxiv.org/abs/2305.00216v1 )

ライセンス: Link先を確認

Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding

(参考訳) 交流電流と直流(AC/DC)ハイブリッドシステムの増大は、これまで以上に高速な電力フロー解析ツールを必要とする。本稿では,物理誘導型グラフニューラルネットワーク(PG-GNN)を提案する。 PG-GNNのトポロジ適応性を高めるために,まずACグリッドとDCグリッドの調整グラフモデリングを行う。データから信頼性の低いエミュレーションを推定するために、AC/DC物理は二重性を用いてPG-GNNに埋め込まれる。拡張されたラグランジアン法に基づく学習スキームが提示され、PG-GNNが非凸パターンを教師なしラベルフリーで学習するのに役立つ。マルチPG-GNNは、最終的に様々なDC制御モードをマスターするために実行される。ケーススタディでは、他の7つのデータ駆動型ライバルと比較して、提案手法はモデルベースベンチマークの性能と一致し、計算効率も10倍以上に向上している。

The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data, AC/DC physics are embedded in the PG-GNN using duality. Augmented Lagrangian method-based learning scheme is then presented to help the PG-GNN better learn nonconvex patterns in an unsupervised label-free manner. Multi-PG-GNN is finally conducted to master varied DC control modes. Case study shows that, relative to the other 7 data-driven rivals, only the proposed method matches the performance of the model-based benchmark, also beats it in computational efficiency beyond 10 times.

翻訳日:2023-05-02 16:30:55 公開日:2023-04-29

# EBLIME: ベイズ局所解釈型モデル非依存的説明

EBLIME: Enhanced Bayesian Local Interpretable Model-agnostic Explanations ( http://arxiv.org/abs/2305.00213v1 )

ライセンス: Link先を確認

Yuhao Zhong, Anirban Bhattacharya, Satish Bukkapatnam

(参考訳) ブラックボックス機械学習モデルの説明とベイジアンリッジ回帰モデルを用いた特徴量の分布を求めるため,EBLIMEを提案する。ベイズフレームワークの数学的表現とリッジパラメータの意義を含む理論的結果を提供する。ケーススタディは、ベンチマークデータセットと、製造製品の内部欠陥を見つけるための実世界の工業的応用に基づいて行われた。最新の手法と比較して、eblimeはより直感的で正確な結果を得ることができ、後方分布、信頼できる間隔、特徴重要性のランキングといった点でより不確実性が定量化される。

We propose EBLIME to explain black-box machine learning models and obtain the distribution of feature importance using Bayesian ridge regression models. We provide mathematical expressions of the Bayesian framework and theoretical outcomes including the significance of ridge parameter. Case studies were conducted on benchmark datasets and a real-world industrial application of locating internal defects in manufactured products. Compared to the state-of-the-art methods, EBLIME yields more intuitive and accurate results, with better uncertainty quantification in terms of deriving the posterior distribution, credible intervals, and rankings of the feature importance.

翻訳日:2023-05-02 16:30:40 公開日:2023-04-29

# ShipHullGAN:Deep Convolutional Generative Modelを用いた船体設計のための汎用パラメトリックモデル

ShipHullGAN: A generic parametric modeller for ship hull design using deep convolutional generative model ( http://arxiv.org/abs/2305.00210v1 )

ライセンス: Link先を確認

Shahroz Khan, Kosa Goucher-Lambert, Konstantinos Kostas, Panagiotis Kaklis

(参考訳) 本稿では,船殻の汎用表現と生成のために,深部畳み込み生成逆数ネットワーク(GAN)を用いて構築された汎用パラメトリック・モデルラーであるShipHullGANを紹介する。高いレベルでは、新しいモデルはパラメトリックな船の設計パラダイムにおける現在の保守性に対処することを目的としており、パラメトリックなモデラーは特定の船種しか扱えない。 shiphullganを52,591 \textit{physically validated}の設計で訓練し、コンテナ船、タンカー、ばら積み貨物船、タグボート、乗組員の補給船など、さまざまな船種から設計した。我々は、全てのトレーニングデザインを同じ解像度の共通幾何学的表現に変換するための新しい形状抽出と表現戦略を開発した。スペース充填層はジェネレータコンポーネントの直後に置かれ、トレーニングされたジェネレータがすべての設計クラスをカバーできることを保証する。トレーニング中の設計は、幾何学的モーメントを用いてコンパクトな幾何学的表現を利用する形状変化テンソル(SST)の形で提供される。我々は,ShipHullGANが拡張された特徴を持つデザインを生成できるという広範な研究と最適化事例を通じて,幾何学的に有効かつ実用的な形状の伝統的かつ斬新なデザインを創出する多目的デザイン空間を提示した。

In this work, we introduce ShipHullGAN, a generic parametric modeller built using deep convolutional generative adversarial networks (GANs) for the versatile representation and generation of ship hulls. At a high level, the new model intends to address the current conservatism in the parametric ship design paradigm, where parametric modellers can only handle a particular ship type. We trained ShipHullGAN on a large dataset of 52,591 \textit{physically validated} designs from a wide range of existing ship types, including container ships, tankers, bulk carriers, tugboats, and crew supply vessels. We developed a new shape extraction and representation strategy to convert all training designs into a common geometric representation of the same resolution, as typically GANs can only accept vectors of fixed dimension as input. A space-filling layer is placed right after the generator component to ensure that the trained generator can cover all design classes. During training, designs are provided in the form of a shape-signature tensor (SST) which harnesses the compact geometric representation using geometric moments that further enable the inexpensive incorporation of physics-informed elements in ship design. We have shown through extensive comparative studies and optimisation cases that ShipHullGAN can generate designs with augmented features resulting in versatile design spaces that produce traditional and novel designs with geometrically valid and practically feasible shapes.

翻訳日:2023-05-02 16:30:29 公開日:2023-04-29

# bi-rnnネットワークを用いた高移動度通信における深層学習に基づくチャネル推定

Deep Learning Based Channel Estimation in High Mobility Communications Using Bi-RNN Networks ( http://arxiv.org/abs/2305.00208v1 )

ライセンス: Link先を確認

Abdul Karim Gizzini, Marwa Chafii

(参考訳) 二重選択チャネル推定は、無線システムにおける通信信頼性を保証する重要な要素である。動的環境におけるマルチパス伝搬とドップラー干渉の影響により,2重選択チャネル推定が困難となる。従来のチャネル推定手法は、限られた訓練パイロットの使用により、高移動度シナリオにおける性能劣化に遭遇する。近年,畳み込みニューラルネットワーク(CNN)ネットワークを用いたフレーム・バイ・フレーム(FBF)チャネル推定において,深層学習(DL)を二重選択チャネル推定に利用している。しかし、cnnベースの推定器は高い複雑さを必要とし、実際のシナリオでは実用的でない。この目的のために,2重選択チャネルを正確に推定する最適化された双方向リカレントニューラルネットワーク (Bi-RNN) を用いたチャネル推定器を提案することにより,この問題を克服する。提案手法は,ゲートリカレントユニット(GRU)ユニットを用いてエンドツーエンドの補間を行う。広範な数値実験により、開発されたbi-gru推定器は、最近提案されたcnnベースの推定器を異なる移動シナリオで大幅に上回っていることが示され、計算の複雑さは大幅に減少する。

Doubly-selective channel estimation represents a key element in ensuring communication reliability in wireless systems. Due to the impact of multi-path propagation and Doppler interference in dynamic environments, doubly-selective channel estimation becomes challenging. Conventional channel estimation schemes encounter performance degradation in high mobility scenarios due to the usage of limited training pilots. Recently, deep learning (DL) has been utilized for doubly-selective channel estimation, where convolutional neural network (CNN) networks are employed in the frame-by-frame (FBF) channel estimation. However, CNN-based estimators require high complexity, making them impractical in real-case scenarios. For this purpose, we overcome this issue by proposing an optimized and robust bi-directional recurrent neural network (Bi-RNN) based channel estimator to accurately estimate the doubly-selective channel, especially in high mobility scenarios. The proposed estimator is based on performing end-to-end interpolation using gated recurrent unit (GRU) unit. Extensive numerical experiments demonstrate that the developed Bi-GRU estimator significantly outperforms the recently proposed CNN-based estimators in different mobility scenarios, while substantially reducing the overall computational complexity.

翻訳日:2023-05-02 16:30:01 公開日:2023-04-29

# CARLA-BSP:歩行者によるシミュレーションデータセット

CARLA-BSP: a simulated dataset with pedestrians ( http://arxiv.org/abs/2305.00204v1 )

ライセンス: Link先を確認

Maciej Wielgosz and Antonio M. L\'opez and Muhammad Naveed Riaz

(参考訳) 本稿では,CARLA (0.9.13) で新たにデータセットを生成するARCANEフレームワークを用いて,歩行者を特徴付けるサンプルデータセットを提案する。歩行者検出,自動符号化,ポーズ推定,ポーズリフトのユースケースを提供する。ベースラインの結果も紹介します。詳細はhttps://project-arcane.eu/を参照。

We present a sample dataset featuring pedestrians generated using the ARCANE framework, a new framework for generating datasets in CARLA (0.9.13). We provide use cases for pedestrian detection, autoencoding, pose estimation, and pose lifting. We also showcase baseline results. For more information, visit https://project-arcane.eu/.

翻訳日:2023-05-02 16:29:41 公開日:2023-04-29

# インストラクション-ViT:ViTにおけるインストラクション学習のためのマルチモーダルプロンプト

Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT ( http://arxiv.org/abs/2305.00201v1 )

ライセンス: Link先を確認

Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang

(参考訳) プロンプトは大規模言語モデルにおいて重要な役割を果たすことが証明されており、近年では複数の下流タスクのスケーラビリティ向上のためにプロンプトも使用されている。本稿では、インストラクション-ViTと呼ばれる画像分類のための視覚変換器モデルに、命令チューニングに基づくプロンプト設計を適用することに焦点を当てる。キーとなるアイデアは、カテゴリ情報に関連するマルチモーダルプロンプト(テキストまたは画像プロンプト)を実装し、モデルの微調整を導くことである。いくつかの画像キャプションタスクの実験に基づいて、性能とドメイン適応性を改善した。我々の研究は、視覚分類モデルの性能と適応性を向上したマルチモーダルプロンプトを融合する革新的な戦略を提供した。

Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt design based on instruction tuning into a visual transformer model for image classification which we called Instruction-ViT. The key idea is to implement multi-modal prompts (text or image prompt) related to category information to guide the fine-tuning of the model. Based on the experiments of several image captionining tasks, the performance and domain adaptability were improved. Our work provided an innovative strategy to fuse multi-modal prompts with better performance and faster adaptability for visual classification models.

翻訳日:2023-05-02 16:29:36 公開日:2023-04-29

# 新型コロナウイルスパンデミックによる中国の労働市場動態の大規模評価

Large-Scale Assessment of Labour Market Dynamics in China during the COVID-19 Pandemic ( http://arxiv.org/abs/2305.00199v1 )

ライセンス: Link先を確認

Ying Sun, Hengshu Zhu, Hui Xiong

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックが中国の労働市場に前例のない影響を与え、さまざまな地域での労働供給と需要の構造を大きく変えた。政策立案者は、ポストパンデミック労働市場の新たなダイナミクスを理解し、地域経済の持続可能な発展を支援する適切な政策を提供することが重要となる。そこで本稿では,大規模オンライン求人情報検索と求人情報投稿による地域労働市場の変動動態の評価と理解を目的とした,データ駆動型アプローチを提案する。特に、地域労働市場の魅力を反映した、労働の流れと労働需要の空間的・時間的パターンをモデル化する。分析の結果,地域労働市場は劇的な変化に悩まされ,パンデミック時の回復の兆候がみられた。具体的には、大都市から小都市へ、南北地方へ移住する傾向から、労働フローの意図が急速に回復した。一方、パンデミックにより、ブルーカラー労働者の需要はホワイトカラー労働者に比べて大幅に減少した。また、青カラー雇用の需要構造も製造業からサービス産業へと変化した。以上の結果から,パンデミックは労働需要の異なる地域や規制政策に様々な影響を及ぼす可能性が示唆された。この分析は、パンデミックのような極端なイベント中の雇用市場の変化に直面する個人と組織の両方にタイムリーな情報を提供する。また、地方経済の持続的な発展を促進する上で、雇用市場に対する適切な政策の提供を政府も支援できる。

The outbreak of the COVID-19 pandemic has had an unprecedented impact on China's labour market, and has largely changed the structure of labour supply and demand in different regions. It becomes critical for policy makers to understand the emerging dynamics of the post-pandemic labour market and provide the right policies for supporting the sustainable development of regional economies. To this end, in this paper, we provide a data-driven approach to assess and understand the evolving dynamics in regions' labour markets with large-scale online job search queries and job postings. In particular, we model the spatial-temporal patterns of labour flow and labour demand which reflect the attractiveness of regional labour markets. Our analysis shows that regional labour markets suffered from dramatic changes and demonstrated unusual signs of recovery during the pandemic. Specifically, the intention of labour flow quickly recovered with a trend of migrating from large to small cities and from northern to southern regions, respectively. Meanwhile, due to the pandemic, the demand of blue-collar workers has been substantially reduced compared to that of white-collar workers. In addition, the demand structure of blue-collar jobs also changed from manufacturing to service industries. Our findings reveal that the pandemic can cause varied impacts on regions with different structures of labour demand and control policies. This analysis provides timely information for both individuals and organizations in confronting the dynamic change in job markets during the extreme events, such as pandemics. Also, the governments can be better assisted for providing the right policies on job markets in facilitating the sustainable development of regions' economies.

翻訳日:2023-05-02 16:29:23 公開日:2023-04-29

# 逆中散乱問題に対する直接サンプリングに基づく深層学習手法

A Direct Sampling-Based Deep Learning Approach for Inverse Medium Scattering Problems ( http://arxiv.org/abs/2305.00250v1 )

ライセンス: Link先を確認

Jianfeng Ning, Fuqun Han and Jun Zou

(参考訳) 本研究では,計測された散乱データに基づいて未知の散乱器を回収することを目的とした逆媒質散乱問題(imsp)に着目する。 23]で導入された効率的な直接サンプリング法(dsm)に動機づけられ,不均質な散乱器を再構成する新しい直接サンプリング型深層学習法(dsm-dl)を提案する。特に、u-netニューラルネットワークを用いて、インデックス関数と真のコントラストの関係を学習する。提案するdsm-dlは, 計算効率が高く, 雑音に頑健であり, 実装が容易であり, 高品質な再構築を実現するために複数の計測データを自然に組み込むことができる。提案手法の性能を評価するため, 各種入射波数, 騒音レベルの異なる代表実験を行った。その結果,深層学習技術とDSM for IMSPの併用による有望なメリットが示された。

In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the relation between the index functions and the true contrasts. Our proposed DSM-DL is computationally efficient, robust to noise, easy to implement, and able to naturally incorporate multiple measured data to achieve high-quality reconstructions. Some representative tests are carried out with varying numbers of incident waves and different noise levels to evaluate the performance of the proposed method. The results demonstrate the promising benefits of combining deep learning techniques with the DSM for IMSP.

翻訳日:2023-05-02 16:23:29 公開日:2023-04-29

# 自由生活環境におけるパーキンソン震検出改善のための複数事例学習問題における非競合データの活用

Leveraging Unlabelled Data in Multiple-Instance Learning Problems for Improved Detection of Parkinsonian Tremor in Free-Living Conditions ( http://arxiv.org/abs/2305.00249v1 )

ライセンス: Link先を確認

Alexandros Papadopoulos, Anastasios Delopoulos

(参考訳) パーキンソン病とその運動症状を遠隔で検出するためのデータ駆動アプローチは、早期診断の潜在的な臨床効果のために近年普及している。このようなアプローチの聖杯は、データが日々の生活の中で継続的に無害に収集される自由生活のシナリオである。しかし, 微粒な接地構造と, 残りは邪魔にならないことが矛盾しているため, マルチスタンス学習によって問題に対処することが普通である。しかし、大規模研究では、完全な神経学的評価が必要であるため、必要な粗い地面でも得ることは自明ではない。対照的に、根拠のない大規模なデータ収集はずっと簡単です。しかし、このトピックは研究の注目をほとんど受けていないため、複数インスタンス設定での非競合データの利用は簡単ではない。本稿では,このギャップを補うために,半教師付き学習と複数インスタンス学習を組み合わせた新しい手法を提案する。本手法は,通常の半教師あり学習における最先端のアプローチである仮想適応学習原理に基づいており,複数インスタンス設定に適応し,適切な修正を行う。まず,2つのよく知られたベンチマークデータセットから生成した合成問題に対する概念実証実験により,提案手法の有効性を検証した。次に, 完全にラベルが付かないデータが存在する場合, 手動加速度信号からpd振れを検知する実際のタスクに進む。その結果,454名の被験者の非ラベルデータを利用することで,震動が知られている45名のコーホートに対して,サブジェクト毎の震動検出において,高い性能向上(最大9%のf1-score増加)が達成できることがわかった。

Data-driven approaches for remote detection of Parkinson's Disease and its motor symptoms have proliferated in recent years, owing to the potential clinical benefits of early diagnosis. The holy grail of such approaches is the free-living scenario, in which data are collected continuously and unobtrusively during every day life. However, obtaining fine-grained ground-truth and remaining unobtrusive is a contradiction and therefore, the problem is usually addressed via multiple-instance learning. Yet for large scale studies, obtaining even the necessary coarse ground-truth is not trivial, as a complete neurological evaluation is required. In contrast, large scale collection of data without any ground-truth is much easier. Nevertheless, utilizing unlabelled data in a multiple-instance setting is not straightforward, as the topic has received very little research attention. Here we try to fill this gap by introducing a new method for combining semi-supervised with multiple-instance learning. Our approach builds on the Virtual Adversarial Training principle, a state-of-the-art approach for regular semi-supervised learning, which we adapt and modify appropriately for the multiple-instance setting. We first establish the validity of the proposed approach through proof-of-concept experiments on synthetic problems generated from two well-known benchmark datasets. We then move on to the actual task of detecting PD tremor from hand acceleration signals collected in-the-wild, but in the presence of additional completely unlabelled data. We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains (up to 9% increase in F1-score) in per-subject tremor detection for a cohort of 45 subjects with known tremor ground-truth.

翻訳日:2023-05-02 16:23:13 公開日:2023-04-29

# 新しい金融時系列事例表現を用いた産業分類

Industry Classification Using a Novel Financial Time-Series Case Representation ( http://arxiv.org/abs/2305.00245v1 )

ライセンス: Link先を確認

Rian Dolphin, Barry Smyth, Ruihai Dong

(参考訳) 金融分野は、予測、クラスタリング、分類など、さまざまなタスクにまたがる、機械学習の課題の肥大した源泉であることが証明されている。研究者は大量の時系列データにアクセスでき、微妙なパフォーマンス改善さえも大きな付加価値に変換できる。本研究では,この領域における重要な課題に対するケースベース推論の活用を,業界分類における過去の株価リターン時系列データを用いて検討する。本稿では,従来のケースベース推論手法において,時系列データが重要な表象的課題を呈する理由を考察し,それに対応するために,ストックリターン埋め込みに基づく新しい表現を提案し,生のストックリターンデータから容易に計算できることを示す。この表現は、事例に基づく推論に適しており、業界セクターの分類タスクに大規模な公開データセットを使用することで、従来の表現を用いた複数のベースラインのパフォーマンス向上を実証する。

The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations.

翻訳日:2023-05-02 16:22:37 公開日:2023-04-29

# 分割部分スキャンにおける深層学習に基づく3次元歯科メッシュ分割法の限界に関する批判的解析

A Critical Analysis of the Limitation of Deep Learning based 3D Dental Mesh Segmentation Methods in Segmenting Partial Scans ( http://arxiv.org/abs/2305.00244v1 )

ライセンス: Link先を確認

Ananya Jana, Aniruddha Maiti, Dimitris N. Metaxas

(参考訳) 口腔内スキャンによる歯のセグメンテーションは歯科医療において重要な要素である。多くのDeep Learningベースの歯のセグメンテーションアルゴリズムが開発されている。ほとんどの場合、高い精度が達成されているが、利用可能な歯のセグメンテーション技術のほとんどは、全顎モデルの暗黙的な制限的な仮定をしており、全顎モデルに基づいて精度を報告している。しかし、医学的には歯の完全なスキャンは必要とせず、あるいは使用できない場合もある。この実践的な問題を考えると、現在広く使われているDeep Learningベースの歯のセグメンテーション技術の堅牢性を理解することが重要である。そこで本研究では, 部分的口腔内スキャンに利用可能なセグメント化手法を適用し, 利用可能な深層学習技術が大幅に低下していることを発見した。この研究で示された分析と比較は、問題の深刻さを理解するのに役立ち、完全な顎モデルを強く仮定することなく、頑健な歯のセグメンテーション技術の開発を可能にする。

Tooth segmentation from intraoral scans is a crucial part of digital dentistry. Many Deep Learning based tooth segmentation algorithms have been developed for this task. In most of the cases, high accuracy has been achieved, although, most of the available tooth segmentation techniques make an implicit restrictive assumption of full jaw model and they report accuracy based on full jaw models. Medically, however, in certain cases, full jaw tooth scan is not required or may not be available. Given this practical issue, it is important to understand the robustness of currently available widely used Deep Learning based tooth segmentation techniques. For this purpose, we applied available segmentation techniques on partial intraoral scans and we discovered that the available deep Learning techniques under-perform drastically. The analysis and comparison presented in this work would help us in understanding the severity of the problem and allow us to develop robust tooth segmentation technique without strong assumption of full jaw model.

翻訳日:2023-05-02 16:22:23 公開日:2023-04-29

# ディープラーニングが多面体理論を満たすとき:調査

When Deep Learning Meets Polyhedral Theory: A Survey ( http://arxiv.org/abs/2305.00241v1 )

ライセンス: Link先を確認

Joey Huchette, Gonzalo Mu\~noz, Thiago Serra, Calvin Tsay

(参考訳) 過去10年間、コンピュータビジョンや自然言語処理といったタスクにおけるディープニューラルネットワークの驚くべき精度のおかげで、ディープラーニングは予測モデリングの一般的な方法論となった。一方、ニューラルネットワークの構造はより単純な表現に収束し、Rectified Linear Unit (ReLU) のような断片的定数と断片的線形関数がニューラルネットワークで最もよく使われるタイプのアクティベーション関数となった。これにより、ある種のネットワーク構造を$\unicode{x2014}$、一般的な完全連結フィードフォワードニューラルネットワーク$\unicode{x2014}$、多面体理論による解析や線形計画法(LP)や混合整数線形計画法(MILP)といった様々な目的に応用することができる。本稿では、ニューラルネットワークのより詳細な理解と、ネットワークのサイズを訓練、検証、縮小するための線形最適化手法の適用に新たな視点をもたらす。

In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.

翻訳日:2023-05-02 16:22:06 公開日:2023-04-29

# 遺伝的アルゴリズムのFairy Tale

The FAIRy Tale of Genetic Algorithms ( http://arxiv.org/abs/2305.00238v1 )

ライセンス: Link先を確認

Fahad Maqbool, Muhammad Saad Razzaq, Hajira Jabeen

(参考訳) 遺伝的アルゴリズム(GA)は確率演算子を用いて最適な解を求めるメタヒューリスティック進化アルゴリズムであり、多くの複雑な最適化問題の解法(分類、最適化、スケジューリングなど)においてその効果が証明されている。しかし、その性能、人気、単純さにもかかわらず、GAの再現性と再利用性にはあまり注意が払われていない。本稿では,Finderable,Accessible,Interoperable and Reusable (FAIR)データ原則を拡張し,アルゴリズムの再現性と再利用性を実現する。提案原則の適用性を実証するためのユースケースとして,GAを選択しました。また, GAの方法論的展開と変種について概説し, 適切なソースの再現や発見を困難にしている。さらに、FAIRアルゴリズムを有効にするために、軽量RDFフォーマットを用いた語彙(例えば$evo$)を提案し、再現性を向上させる。 GAの確率的性質を考えると、この作業は多くの最適化や機械学習アルゴリズム/メソッドにまで拡張できる。

Genetic Algorithm (GA) is a popular meta-heuristic evolutionary algorithm that uses stochastic operators to find optimal solution and has proved its effectiveness in solving many complex optimization problems (such as classification, optimization, and scheduling). However, despite its performance, popularity and simplicity, not much attention has been paid towards reproducibility and reusability of GA. In this paper, we have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable the reproducibility and reusability of algorithms. We have chosen GA as a usecase to the demonstrate the applicability of the proposed principles. Also we have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source. Additionally, to enable FAIR algorithms, we propose a vocabulary (i.e. $evo$) using light weight RDF format, facilitating the reproducibility. Given the stochastic nature of GAs, this work can be extended to numerous Optimization and machine learning algorithms/methods.

翻訳日:2023-05-02 16:21:47 公開日:2023-04-29

# 教育, マーケティング, ソフトウェア工学, 医療におけるChatGPT応用の概観:利益, 欠点, 研究の方向性

A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions ( http://arxiv.org/abs/2305.00237v1 )

ライセンス: Link先を確認

Mohammad Fraiwan and Natheer Khasawneh

(参考訳) ChatGPTは、ディープラーニングアルゴリズムを使用して、テキストベースのプロンプトに対する人間的な応答を生成する人工知能言語モデルの一種である。 2022年11月に最新のchatgptバージョンが導入されたことで、産業コミュニティと学術コミュニティは、その強力な能力、多くの応用可能性、そして悪用の可能性に衝撃を与えた。この作品の執筆時点で、他のいくつかの言語モデル(google bardやmeta llamaなど)が、可能な限りの市場における足場を築こうと試みて登場した。これらのモデルには、コンピュータとの対話方法に革命を起こす能力があり、教育、ソフトウェア工学、医療、マーケティングなど、多くの分野に潜在的な応用がある。本稿では,これらの分野における高度な言語チャットボット(例えばchatgpt)を用いたアプリケーション,欠点,研究の方向性について述べる。まず、人工知能に基づく言語モデルの簡単な導入と開発スケジュールから始め、その後、そのようなモデルの応用の可能性について検討し、その後、現在の技術状況の限界と欠点について議論し、最後に、今後の研究の方向性を指摘する。

ChatGPT is a type of artificial intelligence language model that uses deep learning algorithms to generate human-like responses to text-based prompts. The introduction of the latest ChatGPT version in November of 2022 has caused shockwaves in the industrial and academic communities for its powerful capabilities, plethora of possible applications, and the great possibility for abuse. At the time of writing this work, several other language models (e.g., Google Bard and Meta LLaMA) just came out in an attempt to get a foothold in the vast possible market. These models have the ability to revolutionize the way we interact with computers and have potential applications in many fields, including education, software engineering, healthcare, and marketing. In this paper, we will discuss the possible applications, drawbacks, and research directions using advanced language Chatbots (e.g., ChatGPT) in each of these fields. We first start with a brief introduction and the development timeline of artificial intelligence based language models, then we go through possible applications of such models, after that we discuss the limitations and drawbacks of the current technological state of the art, and finally we point out future possible research directions.

翻訳日:2023-05-02 16:21:32 公開日:2023-04-29

# ベストプラクティスによる機械学習を目指して

Towards machine learning guided by best practices ( http://arxiv.org/abs/2305.00233v1 )

ライセンス: Link先を確認

Anamaria Mojica-Hanke

(参考訳) 現在、機械学習(ML)は、医学からソフトウェア工学(SE)まで、複数のアプリケーション分野を持つソフトウェアシステムで使われている。一方、業界におけるMLの人気は、その成長と普及を示す統計に見ることができる。一方、その人気は研究、特にseでも見られ、seの会議やジャーナルで複数の研究が公開されているだけでなく、ソフトウェア工学の会議において複数のワークショップや共催の会議でも取り上げられている。同時に、研究者や実践者は、機械学習には特定の課題や落とし穴があることを示した。特に、ML対応システムは従来のSEとは異なる開発プロセスを持つことが研究で示されている。特定された課題や落とし穴を軽減するために、白とグレーの文献は自身の経験に基づいて、ドメイン(例えばバイオメカニクス)に焦点を当てた一連の勧告を提案しているが、私たちの知る限りでは、seコミュニティに焦点を当てたガイドラインはない。本論文は,SE の視点による実践の集合を提示する以前の研究研究と,質問や回答などの実践の源泉を分析して,SE コミュニティの実践者や研究者が使用し,議論するプラクティスを理解するのに役立つ研究質問に答えることにより,このギャップを小さくすることを目的とする。

Nowadays, machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE). On the one hand, the popularity of ML in the industry can be seen in the statistics showing its growth and adoption. On the other hand, its popularity can also be seen in research, particularly in SE, where not only have multiple studies been published in SE conferences and journals but also in the multiple workshops and co-located conferences in software engineering conferences. At the same time, researchers and practitioners have shown that machine learning has some particular challenges and pitfalls. In particular, research has shown that ML-enabled systems have a different development process than traditional SE, which also describes some of the challenges of ML applications. In order to mitigate some of the identified challenges and pitfalls, white and gray literature has proposed a set of recommendations based on their own experiences and focused on their domain (e.g., biomechanics), but for the best of our knowledge, there is no guideline focused on the SE community. This thesis aims to reduce this gap by answering research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community by analyzing possible sources of practices such as question and answer communities and also previous research studies to present a set of practices with an SE perspective.

翻訳日:2023-05-02 16:21:12 公開日:2023-04-29

# 不完全機械工学的知識を有する製造プロセスのための高速化・安価な機械学習

Accelerated and Inexpensive Machine Learning for Manufacturing Processes with Incomplete Mechanistic Knowledge ( http://arxiv.org/abs/2305.00229v1 )

ライセンス: Link先を確認

Jeremy Cleeman, Kian Agrawala, Rajiv Malhotra

(参考訳) 機械学習(ML)は、製造プロセスにおけるパラメトリック効果のモデリングへの関心が高まっている。最先端のアプローチでは、トレーニングデータを生成する実験的および/または計算的コストの削減に重点を置いているが、新しいプロセスのための定性的に正確な物理ベースのモデルを開発するための本質的で重要なコストは無視されている。本稿では,この問題に対処するトランスファーラーニングに基づくアプローチを提案する。そこでは,MLモデルを物理ベースプロセスモデル(ソース)から大量の計算コストのかかるデータに基づいて訓練し,より安価な実験データ(ターゲット)に基づいて微調整を行う。この斬新さは、文献において高いと推定されるソースモデルに要求される定性的精度の境界を押し下げることであり、高モデル開発コストの根源である。溶融フィラメント製造におけるプリントライン幅のモデル化について検討した。極端な機能的・量的不正確さにもかかわらず、我々のアプローチはモデル開発コストを年々削減し、実験コストを56-76%、計算コストを桁違いに、予測誤差を16-24%削減する。

Machine Learning (ML) is of increasing interest for modeling parametric effects in manufacturing processes. But this approach is limited to established processes for which a deep physics-based understanding has been developed over time, since state-of-the-art approaches focus on reducing the experimental and/or computational costs of generating the training data but ignore the inherent and significant cost of developing qualitatively accurate physics-based models for new processes . This paper proposes a transfer learning based approach to address this issue, in which a ML model is trained on a large amount of computationally inexpensive data from a physics-based process model (source) and then fine-tuned on a smaller amount of costly experimental data (target). The novelty lies in pushing the boundaries of the qualitative accuracy demanded of the source model, which is assumed to be high in the literature, and is the root of the high model development cost. Our approach is evaluated for modeling the printed line width in Fused Filament Fabrication. Despite extreme functional and quantitative inaccuracies in the source our approach reduces the model development cost by years, experimental cost by 56-76%, computational cost by orders of magnitude, and prediction error by 16-24%.

翻訳日:2023-05-02 16:20:49 公開日:2023-04-29

# 教師なし修復学習におけるsparsity-aware optimal transport

Sparsity-Aware Optimal Transport for Unsupervised Restoration Learning ( http://arxiv.org/abs/2305.00273v1 )

ライセンス: Link先を確認

Fei Wen, Wei Wang and Wenxian Yu

(参考訳) 近年の研究では,教師なし復元学習問題を最適輸送(ot)問題として最適に定式化することが可能であり,教師付き手法の性能に接近するタスクに有望な性能が示された。しかし、超高分解能、デラリニング、デハジングといった複雑な修復作業における最先端の監督手法の遅れは依然として顕著である。本稿では,otフレームワークの劣化のスパースを生かして,これらのタスクにおける性能を大幅に向上させる。まず,これらの課題の劣化が周波数領域において極めて少ないという観察を開示し,教師なし回復学習のためのsparsity-aware optimal transport (sot) 基準を提案する。さらに,スパーシリティの活用が修復のための逆写像の発見におけるあいまいさの軽減に役立つことを示す分析例を示す。実世界の超解像、デラリニング、デハジングの実験では、SOTがそれぞれ約2.6dB、2.7dB、1.3dBでOTのPSNRを改善できることが示されている。特に3つのタスクにおいて、SOTは既存の教師なし手法を著しく上回り、最先端の教師付き手法の性能にアプローチする。

Recent studies show that, without any prior model, the unsupervised restoration learning problem can be optimally formulated as an optimal transport (OT) problem, which has shown promising performance on denoising tasks to approach the performance of supervised methods. However, it still significantly lags behind state-of-the-art supervised methods on complex restoration tasks such as super-resolution, deraining, and dehazing. In this paper, we exploit the sparsity of degradation in the OT framework to significantly boost its performance on these tasks. First, we disclose an observation that the degradation in these tasks is quite sparse in the frequency domain, and then propose a sparsity-aware optimal transport (SOT) criterion for unsupervised restoration learning. Further, we provide an analytic example to illustrate that exploiting the sparsity helps to reduce the ambiguity in finding an inverse map for restoration. Experiments on real-world super-resolution, deraining, and dehazing demonstrate that SOT can improve the PSNR of OT by about 2.6 dB, 2.7 dB and 1.3 dB, respectively, while achieving the best perception scores among the compared supervised and unsupervised methods. Particularly, on the three tasks, SOT significantly outperforms existing unsupervised methods and approaches the performance of state-of-the-art supervised methods.

翻訳日:2023-05-02 16:13:24 公開日:2023-04-29

# 時間分割開システムの量子速度限界

Quantum Speed Limit for Time-Fractional Open Systems ( http://arxiv.org/abs/2305.00270v1 )

ライセンス: Link先を確認

Dongmei Wei, Hailing Liu, Yongmei Li, Fei Gao, Sujuan Qin, Qiaoyan Wen

(参考訳) Time-Fractional Schr\"odinger Equation (TFSE)は、その散逸環境と相互作用する量子系を研究するためによく調整されている。量子速度制限(quantum speed limit, qsl)は、量子系が2つの状態の間を進化させるのに必要な最短時間であり、量子過程の最大速度を評価する上で重要である。本研究では,tfse を基本開放量子系モデル,すなわち共振散逸性jaynes-cummings (jc) モデルに適用し,システムのqsl時間を調べることにより,一般時間分解型単一量子ビットオープンシステムに対して正確に解く。環境のマルコフ的でない記憶効果は時間-屈折量子進化を加速し、結果としてQSL時間が小さくなることを示した。さらに、与えられた駆動時間における時間分割開量子系、すなわち分数次数、結合強度、光子数の間のトレードオフの加速進化の条件を光に導く。特に、長い駆動時間に対する分数順序を調整することにより、時間差分開量子系の非マルコフ散逸ダイナミクスを演算する方法について述べる。

The Time-Fractional Schr\"odinger Equation (TFSE) is well-adjusted to study a quantum system interacting with its dissipative environment. The Quantum Speed Limit (QSL) time captures the shortest time required for a quantum system to evolve between two states, which is significant for evaluating the maximum speed in quantum processes. In this work, we solve exactly for a generic time-fractional single qubit open system by applying the TFSE to a basic open quantum system model, namely the resonant dissipative Jaynes-Cummings (JC) model, and investigate the QSL time for the system. It is shown that the non-Markovian memory effects of the environment can accelerate the time-fractional quantum evolution, thus resulting in a smaller QSL time. Additionally, the condition for the acceleration evolution of the time-fractional open quantum system at a given driving time, i.e., a tradeoff among the fractional order, coupling strength, and photon number, is brought to light. In particular, a method to manipulate the non-Markovian dissipative dynamics of a time-fractional open quantum system by adjusting the fractional order for a long driving time is presented.

翻訳日:2023-05-02 16:12:59 公開日:2023-04-29

# 超微細構造決定のための$^{85}$Rb 4$D_{3/2}$状態の分光

Spectroscopy of the $^{85}$Rb 4$D_{3/2}$ state for hyperfine-structure determination ( http://arxiv.org/abs/2305.00265v1 )

ライセンス: Link先を確認

Alisher Duspayev and Georg Raithel

(参考訳) 我々は、2光子5$S_{1/2}\rightarrow$4$D_{3/2}$遷移を用いて、$^{85}$Rb 4$D_{3/2}$状態の超微細構造定数の測定を報告する。超微細遷移は、795nmレーザー周波数の関数として低温原子試料を介して低出力の795nm下段レーザー光の透過を測定し、上段1476nmレーザーの周波数を固定する。 4つの超微粒子成分は、記録された透過スペクトルにおいてよく分解される。 acシフトは慎重に考慮される。測定されたライン位置をゼロレーザーパワーに外挿することにより、フィールドフリーの超微細ライン位置を求める。磁気双極子と電気四極子定数である$A$と$B$はそれぞれ7.419(35)~MHzと4.19(19)~MHzと決定される。結果は,先行研究の文脈で評価される。 Rb 4$D_J$状態のRydberg-atom-physics,precision-metrology,quantum-technology への応用について論じる。

We report a measurement of the hyperfine-structure constants of the $^{85}$Rb 4$D_{3/2}$ state using a two-photon 5$S_{1/2}\rightarrow$4$D_{3/2}$ transition. The hyperfine transitions are probed by measuring the transmission of the low-power 795-nm lower-stage laser beam through a cold-atom sample as a function of 795-nm laser frequency, with the frequency of the upper-stage 1476-nm laser fixed. All 4 hyperfine components are well-resolved in the recorded transmission spectra. AC shifts are carefully considered. The field-free hyperfine line positions are obtained by extrapolating measured line positions to zero laser power. The magnetic-dipole and electric-quadrupole constants, $A$ and $B$, are determined from the hyperfine intervals to be 7.419(35)~MHz and 4.19(19)~MHz, respectively. The results are evaluated in context with previous works. Possible uses of the Rb 4$D_J$ states in Rydberg-atom-physics, precision-metrology and quantum-technology applications are discussed.

翻訳日:2023-05-02 16:12:37 公開日:2023-04-29

# 画像線分節の検出と記述に関する総合的レビュー:分類学,比較,課題

A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and Challenges ( http://arxiv.org/abs/2305.00264v1 )

ライセンス: Link先を確認

Xinyu Lin, Yingjie Zhou, Yipeng Liu, and Ce Zhu

(参考訳) ラインセグメントの検出と記述は多くの視覚タスクの基礎となった。多くの研究は線分の検出と記述を目的としているが、包括的なレビューは欠如しており、その進捗を妨げている。本研究は,二次元画像線セグメントの検出と記述に関する関連研究を包括的にレビューし,研究者に全体像と深い理解を与えることにより,このギャップを埋めている。それらの機構に基づき,線分検出と記述のための2つの分類法を提案し,これらの研究の紹介,解析,要約を行い,研究者が迅速かつ広範囲に学べるようにした。主要な問題、中核的な考え、既存手法の利点とデメリット、そして各カテゴリの潜在的な応用について分析・要約し、これまで未知の発見を含む。既存の方法の課題とそれを解決するための関連する洞察は、研究者を刺激するためにも提供される。さらに、いくつかの最先端の線分検出および記述アルゴリズムをバイアスなく評価し、評価コードを公開する。理論的解析は、実験結果と相まって、研究者が意図した視覚応用に最適な方法を選択するためのガイドとなる。最後に、この研究は、この分野の研究者からより多くの注目を集めるために、潜在的に興味深い将来の研究方向についての洞察を提供する。

Detection and description of line segments lay the basis for numerous vision tasks. Although many studies have aimed to detect and describe line segments, a comprehensive review is lacking, obstructing their progress. This study fills the gap by comprehensively reviewing related studies on detecting and describing two-dimensional image line segments to provide researchers with an overall picture and deep understanding. Based on their mechanisms, two taxonomies for line segment detection and description are presented to introduce, analyze, and summarize these studies, facilitating researchers to learn about them quickly and extensively. The key issues, core ideas, advantages and disadvantages of existing methods, and their potential applications for each category are analyzed and summarized, including previously unknown findings. The challenges in existing methods and corresponding insights for potentially solving them are also provided to inspire researchers. In addition, some state-of-the-art line segment detection and description algorithms are evaluated without bias, and the evaluation code will be publicly available. The theoretical analysis, coupled with the experimental results, can guide researchers in selecting the best method for their intended vision applications. Finally, this study provides insights for potentially interesting future research directions to attract more attention from researchers to this field.

翻訳日:2023-05-02 16:12:14 公開日:2023-04-29

# Voigt系光ポンピング磁気センサの分光マイクロ波分光

Stroboscopic microwave spectroscopy of Voigt based optically pumped magnetometers ( http://arxiv.org/abs/2305.00263v1 )

ライセンス: Link先を確認

Hans Marin Florez, Tadas Pyragius and Thomas Fernholz

(参考訳) 高周波式光ポンピング磁気センサの分光マイクロ波分光結果について報告する。高周波装束原子と同期パルスマイクロ波場との相互作用と、Voigt効果に基づく光プローブにより、部分状態トモグラフィを行い、状態形成プロセスの効率を評価することができる。このシステムを理論的に記述するために,フロッケ展開を用いた密度行列の動的方程式を解く。我々の理論的結果は、幅広いパラメータとポンプ条件に関する実験結果とよく一致している。最後に、この研究で示された理論的および実験的分析は、複雑な状態準備技術を含む他のシステムに一般化することができる。

We present results of stroboscopic microwave spectroscopy of radio-frequency dressed optically pumped magnetometer. Interaction between radio-frequency dressed atoms and a synchronously pulsed microwave field followed by Voigt effect-based optical probing allows us to perform partial state tomography and assess the efficiency of the state preparation process. To theoretically describe the system, we solve the dynamical equation of the density matrix employing Floquet expansion. Our theoretical results are in good agreement with experimental measurements over a wide range of parameters and pumping conditions. Finally, the theoretical and experimental analysis presented in this work can be generalised to other systems involving complex state preparation techniques.

翻訳日:2023-05-02 16:11:54 公開日:2023-04-29

# 特殊トークンとターンレベルの注意による階層的対話理解

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention ( http://arxiv.org/abs/2305.00262v1 )

ライセンス: Link先を確認

Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You

(参考訳) 標準的なテキストと比較すると、各ターンの動的および予期せぬ意味変化として、機械にとって対話を理解することはより困難である。このような一貫性のない意味論をモデル化するために,階層的対話理解モデルhidialogを提案する。具体的には,まず対話に複数の特殊トークンを挿入し,ターンレベルの注意を階層的に学習する。そして、学習された埋め込みを磨くために異種グラフモジュールを利用する。我々は,対話関係抽出,対話感情認識,対話行為分類など,対話理解タスクにおけるモデルの評価を行った。その結果, 上述の3つのタスクすべてにおいて, 最新のパフォーマンスを実現するための簡単な手法が得られた。ソースコードはすべてhttps://github.com/ShawX825/HiDialog.comで公開されています。

Compared with standard text, understanding dialogue is more challenging for machines as the dynamic and unexpected semantic changes in each turn. To model such inconsistent semantics, we propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog. Specifically, we first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically. Then, a heterogeneous graph module is leveraged to polish the learned embeddings. We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification. Results show that our simple approach achieves state-of-the-art performance on all three tasks above. All our source code is publicly available at https://github.com/ShawX825/HiDialog.

翻訳日:2023-05-02 16:11:45 公開日:2023-04-29

# CME適応時間予測のためのアンサンブル学習

Ensemble Learning for CME Arrival Time Prediction ( http://arxiv.org/abs/2305.00258v1 )

ライセンス: Link先を確認

Khalid A. Alobaid, Jason T. L. Wang

(参考訳) 太陽は常に放射とプラズマをヘリウム圏に放出する。散発的に太陽はフレアやコロナ質量放出(cmes)のような太陽の噴火を起こす。 CMEは大量の質量と磁束を輸送する。地球指向のCMEは、人間のシステムに深刻な影響をもたらす可能性がある。電力網、パイプライン、衛星、通信を破壊できる。したがって、人体システムへのダメージを最小限に抑えるためには、正確な監視と予測が重要である。本研究では,太陽から地球へのCMEの到着時刻を予測するため,CMETNetというアンサンブル学習手法を提案する。我々は,1996年から2021年までの2つの太陽周期,#23と#24の噴火事象を,合計363個の地球効率CMEを用いて収集・統合した。予測に使用されるデータには、SOHO/LASCO C2コロナグラフから得られたCMEの特徴、太陽風パラメータ、CME画像が含まれる。本学習フレームワークは,数値データ解析のための回帰アルゴリズムと,画像処理のための畳み込みニューラルネットワークから構成される。実験の結果,CMETNetはPearsonの製品モーメント相関係数0.83,絶対誤差9.75時間で,既存の機械学習手法よりも優れた性能を示した。

The Sun constantly releases radiation and plasma into the heliosphere. Sporadically, the Sun launches solar eruptions such as flares and coronal mass ejections (CMEs). CMEs carry away a huge amount of mass and magnetic flux with them. An Earth-directed CME can cause serious consequences to the human system. It can destroy power grids/pipelines, satellites, and communications. Therefore, accurately monitoring and predicting CMEs is important to minimize damages to the human system. In this study we propose an ensemble learning approach, named CMETNet, for predicting the arrival time of CMEs from the Sun to the Earth. We collect and integrate eruptive events from two solar cycles, #23 and #24, from 1996 to 2021 with a total of 363 geoeffective CMEs. The data used for making predictions include CME features, solar wind parameters and CME images obtained from the SOHO/LASCO C2 coronagraph. Our ensemble learning framework comprises regression algorithms for numerical data analysis and a convolutional neural network for image processing. Experimental results show that CMETNet performs better than existing machine learning methods reported in the literature, with a Pearson product-moment correlation coefficient of 0.83 and a mean absolute error of 9.75 hours.

翻訳日:2023-05-02 16:11:33 公開日:2023-04-29

# 深層学習法を用いたmri画像からの脳腫瘍分割

Brain Tumor Segmentation from MRI Images using Deep Learning Techniques ( http://arxiv.org/abs/2305.00257v1 )

ライセンス: Link先を確認

Ayan Gupta, Mayank Dixit, Vipul Kumar Mishra, Attulya Singh, Atul Dayal

(参考訳) 良性であれ悪性であれ、脳腫瘍は生命を脅かす可能性があり、病気のタイプ、起源、位置を特定するのに苦労する必要がある。医療専門家による手動セグメンテーションは時間のかかる作業であり、高い精度でプロセスを早めるテクノロジーの関与を訴える。医用画像セグメンテーションの目的で,脳腫瘍セグメンテーションに用いるデータセットにおいて一貫した結果を示す有能な深層学習モデルを検査,同定した。本研究では, 3種類の脳腫瘍, viz. meningioma, glioma, 下垂体腫瘍233例のti強調画像3064例について, mri画像データセットを用いて検討した。データセットファイルは、様々なバックボーンを持つU-Net & Attention U-Net、Deep Residual U-Net、ResUnet++、Recurrent Residual U-Netといった、よく知られたイメージセグメンテーションのディープラーニングモデルの実装とトレーニングを利用する方法論に順応する前に、変換および事前処理された。様々なパラメーターで人間の脳腫瘍の分類とセグメンテーションに関する文献のレビューから入手した実験結果から,Adamオプティマイザを用いた再帰的残差U-Netは平均差0.8665に達し,他の最先端ディープラーニングモデルよりも優れていることがわかった。視覚的な発見はまた、MRIスキャンによる脳腫瘍のセグメンテーションの顕著な結果を示し、医師がMRIスキャンから自動的に脳がんを抽出し、人類に役立てるためのアルゴリズムがいかに有用かを示している。

A brain tumor, whether benign or malignant, can potentially be life threatening and requires painstaking efforts in order to identify the type, origin and location, let alone cure one. Manual segmentation by medical specialists can be time-consuming, which calls out for the involvement of technology to hasten the process with high accuracy. For the purpose of medical image segmentation, we inspected and identified the capable deep learning model, which shows consistent results in the dataset used for brain tumor segmentation. In this study, a public MRI imaging dataset contains 3064 TI-weighted images from 233 patients with three variants of brain tumor, viz. meningioma, glioma, and pituitary tumor. The dataset files were converted and preprocessed before indulging into the methodology which employs implementation and training of some well-known image segmentation deep learning models like U-Net & Attention U-Net with various backbones, Deep Residual U-Net, ResUnet++ and Recurrent Residual U-Net. with varying parameters, acquired from our review of the literature related to human brain tumor classification and segmentation. The experimental findings showed that among all the applied approaches, the recurrent residual U-Net which uses Adam optimizer reaches a Mean Intersection Over Union of 0.8665 and outperforms other compared state-of-the-art deep learning models. The visual findings also show the remarkable results of the brain tumor segmentation from MRI scans and demonstrates how useful the algorithm will be for physicians to extract the brain cancers automatically from MRI scans and serve humanity.

翻訳日:2023-05-02 16:11:14 公開日:2023-04-29

# 駆動量子系のクリロフ構成と複雑性

Krylov construction and complexity for driven quantum systems ( http://arxiv.org/abs/2305.00256v1 )

ライセンス: Link先を確認

Amin A. Nizami and Ankit W. Shrestha

(参考訳) クリロフ複雑性は作用素の成長と量子カオスの研究と関連する重要な力学量であり、最近では様々な時間に依存しない系で多くの研究がなされている。時間依存型(駆動型)量子システムにおけるK-複素性の研究を開始する。周期時間依存(フローク)系では、クリロフ構成を行う自然な方法を与え、そのような系に対して(状態と演算子)k-複素性を定義する。特にトーラスとハーパー写像上の量子キックロータに着目し,ランチョス様係数の時間依存性と,弱結合状態と強結合状態とのカップリング定数とのk-複素性について詳細な数値的研究を行った。

Krylov complexity is an important dynamical quantity with relevance to the study of operator growth and quantum chaos and has recently been much studied for various time-independent systems. We initiate the study of K-complexity in time-dependent (driven) quantum systems. For periodic time-dependent (Floquet) systems, we give a natural method for doing the Krylov construction and then define (state and operator) K-complexity for such systems. Focusing on kicked systems, in particular the quantum kicked rotor on a torus and the Harper map, we undertake a detailed numerical study of the time dependence of Lanczos-like coefficients as well as of the K-complexity with the coupling constant interpolating between the weak and strong coupling regime.

翻訳日:2023-05-02 16:10:41 公開日:2023-04-29

# 半無限拘束マルコフ決定過程と効率的な強化学習

Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning ( http://arxiv.org/abs/2305.00254v1 )

ライセンス: Link先を確認

Liangyu Zhang, Yang Peng, Wenhao Yang and Zhihua Zhang

(参考訳) 本稿では,制約付きマルコフ決定過程 (CMDP) の新たな一般化を提案し,これを<emph{semi-infinitely constrained Markov decision process} (SICMDP) と呼ぶ。特に、通常のCMDPの場合のように、有限個の制約ではなく制約の連続性を考える。また,SI-CRL と SI-CPO の2つの強化学習アルゴリズムを考案した。 SI-CRLはモデルに基づく強化学習アルゴリズムである。遷移モデルを推定すると、まず強化学習問題を線形半無限プログラミング(LSIP)問題に変換し、次にLSIP文学における二重交換法を用いて解決する。 SI-CPOはポリシー最適化アルゴリズムである。協調確率近似アプローチからアイデアを借用し,政策パラメータの代替更新を行い,報酬を最大化し,コストを最小化する。我々の知る限り、我々は、制約付き強化学習問題を解決するために、半無限プログラミング(SIP)のツールを最初に適用しました。 SI-CRL と SI-CPO の理論的解析を行い,それらの反復複雑性とサンプル複雑性を同定した。また,sicmdpモデルを説明するために広範な数値実験を行い,最新の深層強化学習手法を用いて,提案手法が複雑な逐次的意思決定課題を解決できることを実証した。

We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CRL is a model-based reinforcement learning algorithm. Given an estimate of the transition model, we first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. SI-CPO is a policy optimization algorithm. Borrowing the ideas from the cooperative stochastic approximation approach, we make alternative updates to the policy parameters to maximize the reward or minimize the cost. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems. We present theoretical analysis for SI-CRL and SI-CPO, identifying their iteration complexity and sample complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and demonstrate that our proposed algorithms are able to solve complex sequential decision-making tasks leveraging modern deep reinforcement learning techniques.

翻訳日:2023-05-02 16:10:27 公開日:2023-04-29

# 模擬学習のための結合フローアプローチ

A Coupled Flow Approach to Imitation Learning ( http://arxiv.org/abs/2305.00303v1 )

ライセンス: Link先を確認

Gideon Freund, Elad Sarafian, Sarit Kraus

(参考訳) 強化学習と模倣学習において、中心的重要性の対象は政策によって引き起こされる状態分布である。この定理は政策勾配定理において重要な役割を担っており、関連する状態-作用分布とともにそれを参照している。その重要性にもかかわらず、状態分布は明示的にモデル化されるのではなく、主に間接的に理論的に議論される。適切な密度推定ツールがないのは理由です。本研究では,上記の分布に対する正規化フローベースモデルの応用について検討する。特に、分布マッチングに基づく模倣学習において、KL(Kulback-Leibler)発散のDonsker-Varadhan表現の最適点を介して結合された一対の流れを用いる。我々のアルゴリズムであるCFIL(Coupled Flow Imitation Learning)は,1つの専門的軌道を持つベンチマークタスクにおける最先端のパフォーマンスを達成し,サブサンプルとステートのみのルールを含むさまざまな設定に自然に拡張する。

In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it--along with the related state-action distribution--can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow-based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.

翻訳日:2023-05-02 16:04:25 公開日:2023-04-29

# イデオグラフィーのパズルの認知的記述

A Cognitive Account of the Puzzle of Ideography ( http://arxiv.org/abs/2305.00296v1 )

ライセンス: Link先を確認

Xerxes D. Arsiwalla

(参考訳) モリンの「イデノグラフィーのパズル」の解説記事において、モリンの標準化を補完するイデオログラフィーのパズルの認知的記述を新たに発表した。音声言語の効率的な標準化は、認知表現のチャンキングと組み合わさったモダリティ効果に現象論的に起因し、さらに多感的な統合と注意のシリアライズされた性質によって支援される。これらの認知メカニズムは、汎用コミュニケーションにおいて言語がグラフィックコードを支配している理由を説明する上で重要である。

In this commentary article to 'The Puzzle of Ideography' by Morin, we put forth a new cognitive account of the puzzle of ideography, that complements the standardization account of Morin. Efficient standardization of spoken language is phenomenologically attributed to a modality effect coupled with chunking of cognitive representations, further aided by multi-sensory integration and the serialized nature of attention. These cognitive mechanisms are crucial for explaining why languages dominate graphic codes for general-purpose human communication.

翻訳日:2023-05-02 16:04:06 公開日:2023-04-29

# フローダイナミクス最適化深層学習法を用いた網膜眼底画像の分類の改善

Improving Classification of Retinal Fundus Image Using Flow Dynamics Optimized Deep Learning Methods ( http://arxiv.org/abs/2305.00294v1 )

ライセンス: Link先を確認

V. Banupriya, S. Anusuya

(参考訳) 糖尿病網膜症(英: diabetes retinopathy、dr)は、網膜に存在する血管ネットワークを損傷する糖尿病の障害である。これは糖尿病を患っている場合、被験者の視覚を危険にさらす可能性がある。経験豊富な臨床医は、疾患の特定に使用する画像中の腫瘍を識別する必要があるため、色眼底写真を用いてdr診断を行うのに時間がかかる。 DRの自動検出は非常に難しい作業である。畳み込みニューラルネットワーク(cnn)は、現在の状況において、特に手作りや機能的手法と比較して、画像の分類に非常に有効である。高い結果を保証するため、研究者たちは基礎画像の特徴を決定するための最先端のcnnモデルも提案した。 cnn出力の特徴は,提案システムにおける機械学習の各種分類器に応用された。このモデルは後に異なる形態の深層学習法と視覚幾何学群(vgg)ネットワークを用いて評価された。これは、一般的なKAGGLEデータセットのイメージを使用することで実現された。ここでは, 網膜眼底像検出のためのファンネットとともに, 河川形成ダイナミクス (rfd) アルゴリズムが提案されている。調査の結果、アプローチは代替アプローチよりも優れていることが示された。

Diabetic Retinopathy (DR) refers to a barrier that takes place in diabetes mellitus damaging the blood vessel network present in the retina. This may endanger the subjects' vision if they have diabetes. It can take some time to perform a DR diagnosis using color fundus pictures because experienced clinicians are required to identify the tumors in the imagery used to identify the illness. Automated detection of the DR can be an extremely challenging task. Convolutional Neural Networks (CNN) are also highly effective at classifying images when applied in the present situation, particularly compared to the handmade and functionality methods employed. In order to guarantee high results, the researchers also suggested a cutting-edge CNN model that might determine the characteristics of the fundus images. The features of the CNN output were employed in various classifiers of machine learning for the proposed system. This model was later evaluated using different forms of deep learning methods and Visual Geometry Group (VGG) networks). It was done by employing the images from a generic KAGGLE dataset. Here, the River Formation Dynamics (RFD) algorithm proposed along with the FUNDNET to detect retinal fundus images has been employed. The investigation's findings demonstrated that the approach performed better than alternative approaches.

翻訳日:2023-05-02 16:03:54 公開日:2023-04-29

# Polyp-SAM:ポリプセグメンテーションのためのトランスファーSAM

Polyp-SAM: Transfer SAM for Polyp Segmentation ( http://arxiv.org/abs/2305.00293v1 )

ライセンス: Link先を確認

Yuheng Li, Mingzhe Hu, and Xiaofeng Yang

(参考訳) 大腸ポリープは大腸癌の重要な前駆体と考えられている。大腸ポリープの自動分画は大腸癌の誤診を著しく低減し、医師の診断効率を向上させる。ポリープセグメンテーションには多くの方法が提案されているが,大腸内視鏡データを限定した大規模セグメンテーションネットワークの訓練は課題である。近年,Segment Anything Model (SAM) は,自然画像と医用画像のセグメンテーションにおいて注目されている。 SAMはいくつかの画像ベンチマークにおいて優れた性能を示しており、医用画像のセグメンテーションに大きな可能性を示している。本研究では,ポリプセグメンテーションのための微調整samモデルであるpoly-samを提案し,その性能を最先端ポリプセグメンテーションモデルと比較する。 samの2つの転送学習戦略をエンコーダを微調整することなく比較した。 5つのパブリックデータセットで評価され、2つのデータセットで最先端のパフォーマンスを達成し、3つのデータセットで印象的なパフォーマンスを実現しました。本研究は,SAMを医用画像分割タスクに適用する大きな可能性を示す。この記事では、コードとモデルの重み付けを次のようにリリースする予定です。

Colon polyps are considered important precursors for colorectal cancer. Automatic segmentation of colon polyps can significantly reduce the misdiagnosis of colon cancer and improve physician annotation efficiency. While many methods have been proposed for polyp segmentation, training large-scale segmentation networks with limited colonoscopy data remains a challenge. Recently, the Segment Anything Model (SAM) has recently gained much attention in both natural and medical image segmentation. SAM demonstrates superior performance in several image benchmarks and therefore shows great potential for medical image segmentation. In this study, we propose Poly-SAM, a finetuned SAM model for polyp segmentation, and compare its performance to several state-of-the-art polyp segmentation models. We also compare two transfer learning strategies of SAM with and without finetuning its encoders. Evaluated on five public datasets, our Polyp-SAM achieves state-of-the-art performance on two datasets and impressive performance on three datasets, with dice scores all above 88%. This study demonstrates the great potential of adapting SAM to medical image segmentation tasks. We plan to release the code and model weights for this paper at: https://github.com/ricklisz/Polyp-SAM.

翻訳日:2023-05-02 16:03:36 公開日:2023-04-29

# 生成aiに関する学生の声 : 高等教育における認識・利益・課題

Students' Voices on Generative AI: Perceptions, Benefits, and Challenges in Higher Education ( http://arxiv.org/abs/2305.00290v1 )

ライセンス: Link先を確認

Cecilia Ka Yuk Chan and Wenjie Hu

(参考訳) 本研究は、高等教育におけるChatGPTのような生成AI(GenAI)技術に対する大学生の認識について、親しみ、取り組みへの意欲、潜在的な利益と課題、効果的な統合に焦点を当てたものである。香港の様々な分野の大学生・大学院生399名を対象に調査を行ったところ、教育・学習におけるGenAIに対する概して肯定的な態度を示した。学生は、パーソナライズされた学習支援、執筆とブレインストーミング支援、研究と分析機能の可能性を認識した。しかし, 正確性, プライバシ, 倫理的問題, 個人の発達, キャリアの見通し, 社会的価値への影響についても懸念が表明された。 John Biggs氏の3Pモデルによると、学生の知覚は学習のアプローチや成果に大きな影響を与えている。学生の認識を理解することで、教育者や政策立案者はGenAI技術をニーズや関心に対処し、効果的な学習成果を促進することができる。本研究から得られた知見は、GenAI技術の高等教育への統合に関する政策開発に影響を及ぼす。学生の認識を理解し、その懸念に対処することで、政策立案者は、GenAIツールの責任と効果的な実装のための、しっかりとインフォームドされたガイドラインと戦略を作成し、最終的に高等教育における教育と学習の経験を向上することができる。

This study explores university students' perceptions of generative AI (GenAI) technologies, such as ChatGPT, in higher education, focusing on familiarity, their willingness to engage, potential benefits and challenges, and effective integration. A survey of 399 undergraduate and postgraduate students from various disciplines in Hong Kong revealed a generally positive attitude towards GenAI in teaching and learning. Students recognized the potential for personalized learning support, writing and brainstorming assistance, and research and analysis capabilities. However, concerns about accuracy, privacy, ethical issues, and the impact on personal development, career prospects, and societal values were also expressed. According to John Biggs' 3P model, student perceptions significantly influence learning approaches and outcomes. By understanding students' perceptions, educators and policymakers can tailor GenAI technologies to address needs and concerns while promoting effective learning outcomes. Insights from this study can inform policy development around the integration of GenAI technologies into higher education. By understanding students' perceptions and addressing their concerns, policymakers can create well-informed guidelines and strategies for the responsible and effective implementation of GenAI tools, ultimately enhancing teaching and learning experiences in higher education.

翻訳日:2023-05-02 16:03:16 公開日:2023-04-29

# LiDAR点雲上のバンドル調整のための効率的な平面抽出手法

An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds ( http://arxiv.org/abs/2305.00287v1 )

ライセンス: Link先を確認

Zheng Liu and Fu Zhang

(参考訳) LiDARポイントクラウド上のバンドル調整(BA)は、複数のポーズを同時に最適化する能力により、ポイントクラウドの高精度でグローバルな一貫性をもたらすため、近年広く研究されている。しかし、LiDARバンドル調整の精度と速度は、LiDAR BAの点関連性を提供する平面抽出の品質に依存する。本研究では,lidarバンドル調整のためのポイントアソシエーションを提供するために特別に設計された,voxelに基づく平面抽出手法を提案する。まず、空間を一定サイズの複数のボクセルに分割し、その点が同じ平面上にあるかどうかに基づいて、octree構造を用いてこれらのルートボクセルを分割する。また,基本成分分析(pca)に基づく新しい平面決定法を考案し,各点を4つの偶数クォーターに分割し,それらの最小固有値と初期点クラウドの値を比較する。最後に,1つのボクセル内に存在する小さな平面が多すぎることを防止し,BAに必要な最適化時間を短縮する平面マージ手法を提案する。 HILTIを用いた実験結果から,提案手法が他の平面抽出法と比較して最適かつ最小の時間コストを実現することを示す。

Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.

翻訳日:2023-05-02 16:02:53 公開日:2023-04-29

# 自己監督型タスク表現学習に基づくメタ強化学習

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning ( http://arxiv.org/abs/2305.00286v1 )

ライセンス: Link先を確認

Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su, Chenguang Yang, Kai Huang and Alois Knoll

(参考訳) メタ強化学習により、人工知能は関連するトレーニングタスクから学び、最小限のインタラクションデータで新しいタスクに効率的に適応することができる。しかし、既存の研究の多くは、まだパラメトリックで定常的な狭いタスク分布に限られており、評価中に配布外タスクを考慮せず、適用を制限している。本稿では,この課題に対処するために,自己監督型タスク表現学習に基づくコンテキストベースメタ強化学習アルゴリズムMOSSを提案する。メタRLは、これまで探索されたことのない幅広い非パラメトリックタスク分布に拡張し、非定常および非分布タスクにおける最先端結果を達成する。具体的には、MOSSはタスク推論モジュールとポリシーモジュールで構成される。タスク表現にはガウス混合モデルを用いてパラメトリックおよび非パラメトリックタスクのバリエーションを模倣する。さらに、我々のオンライン適応戦略により、エージェントはタスク変更の第一の視点で反応し、非定常的なタスクに適用できる。 MoSSはまた、信頼性と堅牢なタスク表現の恩恵を受けるアウト・オブ・ディストリビューションタスクにおいて、強力な一般化ロバスト性を示す。ポリシーはオフ・ポリシーrlアルゴリズム上に構築されており、ネットワーク全体が完全にオフ・ポリシーに訓練され、高いサンプル効率が保証される。 MuJoCo と Meta-World のベンチマークでは、MoSS は漸近的性能、サンプル効率(3-50倍高速)、適応効率、広範囲で多様なタスク分布に対する一般化ロバスト性といった点において先行研究より優れていた。

Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt to new tasks efficiently with minimal interaction data. However, most existing research is still limited to narrow task distributions that are parametric and stationary, and does not consider out-of-distribution tasks during the evaluation, thus, restricting its application. In this paper, we propose MoSS, a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning to address this challenge. We extend meta-RL to broad non-parametric task distributions which have never been explored before, and also achieve state-of-the-art results in non-stationary and out-of-distribution tasks. Specifically, MoSS consists of a task inference module and a policy module. We utilize the Gaussian mixture model for task representation to imitate the parametric and non-parametric task variations. Additionally, our online adaptation strategy enables the agent to react at the first sight of a task change, thus being applicable in non-stationary tasks. MoSS also exhibits strong generalization robustness in out-of-distributions tasks which benefits from the reliable and robust task representation. The policy is built on top of an off-policy RL algorithm and the entire network is trained completely off-policy to ensure high sample efficiency. On MuJoCo and Meta-World benchmarks, MoSS outperforms prior works in terms of asymptotic performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization robustness on broad and diverse task distributions.

翻訳日:2023-05-02 16:02:32 公開日:2023-04-29

# NSLF-OL:リアルタイムインクリメンタル3次元再構成を伴うニューラルネットワークのオンライン学習

NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction ( http://arxiv.org/abs/2305.00282v1 )

ライセンス: Link先を確認

Yijun Yuan and Andreas Nuchter

(参考訳) 没入型新規ビュー生成はグラフィックス分野における重要な技術であり,近年,操作者による人間ロボットのインタラクションにも注目されている。しかし、関連するトレーニングは時間がかかるため、現在のテスト範囲は、主にオブジェクトのキャプチャにかかっている。これは、ロボットコミュニティにおける3次元再構築のための関連するモデルの使用を制限する。(1) ロボットは、通常、目に見えない、新しい方向の任意の予測を引き起こす表面への非常に小さな視野方向のみをキャプチャし、(2) リアルタイムアルゴリズムを必要とし、(3) ロボット探索のような成長するシーンで作業するためである。そこで本研究では,視線方向の小さな方向に対応できるニューラルサーフェス光場モデルを提案する。最近のエンコーディング技術を活用することで、モデルのトレーニングは非常に効率的です。さらに,大規模に成長するシーンに対して,各小領域を並列に学習する汎用フレームワークであるMANA(Multiple Asynchronous Neural Agents)を設計した。我々のモデルは、リアルタイムな3次元再構成の他に、シーケンシャルなデータストリームを共有入力として、ニューラルネットワーク光場(NSLF)をオンラインで学習する。オンライントレーニングに加えて,可視化のためのデータストリームの完了後にリアルタイムレンダリングも提供する。我々は,有名なrgbd屋内データセットを用いて実験を行い,実時間3次元再構成にモデルを埋め込むための高い柔軟性を示し,これらのシーンに対する高忠実度な映像合成を示す。コードはgithubで入手できる。

Immersive novel view generation is an important technology in the field of graphics and has recently also received attention for operator-based human-robot interaction. However, the involved training is time-consuming, and thus the current test scope is majorly on object capturing. This limits the usage of related models in the robotics community for 3D reconstruction since robots (1) usually only capture a very small range of view directions to surfaces that cause arbitrary predictions on unseen, novel direction, (2) requires real-time algorithms, and (3) work with growing scenes, e.g., in robotic exploration. The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions. Exploiting recent encoding techniques, the training of our model is highly efficient. In addition, we design Multiple Asynchronous Neural Agents (MANA), a universal framework to learn each small region in parallel for large-scale growing scenes. Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input. In addition to online training, our model also provides real-time rendering after completing the data stream for visualization. We implement experiments using well-known RGBD indoor datasets, showing the high flexibility to embed our model into real-time 3D reconstruction and demonstrating high-fidelity view synthesis for these scenes. The code is available on github.

翻訳日:2023-05-02 16:02:06 公開日:2023-04-29

# 大学教育・学習のための総合的AI政策教育フレームワーク

A Comprehensive AI Policy Education Framework for University Teaching and Learning ( http://arxiv.org/abs/2305.00280v1 )

ライセンス: Link先を確認

Cecilia Ka Yuk Chan

(参考訳) 本研究は,テキスト生成型AI技術の認識と意義を検証し,高等教育のためのAI教育政策を開発することを目的とする。香港大学で457人の学生と180人の教員とスタッフから,定量的・質的調査手法を用いて収集した。本研究は,大学教育と学習におけるAI統合の多面的影響に対処する,AIエコロジー教育政策枠組みを提案する。このフレームワークは、Pedagogical、Government、Operationalの3つの次元に分けられます。教育のディメンションはAIを使用して教育と学習の成果を改善することに集中し、ガバナンスディメンションはプライバシ、セキュリティ、説明責任に関する問題に取り組む。運用次元は、インフラストラクチャとトレーニングに関する問題に対処する。このフレームワークは、学術的な設定におけるai統合の意味を微妙に理解し、ステークホルダーが責任を認識し、適切な行動を取ることを保証する。

This study aims to develop an AI education policy for higher education by examining the perceptions and implications of text generative AI technologies. Data was collected from 457 students and 180 teachers and staff across various disciplines in Hong Kong universities, using both quantitative and qualitative research methods. Based on the findings, the study proposes an AI Ecological Education Policy Framework to address the multifaceted implications of AI integration in university teaching and learning. This framework is organized into three dimensions: Pedagogical, Governance, and Operational. The Pedagogical dimension concentrates on using AI to improve teaching and learning outcomes, while the Governance dimension tackles issues related to privacy, security, and accountability. The Operational dimension addresses matters concerning infrastructure and training. The framework fosters a nuanced understanding of the implications of AI integration in academic settings, ensuring that stakeholders are aware of their responsibilities and can take appropriate actions accordingly.

翻訳日:2023-05-02 16:01:43 公開日:2023-04-29

# segment anything model (sam)がガラスを満たす - 鏡や透明な物体は容易に検出できない

Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected ( http://arxiv.org/abs/2305.00278v1 )

ライセンス: Link先を確認

Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong

(参考訳) meta ai researchが先日リリースしたsam(segment anything model)は、10億以上のマスクからなる大規模なセグメンテーションデータセットでトレーニングされている。コンピュータビジョンの分野での基礎モデルとして、sam(segment anything model)は汎用オブジェクトセグメンテーションにおける印象的なパフォーマンスで注目を集めている。幅広いゼロショット転送タスクの強い能力にもかかわらず、SAMが透明なオブジェクトのような挑戦的なセットアップで何かを検出できるかどうかは不明だ。本研究では,鏡と透明物体の2つのガラス関連課題を実証的に評価する。 SAMは両方のシナリオでガラスの検出に失敗することが多く、様々な形態のガラスを持つ安全クリティカルな状況においてSAMをデプロイすることを懸念する。

Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unknown whether SAM can detect things in challenging setups like transparent objects. In this work, we perform an empirical evaluation of two glass-related challenging scenarios: mirror and transparent objects. We found that SAM often fails to detect the glass in both scenarios, which raises concern for deploying the SAM in safety-critical situations that have various forms of glass.

翻訳日:2023-05-02 16:01:26 公開日:2023-04-29

# fedgrad: 局所的究極的勾配検査によるフェデレーション学習におけるバックドア攻撃の軽減

FedGrad: Mitigating Backdoor Attacks in Federated Learning Through Local Ultimate Gradients Inspection ( http://arxiv.org/abs/2305.00328v1 )

ライセンス: Link先を確認

Thuy Dung Nguyen, Anh Duy Nguyen, Kok-Seng Wong, Huy Hieu Pham, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen

(参考訳) フェデレートラーニング(FL)により、複数のクライアントが機密データを妥協することなくモデルをトレーニングできる。 FLの分散した性質は、特に訓練中のバックドア挿入において敵の攻撃を受けやすい。近年,データ分布の尾部を利用したエッジケースバックドア攻撃が強力な攻撃として提案され,現状の防御の堅牢性保証の不足に関する疑問が提起されている。特に、既存の防御の多くは、エッジケースバックドア攻撃を排除できないか、バックドア防御の有効性とプライマリタスクにおける全体的なパフォーマンスのトレードオフに苦しむ。この課題に取り組むため,我々は,エッジケース攻撃を含む最先端バックドア攻撃に耐性を持ち,異種クライアントデータと多数の漏洩したクライアントにおいて効果的に実行する,新しいflバックドア防御手法であるfeedgradを提案する。 fedgradは、究極のレイヤの勾配を徹底的に分析し、疑わしいローカルアップデートを特定し、集約プロセスから削除する2層フィルタリングメカニズムとして設計されている。我々は、異なる攻撃シナリオ下でFedGradを評価し、最先端の防御機構を著しく上回ることを示す。特にfeedgradは、悪意のある参加者をほぼ100%正しく検出することができ、主要なタスクの精度を低下させることなく、バックドア効果(例えばバックドア精度が8%未満)を大幅に削減することができる。

Federated learning (FL) enables multiple clients to train a model without compromising sensitive data. The decentralized nature of FL makes it susceptible to adversarial attacks, especially backdoor insertion during training. Recently, the edge-case backdoor attack employing the tail of the data distribution has been proposed as a powerful one, raising questions about the shortfall in current defenses' robustness guarantees. Specifically, most existing defenses cannot eliminate edge-case backdoor attacks or suffer from a trade-off between backdoor-defending effectiveness and overall performance on the primary task. To tackle this challenge, we propose FedGrad, a novel backdoor-resistant defense for FL that is resistant to cutting-edge backdoor attacks, including the edge-case attack, and performs effectively under heterogeneous client data and a large number of compromised clients. FedGrad is designed as a two-layer filtering mechanism that thoroughly analyzes the ultimate layer's gradient to identify suspicious local updates and remove them from the aggregation process. We evaluate FedGrad under different attack scenarios and show that it significantly outperforms state-of-the-art defense mechanisms. Notably, FedGrad can almost 100% correctly detect the malicious participants, thus providing a significant reduction in the backdoor effect (e.g., backdoor accuracy is less than 8%) while not reducing the main accuracy on the primary task.

翻訳日:2023-05-02 15:55:14 公開日:2023-04-29

# スパース行列による加法ガウス過程の表現

Representing Additive Gaussian Processes by Sparse Matrices ( http://arxiv.org/abs/2305.00324v1 )

ライセンス: Link先を確認

Lu Zou, Haoyuan Chen, Liang Ding

(参考訳) 一般化された加法モデルの中で、加法的Mat\'ern Gaussian Processes (GPs) はスケーラブルな高次元問題において最もよく用いられる。彼らの加法構造と確率微分方程式表現のおかげで、バックフィッティングに基づくアルゴリズムは、後進平均の計算の時間的複雑さを$O(n^3)$から$O(n\log n)$ timeに減らすことができる。しかし、これらのアルゴリズムを一般化して後方分散と最大対数類似度を効率的に計算することは未解決の問題である。本研究では,加法的Mat\'ern GP に対して,後続平均だけでなく,後続分散,対数類似度,勾配もスパース行列とスパースベクトルのみを含む式で表すことができることを示した。これらのスパース式を用いてバックフィッティングに基づくアルゴリズムを一般化し,これら3つの関数の後方平均,後方分散,対数類似度,勾配を,すべてo(n \log n)$ timeで効率的に計算する方法を示す。我々はベイジアン最適化にアルゴリズムを適用し、ベイジアン最適化における後方更新、ハイパーパラメータ学習、および取得関数の計算とその勾配の効率的なアルゴリズムを提案する。後者を考えると、アルゴリズムは、取得関数とその勾配を一般の学習率で$o(n^2)$から$o(\log n)$に、小さな学習率で$o(1)$まで計算する時間の複雑さを大幅に削減します。

Among generalized additive models, additive Mat\'ern Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Mat\'ern GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.

翻訳日:2023-05-02 15:54:48 公開日:2023-04-29

# データマイニングアルゴリズムを活用してソースコードの変更を推奨

Leveraging Data Mining Algorithms to Recommend Source Code Changes ( http://arxiv.org/abs/2305.00323v1 )

ライセンス: Link先を確認

AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal, Latifa Guerrouj, Foutse Khomh

(参考訳) コンテキスト: 最近の研究では、開発者がソースコードの変更をガイドできる技術を開発するために、データマイニングが使われています。私たちの知る限りでは、データマイニング技術を調査したり、他のアルゴリズムやベースラインと比較したりする研究はほとんどありません。目的: 4つのデータマイニングアルゴリズムを用いてソースコード変更を推奨する自動手法を提案する。これらのアルゴリズムはソースコードの変更を推奨するだけでなく、実証的な評価も行います。方法: 調査には7つのオープンソースプロジェクトが含まれており、ファイルレベルでのソース変更履歴を抽出した。 4つの広範にわたるデータマイニングアルゴリズム \ie{} apriori, fp- growth, eclat, relimを用いて、アルゴリズムの性能(精度、リコール、f-測定)と実行時間の比較を行った。結果:Aprioriのような頻繁なパターンマイニングアルゴリズムが,他のアルゴリズムよりも優れている場合もあるが,研究対象のプロジェクトの性質や特性,特に変更履歴が原因で,すべてのソフトウェアプロジェクトにおいて一貫性が保たれているという実証的証拠が得られた。結論: aprioriは大規模プロジェクトに適しているが、eclatは小規模プロジェクトに適しているようだ。さらに、FP-Growthは実行時間の面で効率的なアプローチである。

Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.

翻訳日:2023-05-02 15:54:18 公開日:2023-04-29

# l_\infty$-recovery of nonlinear functions: a polynomial sample complexity bound for gaussian random fields

Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields ( http://arxiv.org/abs/2305.00322v1 )

ライセンス: Link先を確認

Kefan Dong, Tengyu Ma

(参考訳) 多くの機械学習アプリケーションは入力領域全体、すなわち$L_\infty$-errorという小さな最悪のエラーを持つ関数を学習する必要があるが、既存の理論では$L_2$-errorのような平均エラーの回復しか保証していない。多項式サンプルからの$L_\infty$-recoveryは、定数ノルム無限幅2層ニューラルネットのような一見単純な関数クラスでは不可能である。本稿では, 地中関数のランダム性を活用することにより, 予測不可能性を超えた初期ステップを提案する。ガウス確率場から引き出されたランダム接地構造関数に束縛された多項式サンプル複雑性を証明した。我々の重要な技術的ノベルティは、ガウス確率場からの函数の次数-$k$球面調和成分が、その$L_\infty$/$L_2$比が高い確率で$O(d \sqrt{\ln k})$で上界であることを証明することである。対照的に、次数-k$球面調和に対する最悪の場合の$l_\infty$/$l_2$比は、$\omega(\min\{d^{k/2},k^{d/2}\})$である。

Many machine learning applications require learning a function with a small worst-case error over the entire input domain, that is, the $L_\infty$-error, whereas most existing theoretical works only guarantee recovery in average errors such as the $L_2$-error. $L_\infty$-recovery from polynomial samples is even impossible for seemingly simple function classes such as constant-norm infinite-width two-layer neural nets. This paper makes some initial steps beyond the impossibility results by leveraging the randomness in the ground-truth functions. We prove a polynomial sample complexity bound for random ground-truth functions drawn from Gaussian random fields. Our key technical novelty is to prove that the degree-$k$ spherical harmonics components of a function from Gaussian random field cannot be spiky in that their $L_\infty$/$L_2$ ratios are upperbounded by $O(d \sqrt{\ln k})$ with high probability. In contrast, the worst-case $L_\infty$/$L_2$ ratio for degree-$k$ spherical harmonics is on the order of $\Omega(\min\{d^{k/2},k^{d/2}\})$.

翻訳日:2023-05-02 15:53:57 公開日:2023-04-29

# 腐敗したマルチモーダルデータを用いた実世界サーベイランスにおける近赤外人物認証の融合

Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data ( http://arxiv.org/abs/2305.00320v1 )

ライセンス: Link先を確認

Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger

(参考訳) V-I ReID (Visible-infrared person re-identification) は、RGBとIRカメラの分散ネットワーク上で撮影された個人の画像と一致する。 vモードとiモードの大きな違い、特に実世界の状況下では、画像がぼやけ、ノイズ、天気によって腐敗しているため、この課題は困難である。実際、最先端のV-I ReIDモデルは、破損したモダリティ情報を利用して高い精度を維持することはできない。本稿では,マルチモーダル画像に対するロバスト性を改善するために,モダリティ固有の知識を保持するマルチモーダル中間流融合(mmsf)と呼ばれるマルチモーダルv-iリードの効率的なモデルを提案する。さらに、3つの最先端の注意に基づくマルチモーダル融合モデルを用いて、v-i reidの破損したマルチモーダルデータに対処する。近年,現実シナリオにおけるReIDモデルの堅牢性を評価するための評価プロトコルが提案されている。しかしながら、これらのプロトコルはunimodal V設定に限られている。マルチモーダル(およびクロスモーダル)のV-I人物ReIDモデルの現実的な評価のために,VとIカメラが共位置(CL)であり、共位置(NCL)ではないシナリオを対象とした,新しい挑戦的破損データセットを提案する。最後に、マルチモーダル汚職に対するReIDモデルの堅牢性を改善するため、我々のMasking and Local Multimodal Data Augmentation(ML-MDA)戦略の利点を検討する。 SYSU-MM01, RegDB, および ThermalWORLD データセットのクリーンで破損したバージョンについて実験した結果, 実世界の運用条件下では良好に動作しそうなマルチモーダル V-I ReID モデルが得られた。特に,我々のML-MDAは,劣化したマルチモーダル画像を処理する際の高精度かつ堅牢性を維持するために,V-I人物ReIDシステムにとって重要な戦略である。また,マルチモーダル ReID モデル MMSF は,CL と NCL のカメラシナリオ下での全手法より優れている。

Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras. The task is challenging due to the significant differences between V and I modalities, especially under real-world conditions, where images are corrupted by, e.g, blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy. In this paper, we propose an efficient model for multimodal V-I ReID -- named Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific knowledge for improved robustness to corrupted multimodal images. In addition, three state-of-art attention-based multimodal fusion models are adapted to address corrupted multimodal data in V-I ReID, allowing to dynamically balance each modality importance. Recently, evaluation protocols have been proposed to assess the robustness of ReID models under challenging real-world scenarios. However, these protocols are limited to unimodal V settings. For realistic evaluation of multimodal (and cross-modal) V-I person ReID models, we propose new challenging corrupted datasets for scenarios where V and I cameras are co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to improve the robustness of ReID models to multimodal corruption. Our experiments on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD datasets indicate the multimodal V-I ReID models that are more likely to perform well in real-world operational conditions. In particular, our ML-MDA is an important strategy for a V-I person ReID system to sustain high accuracy and robustness when processing corrupted multimodal images. Also, our multimodal ReID model MMSF outperforms every method under CL and NCL camera scenarios.

翻訳日:2023-05-02 15:53:33 公開日:2023-04-29

# 制約付きメタ最適輸送による再ランク学習

Learning to Re-rank with Constrained Meta-Optimal Transport ( http://arxiv.org/abs/2305.00319v1 )

ライセンス: Link先を確認

Andr\'es Hoyos-Idrobo

(参考訳) 検索システムにおける多くの再ランク戦略は確率的ランク付けポリシーに依存しており、Douubly-Stochastic (DS) 行列として符号化されており、期待されるランク付けの制約を満たす。これらの戦略は一般的に2段階のパイプラインである: \emph{i} はオフラインのポリシー構築ステップであり、 \emph{ii} はオンラインのランキング手順のサンプリングである。再ランクポリシを構築するには、各発行されたクエリに対して、制約付き最適化問題を繰り返し解決する必要がある。したがって、新しい/未知のクエリの最適化手順を再計算する必要がある。サンプリングに関して、Birkhoff-von-Neumann分解(BvND)は、DSベースのポリシーからランキングを引き出すための好ましいアプローチである。しかし、BvNDはオンラインで計算するには高すぎる。したがって、サンプリングソリューションとしてのBvNDは、$N$クエリと$n$ドキュメントに対して$\gO(N\, n^2)$として成長できるため、メモリ消費である。本稿では,公正な確率的再配置政策を予測するための新しい,高速で軽量な方法,制約付きメタ最適輸送 (comot) を提案する。この方法は、学習からランクまでのシステムのようなクエリ間で共有されるニューラルネットワークに適合する。また、dsベースのポリシーによるオンラインサンプリングアプローチであるgumbel-matching sampling (gumms)についても紹介する。提案するパイプラインである CoMOT + GumMS は,単一のモデルのパラメータを格納するだけでよい。 FOE制約の下で、TREC 2019と2020のデータセットでパイプラインを実証的に評価しました。実験の結果,CoMOTは,クエリ毎の平均文書数に比例して,保持データに対する公正な再ランクポリシを急速に予測することがわかった。また、オリジナルの最適化ベースのポリシーと同様の公平さとランキングパフォーマンスを表示する。さらに,GumMS の有効性を実証的に検証し,DS ベースのポリシーを予測する。

Many re-ranking strategies in search systems rely on stochastic ranking policies, encoded as Doubly-Stochastic (DS) matrices, that satisfy desired ranking constraints in expectation, e.g., Fairness of Exposure (FOE). These strategies are generally two-stage pipelines: \emph{i)} an offline re-ranking policy construction step and \emph{ii)} an online sampling of rankings step. Building a re-ranking policy requires repeatedly solving a constrained optimization problem, one for each issued query. Thus, it is necessary to recompute the optimization procedure for any new/unseen query. Regarding sampling, the Birkhoff-von-Neumann decomposition (BvND) is the favored approach to draw rankings from any DS-based policy. However, the BvND is too costly to compute online. Hence, the BvND as a sampling solution is memory-consuming as it can grow as $\gO(N\, n^2)$ for $N$ queries and $n$ documents. This paper offers a novel, fast, lightweight way to predict fair stochastic re-ranking policies: Constrained Meta-Optimal Transport (CoMOT). This method fits a neural network shared across queries like a learning-to-rank system. We also introduce Gumbel-Matching Sampling (GumMS), an online sampling approach from DS-based policies. Our proposed pipeline, CoMOT + GumMS, only needs to store the parameters of a single model, and it generalizes to unseen queries. We empirically evaluated our pipeline on the TREC 2019 and 2020 datasets under FOE constraints. Our experiments show that CoMOT rapidly predicts fair re-ranking policies on held-out data, with a speed-up proportional to the average number of documents per query. It also displays fairness and ranking performance similar to the original optimization-based policy. Furthermore, we empirically validate the effectiveness of GumMS to approximate DS-based policies in expectation.

翻訳日:2023-05-02 15:52:55 公開日:2023-04-29

# 理想的な連続学習者:決して忘れないエージェント

The Ideal Continual Learner: An Agent That Never Forgets ( http://arxiv.org/abs/2305.00316v1 )

ライセンス: Link先を確認

Liangzu Peng, Paris V. Giampouras, Ren\'e Vidal

(参考訳) 連続学習の目的は、学習者に順次提示される複数の学習課題を解決するモデルを見つけることである。この設定における重要な課題は、新しいタスクを学ぶとき、学習者が前のタスクの解き方を忘れてしまう可能性があることである。この課題に対処するために,メモリベース,正規化ベース,拡張ベースなど,多くの実用的な手法が提案されている。しかし、これらの手法の厳密な理論的理解はいまだ解明されていない。本稿では,この理論と実践のギャップを埋めるために,建設による破滅的忘れ去を回避できるideal continual learninger(icl)と呼ばれる新しい連続学習フレームワークを提案する。 ICLは複数の確立された連続学習手法を統合し、これらの手法の強みと弱みに関する新たな理論的知見を提供する。また、リハーサルが一般化にどのように影響するかを理論的に定量化できるiclの一般化境界も導出する。最後に、ICLをいくつかの古典的主題と近代的関心の研究トピックに結びつけることで、歴史的発言をし、今後の方向性を刺激することができる。

The goal of continual learning is to find a model that solves multiple learning tasks which are presented sequentially to the learner. A key challenge in this setting is that the learner may forget how to solve a previous task when learning a new task, a phenomenon known as catastrophic forgetting. To address this challenge, many practical methods have been proposed, including memory-based, regularization-based, and expansion-based methods. However, a rigorous theoretical understanding of these methods remains elusive. This paper aims to bridge this gap between theory and practice by proposing a new continual learning framework called Ideal Continual Learner (ICL), which is guaranteed to avoid catastrophic forgetting by construction. We show that ICL unifies multiple well-established continual learning methods and gives new theoretical insights into the strengths and weaknesses of these methods. We also derive generalization bounds for ICL which allow us to theoretically quantify how rehearsal affects generalization. Finally, we connect ICL to several classic subjects and research topics of modern interest, which allows us to make historical remarks and inspire future directions.

翻訳日:2023-05-02 15:52:22 公開日:2023-04-29

# InfraDet3D:ロードサイドインフラストラクチャカメラとLiDARセンサを用いたマルチモード3Dオブジェクト検出

InfraDet3D: Multi-Modal 3D Object Detection based on Roadside Infrastructure Camera and LiDAR Sensors ( http://arxiv.org/abs/2305.00314v1 )

ライセンス: Link先を確認

Walter Zimmer, Joseph Birkner, Marcel Brucker, Huu Tung Nguyen, Stefan Petrovski, Bohan Wang, Alois C. Knoll

(参考訳) 現在のマルチモーダル物体検出手法は車両領域に焦点をあてており、知覚範囲と処理能力に制限がある。道路脇センサユニット(rsus)は、知覚システムのための新しいドメインを導入し、高度を利用して交通を観測する。ガントリーブリッジに搭載されたカメラとLiDARは認識範囲を増やし、トラフィックの完全なデジタル双対を生成する。本研究では,道路インフラストラクチャセンサのためのマルチモーダル3Dオブジェクト検出器であるInfraDet3Dを紹介する。初期核融合により2つのLiDARを融合させ、さらに単眼カメラからの検知を取り入れてロバスト性を高め、小さな物体を検出する。我々の単分子3D検出モジュールはHDマップを使って仮説を立て、最終的な知覚結果を改善する。知覚フレームワークは、ドイツのミュンヘンにあるa9テストストレッチの一部である現実世界の交差点にデプロイされる。いくつかのアブレーション研究と実験を行い、2台のLiDARを2台のカメラで融合させることで、カメラのみのソリューションに比べて+1.90 mAPが改善されることを示した。 a9 インフラストラクチャデータセットでの結果を評価し,テストセット上で68.48 マップを達成した。データセットとコードはhttps://a9-dataset.comで公開され、研究コミュニティは認識結果をさらに改善し、自動運転をより安全にすることができる。

Current multi-modal object detection approaches focus on the vehicle domain and are limited in the perception range and the processing capabilities. Roadside sensor units (RSUs) introduce a new domain for perception systems and leverage altitude to observe traffic. Cameras and LiDARs mounted on gantry bridges increase the perception range and produce a full digital twin of the traffic. In this work, we introduce InfraDet3D, a multi-modal 3D object detector for roadside infrastructure sensors. We fuse two LiDARs using early fusion and further incorporate detections from monocular cameras to increase the robustness and to detect small objects. Our monocular 3D detection module uses HD maps to ground object yaw hypotheses, improving the final perception results. The perception framework is deployed on a real-world intersection that is part of the A9 Test Stretch in Munich, Germany. We perform several ablation studies and experiments and show that fusing two LiDARs with two cameras leads to an improvement of +1.90 mAP compared to a camera-only solution. We evaluate our results on the A9 infrastructure dataset and achieve 68.48 mAP on the test set. The dataset and code will be available at https://a9-dataset.com to allow the research community to further improve the perception results and make autonomous driving safer.

翻訳日:2023-05-02 15:52:03 公開日:2023-04-29

# 制約付き多目的フェデレーション学習におけるプライバシ、ユーティリティ、効率の最適化

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning ( http://arxiv.org/abs/2305.00312v1 )

ライセンス: Link先を確認

Yan Kang, Hanlin Gu, Xingxing Tang, Yuanqin He, Yuzhu Zhang, Jinnan He, Yuxing Han, Lixin Fan, Qiang Yang

(参考訳) 従来、連合学習は単一の目的、通常はユーティリティを最適化することを目的としていた。しかし、連合学習システムが信頼できるためには、モデル性能の最大化、プライバシのリークとトレーニングコストの最小化、悪意のある攻撃に対する堅牢性など、複数の目標を同時に満たす必要がある。複数の競合する目的を同時に最適化することを目的とした多目的最適化(MOO)は、信頼できるフェデレートラーニング(TFL)の最適化問題を解決するのに非常に適している。本稿では,制約付き多目的フェデレーション学習(CMOFL)の問題を定式化し,MOOとTFLを統一する。この定式化の下では、既存のMOOアルゴリズムをTFLに簡単に適用することができる。汎用性,効率性,公平性,堅牢性を重視した既存のcmoflとは違って,tflシステムの3つの主な目的であるユーティリティ損失とトレーニングコストとともに,プライバシリークの最適化を検討する。 NSGA-II と PSL に基づく 2 つの改良された CMOFL アルゴリズムを開発し,Pareto 最適解を効果的かつ効率的に検出し,その収束に関する理論的解析を行った。我々は、ランダム化、BatchCrypt(同型暗号化の効率的なバージョン)、スパシフィケーションの3つのプライバシ保護メカニズムに対して、プライバシー漏洩、ユーティリティ損失、トレーニングコストの具体的な測定を設計する。 3つの保護機構のそれぞれで実験を行い,提案手法の有効性を実証した。

Conventionally, federated learning aims to optimize a single objective, typically the utility. However, for a federated learning system to be trustworthy, it needs to simultaneously satisfy multiple/many objectives, such as maximizing model performance, minimizing privacy leakage and training cost, and being robust to malicious attacks. Multi-Objective Optimization (MOO) aiming to optimize multiple conflicting objectives at the same time is quite suitable for solving the optimization problem of Trustworthy Federated Learning (TFL). In this paper, we unify MOO and TFL by formulating the problem of constrained multi-objective federated learning (CMOFL). Under this formulation, existing MOO algorithms can be adapted to TFL straightforwardly. Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system. We develop two improved CMOFL algorithms based on NSGA-II and PSL, respectively, for effectively and efficiently finding Pareto optimal solutions, and we provide theoretical analysis on their convergence. We design specific measurements of privacy leakage, utility loss, and training cost for three privacy protection mechanisms: Randomization, BatchCrypt (An efficient version of homomorphic encryption), and Sparsification. Empirical experiments conducted under each of the three protection mechanisms demonstrate the effectiveness of our proposed algorithms.

翻訳日:2023-05-02 15:51:39 公開日:2023-04-29

# 典型性をもつ条件論理における多層パーセプトロンの優先的解釈

A preferential interpretation of MultiLayer Perceptrons in a conditional logic with typicality ( http://arxiv.org/abs/2305.00304v1 )

ライセンス: Link先を確認

Mario Alviano, Francesco Bartoli, Marco Botta, Roberto Esposito, Laura Giordano, Daniele Theseider Dupr\'e

(参考訳) 本稿では,知識表現におけるデファシブル推論のための多項述語セマンティクスと多層ニューラルネットワークモデルとの関係について検討する。典型的な単純な記述論理に対する重み付き知識ベースは、(多値) ``concept-wise" 多重参照セマンティクスの下で考慮される。セマンティクスは、MultiLayer Perceptrons(MLP)の優先的な解釈を提供するために使用される。 MLPの条件特性の検証には,モデルチェックとエンテーメントに基づくアプローチが有効である。

In this paper we investigate the relationships between a multipreferential semantics for defeasible reasoning in knowledge representation and a multilayer neural network model. Weighted knowledge bases for a simple description logic with typicality are considered under a (many-valued) ``concept-wise" multipreference semantics. The semantics is used to provide a preferential interpretation of MultiLayer Perceptrons (MLPs). A model checking and an entailment based approach are exploited in the verification of conditional properties of MLPs.

翻訳日:2023-05-02 15:51:13 公開日:2023-04-29

# 計算量子秘密共有

Computational Quantum Secret Sharing ( http://arxiv.org/abs/2305.00356v1 )

ライセンス: Link先を確認

Alper \c{C}akan, Vipul Goyal, Chen-Da Liu-Zhang, Jo\~ao Ribeiro

(参考訳) 量子秘密共有(quantum secret sharing, qss)は、ディーラーが秘密の量子状態を一組のパーティに分散させ、あるサブセットが秘密を再構築できるようにする。 QSSは20年以上前に導入されたが、以前の研究は完全なセキュアなスキームの存在のみに焦点を当てており、既知のスキームの共有サイズは多項式サイズモノトーン回路によって計算されたアクセス構造に対しても指数関数的である。これは古典的な場合とは対照的に、$\mathsf{monotone~P}$の全てのアクセス構造に対して効率的な計算安全スキームが長く知られており、完全なセキュリティでは不可能な秘密よりもはるかに短い共有を得ることもできる。本研究では、計算安全QSSの研究を開始し、計算仮定がQSSスキームの構築に大いに役立つことを示す。我々は、単純なコンパイラを示し、それを用いて多種多様な結果を得る:我々は、リッチなアクセス構造のための標準仮定の下で多項式時間qssスキームを構築する。これには、以前のQSSの結果が指数的な共有サイズを必要とする多くのアクセス構造が含まれている。また、株のサイズが秘密のサイズよりも大幅に小さいQSSスキームを構築します。古典的な場合のように、完全なセキュリティでは不可能です。また、計算QSSを超える結果を得るためにコンパイラを使用します。情報理論では、大規模アクセス構造に対する完全なQSSスキームの共有サイズを1.5^{n+o(n)}$に改善し、最もよく知られたスキームを改善し、古典的ケースにおける一般的なアクセス構造に対して最もよく知られた結果と整合する。最後に、量子秘密共有スキームに秘密のコピーが複数与えられた場合、すべてのアクセス構造に対する効率的なスキームを $\mathsf{p}$ と $\mathsf{np}$ で構築する。

Quantum secret sharing (QSS) allows a dealer to distribute a secret quantum state among a set of parties so that certain subsets can reconstruct the secret, while unauthorized subsets obtain no information. While QSS was introduced over twenty years ago, previous works focused only on existence of perfectly secure schemes, and the share size of the known schemes is exponential even for access structures computed by polynomial size monotone circuits. This stands in contrast to the classical case, where efficient computationally-secure schemes have been long known for all access structures in $\mathsf{monotone~P}$, and one can even obtain shares which are much shorter than the secret which is impossible with perfect security. In this work, we initiate the study of computationally-secure QSS and show that computational assumptions help significantly in building QSS schemes. We present a simple compiler and use it to obtain a large variety results: We construct polynomial-time QSS schemes under standard assumptions for a rich class of access structures. This includes many access structures for which previous results in QSS required exponential share size. We also construct QSS schemes for which the size of the shares is significantly smaller than the size of the secret. As in the classical case, this is impossible with perfect security. We also use our compiler to obtain results beyond computational QSS. In the information-theoretic setting, we improve the share size of perfect QSS schemes for a large class of access structures to $1.5^{n+o(n)}$, improving upon best known schemes and matching the best known result for general access structures in the classical case. Finally, we show construct efficient schemes for all access structures in $\mathsf{P}$ and $\mathsf{NP}$ when the quantum secret sharing scheme is given multiple of copies of the secret.

翻訳日:2023-05-02 15:44:44 公開日:2023-04-29

# MH-DETR:クロスモーダルトランスを用いたビデオモーメントと光検出

MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer ( http://arxiv.org/abs/2305.00355v1 )

ライセンス: Link先を確認

Yifang Xu, Yunzhuo Sun, Yang Li, Yilei Shi, Xiaoxiang Zhu, Sidan Du

(参考訳) ビデオ理解の需要が高まり、ビデオモーメントとハイライト検出(MHD)が重要な研究トピックとして浮上している。 MHDはすべての瞬間をローカライズし、クリップワイドのサリエンシスコアを同時に予測することを目的としている。既存のDETRに基づく手法の進歩にもかかわらず、これらの手法は時間的モード内コンテキストを弱め、結果としてモーダル間相互作用が不十分となる様々なモードから粗い特徴を融合する。本稿では,MHDに適したMH-DETR(Moment and Highlight Detection Transformer)を提案する。具体的には,ユニモーダルエンコーダ内に,グローバル・イントラモーダル・コンテキストをキャプチャする簡易かつ効率的なプーリング演算子を導入する。さらに、時間的に調整されたクロスモーダル特徴を得るために、エンコーダとデコーダ間のプラグ・アンド・プレイクロスモーダル相互作用モジュールを設計し、視覚的な特徴とテキスト的な特徴をシームレスに統合する。 QVHighlights、Charades-STA、Activity-Net、TVSumデータセットに関する総合的な実験は、MH-DETRが既存の最先端手法よりも優れており、その効果と優位性を示していることを示している。私たちのコードはhttps://github.com/YoucanBaby/MH-DETRで利用可能です。

With the increasing demand for video understanding, video moment and highlight detection (MHD) has emerged as a critical research topic. MHD aims to localize all moments and predict clip-wise saliency scores simultaneously. Despite progress made by existing DETR-based methods, we observe that these methods coarsely fuse features from different modalities, which weakens the temporal intra-modal context and results in insufficient cross-modal interaction. To address this issue, we propose MH-DETR (Moment and Highlight Detection Transformer) tailored for MHD. Specifically, we introduce a simple yet efficient pooling operator within the uni-modal encoder to capture global intra-modal context. Moreover, to obtain temporally aligned cross-modal features, we design a plug-and-play cross-modal interaction module between the encoder and decoder, seamlessly integrating visual and textual features. Comprehensive experiments on QVHighlights, Charades-STA, Activity-Net, and TVSum datasets show that MH-DETR outperforms existing state-of-the-art methods, demonstrating its effectiveness and superiority. Our code is available at https://github.com/YoucanBaby/MH-DETR.

翻訳日:2023-05-02 15:43:49 公開日:2023-04-29

# 法医学的顔比較のための埋め込みアグリゲーション

Embedding Aggregation for Forensic Facial Comparison ( http://arxiv.org/abs/2305.00352v1 )

ライセンス: Link先を確認

Rafael Oliveira Ribeiro, Jo\~ao C. R. Neves, Arnout C. C. Ruifrok, Flavio de Barros Vidal

(参考訳) 法医学的な顔比較では、疑わしいソース画像は、通常、制御されていない環境、不均一な照明、そして非協力的な被験者から撮影される。このような資料の質の低さは、通常法的事項の証拠としての価値を損なう。一方、法医学的なケースワークでは、興味ある人物の複数の画像が通常利用可能である。本稿では,顔認証の性能向上のために,同一人物のさまざまな画像からのディープニューラルネットワークの埋め込みを集約することを提案する。特に低画質画像では,性能が著しく向上した。さらなる改善は、より多くの画像の埋め込みを集約し、品質重み付けアグリゲーションを適用することで得られる。本手法は,cctv画像に対して最大95%(0.249～0.012),ソーシャルメディア画像では最大96%(0.083～0.003)のcllr改善を報告し,スコアベース度比システムの開発と検証を行い,法医学的評価において有効であることを示す。

In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to aggregate deep neural network embeddings from various images of the same person to improve performance in facial verification. We observe significant performance improvements, especially for very low-quality images. Further improvements are obtained by aggregating embeddings of more images and by applying quality-weighted aggregation. We demonstrate the benefits of this approach in forensic evaluation settings with the development and validation of score-based likelihood ratio systems and report improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.

翻訳日:2023-05-02 15:43:14 公開日:2023-04-29

# POUF: 大規模事前訓練モデルのためのプロンプト指向の教師なし微調整

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models ( http://arxiv.org/abs/2305.00350v1 )

ライセンス: Link先を確認

Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou

(参考訳) プロンプトを通じて、大規模な事前訓練型モデルはより表現力が高く、力強くなり、近年は注目されている。これらの大きなモデルはゼロショット機能を持っているが、一般にラベル付きデータはダウンストリームタスクに適応するために必要である。この限界を克服するために、モデルを直接微調整したり、ラベルのないターゲットデータにプロンプトを付与する教師なしの微調整フレームワークを提案する。本稿では,プロンプトとターゲットデータから抽出した離散分布を整列させて,言語拡張視覚とマスキング言語モデルの両方に適用する方法を示す。提案手法の適用性を検証するため,画像分類,感情分析,自然言語推論タスクについて広範な実験を行った。 13のイメージ関連タスクと15の言語関連タスクに対して,提案手法はベースラインよりも一貫した改善を実現する。

Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data. To verify our approach's applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.

翻訳日:2023-05-02 15:42:56 公開日:2023-04-29

# 身体視におけるモダリティ不変の視覚計測

Modality-invariant Visual Odometry for Embodied Vision ( http://arxiv.org/abs/2305.00348v1 )

ライセンス: Link先を確認

Marius Memmel, Roman Bachmann, Amir Zamir

(参考訳) エージェントを現実的でノイズの多い環境で効果的にローカライズすることは、多くの具体的視覚タスクに不可欠である。ビジュアルオドメトリー(VO)は、特に屋内環境では、信頼性の低いGPSやコンパスセンサーの代替となる。 SLAMベースの手法は、大きなデータ要求なしに安定した性能を示すが、学習ベースのアプローチに比べて、ノイズやセンサースイートの変更に対して柔軟性が低く、堅牢である。しかし、最近のディープVOモデルは、数百万のサンプルをトレーニングしながら、RGBや深さなどの入力モードの固定セットに制限されている。センサーが故障した場合、センサースイートが変更され、あるいは電力消費などの利用可能なリソースのために、モダリティが意図的にループアウトされる。さらに、これらのモデルをスクラッチからトレーニングすることは、シミュレーターアクセスや、微調整可能な既存のモデルなしでさらにコストがかかる。このようなシナリオはシミュレーションでほとんど無視されるが、実世界のアプリケーションでモデルの再利用性を妨げる。本稿では,様々なナビゲーションエージェントのセンサスイートに対応可能なトランスフォーマティブ型モダリティ不変voアプローチを提案する。我々のモデルは、データの一部をトレーニングしながら、以前の方法よりも優れています。この手法が、フレキシブルで学習されたVOモデルの恩恵を受けることができる幅広い現実世界アプリケーションへの扉を開くことを願っている。

Effectively localizing an agent in a realistic, noisy setting is crucial for many embodied vision tasks. Visual Odometry (VO) is a practical substitute for unreliable GPS and compass sensors, especially in indoor environments. While SLAM-based methods show a solid performance without large data requirements, they are less flexible and robust w.r.t. to noise and changes in the sensor suite compared to learning-based approaches. Recent deep VO models, however, limit themselves to a fixed set of input modalities, e.g., RGB and depth, while training on millions of samples. When sensors fail, sensor suites change, or modalities are intentionally looped out due to available resources, e.g., power consumption, the models fail catastrophically. Furthermore, training these models from scratch is even more expensive without simulator access or suitable existing models that can be fine-tuned. While such scenarios get mostly ignored in simulation, they commonly hinder a model's reusability in real-world applications. We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents. Our model outperforms previous methods while training on only a fraction of the data. We hope this method opens the door to a broader range of real-world applications that can benefit from flexible and learned VO models.

翻訳日:2023-05-02 15:42:40 公開日:2023-04-29

# 協調型aiの可能性を解き放つ ---連合機械学習の社会技術的課題-

Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning ( http://arxiv.org/abs/2304.13688v3 )

ライセンス: Link先を確認

Tobias M\"uller, Milena Zahn and Florian Matthes

(参考訳) AIシステムの破壊的なポテンシャルは、ビッグデータの出現に根ざしている。しかし、かなりの部分が散らばってデータサイロに閉じ込められ、その潜在能力は失われている。 Federated Machine Learningは、分散化された潜在的サイロデータからAIモデルを作成することができる、新しいAIパラダイムである。したがって、フェデレーション機械学習は技術的にデータサイロを開放し、経済的な可能性を開くことができる。しかし、これはデータサイロを所有する複数のパーティ間のコラボレーションを必要とする。協調型ビジネスモデルのセットアップは複雑であり、しばしば失敗の原因となる。現在の文献には、協調AIプロジェクトを成功させるために考慮すべき側面のガイドラインが欠けている。本研究では,協調型ビジネスモデルの普及の課題と,連合機械学習の異なる側面について検討する。体系的な文献レビュー、フォーカスグループ、エキスパートインタビューを通じて、社会技術的課題の体系化されたコレクションと、協調aiプロジェクトの初期実行可能性評価のための拡張ビジネスモデルキャンバスを提供する。

The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic potential. However, this requires collaboration between multiple parties owning data silos. Setting up collaborative business models is complex and often a reason for failure. Current literature lacks guidelines on which aspects must be considered to successfully realize collaborative AI projects. This research investigates the challenges of prevailing collaborative business models and distinct aspects of Federated Machine Learning. Through a systematic literature review, focus group, and expert interviews, we provide a systemized collection of socio-technical challenges and an extended Business Model Canvas for the initial viability assessment of collaborative AI projects.

翻訳日:2023-05-02 10:43:23 公開日:2023-04-29

# 三対角トープリッツ行列と二部量子相関

Tridiagonal Toeplitz Matrices and Bipartite Quantum Correlations ( http://arxiv.org/abs/2302.10192v3 )

ライセンス: Link先を確認

Varsha S. Sambhaje, Suprabhat Sinha, Kapil K. Sharma

(参考訳) 本稿では,量子情報によく用いられる有効なハミルトニアンの要件を満たす三対角トエプリッツエルミット行列に着目する。このような行列の挙動を調べ、二部分級ヴェルナー状態と最大絡み合った混合状態に対する量子相関(絡み合いと量子不協和)のダイナミクスを追求する。 Toeplitz行列の主対角線項が両方の量子状態の量子相関に影響を与えないことは興味深い結果である。しかし、超対角および亜対角項は力学において重要な役割を果たす。突然の絡み合い死の現象を調査し,絡み合いがない場合の量子不協和の存在を観察した。最も重要なことは、MEMSがワーナー状態よりも敏感であることである。

In this article, we focus on tridiagonal Toeplitz Hermitian matrices, which fulfill the requirement of a valid Hamiltonian often used in Quantum Information. We investigate the behavior of such matrices to pursue the dynamics of quantum correlations (entanglement and quantum discord) for bipartite Werner state and maximally entangled mixed states. We have found interesting results that the main diagonal terms in the Toeplitz matrices never affect the quantum correlations in both quantum states. However, super-diagonal and sub-diagonal terms play the important role in the dynamics. We investigate the phenomenon of entanglement sudden death and also observe the presence of quantum discord in the absence of entanglement. Most importantly it is found that MEMS is more sensitive in comparison to the Werner state.

翻訳日:2023-05-02 10:43:09 公開日:2023-04-29

# chatvideo:トラックレット中心のマルチモーダル・多目的ビデオ理解システム

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System ( http://arxiv.org/abs/2304.14407v2 )

ライセンス: Link先を確認

Junke Wang and Dongdong Chen and Chong Luo and Xiyang Dai and Lu Yuan and Zuxuan Wu and Yu-Gang Jiang

(参考訳) 既存のディープビデオモデルは、特定のタスク、固定された入出力空間、一般化能力に制限されているため、現実のシナリオでのデプロイが困難である。本稿では,マルチモーダル・多目的ビデオ理解のためのビジョンを示し,プロトタイプシステムである \system を提案する。本システムは,トラックレットを基本ビデオ単位として扱い,様々なビデオファウンデーションモデル(ViFM)を用いて,その特性,例えば外見,動き,および<etc>をアノテートする,トラックレット中心のパラダイムに基づいて構築されている。検出されたトラックレットはすべてデータベースに格納され、データベースマネージャを介してユーザと対話する。我々は,様々な形態の動画のケーススタディを行い,様々なビデオ関連問題に対処するための手法の有効性を実証した。私たちのプロジェクトはhttps://www.wangjunke.info/ChatVideo/で利用可能です。

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios. In this paper, we present our vision for multimodal and versatile video understanding and propose a prototype system, \system. Our system is built upon a tracklet-centric paradigm, which treats tracklets as the basic video unit and employs various Video Foundation Models (ViFMs) to annotate their properties e.g., appearance, motion, \etc. All the detected tracklets are stored in a database and interact with the user through a database manager. We have conducted extensive case studies on different types of in-the-wild videos, which demonstrates the effectiveness of our method in answering various video-related problems. Our project is available at https://www.wangjunke.info/ChatVideo/

翻訳日:2023-05-02 10:32:58 公開日:2023-04-29

# 都市空間時間予測の効率化に向けて:統一図書館と性能ベンチマーク

Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark ( http://arxiv.org/abs/2304.14343v2 )

ライセンス: Link先を確認

Jingyuan Wang, Jiawei Jiang, Wenjun Jiang, Chengkai Han, Wayne Xin Zhao

(参考訳) 深層学習技術が進歩し、都市空間時空間データが蓄積するにつれて、都市空間時空間予測問題を解決するための深層学習モデルが増えている。しかし、既存の分野には、さまざまなフォーマットで、使いづらいオープンソースのデータ、コードとデータをオープンに利用可能にする論文、さまざまなフレームワークやプラットフォームを使用するオープンソースモデルなど、制限があり、比較が難しい。これらのメソッドを実装し評価するには、標準化されたフレームワークが緊急に必要です。これらの課題に対処するため、都市空間時空間予測の総合的なレビューを行い、原子ファイルと呼ばれる空間時空間データの統一記憶形式を提案する。また、libcityは、研究者に信頼できる実験ツールと便利な開発フレームワークを提供するオープンソースライブラリである。本図書館では,65の空間-時間予測モデルを再現し,55の空間-時間データセットを収集した。 LibCityを用いて、異なるモデルやコンポーネントの有効性を検証する一連の実験を行い、将来有望な技術開発と研究の方向性を時空間予測のために要約した。公平なモデル比較を可能にし、統一されたデータストレージフォーマットを設計し、新しいモデルの開発プロセスを簡単にすることで、libcityは空間-時間予測分野に大きな貢献をする準備が整っている。

As deep learning technology advances and more urban spatial-temporal data accumulates, an increasing number of deep learning models are being proposed to solve urban spatial-temporal prediction problems. However, there are limitations in the existing field, including open-source data being in various formats and difficult to use, few papers making their code and data openly available, and open-source models often using different frameworks and platforms, making comparisons challenging. A standardized framework is urgently needed to implement and evaluate these methods. To address these issues, we provide a comprehensive review of urban spatial-temporal prediction and propose a unified storage format for spatial-temporal data called atomic files. We also propose LibCity, an open-source library that offers researchers a credible experimental tool and a convenient development framework. In this library, we have reproduced 65 spatial-temporal prediction models and collected 55 spatial-temporal datasets, allowing researchers to conduct comprehensive experiments conveniently. Using LibCity, we conducted a series of experiments to validate the effectiveness of different models and components, and we summarized promising future technology developments and research directions for spatial-temporal prediction. By enabling fair model comparisons, designing a unified data storage format, and simplifying the process of developing new models, LibCity is poised to make significant contributions to the spatial-temporal prediction field.

翻訳日:2023-05-02 10:32:43 公開日:2023-04-29

PDF登録状況（公開日: 20230429）