このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス(CC 0, CC BY, CC BY-SA)の論文を日本語訳しています。 本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。(arxivのメタデータは CC 0です。) 翻訳文のライセンスはCC BY-SA 4.0です。 翻訳にはFugu-Machine Translatorを利用しています。
Title | Authors | Abstract | 論文公表日・翻訳日 |
# ハッカーの仕事を理解する: 攻撃的セキュリティ実践者の実証的研究 Understanding Hackers' Work: An Empirical Study of Offensive Security Practitioners ( http://arxiv.org/abs/2308.07057v3 ) ライセンス: Link先を確認 | Andreas Happe, J\"urgen Cito | (参考訳) 攻撃的なセキュリティテストは、潜在的な脆弱性を積極的に発見する一般的な方法です。
この分析により、研究者やツールビルダーが自動化の効率を向上し、新しい研究領域を特定するための推奨事項をまとめることができる。 Offensive security-tests are a common way to pro-actively discover potential vulnerabilities. They are performed by specialists, often called penetration-testers or white-hat hackers. The chronic lack of available white-hat hackers prevents sufficient security test coverage of software. Research into automation tries to alleviate this problem by improving the efficiency of security testing. To achieve this, researchers and tool builders need a solid understanding of how hackers work, their assumptions, and pain points. In this paper, we present a first data-driven exploratory qualitative study of twelve security professionals, their work and problems occurring therein. We perform a thematic analysis to gain insights into the execution of security assignments, hackers' thought processes and encountered challenges. This analysis allows us to conclude with recommendations for researchers and tool builders to increase the efficiency of their automation and identify novel areas for research. | 翻訳日:2023-10-23 14:21:04 公開日:2023-08-23 |
# タイミング側チャネルを用いたソフトウェア脆弱性の実証分析 Empirical Analysis of Software Vulnerabilities Causing Timing Side Channels ( http://arxiv.org/abs/2308.11862v1 ) ライセンス: Link先を確認 | M. Mehdi Kholoosi, M. Ali Babar, Cemal Yilmaz | (参考訳) タイムアタックは最も被害の大きいサイドチャネルアタックの1つである。
我々は,2003年3月から2022年12月までにNVD(National Vulnerability Database)で報告されたタイミング攻撃関連脆弱性について,定性的かつ定量的に研究を行った。
この研究の結果は、ソフトウェアセキュリティコミュニティがタイミング攻撃に関連する脆弱性の性質と原因に関する証拠に基づく情報を得るのに役立つことが期待されている。 Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several efforts have been devoted to exploring the various aspects of timing attacks, particularly in cryptography, little attention has been paid to empirically studying the timing attack-related vulnerabilities in non-cryptographic software. By inspecting these software vulnerabilities, this study aims to gain an evidence-based understanding of weaknesses in non-cryptographic software that may help timing attacks succeed. We used qualitative and quantitative research approaches to systematically study the timing attack-related vulnerabilities reported in the National Vulnerability Database (NVD) from March 2003 to December 2022. Our analysis was focused on the modifications made to the code for patching the identified vulnerabilities. We found that a majority of the timing attack-related vulnerabilities were introduced due to not following known secure coding practices. The findings of this study are expected to help the software security community gain evidence-based information about the nature and causes of the vulnerabilities related to timing attacks. | 翻訳日:2023-10-23 13:10:02 公開日:2023-08-23 |
# brightsquidにおける説明責任に関するオーナシップ - ケーススタディと開発者調査 Ownership in the Hands of Accountability at Brightsquid -- A Case Study and a Developer Survey ( http://arxiv.org/abs/2308.12455v1 ) ライセンス: Link先を確認 | Umme Ayman Koana, Francis Chew, Chris Carlson, Maleknaz Nayebi | (参考訳) 新型コロナウイルスのパンデミックにより、デジタルヘルスソリューションの採用が加速した。
さらに、調査の結果、参加者の大多数(67.5%)は、説明責任とアーティファクトの所有権と関連していることが分かった。 The COVID-19 pandemic has accelerated the adoption of digital health solutions. This has presented significant challenges for software development teams to swiftly adjust to the market need and demand. To address these challenges, product management teams have had to adapt their approach to software development, reshaping their processes to meet the demands of the pandemic. Brighsquid implemented a new task assignment process aimed at enhancing developer accountability toward the customer. To assess the impact of this change on code ownership, we conducted a code change analysis. Additionally, we surveyed 67 developers to investigate the relationship between accountability and ownership more broadly. The findings of our case study indicate that the revised assignment model not only increased the perceived sense of accountability within the production team but also improved code resilience against ownership changes. Moreover, the survey results revealed that a majority of the participating developers (67.5%) associated perceived accountability with artifact ownership. | 翻訳日:2023-10-23 12:56:17 公開日:2023-08-23 |
# 経験的ソフトウェア工学におけるポリシ・プロセス・プロダクト理論の利用に関する考察 Reflecting on the Use of the Policy-Process-Product Theory in Empirical Software Engineering ( http://arxiv.org/abs/2308.12387v1 ) ライセンス: Link先を確認 | Kelechi G. Kalu, Taylor R. Schorlemmer, Sophie Chen, Kyle Robinson, Erik Kocinare, James C. Davis | (参考訳) ソフトウェアエンジニアリングの主要な理論は、組織のポリシーとプロセスが製品の品質に影響を与えることである。
これを PPP 理論と呼ぶ。
ソフトウェア製品、プロセス、ポリシーの関係について、そしてそれに関して、研究結果が文脈にあることを覚えておきましょう。 The primary theory of software engineering is that an organization's Policies and Processes influence the quality of its Products. We call this the PPP Theory. Although empirical software engineering research has grown common, it is unclear whether researchers are trying to evaluate the PPP Theory. To assess this, we analyzed half (33) of the empirical works published over the last two years in three prominent software engineering conferences. In this sample, 70% focus on policies/processes or products, not both. Only 33% provided measurements relating policy/process and products. We make four recommendations: (1) Use PPP Theory in study design; (2) Study feedback relationships; (3) Diversify the studied feedforward relationships; and (4) Disentangle policy and process. Let us remember that research results are in the context of, and with respect to, the relationship between software products, processes, and policies. | 翻訳日:2023-10-23 12:55:59 公開日:2023-08-23 |
# Bugsplainer: ニューラルネットワークによるソフトウェアバグの解説にコード構造を活用する Bugsplainer: Leveraging Code Structures to Explain Software Bugs with Neural Machine Translation ( http://arxiv.org/abs/2308.12267v1 ) ライセンス: Link先を確認 | Parvez Mahbub, Mohammad Masudur Rahman, Ohiduzzaman Shuvo, Avinash Gopal | (参考訳) ソフトウェアバグは、毎年数十億ドルの経済費を負担し、開発時間の50%を占める。
ツールビデオ: https://youtu.be/xga-ScvULpk Software bugs cost the global economy billions of dollars each year and take up ~50% of the development time. Once a bug is reported, the assigned developer attempts to identify and understand the source code responsible for the bug and then corrects the code. Over the last five decades, there has been significant research on automatically finding or correcting software bugs. However, there has been little research on automatically explaining the bugs to the developers, which is essential but a highly challenging task. In this paper, we propose Bugsplainer, a novel web-based debugging solution that generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits. Bugsplainer leverages code structures to reason about a bug and employs the fine-tuned version of a text generation model, CodeT5, to generate the explanations. Tool video: https://youtu.be/xga-ScvULpk | 翻訳日:2023-10-23 12:55:29 公開日:2023-08-23 |
# 産業自動化のためのLLM生成モデルのレジリエンス解析 Resiliency Analysis of LLM generated models for Industrial Automation ( http://arxiv.org/abs/2308.12129v1 ) ライセンス: Link先を確認 | Oluwatosin Ogundare, Gustavo Quiros Araya, Ioannis Akrotirianakis, Ankit Shukla | (参考訳) 本稿では,大規模言語モデル(llms)を用いた自動生成産業自動化制御システムのレジリエンスと効率について検討する。
本研究の目的は、産業自動化・制御における自動生成システムの有効性と信頼性に関する洞察を提供することと、その設計・実装改善の可能性を明らかにすることである。 This paper proposes a study of the resilience and efficiency of automatically generated industrial automation and control systems using Large Language Models (LLMs). The approach involves modeling the system using percolation theory to estimate its resilience and formulating the design problem as an optimization problem subject to constraints. Techniques from stochastic optimization and regret analysis are used to find a near-optimal solution with provable regret bounds. The study aims to provide insights into the effectiveness and reliability of automatically generated systems in industrial automation and control, and to identify potential areas for improvement in their design and implementation. | 翻訳日:2023-10-23 12:55:14 公開日:2023-08-23 |
# typescriptコンパイラを使って誤ったnode.jsスニペットを修正する Using the TypeScript compiler to fix erroneous Node.js snippets ( http://arxiv.org/abs/2308.12079v1 ) ライセンス: Link先を確認 | Brittany Reid, Christoph Treude, Markus Wagner | (参考訳) ほとんどのオンラインコードスニペットは実行されない。
Node Code Correction(NCC)という,Node.jsコードスニペットのエラーを自動的に評価し,修正するためのアプローチを提案する。
私たちの評価では、コード修正にtypescriptコンパイラを使用することが、オンラインソースからのコードスニペットの再利用を支援する有望な戦略であることを確認しています。 Most online code snippets do not run. This means that developers looking to reuse code from online sources must manually find and fix errors. We present an approach for automatically evaluating and correcting errors in Node.js code snippets: Node Code Correction (NCC). NCC leverages the ability of the TypeScript compiler to generate errors and inform code corrections through the combination of TypeScript's built-in codefixes, our own targeted fixes, and deletion of erroneous lines. Compared to existing approaches using linters, our findings suggest that NCC is capable of detecting a larger number of errors per snippet and more error types, and it is more efficient at fixing snippets. We find that 73.7% of the code snippets in NPM documentation have errors; with the use of NCC's corrections, this number was reduced to 25.1%. Our evaluation confirms that the use of the TypeScript compiler to inform code corrections is a promising strategy to aid in the reuse of code snippets from online sources. | 翻訳日:2023-10-23 12:54:43 公開日:2023-08-23 |
# ハイブリッド交通における安全自律:情報共有による運転者の予測不可能な異常行動の検出 Towards Safe Autonomy in Hybrid Traffic: Detecting Unpredictable Abnormal Behaviors of Human Drivers via Information Sharing ( http://arxiv.org/abs/2309.16716v1 ) ライセンス: Link先を確認 | Jiangwei Wang, Lili Su, Songyang Han, Dongjin Song, Fei Miao | (参考訳) 自動運転車と人間駆動車の両方を含むハイブリッド交通は、しばらくの間自動運転車の練習の標準となるだろう。
最良の性能は検出率97.3%、平均検出遅延1.2s、誤警報0である。 Hybrid traffic which involves both autonomous and human-driven vehicles would be the norm of the autonomous vehicles practice for a while. On the one hand, unlike autonomous vehicles, human-driven vehicles could exhibit sudden abnormal behaviors such as unpredictably switching to dangerous driving modes, putting its neighboring vehicles under risks; such undesired mode switching could arise from numbers of human driver factors, including fatigue, drunkenness, distraction, aggressiveness, etc. On the other hand, modern vehicle-to-vehicle communication technologies enable the autonomous vehicles to efficiently and reliably share the scarce run-time information with each other. In this paper, we propose, to the best of our knowledge, the first efficient algorithm that can (1) significantly improve trajectory prediction by effectively fusing the run-time information shared by surrounding autonomous vehicles, and can (2) accurately and quickly detect abnormal human driving mode switches or abnormal driving behavior with formal assurance without hurting human drivers privacy. To validate our proposed algorithm, we first evaluate our proposed trajectory predictor on NGSIM and Argoverse datasets and show that our proposed predictor outperforms the baseline methods. Then through extensive experiments on SUMO simulator, we show that our proposed algorithm has great detection performance in both highway and urban traffic. The best performance achieves detection rate of 97.3%, average detection delay of 1.2s, and 0 false alarm. | 翻訳日:2023-10-23 05:47:50 公開日:2023-08-23 |
# 脆弱性クラスタリングとセマンティクス脆弱性埋め込みの機械学習応用 Vulnerability Clustering and other Machine Learning Applications of Semantic Vulnerability Embeddings ( http://arxiv.org/abs/2310.05935v1 ) ライセンス: Link先を確認 | Mark-Oliver Stehr, Minyoung Kim | (参考訳) サイバーセキュリティの脆弱性は通常、短い自然言語記述(例えば、MITREのCVEリストの形式)の形で公開され、時間とともにCommon Vulnerability Scoring System (CVSS) によって定義されたラベルでさらに手作業で強化される。
Vulnerability AI(Analytics and Intelligence)プロジェクトでは、自然言語処理(NLP)技術に基づくさまざまな種類のセマンティックな脆弱性埋め込みを調査し、脆弱性空間の簡潔な表現を得た。
このレポートで検討し、簡単に要約した特定のアプリケーションは、クラスタリング、分類、可視化、および脆弱性空間に関する理論を評価するための新しいロジックベースのアプローチである。 Cyber-security vulnerabilities are usually published in form of short natural language descriptions (e.g., in form of MITRE's CVE list) that over time are further manually enriched with labels such as those defined by the Common Vulnerability Scoring System (CVSS). In the Vulnerability AI (Analytics and Intelligence) project, we investigated different types of semantic vulnerability embeddings based on natural language processing (NLP) techniques to obtain a concise representation of the vulnerability space. We also evaluated their use as a foundation for machine learning applications that can support cyber-security researchers and analysts in risk assessment and other related activities. The particular applications we explored and briefly summarize in this report are clustering, classification, and visualization, as well as a new logic-based approach to evaluate theories about the vulnerability space. | 翻訳日:2023-10-23 04:03:23 公開日:2023-08-23 |
# DF-3DFace:拡散を伴う1対1の3次元顔アニメーション DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion ( http://arxiv.org/abs/2310.05934v1 ) ライセンス: Link先を確認 | Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro | (参考訳) 音声駆動の3d顔アニメーションは、音声に基づく3d空間でリアルで表現力に富んだ顔アニメーションを作成できることで、大きな注目を集めている。
広汎な実験により,本手法は音声から高度に可変な顔の形状と動きを生成できることが実証された。 Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synchronizes with the speech content, other facial attributes beyond speech-related motions are variable with respect to the speech. To account for the potential variance in the facial attributes within a single speech, we propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. DF-3DFace captures the complex one-to-many relationships between speech and 3D face based on diffusion. It concurrently achieves aligned lip motion by exploiting audio-mesh synchronization and masked conditioning. Furthermore, the proposed method jointly models identity and pose in addition to facial motions so that it can generate 3D face animation without requiring a reference identity mesh and produce natural head poses. We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh. Extensive experiments demonstrate that our method successfully generates highly variable facial shapes and motions from speech and simultaneously achieves more realistic facial animation than the state-of-the-art methods. | 翻訳日:2023-10-23 04:03:06 公開日:2023-08-23 |
# FPGA技術を用いた物体運動検出器の計算モデル Computational models of object motion detectors accelerated using FPGA technology ( http://arxiv.org/abs/2310.06842v1 ) ライセンス: Link先を確認 | Pedro Machado | (参考訳) マルチ階層スパイキングニューラルネットワーク(MHSNN: Multi-Hierarchical Spiking Neural Network)は、脊椎動物の網膜にインスパイアされた4層スパイキングニューラルネットワーク(SNN)アーキテクチャである。
Hybrid Sensitive Motion Detector (HSMD): 3層SNNによる動的バックグラウンドサブトラクション(DBS)の強化、前景データの安定化、物体の動き検出の強化。
Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD): HSMD上に構築され、SNNコンポーネントを専用ハードウェア(FPGA)上に実装した。
これらの貢献は、生物学的にインスパイアされたニューラルネットワーク設計から、既存の手法を精度と処理速度で上回る最適化されたハードウェア実装まで、オブジェクトの動き検出における重要な進歩を表している。 This PhD research introduces three key contributions in the domain of object motion detection: Multi-Hierarchical Spiking Neural Network (MHSNN): A specialized four-layer Spiking Neural Network (SNN) architecture inspired by vertebrate retinas. Trained on custom lab-generated images, it exhibited 6.75% detection error for horizontal and vertical movements. While non-scalable, MHSNN laid the foundation for further advancements. Hybrid Sensitive Motion Detector (HSMD): Enhancing Dynamic Background Subtraction (DBS) using a tailored three-layer SNN, stabilizing foreground data to enhance object motion detection. Evaluated on standard datasets, HSMD outperformed OpenCV-based methods, excelling in four categories across eight metrics. It maintained real-time processing (13.82-13.92 fps) on a high-performance computer but showed room for hardware optimisation. Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD): Building upon HSMD, this adaptation implemented the SNN component on dedicated hardware (FPGA). OpenCL simplified FPGA design and enabled portability. NeuroHSMD demonstrated an 82% speedup over HSMD, achieving 28.06-28.71 fps on CDnet2012 and CDnet2014 datasets. These contributions collectively represent significant advancements in object motion detection, from a biologically inspired neural network design to an optimized hardware implementation that outperforms existing methods in accuracy and processing speed. | 翻訳日:2023-10-23 03:45:10 公開日:2023-08-23 |
# イメージされた人間の脳の正当性 The legibility of the imaged human brain ( http://arxiv.org/abs/2309.07096v1 ) ライセンス: Link先を確認 | James K Ruffle, Robert J Gray, Samia Mohinta, Guilherme Pombo, Chaitanya Kaul, Harpreet Hyare, Geraint Rees, Parashkev Nachev | (参考訳) 人口レベルでの人間の脳の組織に関する我々の知識は、個々のレベルでの機能的差異を予測し、臨床応用を制限し、推論されたメカニズムの一般化可能性に疑問を投げかける力にはまだ変換されていない。
性別の予測可能性(精度99.7%)、年齢(平均絶対誤差2.048年、R2 0.859)、体重(平均絶対誤差2.609Kg、R2 0.625)の間には顕著な相違が見られ、そこでは新たな最先端性能を設定し、他の特性の予想可能性も驚くほど低い。
血清学的には共通致死率 (p<0.05) が予測され, 好ましくはp<0.001) , 次いで構造的神経イメージング (p<0.05) が予測された。
以上の結果から、より情報的画像化やより強力なモデルが脳から個々のレベルの特徴を解読するために必要であることが示唆された。 Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications, and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limited power to access them with the models and compute at our disposal. Here we comprehensively investigate the resolvability of such patterns with data and compute at unprecedented scale. Across 23810 unique participants from UK Biobank, we systematically evaluate the predictability of 25 individual biological characteristics, from all available combinations of structural and functional neuroimaging data. Over 4526 GPU*hours of computation, we train, optimize, and evaluate out-of-sample 700 individual predictive models, including multilayer perceptrons of demographic, psychological, serological, chronic morbidity, and functional connectivity characteristics, and both uni- and multi-modal 3D convolutional neural network models of macro- and micro-structural brain imaging. We find a marked discrepancy between the high predictability of sex (balanced accuracy 99.7%), age (mean absolute error 2.048 years, R2 0.859), and weight (mean absolute error 2.609Kg, R2 0.625), for which we set new state-of-the-art performance, and the surprisingly low predictability of other characteristics. Neither structural nor functional imaging predicted individual psychology better than the coincidence of common chronic morbidity (p<0.05). Serology predicted common morbidity (p<0.05) and was best predicted by it (p<0.001), followed by structural neuroimaging (p<0.05). Our findings suggest either more informative imaging or more powerful models will be needed to decipher individual level characteristics from the brain. | 翻訳日:2023-09-17 13:48:42 公開日:2023-08-23 |
# エンド・ツー・エンド限界順序ブックモデリングのための生成AI:ディープ・ステート・スペース・ネットワークを用いたメッセージフローのトーケンレベル自己回帰生成モデル Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network ( http://arxiv.org/abs/2309.00638v1 ) ライセンス: Link先を確認 | Peer Nagy, Sascha Frey, Silvia Sapora, Kang Li, Anisoara Calinescu, Stefan Zohren, Jakob Foerster | (参考訳) 金融市場における現実的な注文フローのジェネレーティブなモデルの開発は、市場参加者に多くのアプリケーションを提供しながら、難しいオープン問題である。
本研究では,NASDAQ エクイティLOBのLOBSTERデータを用いて,大規模言語モデルにおけるトークン化と同様の,逐次桁の群をトークンに変換する,メッセージデータ用のカスタムトークン化器を開発する。
全体として、当社では、高周波金融データ生成のための自己回帰型大規模金融モデル(autoregressive large financial model)の方向性として、このモデルの使用と拡張を推奨しています。 Developing a generative model of realistic order flow in financial markets is a challenging open problem, with numerous applications for market participants. Addressing this, we propose the first end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages. These messages are interpreted by a Jax-LOB simulator, which updates the LOB state. To handle long sequences efficiently, the model employs simplified structured state-space layers to process sequences of order book states and tokenized messages. Using LOBSTER data of NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens, similar to tokenization in large language models. Out-of-sample results show promising performance in approximating the data distribution, as evidenced by low model perplexity. Furthermore, the mid-price returns calculated from the generated order flow exhibit a significant correlation with the data, indicating impressive conditional forecast performance. Due to the granularity of generated data, and the accuracy of the model, it offers new application areas for future work beyond forecasting, e.g. acting as a world model in high-frequency financial reinforcement learning applications. Overall, our results invite the use and extension of the model in the direction of autoregressive large financial models for the generation of high-frequency financial data and we commit to open-sourcing our code to facilitate future research. | 翻訳日:2023-09-10 03:57:27 公開日:2023-08-23 |
# GAN拡張データにおけるバイアスの定量化に関する体系的研究 A Systematic Study on Quantifying Bias in GAN-Augmented Data ( http://arxiv.org/abs/2308.13554v1 ) ライセンス: Link先を確認 | Denis Liu | (参考訳) generative adversarial networks(gans)は最近、機械学習の実践者が使用する一般的なデータ拡張技術になっている。
これらの手法のいくつかは利用可能であるが、異なる画像領域のスパンに対して確実にバイアスの悪化を定量化する単一の計量は存在しない。 Generative adversarial networks (GANs) have recently become a popular data augmentation technique used by machine learning practitioners. However, they have been shown to suffer from the so-called mode collapse failure mode, which makes them vulnerable to exacerbating biases on already skewed datasets, resulting in the generated data distribution being less diverse than the training distribution. To this end, we address the problem of quantifying the extent to which mode collapse occurs. This study is a systematic effort focused on the evaluation of state-of-the-art metrics that can potentially quantify biases in GAN-augmented data. We show that, while several such methods are available, there is no single metric that quantifies bias exacerbation reliably over the span of different image domains. | 翻訳日:2023-09-03 21:33:03 公開日:2023-08-23 |
# パンデミックが家庭に留まるためのレンズ A Lens to Pandemic Stay at Home Attitudes ( http://arxiv.org/abs/2308.13552v1 ) ライセンス: Link先を確認 | Andrew Wentzel, Lauren Levine, Vipul Dhariwal, Zahra Fatemi, Barbara Di Eugenio, Andrew Rojecki, Elena Zheleva, G.Elisabeta Marai | (参考訳) 我々は,外出禁止命令やソーシャルメディアのモラルフレームに関連する,急速な多分野のパンデミック・プロジェクトにおいて遭遇した設計プロセスと課題について述べる。
この経験から学んだ教訓について説明する。 We describe the design process and the challenges we met during a rapid multi-disciplinary pandemic project related to stay-at-home orders and social media moral frames. Unlike our typical design experience, we had to handle a steeper learning curve, emerging and continually changing datasets, as well as under-specified design requirements, persistent low visual literacy, and an extremely fast turnaround for new data ingestion, prototyping, testing and deployment. We describe the lessons learned through this experience. | 翻訳日:2023-09-03 21:32:51 公開日:2023-08-23 |
# Dance with You: The Diversity Controllable Dancer Generation by Diffusion Models Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models ( http://arxiv.org/abs/2308.13551v1 ) ライセンス: Link先を確認 | Siyue Yao, Mingjie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang | (参考訳) 近年,仮想環境における対人インタラクションのためのデジタル人間が注目されている。
この目的を達成するために,Dance-with-You (DanY) と呼ばれる3段階のフレームワークを提案する。
その後、ダンス・モーション・トランスファー(Dance Motion Transfer)ステージがリーダーシーケンスと音楽に採用され、複数条件のサンプリング式が書き換えられ、プレ生成されたポーズがパートナースタイルのシーケンスに転送される。
AIST-Mデータセットの総合的な評価は、提案したDanYが良好なパートナーダンサー結果を制御可能な多様性で合成できることを実証している。 Recently, digital humans for interpersonal interaction in virtual environments have gained significant attention. In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users. The task aims to control the pose diversity between the lead dancer and the partner dancer. The core of this task is to ensure the controllable diversity of the generated partner dancer while maintaining temporal coordination with the lead dancer. This scenario varies from earlier research in generating dance motions driven by music, as our emphasis is on automatically designing partner dancer postures according to pre-defined diversity, the pose of lead dancer, as well as the accompanying tunes. To achieve this objective, we propose a three-stage framework called Dance-with-You (DanY). Initially, we employ a 3D Pose Collection stage to collect a wide range of basic dance poses as references for motion generation. Then, we introduce a hyper-parameter that coordinates the similarity between dancers by masking poses to prevent the generation of sequences that are over-diverse or consistent. To avoid the rigidity of movements, we design a Dance Pre-generated stage to pre-generate these masked poses instead of filling them with zeros. After that, a Dance Motion Transfer stage is adopted with leader sequences and music, in which a multi-conditional sampling formula is rewritten to transfer the pre-generated poses into a sequence with a partner style. In practice, to address the lack of multi-person datasets, we introduce AIST-M, a new dataset for partner dancer generation, which is publicly availiable. Comprehensive evaluations on our AIST-M dataset demonstrate that the proposed DanY can synthesize satisfactory partner dancer results with controllable diversity. | 翻訳日:2023-09-03 21:32:28 公開日:2023-08-23 |
# 第20回量子物理学・論理国際会議に参加して Proceedings of the Twentieth International Conference on Quantum Physics and Logic ( http://arxiv.org/abs/2308.15489v1 ) ライセンス: Link先を確認 | Shane Mansfield, Benoit Val\^iron, Vladimir Zamdzhiev | (参考訳) 第20回量子物理学と論理に関する国際会議(QPL 2023)の手続きを含む。
主な焦点は、代数的および分類的構造、形式言語、型システム、意味論的方法、および物理システム、物理過程、およびそれらの構成の研究に適用可能な他の数学的およびコンピュータ科学技術の使用である。 This volume contains the proceedings of the 20th International Conference on Quantum Physics and Logic (QPL 2023). The aim of the QPL conference series is to bring together academic and industry researchers working on mathematical foundations of quantum computation, quantum physics, and related areas. The main focus is on the use of algebraic and categorical structures, formal languages, type systems, semantic methods, as well as other mathematical and computer scientific techniques applicable to the study of physical systems, physical processes, and their composition. | 翻訳日:2023-09-03 21:22:05 公開日:2023-08-23 |
# 二渦置換および共形橋変換により誘導される非可換ランダウ問題の弱強双対性 Weak-strong duality of the non-commutative Landau problem induced by a two-vortex permutation, and conformal bridge transformation ( http://arxiv.org/abs/2304.06677v2 ) ライセンス: Link先を確認 | Andrey Alcala and Mikhail S. Plyushchay | (参考訳) 2-渦系のダイナミクスと、その部分(非キラル)、超(キラル)、臨界相における非可換ランダウ問題(nclp)との間に対応が確立される。
共形橋の逆変換と直接変換の合成は、これら2つのシステムのそれぞれにおける非キラル相とキラル相を結びつけることもできる。 A correspondence is established between the dynamics of the two-vortex system and the non-commutative Landau problem (NCLP) in its sub- (non-chiral), super- (chiral) and critical phases. As a result, a trivial permutation symmetry of the point vortices induces a weak-strong coupling duality in the NCLP. We show that quantum two-vortex systems with non-zero total vorticity can be generated by applying conformal bridge transformation to a two-dimensional quantum free particle or to a quantum vortex-antivortex system of zero total vorticity. The sub- and super-critical phases of the quantum NCLP are generated in a similar way from the 2D quantum free particle in a commutative or non-commutative plane. The composition of the inverse and direct transformations of the conformal bridge also makes it possible to link the non-chiral and chiral phases in each of these two systems. | 翻訳日:2023-08-28 23:35:30 公開日:2023-08-23 |
# 科学論文からの知識獲得のためのオープンリサーチナレッジグラフに基づくアプローチ An approach based on Open Research Knowledge Graph for Knowledge Acquisition from scientific papers ( http://arxiv.org/abs/2308.12981v1 ) ライセンス: Link先を確認 | Azanzi Jiomekong and Sanju Tiwari | (参考訳) 科学論文は、メタデータとフルボディテキストの2つの主要な構成要素に分けられる。
このアプローチは、"epidemiological surveillance systems design and implementation"研究問題を文書化し、関連する研究を準備するために用いられた。
現在、"food information engineering"、"tabular data to knowledge graph matching"、"question answering"研究問題、"neuro-symbolic ai"ドメインの文書化に使用されている。 A scientific paper can be divided into two major constructs which are Metadata and Full-body text. Metadata provides a brief overview of the paper while the Full-body text contains key-insights that can be valuable to fellow researchers. To retrieve metadata and key-insights from scientific papers, knowledge acquisition is a central activity. It consists of gathering, analyzing and organizing knowledge embedded in scientific papers in such a way that it can be used and reused whenever needed. Given the wealth of scientific literature, manual knowledge acquisition is a cumbersome task. Thus, computer-assisted and (semi-)automatic strategies are generally adopted. Our purpose in this research was two fold: curate Open Research Knowledge Graph (ORKG) with papers related to ontology learning and define an approach using ORKG as a computer-assisted tool to organize key-insights extracted from research papers. This approach was used to document the "epidemiological surveillance systems design and implementation" research problem and to prepare the related work of this paper. It is currently used to document "food information engineering", "Tabular data to Knowledge Graph Matching" and "Question Answering" research problems and "Neuro-symbolic AI" domain. | 翻訳日:2023-08-28 16:34:32 公開日:2023-08-23 |
# 設計最適化と深層学習に基づく逆設計の性能比較 Performance Comparison of Design Optimization and Deep Learning-based Inverse Design ( http://arxiv.org/abs/2308.13000v1 ) ライセンス: Link先を確認 | Minyoung Jwa, Jihoon Kim, Seungyeon Shin, Ah-hyeon Jin, Dongju Shin, Namwoo Kang | (参考訳) サーロゲートモデルに基づく最適化は、エンジニアリング設計の分野でますます使われてきた。
これらのガイドラインは,本手法の実際の工学設計問題への適用性を高めることが期待されている。 Surrogate model-based optimization has been increasingly used in the field of engineering design. It involves creating a surrogate model with objective functions or constraints based on the data obtained from simulations or real-world experiments, and then finding the optimal solution from the model using numerical optimization methods. Recent advancements in deep learning-based inverse design methods have made it possible to generate real-time optimal solutions for engineering design problems, eliminating the requirement for iterative optimization processes. Nevertheless, no comprehensive study has yet closely examined the specific advantages and disadvantages of this novel approach compared to the traditional design optimization method. The objective of this paper is to compare the performance of traditional design optimization methods with deep learning-based inverse design methods by employing benchmark problems across various scenarios. Based on the findings of this study, we provide guidelines that can be taken into account for the future utilization of deep learning-based inverse design. It is anticipated that these guidelines will enhance the practical applicability of this approach to real engineering design problems. | 翻訳日:2023-08-28 16:20:29 公開日:2023-08-23 |
# 量子アニールによる安定セット問題解の促進 Advancing stable set problem solutions through quantum annealers ( http://arxiv.org/abs/2308.13041v1 ) ライセンス: Link先を確認 | Janez Povh and Dunja Pucher | (参考訳) 最も研究されているNPハード問題の1つであるグラフにおける安定セット問題の解法としてD波量子解法の性能を評価する。
ハイブリッドソルバは、非常に優れた結果を与えるが、量子処理ユニットソルバは全体としては控えめなパフォーマンスを示している。 We assess the performance of D-wave quantum solvers for solving the stable set problem in a graph, one of the most studied NP-hard problems. We perform computations on some instances from the literature with up to 125 vertices and compare the quality of the obtained solutions with known optimum solutions. It turns out that the hybrid solver gives very good results, while the Quantum Processing Unit solver shows rather modest performance overall. | 翻訳日:2023-08-28 16:01:19 公開日:2023-08-23 |
# p1ac:単一のアフィン対応から絶対的なポーズを再検討する P1AC: Revisiting Absolute Pose From a Single Affine Correspondence ( http://arxiv.org/abs/2011.08790v4 ) ライセンス: Link先を確認 | Jonathan Ventura, Zuzana Kukelova, Torsten Sattler and D\'aniel Bar\'ath | (参考訳) アフィン対応は従来、幅広いベースラインに対する機能マッチングを改善するために用いられてきた。
このメソッドのコードはhttps://github.com/jonathanventura/p1ac/で入手できる。 Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation problems, less attention has been given to their use in absolute pose estimation. We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. The advantage of our approach (P1AC) is that it requires only a single correspondence, in comparison to the traditional point-based approach (P3P), significantly reducing the combinatorics in robust estimation. P1AC provides a general solution that removes restrictive assumptions made in prior work and is applicable to large-scale image-based localization. We propose a minimal solution to the P1AC problem and evaluate our novel solver on synthetic data, showing its numerical stability and performance under various types of noise. On standard image-based localization benchmarks we show that P1AC achieves more accurate results than the widely used P3P algorithm. Code for our method is available at https://github.com/jonathanventura/P1AC/ . | 翻訳日:2023-08-25 19:17:17 公開日:2023-08-23 |
# BagPipe: 深層推奨モデルのトレーニングを加速する BagPipe: Accelerating Deep Recommendation Model Training ( http://arxiv.org/abs/2202.12429v3 ) ライセンス: Link先を確認 | Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman | (参考訳) ディープラーニングベースのレコメンデーションモデル(DLRM)は、いくつかのビジネスクリティカルなアプリケーションで広く使われている。
私たちはoracle cacherを設計しました。これはlookaheadアルゴリズムを使用して最適なキャッシュ更新決定を生成する新しいコンポーネントです。
3つのデータセットと4つのモデルを用いて実験したところ、Bagpipeはアートベースラインの状態と比較して最大5.6倍の速度を提供し、同期トレーニングと同じ収束と再現性保証を提供する。 Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75\% of the iteration time is spent on embedding access and model synchronization. Our key insight in this paper is that embedding access has a specific structure which can be used to accelerate training. We observe that embedding accesses are heavily skewed, with around 1\% of embeddings representing more than 92\% of total accesses. Further, we observe that during offline training we can lookahead at future batches to determine exactly which embeddings will be needed at what iteration in the future. Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation. We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions while providing strong consistency guarantees against staleness. We also design a logically replicated, physically partitioned cache and show that our design can reduce synchronization overheads in a distributed setting. Finally, we propose a disaggregated system architecture and show that our design can enable low-overhead fault tolerance. Our experiments using three datasets and four models show that Bagpipe provides a speed up of up to 5.6x compared to state of the art baselines, while providing the same convergence and reproducibility guarantees as synchronous training. | 翻訳日:2023-08-25 19:11:51 公開日:2023-08-23 |
# 石油会社の株価予測におけるLSTMモデルの解釈可能性:関連性の影響 The Interpretability of LSTM Models for Predicting Oil Company Stocks: Impact of Correlated Features ( http://arxiv.org/abs/2201.00350v4 ) ライセンス: Link先を確認 | Javad T. Firouzjaee and Pouriya Khaliliyan | (参考訳) 石油会社は世界最大の企業の一つであり、世界の株式市場における経済指標は、金、原油、ドルとの関係から世界経済と市場に大きな影響を与えている。
そこで我々は,LSTM(Standard Long Short-Term Memory)ネットワークを設計し,様々な相関データセットを用いて学習した。
株価の変動を引き起こす要因を十分に理解することが困難になる可能性があるため、株価予測にlstmモデルのみに頼る場合、注意すべきである。 Oil companies are among the largest companies in the world whose economic indicators in the global stock market have a great impact on the world economy and market due to their relation to gold, crude oil, and the dollar. This study investigates the impact of correlated features on the interpretability of Long Short-Term Memory (LSTM) models for predicting oil company stocks. To achieve this, we designed a Standard Long Short-Term Memory (LSTM) network and trained it using various correlated datasets. Our approach aims to improve the accuracy of stock price prediction by considering the multiple factors affecting the market, such as crude oil prices, gold prices, and the US dollar. The results demonstrate that adding a feature correlated with oil stocks does not improve the interpretability of LSTM models. These findings suggest that while LSTM models may be effective in predicting stock prices, their interpretability may be limited. Caution should be exercised when relying solely on LSTM models for stock price prediction as their lack of interpretability may make it difficult to fully understand the underlying factors driving stock price movements. | 翻訳日:2023-08-25 19:11:05 公開日:2023-08-23 |
# ビジュアル文書理解のためのテスト時間適応 Test-Time Adaptation for Visual Document Understanding ( http://arxiv.org/abs/2206.07240v2 ) ライセンス: Link先を確認 | Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister | (参考訳) 視覚的文書理解 (VDU) では, 自己教師による事前学習によって伝達可能な表現が生成できることが示されているが, テスト時の分散シフトに対する表現の効果的な適応は未探索領域のままである。
docttaは、マスク付きビジュアル言語モデリングによるクロスモダリティな自己教師付き学習と、テスト時にunlabeled \textit{target}ドメインに \textit{source}ドメインで学習されたモデルを適応させるための擬似ラベルを使用する。
ベンチマークデータセットは \url{https://saynaebrahimi.github.io/doctta.html} で利用可能です。 For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}. | 翻訳日:2023-08-25 19:02:06 公開日:2023-08-23 |
# StableDR: ランダムではなくデータの欠落を推奨する2倍のロバスト学習 StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random ( http://arxiv.org/abs/2205.04701v3 ) ライセンス: Link先を確認 | Haoxuan Li, Chunyuan Zheng, Peng Wu | (参考訳) 推薦システムでは、ユーザは常に評価対象の項目を選択する。これはデータがランダムに失われず、予測モデルのバイアスのない評価と学習には大きな課題となる。
理論的解析により、stabledr は不正確な不正確な誤差と任意に小さい確率の下で同時に束縛された有界バイアス、分散、一般化誤差を持つことが示されている。
広範な実験により,我々のアプローチは既存の手法を大きく上回ることがわかった。 In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods. | 翻訳日:2023-08-25 19:00:48 公開日:2023-08-23 |
# 3+1 QEDの相対論的離散時空定式化 A relativistic discrete spacetime formulation of 3+1 QED ( http://arxiv.org/abs/2205.03148v2 ) ライセンス: Link先を確認 | Nathana\"el Eon, Giuseppe Di Molfetta, Giuseppe Magnifico, Pablo Arrighi | (参考訳) この研究は、離散時空理論の定式化に基づく2+1ドルと3+1ドルの量子電磁力学(qed)の両方の相対論的デジタル量子シミュレーションスキームを提供する。
最後に、ゲージ場には独自の電磁力学が与えられ、各プラーペットの量子ウォークとして定式化することができる。 This work provides a relativistic, digital quantum simulation scheme for both $2+1$ and $3+1$ dimensional quantum electrodynamics (QED), based on a discrete spacetime formulation of theory. It takes the form of a quantum circuit, infinitely repeating across space and time, parametrised by the discretization step $\Delta_t=\Delta_x$. Strict causality at each step is ensured as circuit wires coincide with the lightlike worldlines of QED; simulation time under decoherence is optimized. The construction replays the logic that leads to the QED Lagrangian. Namely, it starts from the Dirac quantum walk, well-known to converge towards free relativistic fermions. It then extends the quantum walk into a multi-particle sector quantum cellular automata in a way which respects the fermionic anti-commutation relations and the discrete gauge invariance symmetry. Both requirements can only be achieved at cost of introducing the gauge field. Lastly the gauge field is given its own electromagnetic dynamics, which can be formulated as a quantum walk at each plaquette. | 翻訳日:2023-08-25 19:00:31 公開日:2023-08-23 |
# 後方深部bsde法の収束と最適停止問題への応用 Convergence of the Backward Deep BSDE Method with Applications to Optimal Stopping Problems ( http://arxiv.org/abs/2210.04118v3 ) ライセンス: Link先を確認 | Chengfan Gao, Siping Gao, Ruimeng Hu, Zimu Zhu | (参考訳) 最適停止問題は金融市場の中核的な問題の一つであり、アメリカやベルムダンの価格設定など幅広い応用がある。
深部 bsde 法 (han, jentzen and e, pnas, 115(34):8505-8510, 2018) は高次元前方後方確率微分方程式 (fbsdes) の解法に大きな力を示し、多くの応用に影響を与えた。
この難しさを克服するため、最近の論文 (Wang, Chen, Sudjianto, Liu and Shen, arXiv:1807.06622, 2018) では、最適な停止問題を解決するために、後方深度BSDE法を提案した。
1. 後方誤差推定,すなわち,数値解の誤差をトレーニング損失関数で有界化することができる。
2. 損失関数の上界を与えるが、これは普遍近似の十分小さい対象である。
証明された理論と一貫した性能を示す2つの数値例を示す。 The optimal stopping problem is one of the core problems in financial markets, with broad applications such as pricing American and Bermudan options. The deep BSDE method [Han, Jentzen and E, PNAS, 115(34):8505-8510, 2018] has shown great power in solving high-dimensional forward-backward stochastic differential equations (FBSDEs), and inspired many applications. However, the method solves backward stochastic differential equations (BSDEs) in a forward manner, which can not be used for optimal stopping problems that in general require running BSDE backwardly. To overcome this difficulty, a recent paper [Wang, Chen, Sudjianto, Liu and Shen, arXiv:1807.06622, 2018] proposed the backward deep BSDE method to solve the optimal stopping problem. In this paper, we provide the rigorous theory for the backward deep BSDE method. Specifically, 1. We derive the a posteriori error estimation, i.e., the error of the numerical solution can be bounded by the training loss function; and; 2. We give an upper bound of the loss function, which can be sufficiently small subject to universal approximations. We give two numerical examples, which present consistent performance with the proved theory. | 翻訳日:2023-08-25 18:53:54 公開日:2023-08-23 |
# 有効モデル空間における量子シミュレーション(I):デジタル量子コンピュータを用いたハミルトン学習VQEとLipkin-Meshkov-Glickモデルへの応用 Quantum Simulations in Effective Model Spaces (I): Hamiltonian Learning-VQE using Digital Quantum Computers and Application to the Lipkin-Meshkov-Glick Model ( http://arxiv.org/abs/2301.05976v4 ) ライセンス: Link先を確認 | Caroline E. P. Robin and Martin J. Savage | (参考訳) 非相対論的量子多体系の量子シミュレーションにおける有効モデル空間の有用性は、相互作用フェルミオンのリプキン・メシュコフ・グリック模型の文脈で研究されている。
本稿では,実効的なハミルトニアンを同時に最適化し,実効モデル空間への絡み合いと関連する基底状態波動関数を再構成する,反復的ハイブリッド古典量子アルゴリズムであるhamiltonian learning variational quantum eigensolver (hl-vqe)を提案する。
この研究は、ノイズの多い中間スケール量子(NISQ)デバイスの可能性を活用する核システム記述のための絡み合い駆動量子アルゴリズムの開発におけるステップを構成する。 The utility of effective model spaces in quantum simulations of non-relativistic quantum many-body systems is explored in the context of the Lipkin-Meshkov-Glick model of interacting fermions. We introduce an iterative hybrid-classical-quantum algorithm, Hamiltonian learning variational quantum eigensolver (HL-VQE), that simultaneously optimizes an effective Hamiltonian, thereby rearranging entanglement into the effective model space, and the associated ground-state wavefunction. HL-VQE is found to provide an exponential improvement in Lipkin-Meshkov-Glick model calculations, compared to a naive truncation without Hamiltonian learning, throughout a significant fraction of the Hilbert space. Quantum simulations are performed to demonstrate the HL-VQE algorithm, using an efficient mapping where the number of qubits scales with the $\log$ of the size of the effective model space, rather than the particle number, allowing for the description of large systems with small quantum circuits. Implementations on IBM's QExperience quantum computers and simulators for 1- and 2-qubit effective model spaces are shown to provide accurate and precise results, reproducing classical predictions. This work constitutes a step in the development of entanglement-driven quantum algorithms for the description of nuclear systems, that leverages the potential of noisy intermediate-scale quantum (NISQ) devices. | 翻訳日:2023-08-25 18:32:29 公開日:2023-08-23 |
# 鳥眼視レイアウトによるストリートビュー画像生成 Street-View Image Generation from a Bird's-Eye View Layout ( http://arxiv.org/abs/2301.04634v3 ) ライセンス: Link先を確認 | Alexander Swerdlow, Runsheng Xu, Bolei Zhou | (参考訳) Bird's-Eye View (BEV) パーセプションは、ビューを横断する簡潔で統一された空間表現を提供し、様々な下流運転アプリケーションに恩恵を与え、近年注目を集めている。
コードは公開される予定だ。 Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications. While the focus has been placed on discriminative tasks such as BEV segmentation, the dual generative task of creating street-view images from a BEV layout has rarely been explored. The ability to generate realistic street-view images that align with a given HD map and traffic layout is critical for visualizing complex traffic scenarios and developing robust perception models for autonomous driving. In this paper, we propose BEVGen, a conditional generative model that synthesizes a set of realistic and spatially consistent surrounding images that match the BEV layout of a traffic scenario. BEVGen incorporates a novel cross-view transformation and spatial attention design which learn the relationship between cameras and map views to ensure their consistency. Our model can accurately render road and lane lines, as well as generate traffic scenes under different weather conditions and times of day. The code will be made publicly available. | 翻訳日:2023-08-25 18:32:01 公開日:2023-08-23 |
# 原子アンサンブル配列におけるライドバーグドレッシングによるスピンスクイーズ Spin Squeezing by Rydberg Dressing in an Array of Atomic Ensembles ( http://arxiv.org/abs/2303.08805v3 ) ライセンス: Link先を確認 | Jacob A. Hines, Shankari V. Rajagopal, Gabriel L. Moreau, Michael D. Wahrman, Neomi A. Lewis, Ognjen Markovi\'c, Monika Schleier-Smith | (参考訳) 本稿では,中性原子間の局所的相互作用を光学的に制御する手法であるrydberg dressingを用いて,セシウム原子のスピン配列配列の作成について報告する。
これにより、n=200$原子のスクイージングパラメータ$\xi^2 = 0.77(9)$を用いて、標準量子極限以下の位相分散の低減を定量化する。
本手法は,原子時計のアレイに基づく基礎物理実験の精度の向上と,電磁界の量子強調撮像を可能にするために応用できる。 We report on the creation of an array of spin-squeezed ensembles of cesium atoms via Rydberg dressing, a technique that offers optical control over local interactions between neutral atoms. We optimize the coherence of the interactions by a stroboscopic dressing sequence that suppresses super-Poissonian loss. We thereby prepare squeezed states of $N=200$ atoms with a metrological squeezing parameter $\xi^2 = 0.77(9)$ quantifying the reduction in phase variance below the standard quantum limit. We realize metrological gain across three spatially separated ensembles in parallel, with the strength of squeezing controlled by the local intensity of the dressing light. Our method can be applied to enhance the precision of tests of fundamental physics based on arrays of atomic clocks and to enable quantum-enhanced imaging of electromagnetic fields. | 翻訳日:2023-08-25 18:23:34 公開日:2023-08-23 |
# テンソルと深部生成モデルを用いた量子ラジオマップ推定 Quantized Radio Map Estimation Using Tensor and Deep Generative Models ( http://arxiv.org/abs/2303.01770v2 ) ライセンス: Link先を確認 | Subash Timilsina, Sagar Shrestha, Xiao Fu | (参考訳) スペクトル地図 (SC) は、無線地図推定 (RME) としても知られ、限られたセンサ測定から複数の領域(周波数と空間)の電波パワー伝搬マップを作成することを目的としている。
gaussian quantizer による最大度推定(mle)に基づくscフレームワークを提案する。
提案手法の有効性を示すためにシミュレーションと実データ実験が用いられる。 Spectrum cartography (SC), also known as radio map estimation (RME), aims at crafting multi-domain (e.g., frequency and space) radio power propagation maps from limited sensor measurements. While early methods often lacked theoretical support, recent works have demonstrated that radio maps can be provably recovered using low-dimensional models -- such as the block-term tensor decomposition (BTD) model and certain deep generative models (DGMs) -- of the high-dimensional multi-domain radio signals. However, these existing provable SC approaches assume that sensors send real-valued (full-resolution) measurements to the fusion center, which is unrealistic. This work puts forth a quantized SC framework that generalizes the BTD and DGM-based SC to scenarios where heavily quantized sensor measurements are used. A maximum likelihood estimation (MLE)-based SC framework under a Gaussian quantizer is proposed. Recoverability of the radio map using the MLE criterion are characterized under realistic conditions, e.g., imperfect radio map modeling and noisy measurements. Simulations and real-data experiments are used to showcase the effectiveness of the proposed approach. | 翻訳日:2023-08-25 18:22:33 公開日:2023-08-23 |
# 線形相補性プログラミングを用いた時系列の等角予測領域 Conformal Prediction Regions for Time Series using Linear Complementarity Programming ( http://arxiv.org/abs/2304.01075v3 ) ライセンス: Link先を確認 | Matthew Cleaveland, Insup Lee, George J. Pappas, Lars Lindemann | (参考訳) コンフォーマル予測は、高い確率で有効な機械学習モデルの予測領域を生成する統計ツールである。
実際、信頼度1-\delta$でT$以上の予測領域を得るには、 {previous works requires each individual prediction region is valid} with confidence $1-\delta/T$。
この問題を混合整数線形相補性プログラム (MILCP) としてキャストし, 線形相補性プログラム (LCP) に緩和することを示した。
最後に,歩行者軌道予測器とF16戦闘機高度予測器を用いたケーススタディにおける本手法の有効性を示す。 Conformal prediction is a statistical tool for producing prediction regions of machine learning models that are valid with high probability. However, applying conformal prediction to time series data leads to conservative prediction regions. In fact, to obtain prediction regions over $T$ time steps with confidence $1-\delta$, {previous works require that each individual prediction region is valid} with confidence $1-\delta/T$. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. Instead of considering prediction errors individually at each time step, we consider a parameterized prediction error over multiple time steps. By optimizing the parameters over an additional dataset, we find prediction regions that are not conservative. We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP). Additionally, we prove that the relaxed LP has the same optimal cost as the original MILCP. Finally, we demonstrate the efficacy of our method on case studies using pedestrian trajectory predictors and F16 fighter jet altitude predictors. | 翻訳日:2023-08-25 18:12:09 公開日:2023-08-23 |
# LD-ZNet:テキストベース画像分割のための遅延拡散手法 LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation ( http://arxiv.org/abs/2303.12343v2 ) ライセンス: Link先を確認 | Koutilya Pnvr, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs | (参考訳) 画像分類やキャプション、自己監督技術といった大規模な事前学習タスクは、オブジェクトの意味的境界を学ぶインセンティブを与えません。
本報告では, LDMの内部特徴が豊富な意味情報を含んでいることを示すとともに, LD-ZNet方式でテキストセグメンテーションの性能をさらに向上させる手法を提案する。
プロジェクトはhttps://koutilya-pnvr.github.io/ld-znet/で入手できる。 Large-scale pre-training tasks like image classification, captioning, or self-supervised techniques do not incentivize learning the semantic boundaries of objects. However, recent generative foundation models built using text-based latent diffusion techniques may learn semantic boundaries. This is because they have to synthesize intricate details about all objects in an image based on a text description. Therefore, we present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets. First, we show that the latent space of LDMs (z-space) is a better input representation compared to other feature representations like RGB images or CLIP encodings for text-based image segmentation. By training the segmentation models on the latent z-space, which creates a compressed representation across several domains like different forms of art, cartoons, illustrations, and photographs, we are also able to bridge the domain gap between real and AI-generated images. We show that the internal features of LDMs contain rich semantic information and present a technique in the form of LD-ZNet to further boost the performance of text-based segmentation. Overall, we show up to 6% improvement over standard baselines for text-to-image segmentation on natural images. For AI-generated imagery, we show close to 20% improvement compared to state-of-the-art techniques. The project is available at https://koutilya-pnvr.github.io/LD-ZNet/. | 翻訳日:2023-08-25 18:10:45 公開日:2023-08-23 |
# バランスの取れた顔認識データセットで何がバランスをとるべきか? What Should Be Balanced in a "Balanced" Face Recognition Dataset? ( http://arxiv.org/abs/2304.09818v2 ) ライセンス: Link先を確認 | Haiyu Wu, Kevin W. Bowyer | (参考訳) 近年,顔認識精度における人口格差の問題が注目されている。
バイアスの少ない評価を行う能力を向上させるため,本論文では,評価データセットの作成を容易にするバイアス対応ツールキットを提案する。 The issue of demographic disparities in face recognition accuracy has attracted increasing attention in recent years. Various face image datasets have been proposed as 'fair' or 'balanced' to assess the accuracy of face recognition algorithms across demographics. These datasets typically balance the number of identities and images across demographics. It is important to note that the number of identities and images in an evaluation dataset are {\em not} driving factors for 1-to-1 face matching accuracy. Moreover, balancing the number of identities and images does not ensure balance in other factors known to impact accuracy, such as head pose, brightness, and image quality. We demonstrate these issues using several recently proposed datasets. To improve the ability to perform less biased evaluations, we propose a bias-aware toolkit that facilitates creation of cross-demographic evaluation datasets balanced on factors mentioned in this paper. | 翻訳日:2023-08-25 18:03:00 公開日:2023-08-23 |
# BadVFL: 垂直学習におけるバックドア攻撃 BadVFL: Backdoor Attacks in Vertical Federated Learning ( http://arxiv.org/abs/2304.08847v2 ) ライセンス: Link先を確認 | Mohammad Naseri, Yufei Han, Emiliano De Cristofaro | (参考訳) フェデレーション学習(federated learning, fl)は、複数のパーティがデータを共有することなく、機械学習モデルを協調的にトレーニングすることを可能にする。
i) 研修中にラベルにアクセスできないこと,及び
ii) 組込み機能のみにアクセスできるため,ラベルの変更はできない。
3つの異なるデータセットに対する攻撃の有効性を実証し、その成功に関わる要因を調査し、その影響を軽減するための対策について議論する。 Federated learning (FL) enables multiple parties to collaboratively train a machine learning model without sharing their data; rather, they train their own model locally and send updates to a central server for aggregation. Depending on how the data is distributed among the participants, FL can be classified into Horizontal (HFL) and Vertical (VFL). In VFL, the participants share the same set of training instances but only host a different and non-overlapping subset of the whole feature space. Whereas in HFL, each participant shares the same set of features while the training set is split into locally owned training data subsets. VFL is increasingly used in applications like financial fraud detection; nonetheless, very little work has analyzed its security. In this paper, we focus on robustness in VFL, in particular, on backdoor attacks, whereby an adversary attempts to manipulate the aggregate model during the training process to trigger misclassifications. Performing backdoor attacks in VFL is more challenging than in HFL, as the adversary i) does not have access to the labels during training and ii) cannot change the labels as she only has access to the feature embeddings. We present a first-of-its-kind clean-label backdoor attack in VFL, which consists of two phases: a label inference and a backdoor phase. We demonstrate the effectiveness of the attack on three different datasets, investigate the factors involved in its success, and discuss countermeasures to mitigate its impact. | 翻訳日:2023-08-25 18:01:57 公開日:2023-08-23 |
# 効率良く正確な材料照明推定のための因子化逆経路追跡 Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation ( http://arxiv.org/abs/2304.05669v2 ) ライセンス: Link先を確認 | Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi | (参考訳) 近年,室内シーンの幾何および多視点hdr観測により,複合材料と照明推定に逆経路追跡が適用されている。
当社のFactized Inverse Path Tracing (FIPT)は,光輸送の因子式を用いてこれらの課題に対処し,レンダリングエラーによって駆動されるエミッタを見つける。
ソースコードはhttps://github.com/lwwu2/fiptで入手できる。 Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. Our Factorized Inverse Path Tracing (FIPT) addresses these challenges by using a factored light transport formulation and finds emitters driven by rendering errors. Our algorithm enables accurate material and lighting optimization faster than previous work, and is more effective at resolving ambiguities. The exhaustive experiments on synthetic scenes show that our method (1) outperforms state-of-the-art indoor inverse rendering and relighting methods particularly in the presence of complex illumination effects; (2) speeds up inverse path tracing optimization to less than an hour. We further demonstrate robustness to noisy inputs through material and lighting estimates that allow plausible relighting in a real scene. The source code is available at: https://github.com/lwwu2/fipt | 翻訳日:2023-08-25 18:01:13 公開日:2023-08-23 |
# dh-ptam:ディープ・ハイブリッド・ステレオ・イベント・フレーム・並列トラッキング・マッピングシステム DH-PTAM: A Deep Hybrid Stereo Events-Frames Parallel Tracking And Mapping System ( http://arxiv.org/abs/2306.01891v2 ) ライセンス: Link先を確認 | Abanob Soliman, Fabien Bonardi, D\'esir\'e Sidib\'e, Samia Bouchafa | (参考訳) 本稿では,課題環境において優れた視覚並列追跡マッピング(ptam)システムに対するロバストなアプローチを提案する。
VECtor と TUM-VIE ベンチマークの小規模および大規模実世界シーケンスの総合的な実験を通じて,提案手法(DH-PTAM)は,特に大規模 HDR シナリオにおいて,悪条件におけるロバスト性と精度において優れた性能を示す。
私たちの実装のリサーチベースのPython APIは、さらなる研究と開発のためにGitHubで公開されている。 This paper presents a robust approach for a visual parallel tracking and mapping (PTAM) system that excels in challenging environments. Our proposed method combines the strengths of heterogeneous multi-modal visual sensors, including stereo event-based and frame-based sensors, in a unified reference frame through a novel spatio-temporal synchronization of stereo visual frames and stereo event streams. We employ deep learning-based feature extraction and description for estimation to enhance robustness further. We also introduce an end-to-end parallel tracking and mapping optimization layer complemented by a simple loop-closure algorithm for efficient SLAM behavior. Through comprehensive experiments on both small-scale and large-scale real-world sequences of VECtor and TUM-VIE benchmarks, our proposed method (DH-PTAM) demonstrates superior performance in terms of robustness and accuracy in adverse conditions, especially in large-scale HDR scenarios. Our implementation's research-based Python API is publicly available on GitHub for further research and development: https://github.com/AbanobSoliman/DH-PTAM. | 翻訳日:2023-08-25 17:52:23 公開日:2023-08-23 |
# LANISTR: 構造化データと非構造化データによるマルチモーダル学習 LANISTR: Multimodal Learning from Structured and Unstructured Data ( http://arxiv.org/abs/2305.16556v2 ) ライセンス: Link先を確認 | Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister | (参考訳) マルチモーダルな大規模事前トレーニングは、言語、画像、オーディオ、ビデオを含む非構造化データに対して印象的なパフォーマンスを示している。
このギャップを埋めるために,LANguage, Image, STRucturedデータから学習する注目ベースのフレームワークLANISTRを提案する。
lanistr の方法論の中核は、ユニモーダルレベルとマルチモーダルレベルの両方に適用される \textit{masking-based} トレーニングにある。
MIMIC-IV (Healthcare) とAmazon Product Review (Retail) の2つの実世界のデータステーにおいて、LANISTRは、最先端の代替品と比較して、それぞれ0.1\%と0.01\%で微調整されたときに、6.6\% (AUROC) と14\% (精度) の絶対的な改善を示す。
これらの改善は、各データセットにおいて35.7\%と99.8\%というかなりの欠落率が存在する場合でも観察される。 Multimodal large-scale pretraining has shown impressive performance for unstructured data including language, image, audio, and video. However, a prevalent real-world scenario involves the combination of structured data types (tabular, time-series) with unstructured data which has so far been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datastes, MIMIC-IV (healthcare) and Amazon Product Review (retail), LANISTR demonstrates remarkable absolute improvements of 6.6\% (AUROC) and up to 14\% (accuracy) when fine-tuned on 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even in the presence of considerable missingness ratios of 35.7\% and 99.8\%, in the respective datasets. | 翻訳日:2023-08-25 17:51:18 公開日:2023-08-23 |
# PruMUX: モデル圧縮によるデータ多重化の強化 PruMUX: Augmenting Data Multiplexing with Model Compression ( http://arxiv.org/abs/2305.14706v2 ) ライセンス: Link先を確認 | Yushan Su, Vishvak Murahari, Karthik Narasimhan, Kai Li | (参考訳) 言語モデルのサイズが日に日に大きくなるにつれ、効率的な推論の手法は様々なアプリケーションでその能力を活用するのに不可欠である。
先行研究は, モデルプルーニング, 知識蒸留, データ多重化といった手法を調査し, 精度を犠牲にすることなく, モデルのスループットを向上させる。
prumux は bert-base モデルに対する最大 7.5-29.5 倍のスループット向上を実現し,精度閾値を 80% から 74% まで向上させた。
そこで本研究では, 精度低下予算を考慮し, プルーニングおよび多重化の高性能パラメータを予測可能なメタレベルモデルであるauto-prumuxを提案する。 As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods -- structured pruning and data multiplexing -- to compound the speedup gains obtained by either method. Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%. We further study various combinations of parameters (such as sparsity and multiplexing factor) in the two techniques to provide a comprehensive analysis of the tradeoff between accuracy and throughput in the resulting models. We then propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget, providing a practical method to leverage the combination effectively. | 翻訳日:2023-08-25 17:50:34 公開日:2023-08-23 |
# AI時代の偽情報2.0:サイバーセキュリティの観点から Disinformation 2.0 in the Age of AI: A Cybersecurity Perspective ( http://arxiv.org/abs/2306.05569v2 ) ライセンス: Link先を確認 | Wojciech Mazurczyk, Dongwon Lee, Andreas Vlachos | (参考訳) 近年のAI技術の爆発的な進歩により、偽情報研究の現場も急速に変化することが期待されている。
次に、偽情報2.0とサイバーセキュリティの適合と、偽情報2.0の脅威を包括的に対処するための階層化対策について論じる。 With the explosive advancement of AI technologies in recent years, the scene of the disinformation research is also expected to rapidly change. In this viewpoint article, in particular, we first present the notion of "disinformation 2.0" in the age of AI where disinformation would become more targeted and personalized, its content becomes very difficult to distinguish from real news, and its creation and dissemination become more accelerated by AI. Then, we discuss how disinformation 2.0 and cybersecurity fit and a possible layered countermeasure to address the threat in disinformation 2.0 in a holistic manner. | 翻訳日:2023-08-25 17:41:36 公開日:2023-08-23 |
# エントロピー最適輸送のための最小固有次元スケーリング Minimum intrinsic dimension scaling for entropic optimal transport ( http://arxiv.org/abs/2306.03398v2 ) ライセンス: Link先を確認 | Austin J. Stromme | (参考訳) 高い外生次元のデータがまだ低い内生次元を持つ可能性があるという多様体仮説に動機づけられた我々は、データの内生次元に敏感なエントロピー最適輸送のための洗練された統計境界を開発する。
これを最小内在次元スケーリング(mid scaling)現象と呼び、コストが有界かつリプシッツである限り、データ分布を仮定せずにミッドスケーリングを確立する。
本研究は,中間スケーリングが一般的な現象であることを示し,遠距離尺度としてエントロピー正則化の統計的効果を初めて厳密に解釈することで,芸術の理論的状態を著しく前進させた。 Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale. | 翻訳日:2023-08-25 17:40:45 公開日:2023-08-23 |
# ビームスプリッタアレイ上の量子ランダムウォーク Quantum random walks on a beam splitter array ( http://arxiv.org/abs/2307.04262v2 ) ライセンス: Link先を確認 | Mario Ivan Estrada Delgado and Zurika Iveth Blanco Garcia | (参考訳) ビームスプリッタアレイの一般的な行列表現を示す。
これらの演算子により、配列全体を記述し、その結果、入力光子状態の最終確率分布を計算することができる。 The general matrix representation of a beam splitter array is presented. Each beam splitter has a transmission/reflection coefficient that determines the behavior of these individual devices and, in consequence, the whole system response. The general matrix representation of each beam splitter is given as rotations of a $2n-{th}$ dimensional space. With these operators, the matrix that describes the entire array and, consequently, the final probability distribution of an input photon state can be calculated. | 翻訳日:2023-08-25 17:30:40 公開日:2023-08-23 |
# コンパクト化演算子を用いた条件付き期待 Conditional expectation using compactification operators ( http://arxiv.org/abs/2306.10592v3 ) ライセンス: Link先を確認 | Suddhasattwa Das | (参考訳) 分数化、最小二乗期待、多様体学習という別のタスクは、しばしば2つの確率変数の積から生じる条件付き期待を見つける共通の設定で与えられる。
全体的なテクニックは実装が容易で、現実世界の問題に対する彼らの成功例も示されています。 The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown. | 翻訳日:2023-08-25 17:29:35 公開日:2023-08-23 |
# マルチメディアレコメンデーションのためのパレート不変表現学習 Pareto Invariant Representation Learning for Multimedia Recommendation ( http://arxiv.org/abs/2308.04706v2 ) ライセンス: Link先を確認 | Shanshan Huang, Haoxuan Li, Qingsong Li, Chunyuan Zheng, Li Liu | (参考訳) マルチメディアレコメンデーションには、パーソナライズされたランキングタスクが含まれており、通常、マルチメディアコンテンツはジェネリックエンコーダを使って表現される。
本稿では,IID-OOD多目的最適化の観点から,不変表現(ユーザの注意を惹きつける固有の要因)と変動表現(他の要因)を同時に学習することにより,刺激的相関の影響を緩和するPareto Invariant Representation Learning(PaInvRL)というフレームワークを提案する。
提案したPaInvRLと3つの公開マルチメディアレコメンデーションデータセット(Movielens,Tiktok,Kwai)の最先端のレコメンデーションモデルを比較し,PaInvRLの内外の学習への適用性を検証する。 Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning. | 翻訳日:2023-08-25 17:21:05 公開日:2023-08-23 |
# 開発ブートストラップ:単純な能力からインテリジェントな人間互換AIへ Developmental Bootstrapping: From Simple Competences to Intelligent Human-Compatible AIs ( http://arxiv.org/abs/2308.04586v4 ) ライセンス: Link先を確認 | Mark Stefik and Robert Price | (参考訳) 一部のAIは、ボードゲームのようなクローズドな人工世界で人間の能力を上回るが、現実では奇妙な間違いを犯し、気づかない。
このポジションペーパーは、堅牢で信頼性があり、人間と互換性のあるAIを作るために、開発ブートストラップの実践を拡張するための論理、見通し、ギャップ、課題を概説する。 Although some AIs surpass human abilities in closed artificial worlds such as board games, in the real world they make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. Mainstream approaches for creating AIs include the traditional manually-constructed symbolic AI approach and the generative and deep learning AI approaches including large language models (LLMs). Although it is outside of the mainstream, the developmental bootstrapping approach may have more potential. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. They interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. They acquire the competences they need through competence bootstrapping. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped before reaching the Toddler Barrier. This corresponds to human infant development at about two years of age, before infant speech becomes fluent. They also do not bridge the Reading Barrier, where they could skillfully and skeptically draw on the socially developed online information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This position paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust, trustworthy, and human-compatible AIs. | 翻訳日:2023-08-25 17:20:29 公開日:2023-08-23 |
# 大規模地質炭素貯蔵の高速モデリングのための多次元フーリエニューラルオペレータ Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage ( http://arxiv.org/abs/2308.09113v2 ) ライセンス: Link先を確認 | Hewei Tang, Qingkai Kong and Joseph P. Morris | (参考訳) 深層学習に基づくサロゲートモデルが地熱炭素貯蔵(GCS)問題に広く応用され、貯水池圧力の予測とCO2配管の移動が加速された。
Fourier Neural Operatorは望ましいグリッド不変性を持ち、異なる離散化を持つデータセット間の転送学習手順を単純化する。
高忠実度データが極端に制限された場合でも、多忠実度FNOモデルが妥当な精度で圧力場を予測できることを観察する。 Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited. | 翻訳日:2023-08-25 17:10:24 公開日:2023-08-23 |
# ドメイン間の信頼できる表現学習 Trustworthy Representation Learning Across Domains ( http://arxiv.org/abs/2308.12315v1 ) ライセンス: Link先を確認 | Ronghang Zhu and Dongliang Guo and Daiqing Qi and Zhixuan Chu and Xiang Yu and Sheng Li | (参考訳) AIシステムは、私たちの日常生活や人間社会に広く展開する上で、重要なパフォーマンスを得たので、人々はこれらの技術がもたらす利益を享受し、これらのシステムによって引き起こされる多くの社会的問題に苦しむ。
最後に,今後の研究方向性に関する知見と議論をまとめてまとめる。 As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions. | 翻訳日:2023-08-25 17:01:06 公開日:2023-08-23 |
# Spresense による視線推定 Gaze Estimation on Spresense ( http://arxiv.org/abs/2308.12313v1 ) ライセンス: Link先を確認 | Thomas Ruegg, Pietro Bonazzi, Andrea Ronco | (参考訳) 視線推定は、人間とコンピュータの相互作用、仮想現実、医学などの分野に多くの応用がある貴重な技術である。
本稿では,sony spresenseマイクロコントローラを用いた視線推定システムの実装と,そのレイテンシ,mac/cycle,電力消費における性能について検討する。
我々の軽量モデルTinyTrackerSは、85.8kパラメータを使用してわずか169Kbの大きさで、Spresenseプラットフォーム上で3FPSで動作する。 Gaze estimation is a valuable technology with numerous applications in fields such as human-computer interaction, virtual reality, and medicine. This report presents the implementation of a gaze estimation system using the Sony Spresense microcontroller board and explores its performance in latency, MAC/cycle, and power consumption. The report also provides insights into the system's architecture, including the gaze estimation model used. Additionally, a demonstration of the system is presented, showcasing its functionality and performance. Our lightweight model TinyTrackerS is a mere 169Kb in size, using 85.8k parameters and runs on the Spresense platform at 3 FPS. | 翻訳日:2023-08-25 17:00:47 公開日:2023-08-23 |
# 核融合プラズマの動力学シミュレーションにおける波粒子共鳴記述への物理情報ニューラルネットワークの適用 Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas ( http://arxiv.org/abs/2308.12312v1 ) ライセンス: Link先を確認 | Jai Kumar (IRFM), David Zarzoso (M2P2), Virginie Grandgirard (IRFM), Jan Ebert, Stefan Kesselheim | (参考訳) Vlasov-Poisson システムは、物理情報ニューラルネットワーク (PINN) を波動粒子共鳴に適用するための試験ベッドとして、縮小形 (1D1V) で使用されている。
第二に、Vlasov-PoissonシステムへのPINNの適用は、積分方程式を解くための自動微分法と、積分方程式を解くための自動積分法に基づいて、PINN変種であるI-PINN(Integable PINN)の実装を動機付ける積分部にも特に重点を置いている。 The Vlasov-Poisson system is employed in its reduced form version (1D1V) as a test bed for the applicability of Physics Informed Neural Network (PINN) to the wave-particle resonance. Two examples are explored: the Landau damping and the bump-on-tail instability. PINN is first tested as a compression method for the solution of the Vlasov-Poisson system and compared to the standard neural networks. Second, the application of PINN to solving the Vlasov-Poisson system is also presented with the special emphasis on the integral part, which motivates the implementation of a PINN variant, called Integrable PINN (I-PINN), based on the automatic-differentiation to solve the partial differential equation and on the automatic-integration to solve the integral equation. | 翻訳日:2023-08-25 17:00:36 公開日:2023-08-23 |
# 影響支援カノニカルフォームを用いた高速NPN分類 Fast Exact NPN Classification with Influence-aided Canonical Form ( http://arxiv.org/abs/2308.12311v1 ) ライセンス: Link先を確認 | Yonghe Zhang, Liwei Ni, Jiaxi Zhang, Guojie Luo, Huawei Li, Shenggen Zheng | (参考訳) NPN分類はデジタル回路の合成と検証に多くの応用がある。
実験の結果, 正準形式計算における変換列挙量を削減する上で, 影響が重要な役割を担っていることがわかった。
ABCで実装された最先端のアルゴリズムと比較すると,NPN分類の精度は5.5倍に向上する。 NPN classification has many applications in the synthesis and verification of digital circuits. The canonical-form-based method is the most common approach, designing a canonical form as representative for the NPN equivalence class first and then computing the transformation function according to the canonical form. Most works use variable symmetries and several signatures, mainly based on the cofactor, to simplify the canonical form construction and computation. This paper describes a novel canonical form and its computation algorithm by introducing Boolean influence to NPN classification, which is a basic concept in analysis of Boolean functions. We show that influence is input-negation-independent, input-permutation-dependent, and has other structural information than previous signatures for NPN classification. Therefore, it is a significant ingredient in speeding up NPN classification. Experimental results prove that influence plays an important role in reducing the transformation enumeration in computing the canonical form. Compared with the state-of-the-art algorithm implemented in ABC, our influence-aided canonical form for exact NPN classification gains up to 5.5x speedup. | 翻訳日:2023-08-25 17:00:19 公開日:2023-08-23 |
# Trapdoor Claw-free関数に基づくより良い量子シール方式 Better Quantum Seal Schemes based on Trapdoor Claw-Free Functions ( http://arxiv.org/abs/2308.12310v1 ) ライセンス: Link先を確認 | Xiaogang Cheng, Ren Guo | (参考訳) 古典的な情報のシールは単純に不可能です。
esp. quantum unclonable theorem 量子情報に基づいて、量子シールは完璧に構築される。
本稿では,LWEの仮定に基づいて構築可能なTCF(Trapdoor Claw Free)関数を用いて最適境界を超える方法を示す。
したがって、量子後安全である。 Seal in classical information is simply impossible. Since classical information can be easily copied any number of times. Based on quantum information, esp. quantum unclonable theorem, quantum seal maybe constructed perfectly. But it is shown that perfect quantum seal is impossible, and the success probability is bounded. In this paper, we show how to exceed the optimal bound by using the TCF (Trapdoor Claw Free) functions, which can be constructed based on LWE assumption. Hence it is post-quantum secure. | 翻訳日:2023-08-25 17:00:01 公開日:2023-08-23 |
# 制約付きシュタイン変分軌道最適化 Constrained Stein Variational Trajectory Optimization ( http://arxiv.org/abs/2308.12110v1 ) ライセンス: Link先を確認 | Thomas Power and Dmitry Berenson | (参考訳) 本稿では,一連のトラジェクトリに制約を加えてトラジェクトリ最適化を行うアルゴリズムであるConstrained Stein Variational Trajectory Optimization (CSVTO)を提案する。
提案手法では,制約に従いながら,低コスト軌道上の分布を近似する粒子の集合を見つけるために,Stein Variational Gradient Descent (SVGD) を用いる。
本研究は,多種多様な制約満足軌道の生成により,障害に対する堅牢性やベースラインに対する初期化が向上することを示す。 We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines. | 翻訳日:2023-08-25 16:58:32 公開日:2023-08-23 |
# 構造不安定下における形態形成へのデータ駆動アプローチ A Data-Driven Approach to Morphogenesis under Structural Instability ( http://arxiv.org/abs/2308.11846v1 ) ライセンス: Link先を確認 | Yingjie Zhao and Zhiping Xu | (参考訳) 構造不安定下での進化パターンへの形態的発達は、生体系において至るところで見られ、しばしば工学構造にとって重要なものである。
重要な分岐特性を特定し、世界的および地域的特徴から歴史依存的な発展を予測する能力は、脳の成長と航空宇宙構造設計の例によって示され、疾患の診断/予後および不安定耐性設計のガイドラインを提供する。 Morphological development into evolutionary patterns under structural instability is ubiquitous in living systems and often of vital importance for engineering structures. Here we propose a data-driven approach to understand and predict their spatiotemporal complexities. A machine-learning framework is proposed based on the physical modeling of morphogenesis triggered by internal or external forcing. Digital libraries of structural patterns are constructed from the simulation data, which are then used to recognize the abnormalities, predict their development, and assist in risk assessment and prognosis. The capabilities to identify the key bifurcation characteristics and predict the history-dependent development from the global and local features are demonstrated by examples of brain growth and aerospace structural design, which offer guidelines for disease diagnosis/prognosis and instability-tolerant design. | 翻訳日:2023-08-25 16:58:08 公開日:2023-08-23 |
# 慣性核融合ターゲット設計のための停止電力の量子計算 Quantum computation of stopping power for inertial fusion target design ( http://arxiv.org/abs/2308.12352v1 ) ライセンス: Link先を確認 | Nicholas C. Rubin, Dominic W. Berry, Alina Kononov, Fionn D. Malone, Tanuj Khattar, Alec White, Joonho Lee, Hartmut Neven, Ryan Babbush, Andrew D. Baczewski | (参考訳) 停止パワー(英語: Stopping power)とは、物質がそれを通過する荷電粒子の運動エネルギーを吸収する速度である。
我々のアプローチは、Su et al の電子構造ブロック符号化に基づいている。
[prx量子2,0403322021] 有限温度における複数の粒子種の非ボルン-オッペンハイマーダイナミクスから観測可能量を推定するためにこれらのアルゴリズムを適応・最適化する。
我々は,FeMoCo や P450 などの産業関連分子の最先端量子シミュレーションにおいて,科学的に興味深い古典的な停止電力計算を,ほぼ同じ数の論理量子ビットと約100倍のトフォリゲートで量子シミュレーションすることができると推定した。 Stopping power is the rate at which a material absorbs the kinetic energy of a charged particle passing through it -- one of many properties needed over a wide range of thermodynamic conditions in modeling inertial fusion implosions. First-principles stopping calculations are classically challenging because they involve the dynamics of large electronic systems far from equilibrium, with accuracies that are particularly difficult to constrain and assess in the warm-dense conditions preceding ignition. Here, we describe a protocol for using a fault-tolerant quantum computer to calculate stopping power from a first-quantized representation of the electrons and projectile. Our approach builds upon the electronic structure block encodings of Su et al. [PRX Quantum 2, 040332 2021], adapting and optimizing those algorithms to estimate observables of interest from the non-Born-Oppenheimer dynamics of multiple particle species at finite temperature. Ultimately, we report logical qubit requirements and leading-order Toffoli costs for computing the stopping power of various projectile/target combinations relevant to interpreting and designing inertial fusion experiments. We estimate that scientifically interesting and classically intractable stopping power calculations can be quantum simulated with roughly the same number of logical qubits and about one hundred times more Toffoli gates than is required for state-of-the-art quantum simulations of industrially relevant molecules such as FeMoCo or P450. | 翻訳日:2023-08-25 16:49:26 公開日:2023-08-23 |
# Schr\"{o}dinger Bridgeによる生成モデルベース展開の改善 Improving Generative Model-based Unfolding with Schr\"{o}dinger Bridges ( http://arxiv.org/abs/2308.12351v1 ) ライセンス: Link先を確認 | Sascha Diefenbacher, Guan-Horng Liu, Vinicius Mikuni, Benjamin Nachman, and Weili Nie | (参考訳) 機械学習に基づく展開により、未結合かつ高次元の断面積測定が可能になった。
本研究では, シュレーディンガー橋と拡散モデルを用いて, 判別モデルと生成モデルの強みを結合した, sbunfold を作成することを提案する。
SBUnfoldは,合成Z+jetsデータセット上でのアート手法の状態と比較して優れた性能を示す。 Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of phase space with little data. We propose to use Schroedinger Bridges and diffusion models to create SBUnfold, an unfolding approach that combines the strengths of both discriminative and generative models. The key feature of SBUnfold is that its generative model maps one set of events into another without having to go through a known probability density as is the case for normalizing flows and standard diffusion models. We show that SBUnfold achieves excellent performance compared to state of the art methods on a synthetic Z+jets dataset. | 翻訳日:2023-08-25 16:48:54 公開日:2023-08-23 |
# ドメイン適応セマンティックセマンティックセグメンテーションのためのラベル誘導を用いた拡散画像変換 Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation ( http://arxiv.org/abs/2308.12350v1 ) ライセンス: Link先を確認 | Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu | (参考訳) ターゲットモデルを学習するためのソースドメインからターゲットドメインへの変換は、ドメイン適応セマンティックセグメンテーション(DASS)において最も一般的な戦略の1つである。
広範な実験により,最先端手法に対するアプローチの優位性が実証された。 Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. Concretely, we formulate cross-domain image translation as a denoising diffusion process and utilize a novel Semantic Gradient Guidance (SGG) method to constrain the translation process, conditioning it on the pixel-wise source labels. Additionally, a Progressive Translation Learning (PTL) strategy is devised to enable the SGG method to work reliably across domains with large gaps. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods. | 翻訳日:2023-08-25 16:48:29 公開日:2023-08-23 |
# 動的多体フレア点における破壊共鳴の収束 The confluence of fractured resonances at points of dynamical, many--body flare ( http://arxiv.org/abs/2308.12346v1 ) ライセンス: Link先を確認 | Bitan De, Gabriela W\'ojtowicz, Marek M. Rams, Michael Zwolak, and Jakub Zakrzewski | (参考訳) 共鳴輸送は、ある空間媒体に周波数の一致があるときに起こり、ある貯水池から別の貯水池へ粒子を閉鎖する効率を高める。
At one confluence, the interaction strength is finite and the essential resonance arises due to the interplay of interaction with the counter--rotating terms of the periodic drive. The other forms where several paths split by the many--body interaction merge in the non--interacting limit. We discuss the origin and structure of the fractured resonances, as well as the scaling of the conductance on system parameters. These results furnish a new example of the richness of open, driven, many--body systems. Resonant transport occurs when there is a matching of frequencies across some spatial medium, increasing the efficiency of shuttling particles from one reservoir to another. We demonstrate that in a periodically driven, many--body titled lattice there are sets of spatially fractured resonances. These ``emanate'' from two essential resonances due to scattering off internal surfaces created when the driving frequency and many--body interaction strength vary, a scattering reminiscent of lens flare. The confluence of these fractured resonances dramatically enhances transport. At one confluence, the interaction strength is finite and the essential resonance arises due to the interplay of interaction with the counter--rotating terms of the periodic drive. The other forms where several paths split by the many--body interaction merge in the non--interacting limit. We discuss the origin and structure of the fractured resonances, as well as the scaling of the conductance on system parameters. These results furnish a new example of the richness of open, driven, many--body systems. | 翻訳日:2023-08-25 16:48:01 公開日:2023-08-23 |
# 量子コヒーレンスの錬金術:漸近的および触媒的コヒーレンス操作における任意増幅 Alchemy of quantum coherence: Arbitrary amplification in asymptotic and catalytic coherence manipulation ( http://arxiv.org/abs/2308.12338v1 ) ライセンス: Link先を確認 | Naoto Shiraishi and Ryuji Takagi | (参考訳) 量子コヒーレンス(quantum coherence)は、古典理論と量子理論を区別する基本的な側面の1つである。
この現象を, 漸近的および触媒的変換という2つの操作条件下で示す。
触媒を持つ非漸近変換 -- 変換の後に局所的に元の形で残るヘルパー状態 -- において、任意の状態が任意の低コヒーレント状態から得られることを示す。
反対に、上記の増幅は小さいがゼロでないコヒーレンスを必要とすることを示し、コヒーレンス変換の異常なパワーが有効である条件を特徴付ける。 Quantum coherence is one of the fundamental aspects distinguishing classical and quantum theories. Coherence between different energy eigenstates is particularly important, as it serves as a valuable resource under the law of energy conservation. A fundamental question in this setting is how well one can prepare good coherent states from low coherent states and whether a given coherent state is convertible to another one. Here, contrarily to intuitions and previous expectations, we show that any low coherent state is convertible to any high coherent state arbitrarily well, implying that one can increase the amount of quantum coherence inexhaustibly. We demonstrate this remarkable phenomenon in two operational settings: asymptotic and catalytic transformations. For a variant of asymptotic coherence manipulation, the rate of transformation becomes unbounded regardless of how weak the initial coherence is. This particularly shows that the infinite rate of coherence distillation can be accomplished for all coherent states. In a non-asymptotic transformation with a catalyst -- a helper state that locally remains in the original form after the transformation, we show that an arbitrary state can be obtained from any low coherent states. Our protocol avoids the barrier of quantum coherence in state conversion and allows us to amplify quantum coherence infinitely. On its opposite side, we show that the aforementioned amplification requires small but non-zero coherence, characterizing the condition under which the anomalous power of coherence transformation is enabled. | 翻訳日:2023-08-25 16:47:21 公開日:2023-08-23 |
# 決定図を用いた混合次元量子回路シミュレーション Mixed-Dimensional Quantum Circuit Simulation with Decision Diagrams ( http://arxiv.org/abs/2308.12332v1 ) ライセンス: Link先を確認 | Kevin Mato and Stefan Hillmich and Robert Wille | (参考訳) 量子コンピュータは、従来のコンピュータよりも早く、いくつかのカテゴリの問題を解決することを約束している。
シミュレータのソースコードはMITライセンスの下でgithub.com/cda-tum/MiSiMで入手できる。 Quantum computers promise to solve several categories of problems faster than classical computers ever could. Current research mostly focuses on qubits, i.e., systems where the unit of information can assume only two levels. However, the underlying physics of most (if not all) of the technological platforms supports more than two levels, commonly referred to as qudits. Performing computations with qudits increases the overall complexity while, at the same time, reducing the number of operations and providing a lower error rate. Furthermore, qudits with different number of levels can be mixed in one system to ease the experimental control and keep representations as compact as possible. Exploiting these capabilities requires dedicated software support to tackle the increased complexity in an automated and efficient fashion. In this paper, we present a qudit simulator that handles mixed-dimensional systems based on Decision Diagrams (DDs). More precisely, we discuss the type of decision diagram introduced as underlying data structure as well as the resulting implementation. Experimental evaluations demonstrate that the proposed solution is capable of efficiently simulating mixed-dimensional quantum circuits, with specific use cases including more than 100 qudits in one circuit. The source code of the simulator is available via github.com/cda-tum/MiSiM under the MIT~license. | 翻訳日:2023-08-25 16:46:40 公開日:2023-08-23 |
# 異なる機械学習手法による薬物溶解度予測 -抽出化学特性とグラフ畳み込みニューラルネットワークを用いた回帰モデル- Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network ( http://arxiv.org/abs/2308.12325v1 ) ライセンス: Link先を確認 | John Ho, Zhao-Heng Yin, Colin Zhang, Henry Overhauser, Kyle Swanson, Yang Ha | (参考訳) 与えられた分子の溶解度を予測することは製薬業界において重要な課題であり、これはよく研究された話題である。
今後の取り組みは、gcnnの高性能と線形回帰の解釈可能性を組み合わせた次世代の高スループットスクリーニングにおける新たな進歩の実現を目標とすべきである。 Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening. | 翻訳日:2023-08-25 16:46:21 公開日:2023-08-23 |
# マルチモード観測による暗黒シーンの理解 Understanding Dark Scenes by Contrasting Multi-Modal Observations ( http://arxiv.org/abs/2308.12320v1 ) ライセンス: Link先を確認 | Xiaoyu Dong and Naoto Yokoya | (参考訳) 多モード画像データに基づく暗黒シーンの理解は、視覚と補助の両方がタスクに限定的な意味情報を提供するため困難である。
コードと事前訓練されたモデルはhttps://github.com/palmdong/SMMCL.comで入手できる。 Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be together or apart. We validate our approach on a variety of tasks that cover diverse light conditions and image modalities. Experiments show that our approach can effectively enhance dark scene understanding based on multi-modal images with limited semantics by shaping semantic-discriminative feature spaces. Comparisons with previous methods demonstrate our state-of-the-art performance. Code and pretrained models are available at https://github.com/palmdong/SMMCL. | 翻訳日:2023-08-25 16:46:05 公開日:2023-08-23 |
# 削除ネット:DNN指紋除去攻撃 RemovalNet: DNN Fingerprint Removal Attacks ( http://arxiv.org/abs/2308.12319v1 ) ライセンス: Link先を確認 | Hongwei Yao, Zheng Li, Kunzhe Huang, Jian Lou, Zhan Qin, Kui Ren | (参考訳) ディープニューラルネットワーク(DNN)の性能が著しく向上し、DNNは多くの分野で広く利用されている。
私たちのコードは、https://github.com/grasses/RemovalNet.comで利用可能です。 With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet. | 翻訳日:2023-08-25 16:45:40 公開日:2023-08-23 |
# グラフニューラル確率微分方程式 Graph Neural Stochastic Differential Equations ( http://arxiv.org/abs/2308.12316v1 ) ライセンス: Link先を確認 | Richard Bergna, Felix Opolka, Pietro Li\`o, Jose Miguel Hernandez-Lobato | (参考訳) 本稿では,新しいモデルグラフニューラルネットワーク確率微分方程式(graph neural sdes)を提案する。
この手法は、ブラウン運動を用いたデータ表現にランダム性を埋め込むことにより、グラフニューラル常微分方程式(graph neural odes)を強化する。
本フレームワークでは,textit{Latent Graph Neural SDE} 変種に着目し,その有効性を示す。
実験的な研究により、グラフ畳み込みネットワークやグラフニューラルODEといった従来のモデル、特に信頼性予測において、潜在グラフニューラルSDEが超越していることが判明した。 We present a novel model Graph Neural Stochastic Differential Equations (Graph Neural SDEs). This technique enhances the Graph Neural Ordinary Differential Equations (Graph Neural ODEs) by embedding randomness into data representation using Brownian motion. This inclusion allows for the assessment of prediction uncertainty, a crucial aspect frequently missed in current models. In our framework, we spotlight the \textit{Latent Graph Neural SDE} variant, demonstrating its effectiveness. Through empirical studies, we find that Latent Graph Neural SDEs surpass conventional models like Graph Convolutional Networks and Graph Neural ODEs, especially in confidence prediction, making them superior in handling out-of-distribution detection across both static and spatio-temporal contexts. | 翻訳日:2023-08-25 16:45:12 公開日:2023-08-23 |
# FG-Net:一般化可能なピラミッド特徴を用いた顔行動単位検出 FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features ( http://arxiv.org/abs/2308.12380v1 ) ライセンス: Link先を確認 | Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani | (参考訳) 顔行動ユニット(AU)の自動検出は、客観的な表情分析を可能にする。
我々のコードは \url{https://github.com/ihp-lab/FG-Net} でリリースされる。 Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detection. Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset. Then, these features are used to detect AUs with a Pyramid CNN Interpreter, making the training efficient and capturing essential local features. The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model. Extensive experiments are conducted to evaluate within- and cross-corpus AU detection with the widely-used DISFA and BP4D datasets. Compared with the state-of-the-art, the proposed method achieves superior cross-domain performance while maintaining competitive within-domain performance. In addition, FG-Net is data-efficient and achieves competitive performance even when trained on 1000 samples. Our code will be released at \url{https://github.com/ihp-lab/FG-Net} | 翻訳日:2023-08-25 16:39:38 公開日:2023-08-23 |
# 汎用マルチタスク学習のための視覚トランスフォーマーアダプタ Vision Transformer Adapters for Generalizable Multitask Learning ( http://arxiv.org/abs/2308.12372v1 ) ライセンス: Link先を確認 | Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann | (参考訳) 我々は,新しいタスクやドメインに適用可能な汎用的なタスク親和性を学ぶ,最初のマルチタスク・ビジョン・トランスフォーマー・アダプタを紹介する。
既製のvision transformer backboneに組み込まれており、パラメトリックに高価な既存のマルチタスクトランスフォーマーとは異なり、複数の高密度視覚タスクを同時にパラメータ効率良く解くことができます。
プロジェクトページは \url{https://ivrl.github.io/VTAGML} にある。 We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network-based multitasking methods but also the vision transformer-based ones. Our project page is at \url{https://ivrl.github.io/VTAGML}. | 翻訳日:2023-08-25 16:39:17 公開日:2023-08-23 |
# 神経アンサンブルと最大エントロピー損失と特徴強調を用いたオープンセット顔認識 Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation ( http://arxiv.org/abs/2308.12371v1 ) ライセンス: Link先を確認 | Rafael Henrique Vareto and Manuel G\"unther and William Robson Schwartz | (参考訳) オープンセット顔認識(open-set face recognition)は、生体認証システムが既存の全ての主題について不完全な知識を持つシナリオを指す。
我々は、よく知られたLFWとIJB-Cデータセットで実験を行い、その結果、アプローチがクローズドおよびオープンセットの識別率を高めることができることを示す。 Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates. | 翻訳日:2023-08-25 16:39:00 公開日:2023-08-23 |
# adverb: 視覚誘導オーディオの残響 AdVerb: Visually Guided Audio Dereverberation ( http://arxiv.org/abs/2308.12370v1 ) ライセンス: Link先を確認 | Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi and Dinesh Manocha | (参考訳) 本稿では,残響音に加えて視覚的手がかりを用いてクリーンな音声を推定する新しい音声-視覚的除去フレームワークAdVerbを提案する。
本手法の有効性は, 定量的および定性的な評価によって実証される。
また、AVSpeechデータセット上でRT60エラースコアを高い精度で達成する。 We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complementary visual modality to perform audio dereverberation. Given an image of the environment where the reverberated sound signal has been recorded, AdVerb employs a novel geometry-aware cross-modal transformer architecture that captures scene geometry and audio-visual cross-modal relationship to generate a complex ideal ratio mask, which, when applied to the reverberant audio predicts the clean sound. The effectiveness of our method is demonstrated through extensive quantitative and qualitative evaluations. Our approach significantly outperforms traditional audio-only and audio-visual baselines on three downstream tasks: speech enhancement, speech recognition, and speaker verification, with relative improvements in the range of 18% - 82% on the LibriSpeech test-clean set. We also achieve highly satisfactory RT60 error scores on the AVSpeech dataset. | 翻訳日:2023-08-25 16:38:43 公開日:2023-08-23 |
# tsallis確率振幅としての一般化不確実性関係に対するコヒーレント状態:非拡張熱統計学への新しい道 Coherent states for generalized uncertainty relations as Tsallis probability amplitudes: new route to non-extensive thermostatistics ( http://arxiv.org/abs/2308.12368v1 ) ライセンス: Link先を確認 | Petr Jizba, Gaetano Lambiase, Giuseppe Gaetano Luciano and Luciano Petruzziello | (参考訳) 一般化不確実性原理(GUP)に関連するコヒーレントな状態について検討する。
我々は、正および負の変形パラメータ $\beta$ のケースを別々に分析し、続く確率分布が、非拡張パラメータ $q$ が$\beta$ に単調に関連している Tsallis 分布であることを示す。
さらに、$\beta <0$ ($q<1$) の場合、GUP は Tsallis entropy-power ベースの不確実性関係の1パラメータクラスで再構成され、これは再び GUP のコヒーレント状態によって飽和される。
得られた$\beta$ は、弦理論モデルと自然性原理の両方によって予測される値と一致する。
この記事では、最近のレター[phys. rev. d 105, l121501 (2022)]をさらに拡張し、包括的に扱います。 We study coherent states associated to a generalized uncertainty principle (GUP). We separately analyze the cases of positive and negative deformation parameter $\beta$, showing that the ensuing probability distribution is a Tsallis distribution whose non-extensivity parameter $q$ is monotonically related to $\beta$. Moreover, for $\beta <0$ (corresponding to $q<1$), we reformulate the GUP in terms of a one-parameter class of Tsallis entropy-power based uncertainty relations, which are again saturated by the GUP coherent states. We argue that this combination of coherent states with Tsallis entropy offers a natural conceptual framework allowing to study quasi-classical regime of GUP in terms of non-extensive thermodynamics. We substantiate our claim by discussing generalization of Verlinde's entropic force and ensuing implications in the late-inflation epoch. Corresponding dependence of the $\beta$ parameter on cosmological time is derived for the reheating epoch. The obtained $\beta$ is consistent with values predicted by both string-theory models and the naturalness principle. Further salient issues, including derivation of new $\beta$-dependent expressions for the lowest possible value of the spin and Immirzi parameter in Loop Quantum Gravity, and connection of our proposal with the Magueijo--Smolin doubly special relativity are also discussed. This article provides a more extended and comprehensive treatment of our recent letter [Phys. Rev. D 105, L121501 (2022)]. | 翻訳日:2023-08-25 16:38:20 公開日:2023-08-23 |
# safear: リスクアウェアポリシによるより安全なアルゴリズムリコースに向けて SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies ( http://arxiv.org/abs/2308.12367v1 ) ライセンス: Link先を確認 | Haochen Wu, Shubham Sharma, Sunandita Patra, Sriram Gopalakrishnan | (参考訳) 金融や医療といった重要な分野における機械学習(ML)モデルの利用の増加に伴い、MLモデルの決定に悪影響を及ぼす人々に対して、レコメンデーションを提供する必要性が高まっている。
本手法を現実世界の2つのデータセットに適用し,リスク対策とリコース・デシダラタ(親和性と近接)を用いて,リスク回避の異なるレベルと比較する。 With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receive a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures ``Value at Risk'' and ``Conditional Value at Risk'' from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity). | 翻訳日:2023-08-25 16:37:46 公開日:2023-08-23 |
# 逐次誘導型生成ランダムウォークによるゼロショット学習 Continual Zero-Shot Learning through Semantically Guided Generative Random Walks ( http://arxiv.org/abs/2308.12366v1 ) ライセンス: Link先を確認 | Wenxuan Zhang, Paul Janson, Kai Yi, Ivan Skorokhodov, Mohamed Elhoseiny | (参考訳) 新たな概念を学び、過去の知識を思い出し、将来のタスクに適応させることは、人間の生涯を通じて同時に起こる。
提案アルゴリズムは, AWA1, AWA2, CUB, SUNデータセット上での最先端性能を達成し, 既存のCZSL手法を3~7%超えた。
コードはここで利用可能である。 \url{https://github.com/wx-zhang/igczsl} Learning novel concepts, remembering previous knowledge, and adapting it to future tasks occur simultaneously throughout a human's lifetime. To model such comprehensive abilities, continual zero-shot learning (CZSL) has recently been introduced. However, most existing methods overused unseen semantic information that may not be continually accessible in realistic settings. In this paper, we address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling. The heart of the generative-based methods is to learn quality representations from seen classes to improve the generative understanding of the unseen visual space. Motivated by this, we introduce generalization-bound tools and provide the first theoretical explanation for the benefits of generative modeling to CZSL tasks. Guided by the theoretical analysis, we then propose our learning algorithm that employs a novel semantically guided Generative Random Walk (GRW) loss. The GRW loss augments the training by continually encouraging the model to generate realistic and characterized samples to represent the unseen space. Our algorithm achieves state-of-the-art performance on AWA1, AWA2, CUB, and SUN datasets, surpassing existing CZSL methods by 3-7\%. The code has been made available here \url{https://github.com/wx-zhang/IGCZSL} | 翻訳日:2023-08-25 16:37:19 公開日:2023-08-23 |
# 対スプーフィングのためのサリエンシーに基づくビデオ要約 Saliency-based Video Summarization for Face Anti-spoofing ( http://arxiv.org/abs/2308.12364v1 ) ライセンス: Link先を確認 | Usman Muhammad, Mourad Oussalah, Md Ziaul Hoque and Jorma Laaksonen | (参考訳) 顔の偽造防止データベースの普及に伴い、研究者たちは、数百から数千の画像を使ってパフォーマンスへの影響を評価するビデオベースの手法に、ますます力を入れている。
提案手法の有効性を検証するため, 単純なディープラーニングアーキテクチャ(CNN-RNN)を用い, 実験結果から, 難解な5つの顔データに対して, 最先端の性能を示した。 Due to the growing availability of face anti-spoofing databases, researchers are increasingly focusing on video-based methods that use hundreds to thousands of images to assess their impact on performance. However, there is no clear consensus on the exact number of frames in a video required to improve the performance of face anti-spoofing tasks. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing tasks that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail layers, enhancing representation of important information. The weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail layers using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of our proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency of face presentation attack detection models. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method's effectiveness, a simple deep learning architecture (CNN-RNN) was used, and the experimental results showcased state-of-the-art performance on five challenging face anti-spoofing datasets. | 翻訳日:2023-08-25 16:36:57 公開日:2023-08-23 |
# variPEPS -- 2次元の変動基底状態シミュレーションのための多機能テンソルネットワークライブラリ variPEPS -- a versatile tensor network library for variational ground state simulations in two spatial dimensions ( http://arxiv.org/abs/2308.12358v1 ) ライセンス: Link先を確認 | Jan Naumann, Erik Lennart Weerda, Matteo Rizzi, Jens Eisert and Philipp Schmoll | (参考訳) テンソルネットワークは、量子物質の位相の基底状態の大きなクラスを忠実かつ効率的に捉える。
我々は,iPEPSを用いた無限二次元システムのシミュレーションのための,効率的で包括的で汎用的なテンソルネットワークライブラリの機能を,柔軟性のある単位セルと異なる格子ジオメトリをサポートして提示する。 Tensor networks capture large classes of ground states of phases of quantum matter faithfully and efficiently. Their manipulation and contraction has remained a challenge over the years, however. For most of the history, ground state simulations of two-dimensional quantum lattice systems using (infinite) projected entangled pair states have relied on what is called a time-evolving block decimation. In recent years, multiple proposals for the variational optimization of the quantum state have been put forward, overcoming accuracy and convergence problems of previously known methods. The incorporation of automatic differentiation in tensor networks algorithms has ultimately enabled a new, flexible way for variational simulation of ground states and excited states. In this work, we review the state of the art of the variational iPEPS framework. We present and explain the functioning of an efficient, comprehensive and general tensor network library for the simulation of infinite two-dimensional systems using iPEPS, with support for flexible unit cells and different lattice geometries. | 翻訳日:2023-08-25 16:36:23 公開日:2023-08-23 |
# 再正規化拡散モデル Renormalizing Diffusion Models ( http://arxiv.org/abs/2308.12355v1 ) ライセンス: Link先を確認 | Jordan Cotler, Semon Rezchikov | (参考訳) 拡散モデルを用いて、統計および量子場理論の逆再正規化群フローを学習する方法を説明する。
拡散モデル(英: Diffusion model)は、自然画像の分布などの複雑な分布からサンプルを生成するために使用される機械学習モデルの一種で、データ分布が純粋なノイズになるまでデータにノイズを加える拡散過程に逆過程を学習することで用いられる。
本手法のいくつかを適用し, 相互作用する統計場理論の rg フローを数値的に求める。
機械学習の観点から、我々の研究はマルチスケール拡散モデルの解釈を提供し、新しい性質を持つべき拡散モデルに対する物理的に着想を得た提案を与える。 We explain how to use diffusion models to learn inverse renormalization group flows of statistical and quantum field theories. Diffusion models are a class of machine learning models which have been used to generate samples from complex distributions, such as the distribution of natural images, by learning the inverse process to a diffusion process which adds noise to the data until the distribution of the data is pure noise. Nonperturbative renormalization group schemes can naturally be written as diffusion processes in the space of fields. We combine these observations in a concrete framework for building ML-based models for studying field theories, in which the models learn the inverse process to an explicitly-specified renormalization group scheme. We detail how these models define a class of adaptive bridge (or parallel tempering) samplers for lattice field theory. Because renormalization group schemes have a physical meaning, we provide explicit prescriptions for how to compare results derived from models associated to several different renormalization group schemes of interest. We also explain how to use diffusion models in a variational method to find ground states of quantum systems. We apply some of our methods to numerically find RG flows of interacting statistical field theories. From the perspective of machine learning, our work provides an interpretation of multiscale diffusion models, and gives physically-inspired suggestions for diffusion models which should have novel properties. | 翻訳日:2023-08-25 16:36:08 公開日:2023-08-23 |
# 知能の理論:概念、モデル、意味論 A Theory of Intelligences: Concepts, Models, Implications ( http://arxiv.org/abs/2308.12411v1 ) ライセンス: Link先を確認 | Michael E. Hochberg | (参考訳) 知性は目標を達成する能力を表す人間の構成である。
結論として,tisのコンパクトな数学的形式である超越性と難易度,tisの理論的な基礎,オープン質問など,いくつかの概念的な進歩をまとめる。 Intelligence is a human construct to represent the ability to achieve goals. Given this wide berth, intelligence has been defined countless times, studied in a variety of ways and quantified using numerous measures. Understanding intelligence ultimately requires theory and quantification, both of which are elusive. My main objectives are to identify some of the central elements in and surrounding intelligence, discuss some of its challenges and propose a theory based on first principles. I focus on intelligence as defined by and for humans, frequently in comparison to machines, with the intention of setting the stage for more general characterizations in life, collectives, human designs such as AI and in non-designed physical and chemical systems. I discuss key features of intelligence, including path efficiency and goal accuracy, intelligence as a Black Box, environmental influences, flexibility to deal with surprisal, the regress of intelligence, the relativistic nature of intelligence and difficulty, and temporal changes in intelligence including its evolution. I present a framework for a first principles Theory of IntelligenceS (TIS), based on the quantifiable macro-scale system features of difficulty, surprisal and goal resolution accuracy. The proposed partitioning of uncertainty/solving and accuracy/understanding is particularly novel since it predicts that paths to a goal not only function to accurately achieve goals, but as experimentations leading to higher probabilities for future attainable goals and increased breadth to enter new goal spaces. TIS can therefore explain endeavors that do not necessarily affect Darwinian fitness, such as leisure, politics, games and art. I conclude with several conceptual advances of TIS including a compact mathematical form of surprisal and difficulty, the theoretical basis of TIS, and open questions. | 翻訳日:2023-08-25 16:28:18 公開日:2023-08-23 |
# 局在系における2次元き裂状態の塔 Tower of two-dimensional scar states in a localized system ( http://arxiv.org/abs/2308.12409v1 ) ライセンス: Link先を確認 | Michael Iversen, Jens H. Bardarson, Anne E. B. Nielsen | (参考訳) 固有状態熱化仮説は、多くの孤立多体量子系がどのように熱平衡に達するかを記述する。
本研究では,Wigner surmise から Poisson 分布に隣接するギャップ比を観測し,温度相から局部化への遷移について検討した。
最後に,スカー部分空間における部分的支持を伴う初期状態のスカル再生を局在化が保護することを示す。 The eigenstate thermalization hypothesis describes how most isolated many-body quantum systems reach thermal equilibrium. However, the hypothesis is violated by phenomena such as many-body localization and quantum many-body scars. In this work, we study a finite, two-dimensional, disordered model hosting a tower of scar states. This construction is a particular instance of a general framework and we demonstrate its generality by constructing two disordered models hosting a different tower of scar states. At weak disorder, we find numerically that the spectra are nonthermal, and the scar states appear as exact eigenstates with high entropy for certain bipartitions. At strong disorder, the spectra localize and the scar states are identified as inverted scars since the scar states are embedded in a localized background as opposed to a thermal background. We argue that, for the considered type of models, the localization is stronger than what would be naively expected, and we show this explicitly for one of the models. The argument also provides guidelines for obtaining similarly strong localization in other scarred models. We study the transition from the thermal phase to localization by observing the adjacent gap ratio shifting from the Wigner surmise to the Poisson distribution with increasing disorder strength. Moreover, the entanglement entropy transitions from volume-law scaling with system size at weak disorder to area-law scaling at strong disorder. Finally, we demonstrate that localization protects scar revivals for initial states with partial support in the scar subspace. | 翻訳日:2023-08-25 16:27:47 公開日:2023-08-23 |
# 最初の探究:サイレントビデオのためのリアルな音声を生成するための学習 An Initial Exploration: Learning to Generate Realistic Audio for Silent Video ( http://arxiv.org/abs/2308.12408v1 ) ライセンス: Link先を確認 | Matthew Martel, Jackson Wagner | (参考訳) 映画やその他のメディアのリアルなオーディオ効果を生成することは、主にフォーリーアートとして知られる物理的な技術によって今日達成される挑戦的なタスクである。
深層融合CNN、ビジュアルコンテキストを備えた拡張Wavenet CNN、トランスフォーマーベースのアーキテクチャなどがある。
変換器をベースとしたアーキテクチャが最も有望な結果が得られ,低頻度と視覚パターンを効果的に一致させることができた。 Generating realistic audio effects for movies and other media is a challenging task that is accomplished today primarily through physical techniques known as Foley art. Foley artists create sounds with common objects (e.g., boxing gloves, broken glass) in time with video as it is playing to generate captivating audio tracks. In this work, we aim to develop a deep-learning based framework that does much the same - observes video in it's natural sequence and generates realistic audio to accompany it. Notably, we have reason to believe this is achievable due to advancements in realistic audio generation techniques conditioned on other inputs (e.g., Wavenet conditioned on text). We explore several different model architectures to accomplish this task that process both previously-generated audio and video context. These include deep-fusion CNN, dilated Wavenet CNN with visual context, and transformer-based architectures. We find that the transformer-based architecture yields the most promising results, matching low-frequencies to visual patterns effectively, but failing to generate more nuanced waveforms. | 翻訳日:2023-08-25 16:27:21 公開日:2023-08-23 |
# 量子局所領域ネットワークのための低温マイクロ波リンク Cryogenic microwave link for quantum local area networks ( http://arxiv.org/abs/2308.12398v1 ) ライセンス: Link先を確認 | M. Renger, S. Gandorfer, W. Yam, F. Fesquet, M. Handschuh, K. E. Honasoge, F. Kronowetter, Y. Nojiri, M. Partanen, M. Pfeiffer, H. van der Vliet, A. J. Matthews, J. Govenius, R. N. Jabdaraghi, M. Prunnila, A. Marx, F. Deppe, R. Gross, K. G. Fedorov | (参考訳) 超伝導回路を用いたスケーラブルな量子情報処理は、単一希釈冷凍機内の個々のプロセッサから、独立した冷却ユニットにあるより強力な分散量子コンピューティングシステムへと前進し、実用的な量子優位性を達成する。
リンク温度1Kまでの絡み合い分布を保存することにより, 変動散逸定理を実験的に検証する。
その結果,本システムは将来の分散量子コンピューティングアプリケーションのバックボーンを形成することができることを実証する。 Scalable quantum information processing with superconducting circuits is about to advance from individual processors in single dilution refrigerators to more powerful distributed quantum computing systems located in separate cooling units in order to achieve a practical quantum advantage. Hence, realization of hardware platforms for quantum local area networks (QLANs) compatible with superconducting technology is of high importance. Here, we demonstrate a basic prototype for a microwave QLAN based on a cryogenic link connecting two individual dilution cryostats over a distance of 6.6m with a base temperature of 52mK in the center. We provide details about the system design, installation, and performance. We employ superconducting coaxial microwave transmission lines to form a quantum communication channel and characterize its potential by demonstrating robust entanglement distribution in the form of two-mode squeezing between remote parties. By preserving entanglement distribution at link temperatures up to 1K, we experimentally verify the fluctuation-dissipation theorem. Consequently, we demonstrate that our system can form the backbone for future distributed quantum computing applications. | 翻訳日:2023-08-25 16:27:02 公開日:2023-08-23 |
# 内視鏡映像解析のための自己指導型学習 Self-Supervised Learning for Endoscopic Video Analysis ( http://arxiv.org/abs/2308.12394v1 ) ライセンス: Link先を確認 | Roy Hirsch, Mathilde Caron, Regev Cohen, Amir Livne, Ron Shapiro, Tomer Golany, Roman Goldenberg, Daniel Freedman, and Ehud Rivlin | (参考訳) 自己教師付き学習(ssl)は、大量のラベルなしデータから学習を可能にすることによって、コンピュータビジョンの重要なブレークスルーにつながった。
本研究では,大腸内視鏡や腹腔鏡などの内視鏡的画像解析における主要なSSLフレームワークであるMasked Siamese Networks(MSNs)の使用について検討した。
そこで本研究では,sslが内視鏡検査における注釈データの必要性を劇的に低減できることを示す。 Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally invasive procedures which are commonly used to detect and treat infections, chronic inflammatory diseases or cancer. In this work, we study the use of a leading SSL framework, namely Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs. These strong image representations serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization. Additionally, we achieve a 50% reduction in annotated data size without sacrificing performance. Thus, our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy. | 翻訳日:2023-08-25 16:26:48 公開日:2023-08-23 |
# 非線形システムのパラメータ推定における機械学習 Machine learning in parameter estimation of nonlinear systems ( http://arxiv.org/abs/2308.12393v1 ) ライセンス: Link先を確認 | Kaushal Kumar | (参考訳) 複雑な非線形システムのパラメータを正確に推定することは、科学および工学の分野で重要である。
本稿では, 減衰発振器, Van der Pol 発振器, Lotka-Volterra システム, Lorenz システムに適用する。
この手法はノイズや不確実性を適切にナビゲートし、現実世界の課題への適応性を示す。 Accurately estimating parameters in complex nonlinear systems is crucial across scientific and engineering fields. We present a novel approach for parameter estimation using a neural network with the Huber loss function. This method taps into deep learning's abilities to uncover parameters governing intricate behaviors in nonlinear equations. We validate our approach using synthetic data and predefined functions that model system dynamics. By training the neural network with noisy time series data, it fine-tunes the Huber loss function to converge to accurate parameters. We apply our method to damped oscillators, Van der Pol oscillators, Lotka-Volterra systems, and Lorenz systems under multiplicative noise. The trained neural network accurately estimates parameters, evident from closely matching latent dynamics. Comparing true and estimated trajectories visually reinforces our method's precision and robustness. Our study underscores the Huber loss-guided neural network as a versatile tool for parameter estimation, effectively uncovering complex relationships in nonlinear systems. The method navigates noise and uncertainty adeptly, showcasing its adaptability to real-world challenges. | 翻訳日:2023-08-25 16:26:20 公開日:2023-08-23 |
# FOSA: 欠損データに対するFIML(Full Information Maximum Likelihood)最適化自己注意障害 FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data ( http://arxiv.org/abs/2308.12388v1 ) ライセンス: Link先を確認 | Ou Deng, Qun Jin | (参考訳) データ計算では、特に複雑なデータセットにおいて、欠落した値に効果的に対処することが重要である。
本稿では、FIML最適化自己意識(FOSA)フレームワークについて述べる。これは、FIML(Full Information Maximum Likelihood)推定の強みと自己認識ニューラルネットワークの能力とを両立させる革新的なアプローチである。
興味深いことに、構造方程式モデル (Structure Equation Model, SEM) が誤って特定される場合であっても、FOSAの自己注意コンポーネントの堅牢なアーキテクチャは、インパルスの結果を適切に修正し、最適化する。
経験的なテストでは、fosaは、最大40%のランダムな欠如に直面しても、常に賞賛可能な予測を提供しており、データインプテーションにおける大規模アプリケーションにおけるその堅牢性と可能性を強調しています。 In data imputation, effectively addressing missing values is pivotal, especially in intricate datasets. This paper delves into the FIML Optimized Self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of Full Information Maximum Likelihood (FIML) estimation with the capabilities of self-attention neural networks. Our methodology commences with an initial estimation of missing values via FIML, subsequently refining these estimates by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore FOSA's pronounced advantages over traditional FIML techniques, encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in scenarios where the Structural Equation Model (SEM) might be mis-specified, leading to suboptimal FIML estimates, the robust architecture of FOSA's self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions, even in the face of up to 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation. | 翻訳日:2023-08-25 16:26:04 公開日:2023-08-23 |
# 自分の過去の助けを借りて:画像キャプションのための典型的メモリネットワーク With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning ( http://arxiv.org/abs/2308.12383v1 ) ライセンス: Link先を確認 | Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | (参考訳) 画像キャプションは、現在、視覚と言語を含む多くのタスクと同様に、画像中の意味を抽出し、言語的に一貫性のある記述に翻訳するトランスフォーマーベースのアーキテクチャに依存している。
ソースコードとトレーニングされたモデルは、https://github.com/aimagelab/pma-netで入手できる。 Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net. | 翻訳日:2023-08-25 16:25:15 公開日:2023-08-23 |
# 深層強化学習システムの展開:課題の分類 Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges ( http://arxiv.org/abs/2308.12438v1 ) ライセンス: Link先を確認 | Ahmed Haj Yahmed, Altaf Allah Abbassi, Amin Nikanjam, Heng Li, Foutse Khomh | (参考訳) 深層強化学習(DRL)は、強化学習において深層学習(DL)を活用するもので、ロボット工学、コンピュータビジョン、コンピュータゲームなど、幅広い分野において、人間レベルの自律性を達成する大きな可能性を示している。
本稿では,開発者にとって最も人気のあるQ&AフォーラムであるStack Overflow(SO)について,DRLシステムのデプロイにおいて実践者が直面した課題を明らかにし,理解するための実証的研究を提案する。
我々は,我々の研究が今後の研究を刺激し,DRLシステムの展開において実践者が直面する最も一般的で困難な課題をコミュニティが克服するのに役立つことを願っている。 Deep reinforcement learning (DRL), leveraging Deep Learning (DL) in reinforcement learning, has shown significant potential in achieving human-level autonomy in a wide range of domains, including robotics, computer vision, and computer games. This potential justifies the enthusiasm and growing interest in DRL in both academia and industry. However, the community currently focuses mostly on the development phase of DRL systems, with little attention devoted to DRL deployment. In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems. | 翻訳日:2023-08-25 16:19:06 公開日:2023-08-23 |
# 物体認識のための繰り返しニューラルネットワークにおける表現ダイナミクスのキャラクタリゼーション Characterising representation dynamics in recurrent neural networks for object recognition ( http://arxiv.org/abs/2308.12435v1 ) ライセンス: Link先を確認 | Sushrut Thorat, Adrien Doerig, Tim C. Kietzmann | (参考訳) リカレントニューラルネットワーク(recurrent neural network, rnn)は、課題条件における物体認識と霊長類視覚のモデリングの両方に有望な結果をもたらす。
本研究では,ecoset の新たなサブセットである miniecoset 上のオブジェクト分類を訓練した rnn において,そのようなダイナミクスについて検討した。
まず、推論によって、正しい分類の後に表現が進化し続け、``done with classification'''という概念の欠如が示唆された。
次に,アクティベーショントラジェクタを特徴付ける方法として,'readout zone'に着目し,l2ノルムの低いアクティベーションパターンを誤分類した表現が,より周辺的に読み出しゾーンに位置することを観察した。
本研究は, 水平およびトップダウン接続を有するネットワークに一般化し, ボトムアップスイープとの付加的および乗算的相互作用を含む。
この分析フレームワークは、霊長類視覚における表現力学の理解を含む、他の種類のRNNの今後の調査に役立つことを期待している。 Recurrent neural networks (RNNs) have yielded promising results for both recognizing objects in challenging conditions and modeling aspects of primate vision. However, the representational dynamics of recurrent computations remain poorly understood, especially in large-scale visual models. Here, we studied such dynamics in RNNs trained for object classification on MiniEcoset, a novel subset of ecoset. We report two main insights. First, upon inference, representations continued to evolve after correct classification, suggesting a lack of the notion of being ``done with classification''. Second, focusing on ``readout zones'' as a way to characterize the activation trajectories, we observe that misclassified representations exhibit activation patterns with lower L2 norm, and are positioned more peripherally in the readout zones. Such arrangements help the misclassified representations move into the correct zones as time progresses. Our findings generalize to networks with lateral and top-down connections, and include both additive and multiplicative interactions with the bottom-up sweep. The results therefore contribute to a general understanding of RNN dynamics in naturalistic tasks. We hope that the analysis framework will aid future investigations of other types of RNNs, including understanding of representational dynamics in primate vision. | 翻訳日:2023-08-25 16:18:41 公開日:2023-08-23 |
# 交通用非教師なしLiDARセグメントの時空間対応手法 A Spatiotemporal Correspondence Approach to Unsupervised LiDAR Segmentation with Traffic Applications ( http://arxiv.org/abs/2308.12433v1 ) ライセンス: Link先を確認 | Xiao Li, Pan He, Aotian Wu, Sanjay Ranka, Anand Rangarajan | (参考訳) 多様な交通シナリオにおける屋外LiDAR点雲の教師なしセマンティックセマンティックセグメンテーションの問題に対処する。
この一般的なフレームワークは、ドメイン知識を取り入れたLiDARポイントクラウドのための統一表現学習アプローチにつながる可能性がある。 We address the problem of unsupervised semantic segmentation of outdoor LiDAR point clouds in diverse traffic scenarios. The key idea is to leverage the spatiotemporal nature of a dynamic point cloud sequence and introduce drastically stronger augmentation by establishing spatiotemporal correspondences across multiple frames. We dovetail clustering and pseudo-label learning in this work. Essentially, we alternate between clustering points into semantic groups and optimizing models using point-wise pseudo-spatiotemporal labels with a simple learning objective. Therefore, our method can learn discriminative features in an unsupervised learning fashion. We show promising segmentation performance on Semantic-KITTI, SemanticPOSS, and FLORIDA benchmark datasets covering scenarios in autonomous vehicle and intersection infrastructure, which is competitive when compared against many existing fully supervised learning methods. This general framework can lead to a unified representation learning approach for LiDAR point clouds incorporating domain knowledge. | 翻訳日:2023-08-25 16:18:17 公開日:2023-08-23 |
# 超強光子-光子結合 Ultrastrong photon-photon coupling ( http://arxiv.org/abs/2308.12427v1 ) ライセンス: Link先を確認 | Fuyang Tay, Ali Mojibpour, Stephen Sanders, Shuang Liang, Hongjing Xu, Geoff C. Gardner, Andrey Baydin, Michael J. Manfra, Alessandro Alabastri, David Hagenm\"uller, Junichiro Kono | (参考訳) 光子間の相関は、非古典的な光の状態の重要な特徴である。
ランダウ偏光子のテラヘルツ分光測定は, 微視的量子モデルに基づく計算と良好な一致を示した。
これらの発見は、多モード非古典状態を作り、真空場を持つ量子光学の多体状態を探究する道を開いた。 Correlations between photons are a key feature of nonclassical states of light. Recent studies suggest that the ground state of a cavity quantum electrodynamics system can have light-matter correlations in the form of a squeezed vacuum state in thermal equilibrium when the matter ultrastrongly couples with cavity photons. This raises a question whether different photonic modes can also be correlated in the ground state via ultrastrong light-matter coupling. Here we demonstrate ultrastrong coupling between photonic modes of a multi-mode three-dimensional terahertz photonic-crystal cavity that is mediated by their simultaneous ultrastrong coupling with the cyclotron resonance of a two-dimensional electron gas in GaAs. Terahertz spectroscopy measurements of Landau polaritons showed excellent agreement with our calculations based on a microscopic quantum model. Despite the lack of nonlinearity in the matter system, the model shows significant correlations between the photonic modes in the ground state of the system, which can be controlled by changing the matter and photon frequencies and the spatial overlap of their mode profiles. We propose a detuning-independent figure of merit to quantify all possible couplings in multi-mode systems. These findings pave the way for creating multi-mode nonclassical states and exploring the many-body regime of quantum optics with vacuum fields. | 翻訳日:2023-08-25 16:18:01 公開日:2023-08-23 |
# サンドイッチR\'enyiの連続性のための統一的枠組み Unified framework for continuity of sandwiched R\'enyi divergences ( http://arxiv.org/abs/2308.12425v1 ) ライセンス: Link先を確認 | Andreas Bluhm, Angela Capel, Paul Gondolf and Tim M\"obus | (参考訳) 本研究では,サンドイッチしたr\'enyi条件エントロピーのようなr\'enyiダイバージェンスに関連するエントロピー量の一様連続性を示す。
これにより,r\'enyi条件エントロピー文脈において,marwah と dupuis の戦略とbeigi と goodarzi の戦略が拡張される。
別の貢献として、著者らによる以前の論文で開発されたalaaf法を用いて近似量子マルコフ鎖の安定性の研究を行っている。 In this work, we prove uniform continuity bounds for entropic quantities related to the sandwiched R\'enyi divergences such as the sandwiched R\'enyi conditional entropy. We follow three different approaches: The first one is the axiomatic approach, which exploits the sub-/ superadditivity and joint concavity/ convexity of the exponential of the divergence. In our second approach, termed the "operator space approach", we express the entropic measures as norms and utilize their properties for establishing the bounds. These norms draw inspiration from interpolation space norms. We not only demonstrate the norm properties solely relying on matrix analysis tools but also extend their applicability to a context that holds relevance in resource theories. By this, we extend the strategies of Marwah and Dupuis as well as Beigi and Goodarzi employed in the sandwiched R\'enyi conditional entropy context. Finally, we merge the approaches into a mixed approach that has some advantageous properties and then discuss in which regimes each bound performs best. Our results improve over the previous best continuity bounds or sometimes even give the first continuity bounds available. In a separate contribution, we use the ALAAF method, developed in a previous article by some of the authors, to study the stability of approximate quantum Markov chains. | 翻訳日:2023-08-25 16:17:42 公開日:2023-08-23 |
# 複雑なスケーリング法を改良した結合チャネル問題における仮想状態 Virtual states in the coupled-channel problems with an improved complex scaling method ( http://arxiv.org/abs/2308.12424v1 ) ライセンス: Link先を確認 | Yan-Ke Chen, Lu Meng, Zi-Yang Lin, Shi-Lin Zhu | (参考訳) 我々は,従来のCSMでは困難であった仮想状態を得るために,複雑なスケーリング法(CSM)を改善した。
この進歩は、量子システムにおける共鳴と仮想状態を正確に特徴付けるcsmの能力を大きく拡張する。 We improve the complex scaling method (CSM) to obtain the virtual states, which were previously challenging in the conventional CSM. Our approach solves the Schr\"odinger equation in the momentum space as an eigenvalue problem by choosing the flexible contours. It proves to be highly effective in identifying the poles across the different Riemann sheets in the multi-channel scatterings. It is more straightforward and efficient than searching for the zeros of the Fredholm determinant of the Lippmann-Schwinger equation using the root-finding algorithms. This advancement significantly extends the capabilities of the CSM in accurately characterizing the resonances and virtual states in quantum systems. | 翻訳日:2023-08-25 16:17:22 公開日:2023-08-23 |
# ESGに着目したDLT研究の進化:NLPによる文献分析 Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature ( http://arxiv.org/abs/2308.12420v1 ) ライセンス: Link先を確認 | Walter Hernandez, Kamil Tylinski, Alastair Moore, Niall Roche, Nikhil Vadgama, Horst Treiblmaier, Jiangbo Shangguan, Paolo Tasca, and Jiahua Xu | (参考訳) 分散Ledger Technologies(DLT)は急速に進化し、様々なコンポーネントに関する包括的な洞察を必要としている。
さらに、DLTおよびESG関連探査のために設計された54,808個の名前付きエンティティからなる第一種NERデータセットを提案する。 Distributed Ledger Technologies (DLTs) have rapidly evolved, necessitating comprehensive insights into their diverse components. However, a systematic literature review that emphasizes the Environmental, Sustainability, and Governance (ESG) components of DLT remains lacking. To bridge this gap, we selected 107 seed papers to build a citation network of 63,083 references and refined it to a corpus of 24,539 publications for analysis. Then, we labeled the named entities in 46 papers according to twelve top-level categories derived from an established technology taxonomy and enhanced the taxonomy by pinpointing DLT's ESG elements. Leveraging transformer-based language models, we fine-tuned a pre-trained language model for a Named Entity Recognition (NER) task using our labeled dataset. We used our fine-tuned language model to distill the corpus to 505 key papers, facilitating a literature review via named entities and temporal graph analysis on DLT evolution in the context of ESG. Our contributions are a methodology to conduct a machine learning-driven systematic literature review in the DLT field, placing a special emphasis on ESG aspects. Furthermore, we present a first-of-its-kind NER dataset, composed of 54,808 named entities, designed for DLT and ESG-related explorations. | 翻訳日:2023-08-25 16:17:12 公開日:2023-08-23 |
# 実世界におけるアメリカの手話処理に向けて:データ,課題,方法 Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods ( http://arxiv.org/abs/2308.12419v1 ) ライセンス: Link先を確認 | Bowen Shi | (参考訳) ジェスチャーを通して意味を伝える手話は、聴覚障害者の間でのコミュニケーションの主要な手段である。
そこで本研究では,手話翻訳の課題を解決するために,手話の事前学習と手形特徴の融合のためのプリテキストタスクとして,手話検索を含む一連の手法を提案する。 Sign language, which conveys meaning through gestures, is the chief means of communication among deaf people. Recognizing sign language in natural settings presents significant challenges due to factors such as lighting, background clutter, and variations in signer characteristics. In this thesis, I study automatic sign language processing in the wild, using signing videos collected from the Internet. This thesis contributes new datasets, tasks, and methods. Most chapters of this thesis address tasks related to fingerspelling, an important component of sign language and yet has not been studied widely by prior work. I present three new large-scale ASL datasets in the wild: ChicagoFSWild, ChicagoFSWild+, and OpenASL. Using ChicagoFSWild and ChicagoFSWild+, I address fingerspelling recognition, which consists of transcribing fingerspelling sequences into text. I propose an end-to-end approach based on iterative attention that allows recognition from a raw video without explicit hand detection. I further show that using a Conformer-based network jointly modeling handshape and mouthing can bring performance close to that of humans. Next, I propose two tasks for building real-world fingerspelling-based applications: fingerspelling detection and search. For fingerspelling detection, I introduce a suite of evaluation metrics and a new detection model via multi-task training. To address the problem of searching for fingerspelled keywords in raw sign language videos, we propose a novel method that jointly localizes and matches fingerspelling segments to text. Finally, I will describe a benchmark for large-vocabulary open-domain sign language translation based on OpenASL. To address the challenges of sign language translation in realistic settings, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. | 翻訳日:2023-08-25 16:16:47 公開日:2023-08-23 |
# 脳年齢予測問題をより解釈可能かつ定量的なアプローチに反映する Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach ( http://arxiv.org/abs/2308.12416v1 ) ライセンス: Link先を確認 | Neha Gianchandani, Mahsa Dibaji, Mariana Bento, Ethan MacDonald, Roberto Souza | (参考訳) 深層学習モデルは、磁気共鳴(mr)画像から重要な脳の健康バイオマーカーである脳年齢を推定する最新の結果を得た。
以上の結果から,voxel-wise age predictionモデルの方が,脳の老化過程に関する空間情報を提供し,定量的に評価できることが示唆された。 Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative. | 翻訳日:2023-08-25 16:16:17 公開日:2023-08-23 |
# ソースコードの大規模言語モデル解釈のためのベンチマーク因果研究 Benchmarking Causal Study to Interpret Large Language Models for Source Code ( http://arxiv.org/abs/2308.12415v1 ) ライセンス: Link先を確認 | Daniel Rodriguez-Cardenas, David N. Palacio, Dipin Khati, Henry Burke, Denys Poshyvanyk | (参考訳) コード生成にソフトウェア研究者が採用する最も一般的なソリューションの1つは、大量のソースコードでLLM(Large Language Models)をトレーニングすることである。
本研究の結果は,ChatGPTの生成性能に対するアクシデントセマンティクスの正の因果効果を,平均治療効果$\approx 3\%$で示している。
さらに、プロンプトサイズなどの共同設立者は精度の指標と高い相関があることが判明した(\approx 0.412\%$)。
バイアスを低減することにより、分析対象の精度測定値の解釈可能な解が得られる。 One of the most common solutions adopted by software researchers to address code generation is by training Large Language Models (LLMs) on massive amounts of source code. Although a number of studies have shown that LLMs have been effectively evaluated on popular accuracy metrics (e.g., BLEU, CodeBleu), previous research has largely overlooked the role of Causal Inference as a fundamental component of the interpretability of LLMs' performance. Existing benchmarks and datasets are meant to highlight the difference between the expected and the generated outcome, but do not take into account confounding variables (e.g., lines of code, prompt size) that equally influence the accuracy metrics. The fact remains that, when dealing with generative software tasks by LLMs, no benchmark is available to tell researchers how to quantify neither the causal effect of SE-based treatments nor the correlation of confounders to the model's performance. In an effort to bring statistical rigor to the evaluation of LLMs, this paper introduces a benchmarking strategy named Galeras comprised of curated testbeds for three SE tasks (i.e., code completion, code summarization, and commit generation) to help aid the interpretation of LLMs' performance. We illustrate the insights of our benchmarking strategy by conducting a case study on the performance of ChatGPT under distinct prompt engineering methods. The results of the case study demonstrate the positive causal influence of prompt semantics on ChatGPT's generative performance by an average treatment effect of $\approx 3\%$. Moreover, it was found that confounders such as prompt size are highly correlated with accuracy metrics ($\approx 0.412\%$). The end result of our case study is to showcase causal inference evaluations, in practice, to reduce confounding bias. By reducing the bias, we offer an interpretable solution for the accuracy metric under analysis. | 翻訳日:2023-08-25 16:16:01 公開日:2023-08-23 |
# ストリーム型多変量時系列からのゼロ遅延一貫性信号再構成 Zero-delay Consistent Signal Reconstruction from Streamed Multivariate Time Series ( http://arxiv.org/abs/2308.12459v1 ) ライセンス: Link先を確認 | Emilio Ruiz-Moreno, Luis Miguel L\'opez-Ramos, Baltasar Beferull-Lozano | (参考訳) 実世界のアナログ信号のデジタル化は、通常、サンプリング時間と振幅の離散化を伴う。
提案手法は, 類似しているが非矛盾な再構成に比べて, サンプリング速度で良好な誤差速度の減衰を達成できることを示す。 Digitalizing real-world analog signals typically involves sampling in time and discretizing in amplitude. Subsequent signal reconstructions inevitably incur an error that depends on the amplitude resolution and the temporal density of the acquired samples. From an implementation viewpoint, consistent signal reconstruction methods have proven a profitable error-rate decay as the sampling rate increases. Despite that, these results are obtained under offline settings. Therefore, a research gap exists regarding methods for consistent signal reconstruction from data streams. This paper presents a method that consistently reconstructs streamed multivariate time series of quantization intervals under a zero-delay response requirement. On the other hand, previous work has shown that the temporal dependencies within univariate time series can be exploited to reduce the roughness of zero-delay signal reconstructions. This work shows that the spatiotemporal dependencies within multivariate time series can also be exploited to achieve improved results. Specifically, the spatiotemporal dependencies of the multivariate time series are learned, with the assistance of a recurrent neural network, to reduce the roughness of the signal reconstruction on average while ensuring consistency. Our experiments show that our proposed method achieves a favorable error-rate decay with the sampling rate compared to a similar but non-consistent reconstruction. | 翻訳日:2023-08-25 16:07:29 公開日:2023-08-23 |
# PFL-GAN: 個人化フェデレーション学習におけるクライアントの不均一性と生成モデル PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning ( http://arxiv.org/abs/2308.12454v1 ) ライセンス: Link先を確認 | Achintha Wijesinghe, Songyang Zhang, Zhi Ding | (参考訳) 最近の生成型学習モデルの進歩は、生成型adversarial network(gan)モデルに基づく連合型学習(fl)への関心が高まっている。
既存の GAN ベースの FL はグローバルモデルのトレーニングに重点を置いているが、パーソナライズド FL (Personalized FL) は、異なるデータサンプル分布、特徴空間、ラベルの観点からクライアントデータの不均一性の観点から、より効果的であることがある。
いくつかのよく知られたデータセットに対する厳密な実験による実験結果は、PFL-GANの有効性を示している。 Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models. In the context of FL, GAN can capture the underlying client data structure, and regenerate samples resembling the original data distribution without compromising the private raw data. Although most existing GAN-based FL works focus on training a global model, Personalized FL (PFL) sometimes can be more effective in view of client data heterogeneity in terms of distinct data sample distributions, feature spaces, and labels. To cope with client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN. | 翻訳日:2023-08-25 16:07:11 公開日:2023-08-23 |
# 潜伏拡散モデルからの合成データを用いた医用画像分類器の増強 Augmenting medical image classifiers with synthetic data from latent diffusion models ( http://arxiv.org/abs/2308.12453v1 ) ライセンス: Link先を確認 | Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, Arjun K. Manrai | (参考訳) 米国食品医薬品局(FDA)は、現在数百の人工知能(AI)アルゴリズムを承認またはクリアしているが、多くの研究が矛盾した一般化または潜伏バイアス(特に人口不足)を示している。
以上の結果から, 合成データはモデル開発のための力乗算器となりうることが示唆されたが, 多様な実世界のデータの収集は, 医学的aiアルゴリズムを改善する上で最も重要なステップである。 While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms. | 翻訳日:2023-08-25 16:06:52 公開日:2023-08-23 |
# arf-plus:3次元シーンスタイライゼーションのための芸術的輝度場における知覚因子の制御 ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization ( http://arxiv.org/abs/2308.12452v1 ) ライセンス: Link先を確認 | Wenzhao Li, Tianhao Wu, Fangcheng Zhong, Cengiz Oztireli | (参考訳) ラジアンスフィールドスタイルトランスファーは、3d再構成とビュー合成におけるニューラルラジアンスフィールドの優れた性能のおかげで、3dシーンのスタイライゼーション手段として最近人気を集めている新興分野である。
これは無限の可能性の領域を開放し、スタイリゼーション効果のカスタマイズと異なるスタイルの強度の柔軟なマージを可能にし、3Dシーンに斬新で目を引くスタイリスティックなエフェクトを創造する。 The radiance fields style transfer is an emerging field that has recently gained popularity as a means of 3D scene stylization, thanks to the outstanding performance of neural radiance fields in 3D reconstruction and view synthesis. We highlight a research gap in radiance fields style transfer, the lack of sufficient perceptual controllability, motivated by the existing concept in the 2D image style transfer. In this paper, we present ARF-Plus, a 3D neural style transfer framework offering manageable control over perceptual factors, to systematically explore the perceptual controllability in 3D scene stylization. Four distinct types of controls - color preservation control, (style pattern) scale control, spatial (selective stylization area) control, and depth enhancement control - are proposed and integrated into this framework. Results from real-world datasets, both quantitative and qualitative, show that the four types of controls in our ARF-Plus framework successfully accomplish their corresponding perceptual controls when stylizing 3D scenes. These techniques work well for individual style inputs as well as for the simultaneous application of multiple styles within a scene. This unlocks a realm of limitless possibilities, allowing customized modifications of stylization effects and flexible merging of the strengths of different styles, ultimately enabling the creation of novel and eye-catching stylistic effects on 3D scenes. | 翻訳日:2023-08-25 16:06:32 公開日:2023-08-23 |
# MOFO:ビデオ理解のためのセルフスーパービジョン MOFO: MOtion FOcused Self-Supervision for Video Understanding ( http://arxiv.org/abs/2308.12447v1 ) ライセンス: Link先を確認 | Mona Ahmadian, Frank Guerin, and Andrew Gilbert | (参考訳) 自己教師付き学習(SSL)技術は、最近、ラベルのないビデオから視覚表現を学習する際、優れた成果を上げている。
そこで本研究では,映像の動作領域に表現学習を集中させる新しいssl手法であるmofo(motion focus)を提案する。
本手法は,近年の自己監督型視覚変換器(ViT),ビデオMAE,+2.6%,+2.1%,+1.3%,Epic-Kitchens動詞,名詞,行動分類,+4.7%,およびSome-Something V2行動分類の精度を向上する。
提案手法は動作認識のための現在のSSL法の性能を大幅に向上させ,SSLにおける動作を明示的に符号化することが重要であることを示す。 Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing representation learning on the motion area of a video, for action recognition. MOFO automatically detects motion areas in videos and uses these to guide the self-supervision task. We use a masked autoencoder which randomly masks out a high proportion of the input sequence; we force a specified percentage of the inside of the motion area to be masked and the remainder from outside. We further incorporate motion information into the finetuning step to emphasise motion in the downstream task. We demonstrate that our motion-focused innovations can significantly boost the performance of the currently leading SSL method (VideoMAE) for action recognition. Our method improves the recent self-supervised Vision Transformer (ViT), VideoMAE, by achieving +2.6%, +2.1%, +1.3% accuracy on Epic-Kitchens verb, noun and action classification, respectively, and +4.7% accuracy on Something-Something V2 action classification. Our proposed approach significantly improves the performance of the current SSL method for action recognition, indicating the importance of explicitly encoding motion in SSL. | 翻訳日:2023-08-25 16:06:06 公開日:2023-08-23 |
# 深層強化学習システムのための意図的フォッティング駆動型自己修復法 An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems ( http://arxiv.org/abs/2308.12445v1 ) ライセンス: Link先を確認 | Ahmed Haj Yahmed, Rached Bouchoucha, Houssem Ben Braiek, Foutse Khomh | (参考訳) 深層強化学習(DRL)は、NetflixやFacebookのような大規模プロダクションにますます適用されている。
本稿では,DRLシステムに対する効果的な自己修復手法であるDRL(Dr. DRL)を提案する。
Dr. DRLは、バニラCLが未解決のまま残した19.63%の漂流環境への適応に成功し、両方のアプローチで解決した漂流環境に対する最大45%の報酬を維持・増強する。 Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches. | 翻訳日:2023-08-25 16:05:40 公開日:2023-08-23 |
# TAI-GAN : 動的PET運動補正における早期フレーム変換のための時間的および解剖学的インフォームドGAN TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction ( http://arxiv.org/abs/2308.12443v1 ) ライセンス: Link先を確認 | Xueqi Guo, Luyao Shi, Xiongchao Chen, Bo Zhou, Qiong Liu, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek | (参考訳) ダイナミック心筋陽電子放射トモグラフィ (pet) におけるルビジウム82 (^{82}$rb) の高速トレーサ速度とクロスフレーム分布の高変動は、特に従来の強度ベースの画像登録技術が適用できない初期のフレームにおいて、フレーム間運動補正の重要な課題となっている。
提案手法を臨床的に$^{82}$Rb PETデータセットで検証した結果,TAI-GANは実際の参照フレームに匹敵する高画質の変換早期フレームを生成できることがわかった。
TAI-GAN変換後, 運動推定精度と臨床心筋血流量(MBF)は, 元のフレームと比較して改善した。
私たちのコードはhttps://github.com/gxq1998/tai-ganで公開しています。 The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN. | 翻訳日:2023-08-25 16:05:11 公開日:2023-08-23 |
# 導波路qedにおける三成分絡み合いのキラリティーによる促進 Chirality-assisted enhancement of tripartite entanglement in waveguide QED ( http://arxiv.org/abs/2308.12441v1 ) ライセンス: Link先を確認 | Logan Patrick, Umar Arshad, Dingyu Guo, and Imran M. Mirza | (参考訳) 1次元スピン運動量ロック(またはカイラル)導波路に側鎖結合した量子エミッタ(qes)間の真の三成分の絡み合いの発生と制御について検討した。
フォック状態マスター方程式の機構を,最近提案されている三成分の絡み合い(s. xie and j. h. eberly, phys. rev. lett. 127, 040403 (2021)]と合わせて適用することにより, 3光子ガウス波束が2および3つのqに絡み合いを分配する方法について解析する。
右方向の導波路崩壊速度が左方向の5倍大きい場合には, 左右両方向の崩壊速度が等しい対称シナリオと比較して, 三部体の絡み合いの最大値が35%向上することを示した。
本研究の主な応用分野として,量子ネットワークと長距離量子通信が考えられる。 We study the generation and control of genuine tripartite entanglement among quantum emitters (QEs) that are side coupled to one-dimensional spin-momentum locked (or chiral) waveguides. By applying the machinery of Fock state master equations along with the recently proposed concurrence fill measure of tripartite entanglement [S. Xie and J. H. Eberly, Phys. Rev. Lett. 127, 040403 (2021)], we analyze how three-photon Gaussian wavepackets can distribute entanglement among two and three QEs. We show that with a five times larger waveguide decay rate in the right direction as compared to the left direction, the maximum value of tripartite entanglement can be elevated by 35% as compared to the symmetric scenario where both left and right direction decay rates are equal. Additionally, chirality can maintain the tripartite entanglement for longer times in comparison to the corresponding symmetric decay rate situation. Finally, we study the influence of detunings and spontaneous emission on the resulting entanglement. We envision quantum networking and long-distance quantum communication as two main areas of applications of this work. | 翻訳日:2023-08-25 16:04:42 公開日:2023-08-23 |
# HNAS-reg: 変形可能な医用画像登録のための階層型ニューラルネットワーク探索 HNAS-reg: hierarchical neural architecture search for deformable medical image registration ( http://arxiv.org/abs/2308.12440v1 ) ライセンス: Link先を確認 | Jiong Wu and Yong Fan | (参考訳) 畳み込みニューラルネットワーク(CNN)は、医用画像登録のためのディープラーニングモデルを構築するために広く使用されているが、手動設計のネットワークアーキテクチャは必ずしも最適ではない。
636 t1重み付け磁気共鳴画像(mri)を用いた3つのデータセットの実験により、従来の1つの方法と2つの教師なし学習ベースアプローチを含む最先端画像登録アプローチと比較して、画像登録精度とモデルサイズを低減したディープラーニングモデルを構築することができることを実証した。 Convolutional neural networks (CNNs) have been widely used to build deep learning models for medical image registration, but manually designed network architectures are not necessarily optimal. This paper presents a hierarchical NAS framework (HNAS-Reg), consisting of both convolutional operation search and network topology search, to identify the optimal network architecture for deformable medical image registration. To mitigate the computational overhead and memory constraints, a partial channel strategy is utilized without losing optimization quality. Experiments on three datasets, consisting of 636 T1-weighted magnetic resonance images (MRIs), have demonstrated that the proposal method can build a deep learning model with improved image registration accuracy and reduced model size, compared with state-of-the-art image registration approaches, including one representative traditional approach and two unsupervised learning-based approaches. | 翻訳日:2023-08-25 16:04:18 公開日:2023-08-23 |
# BaDExpert: 正確なバックドア入力検出のためのバックドア機能の抽出 BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection ( http://arxiv.org/abs/2308.12439v1 ) ライセンス: Link先を確認 | Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal | (参考訳) 本稿では,ディープニューラルネットワーク(dnn)に対するバックドア攻撃に対して,悪意のある行動(バックドア)をdnnに隠密に埋め込む新たな防御手法を提案する。
BaDExpert(Backdoor Input Detection with Backdoor Expert)は16のSOTAバックドア攻撃を効果的に軽減し,クリーンユーティリティに最小限の影響を与える。
BaDExpertの有効性は、さまざまなモデルアーキテクチャ(ResNet、VGG、MobileNetV2、Vision Transformer)にわたる複数のデータセット(CIFAR10、GTSRB、ImageNet)で検証されている。 We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer). | 翻訳日:2023-08-25 16:04:01 公開日:2023-08-23 |
# Diffuse, Attend, Segment: 安定拡散を用いた教師なしゼロショットセグメンテーション Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion ( http://arxiv.org/abs/2308.12469v1 ) ライセンス: Link先を確認 | Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco | (参考訳) 画像の品質セグメンテーションマスクの作成は、コンピュータビジョンの基本的な問題である。
本稿では, 自己付着層を安定拡散モデルに活用し, 事前学習した安定拡散モデルが注意層内における物体の固有概念を学習したことにより, この目標を達成することを提案する。
COCO-Stuff-27では,従来の教師なしゼロショットSOTA法を26%,IoU平均17%で上回っている。 Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. | 翻訳日:2023-08-25 15:56:59 公開日:2023-08-23 |
# ChatGPTとGPT-4は良いポーカープレイヤーか?
--プレフロップ解析 Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis ( http://arxiv.org/abs/2308.12466v1 ) ライセンス: Link先を確認 | Akshat Gupta | (参考訳) ChatGPTとGPT-4の導入以来、これらのモデルは多数のタスクでテストされてきた。
どちらの戦略も比較的先進的であるが、ゲーム理論は最適ではない。 Since the introduction of ChatGPT and GPT-4, these models have been tested across a large number of tasks. Their adeptness across domains is evident, but their aptitude in playing games and specifically their aptitude in the realm of poker has remained unexplored. Poker is a game that requires decision making under uncertainty and incomplete information. In this paper, we put ChatGPT and GPT-4 through the poker test and evaluate their poker skills. Our findings reveal that while both models display an advanced understanding of poker, encompassing concepts like the valuation of starting hands, playing positions and other intricacies of game theory optimal (GTO) poker, both ChatGPT and GPT-4 are NOT game theory optimal poker players. Through a series of experiments, we first discover the characteristics of optimal prompts and model parameters for playing poker with these models. Our observations then unveil the distinct playing personas of the two models. We first conclude that GPT-4 is a more advanced poker player than ChatGPT. This exploration then sheds light on the divergent poker tactics of the two models: ChatGPT's conservativeness juxtaposed against GPT-4's aggression. In poker vernacular, when tasked to play GTO poker, ChatGPT plays like a Nit, which means that it has a propensity to only engage with premium hands and folds a majority of hands. When subjected to the same directive, GPT-4 plays like a maniac, showcasing a loose and aggressive style of play. Both strategies, although relatively advanced, are not game theory optimal. | 翻訳日:2023-08-25 15:56:16 公開日:2023-08-23 |
# InverseSR:潜在拡散モデルを用いた3次元脳MRI超解像 InverseSR: 3D Brain MRI Super-Resolution Using a Latent Diffusion Model ( http://arxiv.org/abs/2308.12465v1 ) ライセンス: Link先を確認 | Jueqi Wang and Jacob Levman and Walter Hugo Lopez Pinaya and Petru-Daniel Tudosiu and M. Jorge Cardoso and Razvan Marinescu | (参考訳) 研究グレードの医療センターから得られた高分解能MRIスキャンは、画像化された組織に関する正確な情報を提供する。
LDMは前駆体として機能し、3D T1強調脳MRIの事前分布を捉える能力を有する。
脳LCMのアーキテクチャに基づいて、MRI SRの異なる設定に異なる手法が適していることが分かり、新しい2つの戦略を提案する。
1) LDM のデコーダと DIM (Deterministic Denoising Diffusion Implicit Models) を逆転させて, より疎結合な SR に対して, InverseSR(LDM) と呼ぶアプローチを提案する。
2) スパーシティの少ないsrでは、ldmデコーダのみを反転させ、逆sr(decoder)と呼ぶ。
生成モデルのトレーニングプロセスはMRIアンダーサンプリングプロセスとは独立であり,入力測定の異なる多くのMRI SR問題に対する本手法の一般化を保証する。
提案手法は,MRI再建にLDMが与える強力な前駆体を応用できることを実証できる。 High-resolution (HR) MRI scans obtained from research-grade medical centers provide precise information about imaged tissues. However, routine clinical MRI scans are typically in low-resolution (LR) and vary greatly in contrast and spatial resolution due to the adjustments of the scanning parameters to the local needs of the medical center. End-to-end deep learning methods for MRI super-resolution (SR) have been proposed, but they require re-training each time there is a shift in the input distribution. To address this issue, we propose a novel approach that leverages a state-of-the-art 3D brain generative model, the latent diffusion model (LDM) trained on UK BioBank, to increase the resolution of clinical MRI scans. The LDM acts as a generative prior, which has the ability to capture the prior distribution of 3D T1-weighted brain MRI. Based on the architecture of the brain LDM, we find that different methods are suitable for different settings of MRI SR, and thus propose two novel strategies: 1) for SR with more sparsity, we invert through both the decoder of the LDM and also through a deterministic Denoising Diffusion Implicit Models (DDIM), an approach we will call InverseSR(LDM); 2) for SR with less sparsity, we invert only through the LDM decoder, an approach we will call InverseSR(Decoder). These two approaches search different latent spaces in the LDM model to find the optimal latent code to map the given LR MRI into HR. The training process of the generative model is independent of the MRI under-sampling process, ensuring the generalization of our method to many MRI SR problems with different input measurements. We validate our method on over 100 brain T1w MRIs from the IXI dataset. Our method can demonstrate that powerful priors given by LDM can be used for MRI reconstruction. | 翻訳日:2023-08-25 15:55:49 公開日:2023-08-23 |
# 選択パラメータファインタニングによる一般知識損失の克服 Overcoming General Knowledge Loss with Selective Parameter Finetuning ( http://arxiv.org/abs/2308.12462v1 ) ライセンス: Link先を確認 | Wenxuan Zhang, Paul Janson, Rahaf Aljundi, Mohamed Elhoseiny | (参考訳) 基礎モデルは広範な知識ベースを包含し、顕著な転送性を提供する。
本手法は基礎的な視覚言語モデルを用いて,新しい情報学習と,航空機,バードスナップ CIFAR-100, CUB, Cars, GTSRB など多種多様な学習課題における事前知識の保存の両面での有効性を評価する。
包括的アブレーション研究は,新しい知識を制御的に学習し,事前学習した知識の忘れを緩和するために,各成分の寄与を明らかにした。 Foundation models encompass an extensive knowledge base and offer remarkable transferability. However, this knowledge becomes outdated or insufficient over time. The challenge lies in updating foundation models to accommodate novel information while retaining their original ability. In this paper, we present a novel approach to achieving continual model updates by effecting localized modifications to a small subset of parameters. Guided by insights gleaned from prior analyses of foundational models, we first localize a specific layer for model refinement and then introduce an importance scoring mechanism designed to update only the most crucial weights. Our method is exhaustively evaluated on foundational vision-language models, measuring its efficacy in both learning new information and preserving pre-established knowledge across a diverse spectrum of continual learning tasks, including Aircraft, Birdsnap CIFAR-100, CUB, Cars, and GTSRB. The results show that our method improves the existing continual learning methods by 0.5\% - 10\% on average, and reduces the loss of pre-trained knowledge from around 5\% to 0.97\%. Comprehensive ablation studies substantiate our method design, shedding light on the contributions of each component to controllably learning new knowledge and mitigating the forgetting of pre-trained knowledge. | 翻訳日:2023-08-25 15:55:16 公開日:2023-08-23 |
# 理解とコントロールの中間 - 文化製品としての科学 Between understanding and control: Science as a cultural product ( http://arxiv.org/abs/2308.12461v1 ) ライセンス: Link先を確認 | Flavio Del Santo | (参考訳) 人類の初期から、人々は2つの種類の性質について質問してきました。
広範な歴史分析を通じて,(1)科学の発展を振動として,(2)その二つの本質的性質の相互作用として記述すること,(2)紀元前6世紀以降の古代にすでに発生していたことを実証すること,(3)二つの本質のうちの1つが他国に大々的に好まれていたという事実は,科学が異なる社会史的文脈の文化的産物であることの帰結である。 Since the early days of humankind, people have been asking questions about Nature of two kinds: why did that happen? And how can that be used? In a broad sense, science was born that day. We show indeed that science has two complementary and interdependent souls that aim, respectively, to how to understand and how to control Nature. Through a broad historical analysis, this essay aims to (1) give an account of the development of science as an oscillation and an interplay between its two intrinsic natures, (2) demonstrate that this happened already in ancient times starting from the 6th century BC, and (3) the fact that in different periods one of the two natures was largely favored over the other is a consequence of science being a cultural product of the different social-historical contexts. | 翻訳日:2023-08-25 15:54:53 公開日:2023-08-23 |
# MixNet: 野生における混在するシーンテキストの正確な検出に向けて MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild ( http://arxiv.org/abs/2308.12817v1 ) ライセンス: Link先を確認 | Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang | (参考訳) 不規則な位置と非理想の照明の影響が検出エラーにつながる場合、野生の小さなシーンのテキストインスタンスを検出することは特に難しい。
fsnet backboneは、pan、db、fastなど、既存の多くのテキスト検出方法を大幅に改善した。
FSNetとCTBlockを混合したMixNetは,複数のシーンテキスト検出データセットに対して最先端の結果が得られた。 Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting conditions. MixNet incorporates two key modules: (1) the Feature Shuffle Network (FSNet) to serve as the backbone and (2) the Central Transformer Block (CTBlock) to exploit the 1D manifold constraint of the scene text. We first introduce a novel feature shuffling strategy in FSNet to facilitate the exchange of features across multiple scales, generating high-resolution features superior to popular ResNet and HRNet. The FSNet backbone has achieved significant improvements over many existing text detection methods, including PAN, DB, and FAST. Then we design a complementary CTBlock to leverage center line based features similar to the medial axis of text regions and show that it can outperform contour-based approaches in challenging cases when small scene texts appear closely. Extensive experimental results show that MixNet, which mixes FSNet with CTBlock, achieves state-of-the-art results on multiple scene text detection datasets. | 翻訳日:2023-08-25 13:45:17 公開日:2023-08-23 |
# LCANets++: 横方向競合を持つ多層ニューラルネットワークを用いたロバスト音声分類 LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition ( http://arxiv.org/abs/2308.12882v1 ) ライセンス: Link先を確認 | Sayanton V. Dibbo, Juston S. Moore, Garrett T. Kenyon, Michael A. Teti | (参考訳) 音声分類は、音声コマンドや音声イベントを含む音声信号の認識を目的としている。
LCANets++は通常のCNNやLCANetよりも、例えばバックグラウンドノイズやブラックボックスやホワイトボックスアタック、例えばエスケープや高速勾配標識(FGSM)アタックに対して堅牢であることを示す。 Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks. | 翻訳日:2023-08-25 13:26:49 公開日:2023-08-23 |
# 大規模言語モデルを用いた不可能な最適化問題の診断 Diagnosing Infeasible Optimization Problems Using Large Language Models ( http://arxiv.org/abs/2308.12923v1 ) ライセンス: Link先を確認 | Hao Chen, Gonzalo E. Constante-Flores, Can Li | (参考訳) 意思決定問題は数学の最適化モデルとして表され、経済学、工学、製造業、輸送、医療などの分野に広く応用されている。
OptiChat の実装は GPT-4 上に構築されており、最適化解決器を使って最適化問題全体を実現不可能にする制約の最小限のサブセットを識別する(IIS(Irereducible Infeasible Subset)とも呼ばれる)。
実験の結果,OptiChatは,専門家と非専門家の双方が最適化モデルの理解を深める上で有効であることがわかった。 Decision-making problems can be represented as mathematical optimization models, finding wide applications in fields such as economics, engineering and manufacturing, transportation, and health care. Optimization models are mathematical abstractions of the problem of making the best decision while satisfying a set of requirements or constraints. One of the primary barriers to deploying these models in practice is the challenge of helping practitioners understand and interpret such models, particularly when they are infeasible, meaning no decision satisfies all the constraints. Existing methods for diagnosing infeasible optimization models often rely on expert systems, necessitating significant background knowledge in optimization. In this paper, we introduce OptiChat, a first-of-its-kind natural language-based system equipped with a chatbot GUI for engaging in interactive conversations about infeasible optimization models. OptiChat can provide natural language descriptions of the optimization model itself, identify potential sources of infeasibility, and offer suggestions to make the model feasible. The implementation of OptiChat is built on GPT-4, which interfaces with an optimization solver to identify the minimal subset of constraints that render the entire optimization problem infeasible, also known as the Irreducible Infeasible Subset (IIS). We utilize few-shot learning, expert chain-of-thought, key-retrieve, and sentiment prompts to enhance OptiChat's reliability. Our experiments demonstrate that OptiChat assists both expert and non-expert users in improving their understanding of the optimization models, enabling them to quickly identify the sources of infeasibility. | 翻訳日:2023-08-25 13:04:45 公開日:2023-08-23 |
# スタイル空間における多方向部分空間編集 Multi-Directional Subspace Editing in Style-Space ( http://arxiv.org/abs/2211.11825v3 ) ライセンス: Link先を確認 | Chen Naveh and Yacov Hel-Or | (参考訳) 本稿では,StyleGANの潜在空間における不整合意味方向を求める新しい手法について述べる。
さらに, 属性分離・非絡合評価のための定量的尺度を提案し, それらの指標に対するモデルの優越性を示す。 This paper describes a new technique for finding disentangled semantic directions in the latent space of StyleGAN. Our method identifies meaningful orthogonal subspaces that allow editing of one human face attribute, while minimizing undesired changes in other attributes. Our model is capable of editing a single attribute in multiple directions, resulting in a range of possible generated images. We compare our scheme with three state-of-the-art models and show that our method outperforms them in terms of face editing and disentanglement capabilities. Additionally, we suggest quantitative measures for evaluating attribute separation and disentanglement, and exhibit the superiority of our model with respect to those measures. | 翻訳日:2023-08-25 10:56:54 公開日:2023-08-23 |
# シームレスM4T-多言語・多モーダル機械翻訳 SeamlessM4T-Massively Multilingual & Multimodal Machine Translation ( http://arxiv.org/abs/2308.11596v2 ) ライセンス: Link先を確認 | Seamless Communication, Lo\"ic Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-juss\`a, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzm\'an, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang | (参考訳) Babel Fishは、個人が2つの言語間で音声を翻訳するのを助けるツールだ。
そこで我々は,w2v-BERT 2.0を用いて,100万時間のオープン音声データを用いて自己教師型音声表現を学習した。
本研究は, ジェンダーバイアスに関するシームレスm4tを評価し, 翻訳の安全性を評価するために毒性を付加した。
最後に、この作業へのすべてのコントリビューションはオープンソースであり、https://github.com/facebookresearch/seamless_lecommunicationsでアクセス可能である。 What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication | 翻訳日:2023-08-25 10:49:42 公開日:2023-08-23 |
# 条件分布のためのwasserstein測地線発生器 Wasserstein Geodesic Generator for Conditional Distributions ( http://arxiv.org/abs/2308.10145v2 ) ライセンス: Link先を確認 | Young-geun Kim, Kyungbok Lee, Youngwon Choi, Joong-Ho Won, Myunghee Cho Paik | (参考訳) 特定のラベルが与えられたサンプルを生成するには、条件分布を推定する必要がある。
最適輸送理論を用いて,wasserstein geodesic generator(wasserstein geodesic generator,wasserstein geodesicを学習する条件付き発電機)を提案する。
ドメインラベルとして光条件の顔画像を用いた実験により,提案手法の有効性が示された。 Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method. | 翻訳日:2023-08-25 10:46:04 公開日:2023-08-23 |
# クラウドの頭: 大学がパブリッククラウドに移行することの意義を計測する Heads in the Clouds: Measuring the Implications of Universities Migrating to Public Clouds ( http://arxiv.org/abs/2104.09462v4 ) ライセンス: Link先を確認 | Tobias Fiebig, Seda G\"urses, Carlos H. Ga\~n\'an, Erna Kotkamp, Fernando Kuipers, Martina Lindorfer, Menghua Prisse, Taritha Sari | (参考訳) 新型コロナウイルス(covid-19)による遠隔教育や大学勤務の出現に伴い、高等教育の「ゾーミフィケーション」、すなわち大学から雲への移動が公の場に到達した。
そして、結果を分析し、解釈し、個人のプライバシーを超えて、学術的独立性と整合性の問題に到達できることを見出します。 With the emergence of remote education and work in universities due to COVID-19, the `zoomification' of higher education, i.e., the migration of universities to the clouds, reached the public discourse. Ongoing discussions reason about how this shift will take control over students' data away from universities, and may ultimately harm the privacy of researchers and students alike. However, there has been no comprehensive measurement of universities' use of public clouds and reliance on Software-as-a-Service offerings to assess how far this migration has already progressed. We perform a longitudinal study of the migration to public clouds among universities in the U.S. and Europe, as well as institutions listed in the Times Higher Education (THE) Top100 between January 2015 and October. We find that cloud adoption differs between countries, with one cluster (Germany, France, Austria, Switzerland) showing a limited move to clouds, while the other (U.S., U.K, the Netherlands, THE Top100) frequently outsources universities' core functions and services -- starting long before the COVID-19 pandemic. We attribute this clustering to several socio-economic factors in the respective countries, including the general culture of higher education and the administrative paradigm taken towards running universities. We then analyze and interpret our results, finding that the implications reach beyond individuals' privacy towards questions of academic independence and integrity. | 翻訳日:2023-08-24 19:34:38 公開日:2023-08-23 |
# 転校学習への共通直観は勝つか負けるか--線形回帰のケーススタディ The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression ( http://arxiv.org/abs/2103.05621v3 ) ライセンス: Link先を確認 | Yehuda Dar, Daniel LeJeune, Richard G. Baraniuk | (参考訳) 本研究では,データサンプルよりも学習パラメータが多い過パラメータ設定を含む,線形回帰タスクのソースからターゲットへの基本的な転送学習プロセスについて検討する。
対象タスクへの伝達学習アプローチを, to-be-learnedターゲットパラメータと既に学習済みソースパラメータ間の距離を正規化した線形回帰最適化として定義する。
伝達学習設定に対する線形MMSEソリューションを定式化し、伝達学習に対する共通設計哲学との主な違いを指摘する。 We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear regression optimization with a regularization on the distance between the to-be-learned target parameters and the already-learned source parameters. We analytically characterize the generalization performance of our transfer learning approach and demonstrate its ability to resolve the peak in generalization errors in double descent phenomena of the minimum L2-norm solution to linear regression. Moreover, we show that for sufficiently related tasks, the optimally tuned transfer learning approach can outperform the optimally tuned ridge regression method, even when the true parameter vector conforms to an isotropic Gaussian prior distribution. Namely, we demonstrate that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task. Our results emphasize the ability of transfer learning to extend the solution space to the target task and, by that, to have an improved MMSE solution. We formulate the linear MMSE solution to our transfer learning setting and point out its key differences from the common design philosophy to transfer learning. | 翻訳日:2023-08-24 19:34:11 公開日:2023-08-23 |
# 構造化スパンセレクタ A Structured Span Selector ( http://arxiv.org/abs/2205.03977v3 ) ライセンス: Link先を確認 | Tianyu Liu, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan | (参考訳) 多くの自然言語処理タスク、例えば共参照解決や意味的役割ラベリングは、テキストスパンの選択と決定を必要とする。
両方に実証的な改善がある。 Many natural language processing tasks, e.g., coreference resolution and semantic role labeling, require selecting text spans and making decisions about them. A typical approach to such tasks is to score all possible spans and greedily select spans for task-specific downstream processing. This approach, however, does not incorporate any inductive bias about what sort of spans ought to be selected, e.g., that selected spans tend to be syntactic constituents. In this paper, we propose a novel grammar-based structured span selection model which learns to make use of the partial span-level annotation provided for such problems. Compared to previous approaches, our approach gets rid of the heuristic greedy span selection scheme, allowing us to model the downstream task on an optimal set of spans. We evaluate our model on two popular span prediction tasks: coreference resolution and semantic role labeling. We show empirical improvements on both. | 翻訳日:2023-08-24 19:29:29 公開日:2023-08-23 |
# 多様体上のmin-max最適化のためのリーマンハミルトニアン法 Riemannian Hamiltonian methods for min-max optimization on manifolds ( http://arxiv.org/abs/2204.11418v2 ) ライセンス: Link先を確認 | Andi Han, Bamdev Mishra, Pratik Jawanpuria, Pawan Kumar, Junbin Gao | (参考訳) 本稿では,リーマン多様体上のmin-max最適化問題について検討する。
リーマンハミルトニアン函数を導入し、最小化は元の min-max 問題を解くプロキシとして機能する。
本稿では,サブスペースロバストなwaserstein距離,ニューラルネットワークのロバストトレーニング,生成的逆ネットワークといった応用におけるrrmの有効性について述べる。 In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks. | 翻訳日:2023-08-24 19:29:14 公開日:2023-08-23 |
# 重力波サーロゲートモデリングのための残留誤差とバグオブトリック学習 Deep Residual Error and Bag-of-Tricks Learning for Gravitational Wave Surrogate Modeling ( http://arxiv.org/abs/2203.08434v2 ) ライセンス: Link先を確認 | Styliani-Christina Fragkouli, Paraskevi Nousi, Nikolaos Passalis, Panagiotis Iosif, Nikolaos Stergioulas, Anastasios Tefas | (参考訳) 深層学習法は重力波天文学において、スピン整列ブラックホール双対の吸気のための代理波形の構築を加速するために用いられている。
より一般的なサロゲート波形モデル(例えば偏心性を含む場合)の残差も特定の構造を持つ可能性があるため、精度の上昇が計算時間において顕著な利得をもたらす場合に適用できると期待できる。 Deep learning methods have been employed in gravitational-wave astronomy to accelerate the construction of surrogate waveforms for the inspiral of spin-aligned black hole binaries, among other applications. We face the challenge of modeling the residual error of an artificial neural network that models the coefficients of the surrogate waveform expansion (especially those of the phase of the waveform) which we demonstrate has sufficient structure to be learnable by a second network. Adding this second network, we were able to reduce the maximum mismatch for waveforms in a validation set by 13.4 times. We also explored several other ideas for improving the accuracy of the surrogate model, such as the exploitation of similarities between waveforms, the augmentation of the training set, the dissection of the input space, using dedicated networks per output coefficient and output augmentation. In several cases, small improvements can be observed, but the most significant improvement still comes from the addition of a second network that models the residual error. Since the residual error for more general surrogate waveform models (when e.g., eccentricity is included) may also have a specific structure, one can expect our method to be applicable to cases where the gain in accuracy could lead to significant gains in computational time. | 翻訳日:2023-08-24 19:28:55 公開日:2023-08-23 |
# 有限温度における少数フェルミオン系の非慣習的ペアリング Unconventional pairing in few-fermion systems at finite temperature ( http://arxiv.org/abs/2202.07639v3 ) ライセンス: Link先を確認 | Daniel P\k{e}cak and Tomasz Sowi\'nski | (参考訳) 1次元調和トラップに閉じ込められたフェルミオン粒子の2成分混合反応について検討した。
異なる不均衡を持つ系の計算を行うことにより、位相図上の2つの位相間の近似境界を決定する。 Attractively interacting two-component mixtures of fermionic particles confined in a one-dimensional harmonic trap are investigated. Properties of balanced and imbalanced systems are systematically explored with the exact diagonalization approach, focusing on the finite-temperature effects. Using single- and two-particle density distributions, specific non-classical pairing correlations are analyzed in terms of the noise correlations -- quantity directly accessible in state-of-the-art experiments with ultra-cold atoms. It is shown that along with increasing temperature, any imbalanced system hosting Fulde-Ferrel-Larkin-Ovchinnikov pairs crossovers to a standard Bardeen-Cooper-Schrieffer one characterized by zero net momentum of resulting pairs. By performing calculations for systems with different imbalances, the approximate boundary between the two phases on a phase diagram is determined. | 翻訳日:2023-08-24 19:28:35 公開日:2023-08-23 |
# 異常検出によるフェデレーション学習におけるバックドア攻撃の同定 Identifying Backdoor Attacks in Federated Learning via Anomaly Detection ( http://arxiv.org/abs/2202.04311v2 ) ライセンス: Link先を確認 | Yuxi Mi, Yiheng Sun, Jihong Guan, Shuigeng Zhou | (参考訳) フェデレートラーニング(Federated Learning)は、データプライバシの規制要求の増加に対応して、近年採用が増加している。
提案手法は,タスクユーティリティに最小限の影響を伴って,最先端のバックドア攻撃を効果的に軽減する。 Federated learning has seen increased adoption in recent years in response to the growing regulatory demand for data privacy. However, the opaque local training process of federated learning also sparks rising concerns about model faithfulness. For instance, studies have revealed that federated learning is vulnerable to backdoor attacks, whereby a compromised participant can stealthily modify the model's behavior in the presence of backdoor triggers. This paper proposes an effective defense against the attack by examining shared model updates. We begin with the observation that the embedding of backdoors influences the participants' local model weights in terms of the magnitude and orientation of their model gradients, which can manifest as distinguishable disparities. We enable a robust identification of backdoors by studying the statistical distribution of the models' subsets of gradients. Concretely, we first segment the model gradients into fragment vectors that represent small portions of model parameters. We then employ anomaly detection to locate the distributionally skewed fragments and prune the participants with the most outliers. We embody the findings in a novel defense method, ARIBA. We demonstrate through extensive analyses that our proposed methods effectively mitigate state-of-the-art backdoor attacks with minimal impact on task utility. | 翻訳日:2023-08-24 19:28:20 公開日:2023-08-23 |
# 雑音ロバスト確率勾配最適化のための適応型t分布推定ロバストモーメント AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization ( http://arxiv.org/abs/2201.06714v3 ) ライセンス: Link先を確認 | Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi and Takamitsu Matsubara | (参考訳) ディープラーニングアプリケーションの実用性が向上するにつれ、測定誤差やラベルミス、最適化結果に悪影響を及ぼす可能性のある推定サロゲート入力/出力など、さまざまなソースからのノイズによって、実践者は必然的にデータセットに直面することになる。
提案手法は, ハイパーパラメータの低減やロバスト性の向上, 適応性の向上など, 従来の手法よりもいくつかの利点がある。
さらに,amsgrad に頼らずに理論的な後悔を導き出す新しい手法を導入し,その分野への価値ある貢献を提供する。 With the increasing practicality of deep learning applications, practitioners are inevitably faced with datasets corrupted by noise from various sources such as measurement errors, mislabeling, and estimated surrogate inputs/outputs that can adversely impact the optimization results. It is a common practice to improve the optimization algorithm's robustness to noise, since this algorithm is ultimately in charge of updating the network parameters. Previous studies revealed that the first-order moment used in Adam-like stochastic gradient descent optimizers can be modified based on the Student's t-distribution. While this modification led to noise-resistant updates, the other associated statistics remained unchanged, resulting in inconsistencies in the assumed models. In this paper, we propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all the associated statistics. This provides a unified treatment of the optimization process, offering a comprehensive framework under the statistical model of the t-distribution for the first time. The proposed approach offers several advantages over previously proposed approaches, including reduced hyperparameters and improved robustness and adaptability. This noise-adaptive behavior contributes to AdaTerm's exceptional learning performance, as demonstrated through various optimization problems with different and/or unknown noise ratios. Furthermore, we introduce a new technique for deriving a theoretical regret bound without relying on AMSGrad, providing a valuable contribution to the field | 翻訳日:2023-08-24 19:28:01 公開日:2023-08-23 |
# 内在的フィードバックによるインタラクティブ強化学習に向けて Towards Interactive Reinforcement Learning with Intrinsic Feedback ( http://arxiv.org/abs/2112.01575v3 ) ライセンス: Link先を確認 | Benjamin Poole and Minwoo Lee | (参考訳) 強化学習(RL)と脳-コンピュータインターフェース(BCI)は、過去10年間で大きな成長を遂げてきた。
そこで我々は,本質的フィードバックの動機,アプローチ,オープンな問題とその基礎的概念について,より深く理解し,より効果的な利用を促進するためのチュートリアルスタイルのレビューを行う。 Reinforcement learning (RL) and brain-computer interfaces (BCI) have experienced significant growth over the past decade. With rising interest in human-in-the-loop (HITL), incorporating human input with RL algorithms has given rise to the sub-field of interactive RL. Adjacently, the field of BCI has long been interested in extracting informative brain signals from neural activity for use in human-computer interactions. A key link between these fields lies in the interpretation of neural activity as feedback such that interactive RL approaches can be employed. We denote this new and emerging medium of feedback as intrinsic feedback. Despite intrinsic feedback's ability to be conveyed automatically and even unconsciously, proper exploration surrounding this key link has largely gone unaddressed by both communities. Thus, to help facilitate a deeper understanding and a more effective utilization, we provide a tutorial-style review covering the motivations, approaches, and open problems of intrinsic feedback and its foundational concepts. | 翻訳日:2023-08-24 19:27:37 公開日:2023-08-23 |
# ストリームを用いたタスクおよびモーションプランニングにおける探索学習 Learning to Search in Task and Motion Planning with Streams ( http://arxiv.org/abs/2111.13144v6 ) ライセンス: Link先を確認 | Mohamed Khodeir and Ben Agro and Florian Shkurti | (参考訳) ロボットのタスク計画問題と動作計画問題は、離散的なタスク変数上のシンボリック計画と、連続状態とアクション変数に対する動作最適化を組み合わせる。
また,ブロックスタッキング操作タスクにおいて,このアルゴリズムを7DOFロボットアームに適用する。 Task and motion planning problems in robotics combine symbolic planning over discrete task variables with motion optimization over continuous state and action variables. Recent works such as PDDLStream have focused on optimistic planning with an incrementally growing set of objects until a feasible trajectory is found. However, this set is exhaustively expanded in a breadth-first manner, regardless of the logical and geometric structure of the problem at hand, which makes long-horizon reasoning with large numbers of objects prohibitively time-consuming. To address this issue, we propose a geometrically informed symbolic planner that expands the set of objects and facts in a best-first manner, prioritized by a Graph Neural Network that is learned from prior search computations. We evaluate our approach on a diverse set of problems and demonstrate an improved ability to plan in difficult scenarios. We also apply our algorithm on a 7DOF robotic arm in block-stacking manipulation tasks. | 翻訳日:2023-08-24 19:27:18 公開日:2023-08-23 |
# より効果的な半教師付き学習のための教師なし選択ラベリング Unsupervised Selective Labeling for More Effective Semi-Supervised Learning ( http://arxiv.org/abs/2110.03006v4 ) ライセンス: Link先を確認 | Xudong Wang, Long Lian, Stella X. Yu | (参考訳) ラベル付きデータセットとアノテーション予算を考慮すれば,固定数のインスタンスを選択的にラベル付けする方法を検討すれば,半教師付き学習(ssl)が,その部分ラベル付きデータセット上で最も効果的になる。
直感的には、ダウンストリームタスクが何であれ、ラベル付けすべきインスタンスは代表的かつ多様でなければならない。 前者はラベル付きデータへのラベルの伝搬を促進し、後者はデータセット全体のカバレッジを保証する。
例えば、CIFAR-10 (ImageNet-1K) の精度を 0.08% (0.2%) のラベル付きデータで10% (14%) 向上させ、特に低いアノテーション予算下では、ラベル付けデータの選択に費やされた小さな計算が大きな利益をもたらすことを示した。
私たちの仕事は、実用的で効率的なSSLの新しい標準を設定します。 Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right data to label, in addition to usual SSL's propagating labels from labeled data to the rest unlabeled data. This instance selection task is challenging, as without any labeled data we do not know what the objective of learning should be. Intuitively, no matter what the downstream task is, instances to be labeled must be representative and diverse: The former would facilitate label propagation to unlabeled data, whereas the latter would ensure coverage of the entire dataset. We capture this idea by selecting cluster prototypes, either in a pretrained feature space, or along with feature optimization, both without labels. Our unsupervised selective labeling consistently improves SSL methods over state-of-the-art active learning given labeled data, by 8 to 25 times in label efficiency. For example, it boosts FixMatch by 10% (14%) in accuracy on CIFAR-10 (ImageNet-1K) with 0.08% (0.2%) labeled data, demonstrating that small computation spent on selecting what data to label brings significant gain especially under a low annotation budget. Our work sets a new standard for practical and efficient SSL. | 翻訳日:2023-08-24 19:27:05 公開日:2023-08-23 |
# 集約グラフニューラルネットワークの安定性 Stability of Aggregation Graph Neural Networks ( http://arxiv.org/abs/2207.03678v2 ) ライセンス: Link先を確認 | Alejandro Parada-Mayorga, Zhiyang Wang, Fernando Gama, and Alejandro Ribeiro | (参考訳) 本稿では,グラフの摂動を考慮したアグリゲーショングラフニューラルネットワーク(Agg-GNN)の安定性について検討する。
本稿では,Agg-GNNの動作を,異なる大きさの摂動を考慮した実生活応用シナリオで検証する。 In this paper we study the stability properties of aggregation graph neural networks (Agg-GNNs) considering perturbations of the underlying graph. An Agg-GNN is a hybrid architecture where information is defined on the nodes of a graph, but it is processed block-wise by Euclidean CNNs on the nodes after several diffusions on the graph shift operator. We derive stability bounds for the mapping operator associated to a generic Agg-GNN, and we specify conditions under which such operators can be stable to deformations. We prove that the stability bounds are defined by the properties of the filters in the first layer of the CNN that acts on each node. Additionally, we show that there is a close relationship between the number of aggregations, the filter's selectivity, and the size of the stability constants. We also conclude that in Agg-GNNs the selectivity of the mapping operators is tied to the properties of the filters only in the first layer of the CNN stage. This shows a substantial difference with respect to the stability properties of selection GNNs, where the selectivity of the filters in all layers is constrained by their stability. We provide numerical evidence corroborating the results derived, testing the behavior of Agg-GNNs in real life application scenarios considering perturbations of different magnitude. | 翻訳日:2023-08-24 19:21:00 公開日:2023-08-23 |
# 適応部分モジュラー最大化におけるグループ平等 Group Equality in Adaptive Submodular Maximization ( http://arxiv.org/abs/2207.03364v3 ) ライセンス: Link先を確認 | Shaojie Tang, Jing Yuan | (参考訳) 本稿では,非適応的および適応的設定の下で群平等制約を受ける古典的な部分モジュラー最大化問題について検討する。
さらに、我々の研究をさらに拡張し、グローバルな濃度制約とその他の公正な表記を取り入れた。 In this paper, we study the classic submodular maximization problem subject to a group equality constraint under both non-adaptive and adaptive settings. It has been shown that the utility function of many machine learning applications, including data summarization, influence maximization in social networks, and personalized recommendation, satisfies the property of submodularity. Hence, maximizing a submodular function subject to various constraints can be found at the heart of many of those applications. On a high level, submodular maximization aims to select a group of most representative items (e.g., data points). However, the design of most existing algorithms does not incorporate the fairness constraint, leading to under- or over-representation of some particular groups. This motivates us to study the submodular maximization problem with group equality, where we aim to select a group of items to maximize a (possibly non-monotone) submodular utility function subject to a group equality constraint. To this end, we develop the first constant-factor approximation algorithm for this problem. The design of our algorithm is robust enough to be extended to solving the submodular maximization problem under a more complicated adaptive setting. Moreover, we further extend our study to incorporating a global cardinality constraint and other fairness notations. | 翻訳日:2023-08-24 19:20:41 公開日:2023-08-23 |
# 非運動複合最適化のためのランダム座標部分勾配法 Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization ( http://arxiv.org/abs/2206.14981v3 ) ライセンス: Link先を確認 | Lei Zhao and Ding Chen and Daoli Zhu and Xiao Li | (参考訳) 非滑らかな最適化問題に対処するための座標型部分勾配法は、部分微分のセット値の性質のため、比較的未熟である。
本研究は, 広範囲の凸および弱凸(非凸非滑らか)問題を包含する非滑らかな合成最適化問題に焦点を当てた。
具体的には、$\widetilde{\mathcal{O}}$$(1/\sqrt{k})$収束率と$\tilde o(1/\sqrt{k})$ほぼ確実に漸近収束率を、$f$が凸であるときの漸近収束率に設定する。
f$ が弱凸であり、その部分微分が大域的距離準正則性を満たす場合、期待値において $\mathcal{o}(\varepsilon^{-4})$ の反復複雑性を導出する。
最後に, 下位勾配法よりもrcsが優れていることを示す実験を複数実施した。 Coordinate-type subgradient methods for addressing nonsmooth optimization problems are relatively underexplored due to the set-valued nature of the subdifferential. In this work, our study focuses on nonsmooth composite optimization problems, encompassing a wide class of convex and weakly convex (nonconvex nonsmooth) problems. By utilizing the chain rule of the composite structure properly, we introduce the Randomized Coordinate Subgradient method (RCS) for tackling this problem class. To the best of our knowledge, this is the first coordinate subgradient method for solving general nonsmooth composite optimization problems. In theory, we consider the linearly bounded subgradients assumption for the objective function, which is more general than the traditional Lipschitz continuity assumption, to account for practical scenarios. We then conduct convergence analysis for RCS in both convex and weakly convex cases based on this generalized Lipschitz-type assumption. Specifically, we establish the $\widetilde{\mathcal{O}}$$(1/\sqrt{k})$ convergence rate in expectation and the $\tilde o(1/\sqrt{k})$ almost sure asymptotic convergence rate in terms of the suboptimality gap when $f$ is convex. For the case when $f$ is weakly convex and its subdifferential satisfies the global metric subregularity property, we derive the $\mathcal{O}(\varepsilon^{-4})$ iteration complexity in expectation. We also establish an asymptotic convergence result. To justify the global metric subregularity property utilized in the analysis, we establish this error bound condition for the concrete (real-valued) robust phase retrieval problem. We also provide a convergence lemma and the relationship between the global metric subregularity properties of a weakly convex function and its Moreau envelope. Finally, we conduct several experiments to demonstrate the possible superiority of RCS over the subgradient method. | 翻訳日:2023-08-24 19:20:20 公開日:2023-08-23 |
# 一階線形論理を生成文法とする Making first order linear logic a generating grammar ( http://arxiv.org/abs/2206.08955v3 ) ライセンス: Link先を確認 | Sergey Slavnov | (参考訳) 異なる分類文法が一階乗法線形論理(MLL1)の断片において表面表現を持つことが知られている。
本研究では、より簡潔で透明な計算を可能にする、以前に導入された {\bf ETTC} の非自明な表記に富んだ変種を考える。
我々はカットフリーなシークエント計算と自然な推論形式の両方を提示する。 It is known that different categorial grammars have surface representation in a fragment of first order multiplicative linear logic (MLL1). We show that the fragment of interest is equivalent to the recently introduced extended tensor type calculus (ETTC). ETTC is a calculus of specific typed terms, which represent tuples of strings, more precisely bipartite graphs decorated with strings. Types are derived from linear logic formulas, and rules correspond to concrete operations on these string-labeled graphs, so that they can be conveniently visualized. This provides the above mentioned fragment of MLL1 that is relevant for language modeling not only with some alternative syntax and intuitive geometric representation, but also with an intrinsic deductive system, which has been absent. In this work we consider a non-trivial notationally enriched variation of the previously introduced {\bf ETTC}, which allows more concise and transparent computations. We present both a cut-free sequent calculus and a natural deduction formalism. | 翻訳日:2023-08-24 19:19:27 公開日:2023-08-23 |
# SERE: 自己教師型トランスのための機能自己関係を探る SERE: Exploring Feature Self-relation for Self-supervised Transformer ( http://arxiv.org/abs/2206.05184v2 ) ライセンス: Link先を確認 | Zhong-Yu Li, Shanghua Gao, Ming-Ming Cheng | (参考訳) 畳み込みネットワーク(CNN)の自己超越による表現の学習が視覚タスクに有効であることが検証された。
私たちのソースコードは公開されます。 Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations, i.e., spatial/channel self-relations, for self-supervised learning. Self-relation based learning further enhances the relation modeling ability of ViT, resulting in stronger representations that stably improve performance on multiple downstream tasks. Our source code will be made publicly available. | 翻訳日:2023-08-24 19:19:10 公開日:2023-08-23 |
# 参加ダイナミクスとマルチリーナーリトレーニングからの創発的セグメンテーション Emergent segmentation from participation dynamics and multi-learner retraining ( http://arxiv.org/abs/2206.02667v2 ) ライセンス: Link先を確認 | Sarah Dean, Mihaela Curmei, Lillian J. Ratliff, Jamie Morgenstern, Maryam Fazel | (参考訳) データ駆動型サービスに参加する選択は、そのサービスの品質に基づいて行われることが多いが、サービスの学習と改善の能力に影響を与える。
本研究では,ユーザの学習者とサブ人口の両方が,勾配降下や乗法重みなど,幅広い更新クラスをカバーする \emph{risk-reducing} である場合の参加と再訓練のダイナミクスについて検討する。
実データから初期化した模擬例を通して現象を説明する。 The choice to participate in a data-driven service, often made on the basis of quality of that service, influences the ability of the service to learn and improve. We study the participation and retraining dynamics that arise when both the learners and sub-populations of users are \emph{risk-reducing}, which cover a broad class of updates including gradient descent, multiplicative weights, etc. Suppose, for example, that individuals choose to spend their time amongst social media platforms proportionally to how well each platform works for them. Each platform also gathers data about its active users, which it uses to update parameters with a gradient step. For this example and for our general class of dynamics, we show that the only asymptotically stable equilibria are segmented, with sub-populations allocated to a single learner. Under mild assumptions, the utilitarian social optimum is a stable equilibrium. In contrast to previous work, which shows that repeated risk minimization can result in representation disparity and high overall loss for a single learner \citep{hashimoto2018fairness,miller2021outside}, we find that repeated myopic updates with multiple learners lead to better outcomes. We illustrate the phenomena via a simulated example initialized from real data. | 翻訳日:2023-08-24 19:18:51 公開日:2023-08-23 |
# ストリーミングビデオにおけるラベル効率の高いオンライン連続物体検出 Label-Efficient Online Continual Object Detection in Streaming Video ( http://arxiv.org/abs/2206.00309v2 ) ライセンス: Link先を確認 | Jay Zhangjie Wu, David Junhao Zhang, Wynne Hsu, Mengmi Zhang, Mike Zheng Shou | (参考訳) 人間は連続したビデオストリームを視聴し、これまで学んだ経験を保ちながら、最小限の監督で新しい知識を継続的に獲得し、転送することができる。
本稿では,ストリーミングビデオにおけるより現実的で困難な問題である$\unicode{x2014}$Label-Efficient Online Continual Object Detection (LEOCOD)について検討する。
本稿では,ビデオストリームにおける物体検出のための既存の連続学習者への挿入と改良が容易で,データアノテーションコストの低減とモデルのリトレーニング時間の短縮が可能な,プラグアンドプレイモジュールである efficient-clsを提案する。
データとソースコードはhttps://github.com/showlab/Efficient-CLS.comで公開される。 Humans can watch a continuous video stream and effortlessly perform continual acquisition and transfer of new knowledge with minimal supervision yet retaining previously learnt experiences. In contrast, existing continual learning (CL) methods require fully annotated labels to effectively learn from individual frames in a video stream. Here, we examine a more realistic and challenging problem$\unicode{x2014}$Label-Efficient Online Continual Object Detection (LEOCOD) in streaming video. We propose a plug-and-play module, Efficient-CLS, that can be easily inserted into and improve existing continual learners for object detection in video streams with reduced data annotation costs and model retraining time. We show that our method has achieved significant improvement with minimal forgetting across all supervision levels on two challenging CL benchmarks for streaming real-world videos. Remarkably, with only 25% annotated video frames, our method still outperforms the base CL learners, which are trained with 100% annotations on all video frames. The data and source code will be publicly available at https://github.com/showlab/Efficient-CLS. | 翻訳日:2023-08-24 19:18:28 公開日:2023-08-23 |
# 回帰モデルにおける欠失と挿入テスト Deletion and Insertion Tests in Regression Models ( http://arxiv.org/abs/2205.12423v3 ) ライセンス: Link先を確認 | Naofumi Hama, Masayoshi Mase and Art B. Owen | (参考訳) 説明可能なAI(XAI)の基本課題は、ブラックボックス関数$f$による予測の背後にある最も重要な特徴を特定することである。
Petsiuk et al. (2018) の挿入と削除テストは、分類においてピクセルを最も重要から最小までランク付けするアルゴリズムの品質を判断するために用いられる。
この基準を用いて,統合勾配 (ig) で計算された特徴量と,kernel shap (ks) で計算された特徴量と,lime, deeplift, vanilla 勾配, input$\times$gradient 法を比較した。
しかし、ロジスティック回帰のような加法モデルの単調関数に対してはそうする。 A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function $f$. The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of $f$. We find an expression for the expected value of the AUC under a random ordering of inputs to $f$ and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input$\times$gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression. | 翻訳日:2023-08-24 19:18:08 公開日:2023-08-23 |
# スピン光子界面を用いたエネルギー効率の量子非破壊測定 Energy-efficient quantum non-demolition measurement with a spin-photon interface ( http://arxiv.org/abs/2205.09623v3 ) ライセンス: Link先を確認 | Maria Maffei, Bruno O. Goes, Stephen C. Wein, Andrew N. Jordan, Lo\"ic Lanco and Alexia Auff\`eves | (参考訳) スピン光子インタフェース (SPI) は量子技術の鍵となる装置であり、スピン量子ビットと偏光の伝播パルスの間で量子情報をコヒーレントに伝達することを目的としている。
提案手法は, 半導電デバイスにおける不完全性に対して頑健である。 Spin-photon interfaces (SPIs) are key devices of quantum technologies, aimed at coherently transferring quantum information between spin qubits and propagating pulses of polarized light. We study the potential of a SPI for quantum non demolition (QND) measurements of a spin state. After being initialized and scattered by the SPI, the state of a light pulse depends on the spin state. It thus plays the role of a pointer state, information being encoded in the light's temporal and polarization degrees of freedom. Building on the fully Hamiltonian resolution of the spin-light dynamics, we show that quantum superpositions of zero and single photon states outperform coherent pulses of light, producing pointer states which are more distinguishable with the same photon budget. The energetic advantage provided by quantum pulses over coherent ones is maintained when information on the spin state is extracted at the classical level by performing projective measurements on the light pulses. The proposed schemes are robust against imperfections in state of the art semi-conducting devices. | 翻訳日:2023-08-24 19:17:45 公開日:2023-08-23 |
# 自由回転3次元剛体画像からの解釈可能なダイナミクスの学習 Learning Interpretable Dynamics from Images of a Freely Rotating 3D Rigid Body ( http://arxiv.org/abs/2209.11355v3 ) ライセンス: Link先を確認 | Justice Mason and Christine Allen-Blanchette and Nicholas Zolman and Elizabeth Davison and Naomi Leonard | (参考訳) 多くの実世界の環境では、衛星のような自由回転する3次元剛体の画像観察は、低次元の測定がなければ可能である。
これを多段階予測パイプラインを用いて実現し、個々の画像を$\mathbf{so}(3)$ に準同型な潜在表現にマッピングし、潜在対からの角速度を計算し、ハミルトニアンを学習したハミルトニアンの運動方程式を用いて将来の潜在状態を予測する。
回転立方体と正方形プリズムの列が一様で非一様である新しい回転剛体データセットに対する本手法の有効性を実証する。 In many real-world settings, image observations of freely rotating 3D rigid bodies, such as satellites, may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics and a lack of interpretability reduces the usefulness of standard deep learning methods. In this work, we present a physics-informed neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion with a learned representation of the Hamiltonian. We demonstrate the efficacy of our approach on a new rotating rigid-body dataset with sequences of rotating cubes and rectangular prisms with uniform and non-uniform density. | 翻訳日:2023-08-24 19:09:20 公開日:2023-08-23 |
# EDO-Net: グラフダイナミクスによる変形可能な物体の弾性特性の学習 EDO-Net: Learning Elastic Properties of Deformable Objects from Graph Dynamics ( http://arxiv.org/abs/2209.08996v2 ) ライセンス: Link先を確認 | Alberta Longhini, Marco Moletta, Alfredo Reichlin, Michael C. Welle, David Held, Zackory Erickson, and Danica Kragic | (参考訳) 未知の物理特性に一般化する変形可能な物体のグラフ力学の学習問題について検討する。
本稿では,弾性特性の異なる多種多様なサンプルに対して学習したグラフ力学モデルであるEDO-Net(Elastic Deformable Object - Net)を提案する。
2)学習した表現を新しい下流タスクに転送する。 We study the problem of learning graph dynamics of deformable objects that generalizes to unknown physical properties. Our key insight is to leverage a latent representation of elastic physical properties of cloth-like deformable objects that can be extracted, for example, from a pulling interaction. In this paper we propose EDO-Net (Elastic Deformable Object - Net), a model of graph dynamics trained on a large variety of samples with different elastic properties that does not rely on ground-truth labels of the properties. EDO-Net jointly learns an adaptation module, and a forward-dynamics module. The former is responsible for extracting a latent representation of the physical properties of the object, while the latter leverages the latent representation to predict future states of cloth-like objects represented as graphs. We evaluate EDO-Net both in simulation and real world, assessing its capabilities of: 1) generalizing to unknown physical properties, 2) transferring the learned representation to new downstream tasks. | 翻訳日:2023-08-24 19:08:30 公開日:2023-08-23 |
# RGB-DカメラによるUAVナビゲーションと衝突回避のためのリアルタイム動的障害物追跡・マッピングシステム A real-time dynamic obstacle tracking and mapping system for UAV navigation and collision avoidance with an RGB-D camera ( http://arxiv.org/abs/2209.08258v3 ) ライセンス: Link先を確認 | Zhefan Xu, Xiaoyang Zhan, Baihan Chen, Yumeng Xiu, Chenhao Yang, and Kenji Shimada | (参考訳) 混雑した空間における自律ロボットにとって、リアルタイムな動的環境認識は不可欠である。
一般的なボクセルマッピング法は, 任意に複雑な形状の3次元障害物を効率的に表現できるが, 静的障害物と動的障害物の区別は困難であり, 障害物回避性能が制限される。
提案システムではまず, 占有ボクセルマップを用いた深度画像を用いて動的障害物領域を生成する。
障害物領域の提案では, カルマンフィルタと連続フィルタを用いて動的障害物の追跡を行う。
シミュレーションおよび物理実験により,本手法は動的環境における障害物をリアルタイムに追跡・表現し,障害物を安全に回避できることを示した。 The real-time dynamic environment perception has become vital for autonomous robots in crowded spaces. Although the popular voxel-based mapping methods can efficiently represent 3D obstacles with arbitrarily complex shapes, they can hardly distinguish between static and dynamic obstacles, leading to the limited performance of obstacle avoidance. While plenty of sophisticated learning-based dynamic obstacle detection algorithms exist in autonomous driving, the quadcopter's limited computation resources cannot achieve real-time performance using those approaches. To address these issues, we propose a real-time dynamic obstacle tracking and mapping system for quadcopter obstacle avoidance using an RGB-D camera. The proposed system first utilizes a depth image with an occupancy voxel map to generate potential dynamic obstacle regions as proposals. With the obstacle region proposals, the Kalman filter and our continuity filter are applied to track each dynamic obstacle. Finally, the environment-aware trajectory prediction method is proposed based on the Markov chain using the states of tracked dynamic obstacles. We implemented the proposed system with our custom quadcopter and navigation planner. The simulation and physical experiments show that our methods can successfully track and represent obstacles in dynamic environments in real-time and safely avoid obstacles. | 翻訳日:2023-08-24 19:08:16 公開日:2023-08-23 |
# 勾配に基づくbスプライン軌道最適化を用いた視覚支援型uavナビゲーションと動的障害物回避 Vision-aided UAV navigation and dynamic obstacle avoidance using gradient-based B-spline trajectory optimization ( http://arxiv.org/abs/2209.07003v2 ) ライセンス: Link先を確認 | Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, Kenji Shimada | (参考訳) 動的環境をナビゲートするには、ロボットが衝突のない軌道を生成し、移動する障害物を積極的に回避する必要がある。
シミュレーションと物理実験により,提案手法が動的環境を安全にナビゲートするためにリアルタイムに動作できることが証明された。 Navigating dynamic environments requires the robot to generate collision-free trajectories and actively avoid moving obstacles. Most previous works designed path planning algorithms based on one single map representation, such as the geometric, occupancy, or ESDF map. Although they have shown success in static environments, due to the limitation of map representation, those methods cannot reliably handle static and dynamic obstacles simultaneously. To address the problem, this paper proposes a gradient-based B-spline trajectory optimization algorithm utilizing the robot's onboard vision. The depth vision enables the robot to track and represent dynamic objects geometrically based on the voxel map. The proposed optimization first adopts the circle-based guide-point algorithm to approximate the costs and gradients for avoiding static obstacles. Then, with the vision-detected moving objects, our receding-horizon distance field is simultaneously used to prevent dynamic collisions. Finally, the iterative re-guide strategy is applied to generate the collision-free trajectory. The simulation and physical experiments prove that our method can run in real-time to navigate dynamic environments safely. | 翻訳日:2023-08-24 19:07:55 公開日:2023-08-23 |
# 非iidデータを用いたナレッジアウェアフェデレーションアクティブラーニング Knowledge-Aware Federated Active Learning with Non-IID Data ( http://arxiv.org/abs/2211.13579v2 ) ライセンス: Link先を確認 | Yu-Tong Cao, Ye Shi, Baosheng Yu, Jingya Wang, Dacheng Tao | (参考訳) フェデレーション学習は、複数の分散したクライアントが、ローカルトレーニングデータを共有せずに協調的に学習できるようにする。
上記の課題に対処するため,KSAS (Knowledge-Aware Federated Active Learning) とKCFU (Knowledge-Compensatory Federated Update) からなる知識認識型アクティブラーニング (KAFAL) を提案する。
連合型アクティブラーニングフレームワークにおけるKSASの最先端のアクティブラーニング手法に対する優位性と,KCFUの効率性を示すため,大規模な実験と分析を行った。 Federated learning enables multiple decentralized clients to learn collaboratively without sharing the local training data. However, the expensive annotation cost to acquire data labels on local clients remains an obstacle in utilizing local data. In this paper, we propose a federated active learning paradigm to efficiently learn a global model with limited annotation budget while protecting data privacy in a decentralized learning way. The main challenge faced by federated active learning is the mismatch between the active sampling goal of the global model on the server and that of the asynchronous local clients. This becomes even more significant when data is distributed non-IID across local clients. To address the aforementioned challenge, we propose Knowledge-Aware Federated Active Learning (KAFAL), which consists of Knowledge-Specialized Active Sampling (KSAS) and Knowledge-Compensatory Federated Update (KCFU). KSAS is a novel active sampling method tailored for the federated active learning problem. It deals with the mismatch challenge by sampling actively based on the discrepancies between local and global models. KSAS intensifies specialized knowledge in local clients, ensuring the sampled data to be informative for both the local clients and the global model. KCFU, in the meantime, deals with the client heterogeneity caused by limited data and non-IID data distributions. It compensates for each client's ability in weak classes by the assistance of the global model. Extensive experiments and analyses are conducted to show the superiority of KSAS over the state-of-the-art active learning methods and the efficiency of KCFU under the federated active learning framework. | 翻訳日:2023-08-24 19:00:25 公開日:2023-08-23 |
# 完全多様体ガウス変分ベイズ Exact Manifold Gaussian Variational Bayes ( http://arxiv.org/abs/2210.14598v2 ) ライセンス: Link先を確認 | Martin Magris, Mostafa Shabani, Alexandros Iosifidis | (参考訳) 複雑なモデルにおける変分推論(VI)の最適化アルゴリズムを提案する。
我々のExact manifold Gaussian Variational Bayes (EMGVB) は正確な更新ルールを提供するが、簡単に実装できる。
5つ以上のデータセットで、異なる統計モデル、計量モデル、深層学習モデルに対する実現可能なアプローチを実証的に検証し、ベースラインメソッドのパフォーマンスについて議論する。 We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical, econometric, and deep learning models, discussing its performance with respect to baseline methods. | 翻訳日:2023-08-24 18:59:11 公開日:2023-08-23 |
# タスクと動作計画のためのフィードバック付きポリシガイド型遅延探索 Policy-Guided Lazy Search with Feedback for Task and Motion Planning ( http://arxiv.org/abs/2210.14055v4 ) ライセンス: Link先を確認 | Mohamed Khodeir, Atharv Sonwane, Ruthrash Hari, Florian Shkurti | (参考訳) PDDLStreamソルバはタスク・アンド・モーション・プランニング(TAMP)問題に対する実行可能なソリューションとして最近登場し、PDDLを連続的なアクション空間の問題に拡張している。
その結果, 対象, 目標, 初期条件の異なる未確認テスト環境において, 実現可能解の探索において, 大幅な高速化が期待できることがわかった。
我々は, PDDLStream問題に対する既存の解法と比較し, TAMP手法の評価を行った。 PDDLStream solvers have recently emerged as viable solutions for Task and Motion Planning (TAMP) problems, extending PDDL to problems with continuous action spaces. Prior work has shown how PDDLStream problems can be reduced to a sequence of PDDL planning problems, which can then be solved using off-the-shelf planners. However, this approach can suffer from long runtimes. In this paper we propose LAZY, a solver for PDDLStream problems that maintains a single integrated search over action skeletons, which gets progressively more geometrically informed, as samples of possible motions are lazily drawn during motion planning. We explore how learned models of goal-directed policies and current motion sampling data can be incorporated in LAZY to adaptively guide the task planner. We show that this leads to significant speed-ups in the search for a feasible solution evaluated over unseen test environments of varying numbers of objects, goals, and initial conditions. We evaluate our TAMP approach by comparing to existing solvers for PDDLStream problems on a range of simulated 7DoF rearrangement/manipulation problems. | 翻訳日:2023-08-24 18:58:59 公開日:2023-08-23 |
# MARLlib: スケーラブルで効率的なマルチエージェント強化学習ライブラリ MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library ( http://arxiv.org/abs/2210.13708v3 ) ライセンス: Link先を確認 | Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang | (参考訳) マルチエージェント強化学習(marl:multi-agent reinforcement learning)の分野で研究者が直面する大きな課題は、マルチエージェントタスクとアルゴリズムの組み合わせに対して高速かつ互換性のある開発を提供するライブラリの識別に関するものである。
MARLlibライブラリのソースコードはGitHubで公開されている: \url{https://github.com/Replicable-MARL/MARLlib}。 A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues. In this paper, we present MARLlib, a library designed to address the aforementioned challenge by leveraging three key mechanisms: 1) a standardized multi-agent environment wrapper, 2) an agent-level algorithm implementation, and 3) a flexible policy mapping strategy. By utilizing these mechanisms, MARLlib can effectively disentangle the intertwined nature of the multi-agent task and the learning process of the algorithm, with the ability to automatically alter the training strategy based on the current task's attributes. The MARLlib library's source code is publicly accessible on GitHub: \url{https://github.com/Replicable-MARL/MARLlib}. | 翻訳日:2023-08-24 18:58:42 公開日:2023-08-23 |
# ProtoBandit:マルチアーマッドバンドによる効率的なプロトタイプ選択 ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits ( http://arxiv.org/abs/2210.01860v4 ) ライセンス: Link先を確認 | Arghya Roy Chaudhuri, Pratik Jawanpuria, and Bamdev Mishra | (参考訳) そこで本研究では,ターゲットセット$t$を最もよく表現するソースデータセット$s$から,情報型データインスタンス(すなわちプロトタイプ)のコンパクトセットを識別するマルチアームのbanditベースのフレームワークを提案する。
我々の分析の興味深い結果は、$k$-medoidsクラスタリング問題 $T = S$ set に対して、我々のアルゴリズム ProtoBandit が、$O(k^3|S|)$ complexity におけるメドイド(PAM) メソッドの分割の BUILD ステップ解を近似することを示したことである。
実証的に、protobanditは、最先端のアプローチから同等の品質のソリューションを得る一方で、数桁のマグニチュード(100~1000ドル)の類似度計算呼び出しの数を減少させる。 In this work, we propose a multi-armed bandit-based framework for identifying a compact set of informative data instances (i.e., the prototypes) from a source dataset $S$ that best represents a given target set $T$. Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of human decision-making. Current state-of-the-art prototype selection approaches require $O(|S||T|)$ similarity comparisons between source and target data points, which becomes prohibitively expensive for large-scale settings. We propose to mitigate this limitation by employing stochastic greedy search in the space of prototypical examples and multi-armed bandits for reducing the number of similarity comparisons. Our randomized algorithm, ProtoBandit, identifies a set of $k$ prototypes incurring $O(k^3|S|)$ similarity comparisons, which is independent of the size of the target set. An interesting outcome of our analysis is for the $k$-medoids clustering problem $T = S$ setting) in which we show that our algorithm ProtoBandit approximates the BUILD step solution of the partitioning around medoids (PAM) method in $O(k^3|S|)$ complexity. Empirically, we observe that ProtoBandit reduces the number of similarity computation calls by several orders of magnitudes ($100-1000$ times) while obtaining solutions similar in quality to those from state-of-the-art approaches. | 翻訳日:2023-08-24 18:58:27 公開日:2023-08-23 |
# CLIP2Point:イメージ深度事前トレーニングによるポイントクラウド分類へのCLIP転送 CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training ( http://arxiv.org/abs/2210.01055v3 ) ライセンス: Link先を確認 | Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W.H. Lau, Wanli Ouyang, Wangmeng Zuo | (参考訳) 3dビジョンと言語間の事前トレーニングは、トレーニングデータに制限があるため、まだ開発中である。
私たちのCLIP2PointはPointCLIPや他の自己監督型3Dネットワークよりも優れており、ゼロショットと少数ショットの分類で最先端の結果が得られる。 Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the diversity of depth distributions. To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. We introduce a new depth rendering setting that forms a better visual effect, and then render 52,460 pairs of images and depth maps from ShapeNet for pre-training. The pre-training scheme of CLIP2Point combines cross-modality learning to enforce the depth features for capturing expressive visual and textual features and intra-modality learning to enhance the invariance of depth aggregation. Additionally, we propose a novel Dual-Path Adapter (DPA) module, i.e., a dual-path structure with simplified adapters for few-shot learning. The dual-path structure allows the joint use of CLIP and CLIP2Point, and the simplified adapter can well fit few-shot tasks without post-search. Experimental results show that CLIP2Point is effective in transferring CLIP knowledge to 3D vision. Our CLIP2Point outperforms PointCLIP and other self-supervised 3D networks, achieving state-of-the-art results on zero-shot and few-shot classification. | 翻訳日:2023-08-24 18:57:52 公開日:2023-08-23 |
# 量子気象学における一般化条件予測の操作意味 Operational meanings of a generalized conditional expectation in quantum metrology ( http://arxiv.org/abs/2212.13162v5 ) ライセンス: Link先を確認 | Mankei Tsang | (参考訳) 量子力学に対する一般化条件付き期待(gce)の統一的形式論が最近浮上しているが、量子観測可能性の遡及に関する物理的意義は議論を呼んでいる。
これらの結果から、GCEと関連する発散は、量子決定と制御理論において自然で有用で不可逆的な役割を果たす。 A unifying formalism of generalized conditional expectations (GCEs) for quantum mechanics has recently emerged, but its physical implications regarding the retrodiction of a quantum observable remain controversial. To address the controversy, here I offer operational meanings for a version of the GCEs in the context of quantum parameter estimation. When a quantum sensor is corrupted by decoherence, the GCE is found to relate the operator-valued optimal estimators before and after the decoherence. Furthermore, the error increase, or regret, caused by the decoherence is shown to be equal to a divergence between the two estimators. The real weak value as a special case of the GCE plays the same role in suboptimal estimation -- its divergence from the optimal estimator is precisely the regret for not using the optimal measurement. For an application of the GCE, I show that it enables the use of dynamic programming for designing a controller that minimizes the estimation error. For the frequentist setting, I show that the GCE leads to a quantum Rao-Blackwell theorem, which offers significant implications for quantum metrology and thermal-light sensing in particular. These results give the GCE and the associated divergence a natural, useful, and incontrovertible role in quantum decision and control theory. | 翻訳日:2023-08-24 18:49:11 公開日:2023-08-23 |
# 低リソースの著者シップスタイル転送:悪名高い著者は模倣できるか? Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated? ( http://arxiv.org/abs/2212.08986v2 ) ライセンス: Link先を確認 | Ajay Patel, Nicholas Andrews, Chris Callison-Burch | (参考訳) オーサリングスタイルの転送は、本来の意味を保ちながら、ターゲットの作者のスタイルに合わせてテキストを変更することを含む。
我々は、ターゲットの著者スタイルに限られたテキストしか存在しない、より困難なオーサリングスタイル転送のクラスである、‘textit{low-resource authorship style transfer} task’を導入する。
さらなる調査を促進するために、データと実装をリリースします。 Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) makes these approaches primarily useful for style transfer to published authors, politicians, or other well-known figures and authorship styles, while style transfer to non-famous authors has not been well-studied. We introduce the \textit{low-resource authorship style transfer} task, a more challenging class of authorship style transfer where only a limited amount of text in the target author's style may exist. In our experiments, we specifically choose source and target authors from Reddit and style transfer their Reddit posts, limiting ourselves to just 16 posts (on average ~500 words) of the target author's style. Style transfer accuracy is typically measured by how often a classifier or human judge will classify an output as written by the target author. Recent authorship representations models excel at authorship identification even with just a few writing samples, making automatic evaluation of this task possible for the first time through evaluation metrics we propose. Our results establish an in-context learning technique we develop as the strongest baseline, though we find current approaches do not yet achieve mastery of this challenging task. We release our data and implementations to encourage further investigation. | 翻訳日:2023-08-24 18:48:48 公開日:2023-08-23 |
# ランダム化量子化:データ非依存型自己教師型学習のためのジェネリック拡張 Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning ( http://arxiv.org/abs/2212.08663v2 ) ライセンス: Link先を確認 | Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu | (参考訳) 自己監督型表現学習は、データの一部を保持し、残りの部分から予測するようにネットワークに指示するパラダイムに従う。
コードはhttps: //github.com/microsoft/random_quantizeで入手できる。 Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Among many techniques, data augmentation lies at the core for creating the information gap. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation by exploiting precision redundancy. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. Our approach significantly surpasses existing generic data augmentation methods, while showing on par performance against modality-specific augmentations. We comprehensively evaluate our approach on vision, audio, 3D point clouds, as well as the DABS benchmark which is comprised of various data modalities. The code is available at https: //github.com/microsoft/random_quantize. | 翻訳日:2023-08-24 18:48:20 公開日:2023-08-23 |
# グラフ上の信号と畳み込み演算子の次元化のためのグラフオンプール Graphon Pooling for Reducing Dimensionality of Signals and Convolutional Operators on Graphs ( http://arxiv.org/abs/2212.08171v2 ) ライセンス: Link先を確認 | Alejandro Parada-Mayorga and Zhiyang Wang and Alejandro Ribeiro | (参考訳) 本稿では,グラフ群の理論と密度グラフ列の極限に依存するグラフ上の畳み込み情報処理のためのプーリング手法を提案する。
グラフ空間における[0, 1]2のパーティション上のグラフおよびグラフ信号の誘導グラフ表現を利用する3つの手法を提案する。
その結果、畳み込み作用素の低次元表現が導出され、L2([0, 1]) における関数の単純な局所補間によって信号の次元の減少が達成される。
グラフニューラルネットワーク (GNN) を用いた数値実験により, グラノンプーリングによるアプローチの評価を行った。
グラフェンプーリングは, 層間の次元化率が大きい場合, 文献で提案されている他の手法に比べて有意に優れた性能を示す。
また、グラフトンプーリングを使用する場合、一般に過度に適合せず、計算コストも低いことを観察する。 In this paper we propose a pooling approach for convolutional information processing on graphs relying on the theory of graphons and limits of dense graph sequences. We present three methods that exploit the induced graphon representation of graphs and graph signals on partitions of [0, 1]2 in the graphon space. As a result we derive low dimensional representations of the convolutional operators, while a dimensionality reduction of the signals is achieved by simple local interpolation of functions in L2([0, 1]). We prove that those low dimensional representations constitute a convergent sequence of graphs and graph signals, respectively. The methods proposed and the theoretical guarantees that we provide show that the reduced graphs and signals inherit spectral-structural properties of the original quantities. We evaluate our approach with a set of numerical experiments performed on graph neural networks (GNNs) that rely on graphon pooling. We observe that graphon pooling performs significantly better than other approaches proposed in the literature when dimensionality reduction ratios between layers are large. We also observe that when graphon pooling is used we have, in general, less overfitting and lower computational cost. | 翻訳日:2023-08-24 18:47:59 公開日:2023-08-23 |
# 説明可能な変圧器に基づく時系列予測に向けた時相検出 Temporal Saliency Detection Towards Explainable Transformer-based Timeseries Forecasting ( http://arxiv.org/abs/2212.07771v2 ) ライセンス: Link先を確認 | Nghia Duong-Trung, Duc-Manh Nguyen, Danh Le-Phuoc | (参考訳) トランスフォーマーベースのモデルでは顕著な進歩があったが、長いマルチホライゾン時系列予測の課題は、特に説明可能性に対する永続的な課題である。
そこで本稿では,注意機構に基づく効果的なアプローチであるtsd(temporal saliency detection)を提案し,マルチホリゾン時系列予測に適用する。
この研究で示された包括的な調査は、将来の研究に貴重な洞察と利益をもたらすと信じている。 Despite the notable advancements in numerous Transformer-based models, the task of long multi-horizon time series forecasting remains a persistent challenge, especially towards explainability. Focusing on commonly used saliency maps in explaining DNN in general, our quest is to build attention-based architecture that can automatically encode saliency-related temporal patterns by establishing connections with appropriate attention heads. Hence, this paper introduces Temporal Saliency Detection (TSD), an effective approach that builds upon the attention mechanism and applies it to multi-horizon time series prediction. While our proposed architecture adheres to the general encoder-decoder structure, it undergoes a significant renovation in the encoder component, wherein we incorporate a series of information contracting and expanding blocks inspired by the U-Net style architecture. The TSD approach facilitates the multiresolution analysis of saliency patterns by condensing multi-heads, thereby progressively enhancing the forecasting of complex time series data. Empirical evaluations illustrate the superiority of our proposed approach compared to other models across multiple standard benchmark datasets in diverse far-horizon forecasting settings. The initial TSD achieves substantial relative improvements of 31% and 46% over several models in the context of multivariate and univariate prediction. We believe the comprehensive investigations presented in this study will offer valuable insights and benefits to future research endeavors. | 翻訳日:2023-08-24 18:47:42 公開日:2023-08-23 |
# Long-Range Haken-Strobl-Reinekerモデルにおける異常拡散 Anomalous diffusion in the Long-Range Haken-Strobl-Reineker model ( http://arxiv.org/abs/2212.07744v2 ) ライセンス: Link先を確認 | Alberto Giuseppe Catalano and Francesco Mattiotti and J\'er\^ome Dubail and David Hagenm\"uller and Toma\v{z} Prosen and Fabio Franchini and Guido Pupillo | (参考訳) 一般化されたHaken-Strobl-Reinekerモデルにより記述されたデファス化の存在下で、パワー・ローホッピング$\propto 1/r^\alpha$で、$d$次元格子における励起子の伝播を解析する。
量子ゼノン(quantum zeno)理論では、この力学は、長いジャンプを持つ排他過程に対する古典的マスター方程式によって記述される。
この極限において、空間分布を解析的に計算し、その形状は崩壊指数 $\alpha_{\rm cr} = (d+2)/2$ の臨界値で変化する。
超拡散運動は、長距離代数的テールを持つl\'evy安定分布に、$\alpha\leq\alpha_{\rm cr}$ で関連し、$\alpha > \alpha_{\rm cr}$ に対して分布は、長距離代数的テールを持つ驚くべき混合ガウスプロファイルに対応し、短距離拡散と長距離l\'evy飛行の共存をもたらす。
多面体の場合、ドメインウォールエキシトンプロファイルから始めて、代数的テールが任意の$\alpha$の分布に現れることが示され、熱分解に影響を与える: ホッピング範囲が長くなるほど、より早い平衡に達する。
この結果は, 低温イオン, Rydberg原子および超分子色素集合体を用いた実験と直接的に関係している。
長いジャンプを実験的に行うための排除プロセスを実現する方法を提供する。 We analyze the propagation of excitons in a $d$-dimensional lattice with power-law hopping $\propto 1/r^\alpha$ in the presence of dephasing, described by a generalized Haken-Strobl-Reineker model. We show that in the strong dephasing (quantum Zeno) regime the dynamics is described by a classical master equation for an exclusion process with long jumps. In this limit, we analytically compute the spatial distribution, whose shape changes at a critical value of the decay exponent $\alpha_{\rm cr} = (d+2)/2$. The exciton always diffuses anomalously: a superdiffusive motion is associated to a L\'evy stable distribution with long-range algebraic tails for $\alpha\leq\alpha_{\rm cr}$, while for $\alpha > \alpha_{\rm cr}$ the distribution corresponds to a surprising mixed Gaussian profile with long-range algebraic tails, leading to the coexistence of short-range diffusion and long-range L\'evy-flights. In the many-exciton case, we demonstrate that, starting from a domain-wall exciton profile, algebraic tails appear in the distributions for any $\alpha$, which affects thermalization: the longer the hopping range, the faster equilibrium is reached. Our results are directly relevant to experiments with cold trapped ions, Rydberg atoms and supramolecular dye aggregates. They provide a way to realize an exclusion process with long jumps experimentally. | 翻訳日:2023-08-24 18:47:20 公開日:2023-08-23 |
# 未知測定ノイズを持つ物理形ニューラルネットワーク Physics-informed neural networks with unknown measurement noise ( http://arxiv.org/abs/2211.15498v2 ) ライセンス: Link先を確認 | Philipp Pilar, Niklas Wahlstr\"om | (参考訳) 物理インフォームドニューラルネットワーク(PINN)は、解の発見と偏微分方程式のパラメータの同定の両方に対する柔軟なアプローチである。
標準の pinn フレームワークが非ガウスノイズの場合に分解されることを示す。
複数の例を用いて,提案手法の性能改善について述べる。 Physics-informed neural networks (PINNs) constitute a flexible approach to both finding solutions and identifying parameters of partial differential equations. Most works on the topic assume noiseless data, or data contaminated by weak Gaussian noise. We show that the standard PINN framework breaks down in case of non-Gaussian noise. We give a way of resolving this fundamental issue and we propose to jointly train an energy-based model (EBM) to learn the correct noise distribution. We illustrate the improved performance of our approach using multiple examples. | 翻訳日:2023-08-24 18:46:06 公開日:2023-08-23 |
# SEAM:大規模マージン正規化によるトランスファーブル混合精密量子化政策の探索 SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization ( http://arxiv.org/abs/2302.06845v2 ) ライセンス: Link先を確認 | Chen Tang, Kai Ouyang, Zenghao Chai, Yunpeng Bai, Yuan Meng, Zhi Wang, Wenwu Zhu | (参考訳) 混合精度量子化(MPQ)は、特にISLVRC-2012のような大規模データセットを使用する場合、各層に対して最適なビット幅割り当て(すなわちポリシー)を探索する時間を要する。
我々は、大規模ターゲットデータセットと比較して、データスケールのわずか4%のプロキシデータセットで高品質MPQポリシーを検索し、後者を直接検索するのと同じ精度を達成し、MPQ検索効率を最大300倍向上させる。 Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation i.e., the policy) for each layer, especially when using large-scale datasets such as ISLVRC-2012. This limits the practicality of MPQ in real-world deployment scenarios. To address this issue, this paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset instead of the large-scale dataset used for training the model. Deviating from the established norm of employing a consistent dataset for both model training and MPQ policy search stages, our approach, therefore, yields a substantial enhancement in the efficiency of MPQ exploration. Nonetheless, using discrepant datasets poses challenges in searching for a transferable MPQ policy. Driven by the observation that quantization noise of sub-optimal policy exerts a detrimental influence on the discriminability of feature representations -- manifesting as diminished class margins and ambiguous decision boundaries -- our method aims to identify policies that uphold the discriminative nature of feature representations, i.e., intra-class compactness and inter-class separation. This general and dataset-independent property makes us search for the MPQ policy over a rather small-scale proxy dataset and then the policy can be directly used to quantize the model trained on a large-scale dataset. Our method offers several advantages, including high proxy data utilization, no excessive hyper-parameter tuning, and high searching efficiency. We search high-quality MPQ policies with the proxy dataset that has only 4% of the data scale compared to the large-scale target dataset, achieving the same accuracy as searching directly on the latter, improving MPQ searching efficiency by up to 300 times. | 翻訳日:2023-08-24 18:40:32 公開日:2023-08-23 |
# スパーシティの観点からの深層ニューラルネットワークのプルーニング Pruning Deep Neural Networks from a Sparsity Perspective ( http://arxiv.org/abs/2302.05601v3 ) ライセンス: Link先を確認 | Enmao Diao, Ganghua Wang, Jiawei Zhan, Yuhong Yang, Jie Ding, Vahid Tarokh | (参考訳) 近年,計算処理やメモリ制約のある小型デバイスへのaiの迅速な展開を実現するため,ディープネットワークプルーニングが注目されている。
本研究では,ディープニューラルネットワークの潜在的圧縮性を測定するpqインデックス(pqi)を提案し,これを用いてsparsity-informed adaptive pruning(sap)アルゴリズムを開発した。
また,ハイパーパラメータを適切に選択した適応プルーニングアルゴリズムは,圧縮効率とロバスト性の観点から,宝くじによるプルーニング法のような反復プルーニングアルゴリズムよりも優れていることを示す。 In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness. | 翻訳日:2023-08-24 18:40:00 公開日:2023-08-23 |
# 境界散逸スピン鎖におけるスケールフリー非エルミタンスキン効果 Scale-free non-Hermitian skin effect in a boundary-dissipated spin chain ( http://arxiv.org/abs/2301.11896v2 ) ライセンス: Link先を確認 | He-Ran Wang, Bo Li, Fei Song, Zhong Wang | (参考訳) PT対称非エルミート境界場を持つ開XXZスピン鎖について検討する。
座標bethe ansatzを用いて, 相互作用によるスケールフリーな非エルミティアン皮膚効果を見いだした。
マルチボディのスケールフリー状態と境界弦状態の差を調査し, 等方点における2つの状態の遷移について検討した。
結果を検証するための実験的なスキームについても論じる。 We study the open XXZ spin chain with a PT-symmetric non-Hermitian boundary field. We find an interaction-induced scale-free non-Hermitian skin effect by using the coordinate Bethe ansatz. The steady state and the ground state in the PT broken phase are constructed, and the formulas of their eigen-energies in the thermodynamic limit are obtained. The differences between the many-body scale-free states and the boundary string states are explored, and the transition between the two at isotropic point is investigated. We also discuss an experimental scheme to verify our results. | 翻訳日:2023-08-24 18:39:33 公開日:2023-08-23 |
# ソーシャルメディア上でのメンタルヘルス評価のための因果分析と知覚マイニングのためのレンズとしてのNLP NLP as a Lens for Causal Analysis and Perception Mining to Infer Mental Health on Social Media ( http://arxiv.org/abs/2301.11004v5 ) ライセンス: Link先を確認 | Muskan Garg and Chandni Saxena and Usman Naseem and Bonnie J Dorr | (参考訳) ソーシャルメディア上の人間同士のインタラクションは、しばしば行動の背後にある意図を伝達し、オンラインユーザーのメンタルヘルス分析(MHA)のための心理的言語資源を生み出す。
しかし, 臨床心理学やパーソナライズされた精神医療に最適な影響を与えるためには, より連続的で説明可能な研究が必要である。
我々は,因果関係抽出と知覚の強化のためのデータセットや問題定式化における研究成果の増大を観察しながら,言語レンズによる計算心理学的問題をモデル化するための,より説明可能なアプローチを提唱する。 Interactions among humans on social media often convey intentions behind their actions, yielding a psychological language resource for Mental Health Analysis (MHA) of online users. The success of Computational Intelligence Techniques (CIT) for inferring mental illness from such social media resources points to NLP as a lens for causal analysis and perception mining. However, we argue that more consequential and explainable research is required for optimal impact on clinical psychology practice and personalized mental healthcare. To bridge this gap, we posit two significant dimensions: (1) Causal analysis to illustrate a cause and effect relationship in the user generated text; (2) Perception mining to infer psychological perspectives of social effects on online users intentions. Within the scope of Natural Language Processing (NLP), we further explore critical areas of inquiry associated with these two dimensions, specifically through recent advancements in discourse analysis. This position paper guides the community to explore solutions in this space and advance the state of practice in developing conversational agents for inferring mental health from social media. We advocate for a more explainable approach toward modeling computational psychology problems through the lens of language as we observe an increased number of research contributions in dataset and problem formulation for causal relation extraction and perception enhancements while inferring mental states. | 翻訳日:2023-08-24 18:39:25 公開日:2023-08-23 |
# 高次位相信号のdirac信号処理 Dirac signal processing of higher-order topological signals ( http://arxiv.org/abs/2301.10137v2 ) ライセンス: Link先を確認 | Lucille Calmon, Michael T. Schaub, Ginestra Bianconi | (参考訳) 高次ネットワークは、ノードだけでなく、三角形や一般に単純複体の高次元的な単純化に関連付けられた変数であるトポロジカルな信号を維持できる。
我々は,海中のドリフトのノイズ合成データとノイズデータを用いてアルゴリズムをテストした結果,ホッジラプラシアンのみに基づいて,真の信号性能よりも優れたアルゴリズムを効率的に再現できることを確認した。 Higher-order networks can sustain topological signals which are variables associated not only to the nodes, but also to the links, to the triangles and in general to the higher dimensional simplices of simplicial complexes. These topological signals can describe a large variety of real systems including currents in the ocean, synaptic currents between neurons and biological transportation networks. In real scenarios topological signal data might be noisy and an important task is to process these signals by improving their signal to noise ratio. So far topological signals are typically processed independently of each other. For instance, node signals are processed independently of link signals, and algorithms that can enforce a consistent processing of topological signals across different dimensions are largely lacking. Here we propose Dirac signal processing, an adaptive, unsupervised signal processing algorithm that learns to jointly filter topological signals supported on nodes, links and triangles of simplicial complexes in a consistent way. The proposed Dirac signal processing algorithm is formulated in terms of the discrete Dirac operator which can be interpreted as "square root" of a higher-order Hodge Laplacian. We discuss in detail the properties of the Dirac operator including its spectrum and the chirality of its eigenvectors and we adopt this operator to formulate Dirac signal processing that can filter noisy signals defined on nodes, links and triangles of simplicial complexes. We test our algorithms on noisy synthetic data and noisy data of drifters in the ocean and find that the algorithm can learn to efficiently reconstruct the true signals outperforming algorithms based exclusively on the Hodge Laplacian. | 翻訳日:2023-08-24 18:39:02 公開日:2023-08-23 |
# 保健医療における量子コンピューティング応用の現状 The state of quantum computing applications in health and medicine ( http://arxiv.org/abs/2301.09106v2 ) ライセンス: Link先を確認 | Frederik F. Fl\"other | (参考訳) 医療や生命科学の分野を含む医学は、ここ数年で量子関連の活動や実験が活発に行われている(生物学と量子理論はシュルン=オディンガーの猫以来ずっと絡み合っていたが)。
ユースケースとアルゴリズムを要約し、技術的および倫理的課題を含む量子時代の医学の展望を提供する。 Medicine, including fields in healthcare and life sciences, has seen a flurry of quantum-related activities and experiments in the last few years (although biology and quantum theory have arguably been entangled ever since Schr\"odinger's cat). The initial focus was on biochemical and computational biology problems; recently, however, clinical and medical quantum solutions have drawn increasing interest. The rapid emergence of quantum computing in health and medicine necessitates a mapping of the landscape. In this review, clinical and medical proof-of-concept quantum computing applications are outlined and put into perspective. These consist of over 40 experimental and theoretical studies. The use case areas span genomics, clinical research and discovery, diagnostics, and treatments and interventions. Quantum machine learning (QML) in particular has rapidly evolved and shown to be competitive with classical benchmarks in recent medical research. Near-term QML algorithms have been trained with diverse clinical and real-world data sets. This includes studies in generating new molecular entities as drug candidates, diagnosing based on medical image classification, predicting patient persistence, forecasting treatment effectiveness, and tailoring radiotherapy. The use cases and algorithms are summarized and an outlook on medicine in the quantum era, including technical and ethical challenges, is provided. | 翻訳日:2023-08-24 18:38:35 公開日:2023-08-23 |
# BallGAN:球面背景を持つ3次元画像合成 BallGAN: 3D-aware Image Synthesis with a Spherical Background ( http://arxiv.org/abs/2301.09091v2 ) ライセンス: Link先を確認 | Minjung Shin, Yunji Seo, Jeongmin Bae, Young Sun Choi, Hyunsu Kim, Hyeran Byun, Youngjung Uh | (参考訳) 3D対応のGANは、任意の視点でレンダリングして画像を生成できるように、リアルな3Dシーンを合成することを目指している。
2) トレーニングはより安定する。
3) 前景は異なる任意の背景の上に別々に描画することができる。 3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real image to the discriminator is not enough. To solve this problem, we propose to approximate the background as a spherical surface and represent a scene as a union of the foreground placed in the sphere and the thin spherical background. It reduces the degree of freedom in the background field. Accordingly, we modify the volume rendering equation and incorporate dedicated constraints to design a novel 3D-aware GAN framework named BallGAN. BallGAN has multiple advantages as follows. 1) It produces more reasonable 3D geometry; the images of a scene across different viewpoints have better photometric consistency and fidelity than the state-of-the-art methods. 2) The training becomes much more stable. 3) The foreground can be separately rendered on top of different arbitrary backgrounds. | 翻訳日:2023-08-24 18:38:16 公開日:2023-08-23 |
# 厳密な不確かさを意識した量子化フレームワークは、再現可能で再現可能な機械学習ワークフローに不可欠である A Rigorous Uncertainty-Aware Quantification Framework Is Essential for Reproducible and Replicable Machine Learning Workflows ( http://arxiv.org/abs/2301.05763v3 ) ライセンス: Link先を確認 | Line Pouchard, Kristofer G. Reyes, Francis J. Alexander, Byung-Jun Yoon | (参考訳) 機械学習(ML)または人工知能(AI)モデルによる予測を再現し、そのようなML/AI予測を組み込んだ科学的ワークフローの結果として得られる能力は、多くの要因によって駆動される。
我々は、このフレームワークが様々な科学的応用のためにより再現可能で信頼できるワークフローの設計に寄与し、究極的には科学的発見を加速することを期待している。 The ability to replicate predictions by machine learning (ML) or artificial intelligence (AI) models and results in scientific workflows that incorporate such ML/AI predictions is driven by numerous factors. An uncertainty-aware metric that can quantitatively assess the reproducibility of quantities of interest (QoI) would contribute to the trustworthiness of results obtained from scientific workflows involving ML/AI models. In this article, we discuss how uncertainty quantification (UQ) in a Bayesian paradigm can provide a general and rigorous framework for quantifying reproducibility for complex scientific workflows. Such as framework has the potential to fill a critical gap that currently exists in ML/AI for scientific workflows, as it will enable researchers to determine the impact of ML/AI model prediction variability on the predictive outcomes of ML/AI-powered workflows. We expect that the envisioned framework will contribute to the design of more reproducible and trustworthy workflows for diverse scientific applications, and ultimately, accelerate scientific discoveries. | 翻訳日:2023-08-24 18:37:57 公開日:2023-08-23 |
# 位相シフトadversarial training Phase-shifted Adversarial Training ( http://arxiv.org/abs/2301.04785v3 ) ライセンス: Link先を確認 | Yeachan Kim, Seongyeon Kim, Ihyeok Seo, Bonggun Shin | (参考訳) 敵のトレーニングは、ニューラルネットワークベースのアプリケーションを現実世界に安全にデプロイするための必須コンポーネントと考えられている。
評価のために,CIFAR-10 と ImageNet を用いて,信頼性評価のための適応攻撃を慎重に設計した実験を行った。
これにより、モデルが各データ付近でスムーズな予測を行えるようにすることで、対向ロバスト性が改善される。 Adversarial training has been considered an imperative component for safely deploying neural network-based applications to the real world. To achieve stronger robustness, existing methods primarily focus on how to generate strong attacks by increasing the number of update steps, regularizing the models with the smoothed loss function, and injecting the randomness into the attack. Instead, we analyze the behavior of adversarial training through the lens of response frequency. We empirically discover that adversarial training causes neural networks to have low convergence to high-frequency information, resulting in highly oscillated predictions near each data. To learn high-frequency contents efficiently and effectively, we first prove that a universal phenomenon of frequency principle, i.e., \textit{lower frequencies are learned first}, still holds in adversarial training. Based on that, we propose phase-shifted adversarial training (PhaseAT) in which the model learns high-frequency components by shifting these frequencies to the low-frequency range where the fast convergence occurs. For evaluations, we conduct the experiments on CIFAR-10 and ImageNet with the adaptive attack carefully designed for reliable evaluation. Comprehensive results show that PhaseAT significantly improves the convergence for high-frequency information. This results in improved adversarial robustness by enabling the model to have smoothed predictions near each data. | 翻訳日:2023-08-24 18:37:39 公開日:2023-08-23 |
# 誘導深度マップ超解像のための球面空間特徴分解 Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution ( http://arxiv.org/abs/2303.08942v2 ) ライセンス: Link先を確認 | Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun Zhang, Radu Timofte, Luc Van Gool | (参考訳) 誘導深度マップ超解像(GDSR)はマルチモーダル画像処理におけるホットトピックとして,高分解能(HR)RGB画像の付加情報を含む低分解能(LR)深度マップのアップサンプリングを目的としている。
本稿では,この問題を解決するために,Spherical Space Feature Decomposition Network (SSDNet)を提案する。
コードは \url{https://github.com/zhaozixiang1228/gdsr-ssdnet} で入手できる。 Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image processing, aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene. The critical step of this task is to effectively extract domain-shared and domain-private RGB/depth features. In addition, three detailed issues, namely blurry edges, noisy surfaces, and over-transferred RGB texture, need to be addressed. In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues. To better model cross-modality features, Restormer block-based RGB/depth encoders are employed for extracting local-global features. Then, the extracted features are mapped to the spherical space to complete the separation of private features and the alignment of shared features. Shared features of RGB are fused with the depth features to complete the GDSR task. Subsequently, a spherical contrast refinement (SCR) module is proposed to further address the detail issues. Patches that are classified according to imperfect categories are input into the SCR module, where the patch features are pulled closer to the ground truth and pushed away from the corresponding imperfect samples in the spherical feature space via contrastive learning. Extensive experiments demonstrate that our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes. The code is available at \url{https://github.com/Zhaozixiang1228/GDSR-SSDNet}. | 翻訳日:2023-08-24 18:30:37 公開日:2023-08-23 |
# 自己教師付き学習に基づく心血管イベント検出のための総合臨床進歩訓練モデル Self-supervised learning based general laboratory progress pretrained model for cardiovascular event detection ( http://arxiv.org/abs/2303.06980v3 ) ライセンス: Link先を確認 | Li-Chin Chen, Kuo-Hsuan Hung, Yi-Ju Tseng, Hsin-Yao Wang, Tse-Min Lu, Wei-Chieh Huang, Yu Tsao | (参考訳) 患者データの本質的な性質にはいくつかの課題がある。
有意な症例は, 患者の容積や経過の整合性から, 経時的, 時間的, 欠失, 空間的不規則さで知られているが, 稀な症例や特定の症例の募集は, 患者の大きさやエピソード的観察が限られているため, しばしば制限される。
glp処理後、分類は顕著な向上を示し、平均精度は 0.63 から 0.90 に上昇した。
その結果, 従来のGLP処理と比較して有意な優位性を示した(p < 0.01)。
このアプローチを他の病気にまで拡張する可能性は非常に有望です。 The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and episodic observations. This study employed self-supervised learning (SSL) to pretrain a generalized laboratory progress (GLP) model that captures the overall progression of six common laboratory markers in prevalent cardiovascular cases, with the intention of transferring this knowledge to aid in the detection of specific cardiovascular event. GLP implemented a two-stage training approach, leveraging the information embedded within interpolated data and amplify the performance of SSL. After GLP pretraining, it is transferred for TVR detection. The proposed two-stage training improved the performance of pure SSL, and the transferability of GLP exhibited distinctiveness. After GLP processing, the classification exhibited a notable enhancement, with averaged accuracy rising from 0.63 to 0.90. All evaluated metrics demonstrated substantial superiority (p < 0.01) compared to prior GLP processing. Our study effectively engages in translational engineering by transferring patient progression of cardiovascular laboratory parameters from one patient group to another, transcending the limitations of data availability. The transferability of disease progression optimized the strategies of examinations and treatments, and improves patient prognosis while using commonly available laboratory parameters. The potential for expanding this approach to encompass other diseases holds great promise. | 翻訳日:2023-08-24 18:30:08 公開日:2023-08-23 |
# 論理プログラミングと大規模言語モデルを用いた知識グラフに関するドメイン固有質問 Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models ( http://arxiv.org/abs/2303.02206v2 ) ライセンス: Link先を確認 | Navid Madani, Rohini K. Srihari, Kenneth Joseph | (参考訳) ドメイン固有のグラフに関する質問に答えるには、限定された関係数とドメインの特定の性質のために、調整されたアプローチが必要である。
提案手法では,古典論理型言語を大規模言語モデル (LLM) に統合し,論理的推論能力を活用してKGQAタスクに取り組む。
全体として、我々の研究は、ドメイン固有グラフ上の質問応答に対処するための有望なアプローチを示し、論理プログラミング言語を組み込むことで、説明可能で堅牢なソリューションを提供します。 Answering questions over domain-specific graphs requires a tailored approach due to the limited number of relations and the specific nature of the domain. Our approach integrates classic logical programming languages into large language models (LLMs), enabling the utilization of logical reasoning capabilities to tackle the KGQA task. By representing the questions as Prolog queries, which are readable and near close to natural language in representation, we facilitate the generation of programmatically derived answers. To validate the effectiveness of our approach, we evaluate it using a well-known benchmark dataset, MetaQA. Our experimental results demonstrate that our method achieves accurate identification of correct answer entities for all test questions, even when trained on a small fraction of annotated data. Overall, our work presents a promising approach to addressing question answering over domain-specific graphs, offering an explainable and robust solution by incorporating logical programming languages. | 翻訳日:2023-08-24 18:29:16 公開日:2023-08-23 |
# 超相対論的状態におけるトンネル時間に及ぼすスピンの影響 Influence of spin on tunneling times in the super-relativistic regime ( http://arxiv.org/abs/2303.01873v2 ) ライセンス: Link先を確認 | Said Lantigua, Jonas Maziero | (参考訳) ディラック方程式を用いて記述された相対論的トンネル効果について、著者らは[phys. rev. a 70, 052112 (2004)]において、ポテンシャル障壁内の粒子の蓄積時間とインシデントと反射波関数の相互作用に関連する自己干渉遅延の合計として位相時間(グループ遅延)を決定できる一般的な結果の推論を提示した。
この表現は [found. phys. 45, 1586 (2015)] で導入されたものと似ている。
より具体的には、スピンアップ・スピンダウンした粒子のポテンシャル障壁内におけるトンネル時間と、インシデントに関連する自己相互作用時間と、スピンアップした粒子の反射波動関数の合計としてトンネル時間を求める。 For the relativistic tunneling effect described using Dirac's equation, in [Phys. Rev. A 70, 052112 (2004)] the authors presented the deduction of a general result that allows for the determination of the phase time (group delay) as the sum of the particle dwell time inside a potential barrier and of the self-interference delay associated with the incident and reflected wave functions interaction. In this article, a mathematical model is derived through a construction analogous to the proposal mentioned above, but based on an alternative representation for Dirac's equation. This representation is similar to the one introduced in [Found. Phys. 45, 1586 (2015)]. Thus, from the application of this model in the study of the tunneling effect in the absence of an external magnetic field, the influence of spin on the tunneling times is described. More specifically, the tunneling time is obtained as the sum of the dwell times inside the potential barrier for particles with spin up and spin down and the self-interaction time associated with the incident and reflected wave functions for particles with spin up. | 翻訳日:2023-08-24 18:29:00 公開日:2023-08-23 |
# 胸部X線による疾患検出のためのコンテンツ認識型不変モデルによる未確認領域への一般化の学習 Learning to Generalize towards Unseen Domains via a Content-Aware Style Invariant Model for Disease Detection from Chest X-rays ( http://arxiv.org/abs/2302.13991v2 ) ライセンス: Link先を確認 | Mohammad Zunaed, Md. Aynal Haque, Taufiq Hasan | (参考訳) ソースドメインミスマッチによるパフォーマンス劣化は、特に胸部X線(CXR)の深層学習に基づく医用画像解析における長年にわたる課題である。
大規模胸部疾患データセットであるCheXpert,MIMIC-CXR,BRAXを用いた大規模な実験により,本フレームワークはドメインシフトの存在下でより堅牢であり,最先端のパフォーマンスを実現することができることを示した。 Performance degradation due to source domain mismatch is a longstanding challenge in deep learning-based medical image analysis, particularly for chest X-rays (CXRs). Several methods (e.g., adversarial training, multi-domain mixups) have been proposed to extract domain-invariant high-level features to address this domain shift. However, these methods do not explicitly regularize the content and style characteristics of the extracted domain-invariant features. Recent studies have demonstrated that CNN models exhibit a strong bias toward styles (e.g., uninformative textures) rather than content (e.g., shape), in stark contrast to the human-vision system. Radiologists tend to learn visual cues from CXRs and thus perform well across multiple domains. Therefore, in medical imaging for pathology diagnosis from CXR images, models should extract domain-invariant features that are style-invariant and content-biased. Motivated by this, we employ the novel style randomization modules (SRMs) at both image and feature levels that work together hierarchically to create rich style perturbed features on the fly while keeping the content intact. In addition, we leverage consistency regularizations between global semantic features and predicted probability distributions, respectively, for with and without style perturbed versions of the same CXR image to tweak the model's sensitivity toward content markers for accurate predictions. Extensive experiments with three large-scale thoracic disease datasets, i.e., CheXpert, MIMIC-CXR, and BRAX, demonstrate that our proposed framework is more robust in the presence of domain shift and achieves state-of-the-art performance. | 翻訳日:2023-08-24 18:28:37 公開日:2023-08-23 |
# 路上走行データによる運転者の性格特性の推定 Estimating Driver Personality Traits from On-Road Driving Data ( http://arxiv.org/abs/2302.10898v2 ) ライセンス: Link先を確認 | Ryusei Kimura and Takahiro Tanaka and Yuki Yoshihara and Kazuhiro Fujikake and Hitoshi Kanamori and Shogo Okada | (参考訳) 本稿では,運転支援システムにおける運転データを用いた運転者の心理特性の推定について述べる。
また,レグレッション・モデリングにより,運転行動とトレイルメイキングテスト (tmt) と有用視野テスト (ufov) を含む様々な認知機能との関係について検討した。
心理的な運転スタイルや作業負荷感度などの特徴は高い精度で推定されるが、様々な持続時間セグメンテーションの精度が向上するか否かは特性に依存するため、全ての特性に対して有効ではない。 This paper focuses on the estimation of a driver's psychological characteristics using driving data for driving assistance systems. Driving assistance systems that support drivers by adapting individual psychological characteristics can provide appropriate feedback and prevent traffic accidents. As a first step toward implementing such adaptive assistance systems, this research aims to develop a model to estimate drivers' psychological characteristics, such as cognitive function, psychological driving style, and workload sensitivity, from on-road driving behavioral data using machine learning and deep learning techniques. We also investigated the relationship between driving behavior and various cognitive functions, including the Trail Making Test (TMT) and Useful Field of View (UFOV) test, through regression modeling. The proposed method focuses on road type information and captures various durations of time-series data observed from driving behaviors. First, we segment the driving time-series data into two road types, namely, arterial roads and intersections, to consider driving situations. Second, we further segment data into many sequences of various durations. Third, statistics are calculated from each sequence. Finally, these statistics are used as input features of machine learning models to estimate psychological characteristics. The experimental results show that our model can estimate a driver's cognitive function, namely, the TMT~(B) and UFOV test scores, with Pearson correlation coefficients $r$ of 0.579 and 0.708, respectively. Some characteristics, such as psychological driving style and workload sensitivity, are estimated with high accuracy, but whether various duration segmentation improves accuracy depends on the characteristics, and it is not effective for all characteristics. | 翻訳日:2023-08-24 18:28:07 公開日:2023-08-23 |
# 機械学習のセキュリティ防衛における品質測定 : 音声認識を事例として Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition ( http://arxiv.org/abs/2302.08973v6 ) ライセンス: Link先を確認 | Luke E. Richards, Edward Raff, Cynthia Matuszek | (参考訳) 過去10年間で、機械学習のセキュリティコミュニティは、回避攻撃のための無数の防御方法を開発した。
このコミュニティの未熟な疑問は: この防御策は誰を擁護するのか?
音声コマンド認識のケーススタディでは, 性別, アクセント, 年齢などの社会的サブグループに対して, 対人訓練や強化が不等であるが, 複雑に保護されていることを示す。
本稿では, ランダム化スムースメントとニューラルリジェクションの2つの防御法の比較を行い, マイノリティ集団のサンプリング機構により, ランダム化スムースメントがより公平であることを見出した。
本研究は,音声領域における対向ロバスト性の相違と,拒絶に基づく防御の公平性を評価する最初の研究である。 Over the past decade, the machine learning security community has developed a myriad of defenses for evasion attacks. An understudied question in that community is: for whom do these defenses defend? This work considers common approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We outline appropriate parity metrics for analysis and begin to answer this question through empirical results of the fairness implications of machine learning security methods. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. The framework we propose for measuring defense equality can be applied to robustly trained models, preprocessing-based defenses, and rejection methods. We identify a set of datasets with a user-centered application and a reasonable computational cost suitable for case studies in measuring the equality of defenses. In our case study of speech command recognition, we show how such adversarial training and augmentation have non-equal but complex protections for social subgroups across gender, accent, and age in relation to user coverage. We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups. This represents the first work examining the disparity in the adversarial robustness in the speech domain and the fairness evaluation of rejection-based defenses. | 翻訳日:2023-08-24 18:27:41 公開日:2023-08-23 |
# ロバスト強化学習のためのレグレトベース最適化 Regret-Based Optimization for Robust Reinforcement Learning ( http://arxiv.org/abs/2302.06912v3 ) ライセンス: Link先を確認 | Roman Belaire, Pradeep Varakantham, Thanh Nguyen, David Lo | (参考訳) 深層強化学習(DRL)の政策は、観測において小さな敵対的雑音に弱いことが示されている。
我々の後悔の基準は、既存の値およびポリシーベースのDeep RLメソッドの変更に利用できる。
当社のアプローチは,より堅牢なDeep RLに対する主要なアプローチに対して,さまざまなベンチマークで大幅なパフォーマンス向上を実現しています。 Deep Reinforcement Learning (DRL) policies have been shown to be vulnerable to small adversarial noise in observations. Such adversarial noise can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about nearby signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) or objects (e.g., cars altered to be recognized as trees) can be fatal. Existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on reactive approaches that iteratively improve against adversarial examples generated at each iteration. While such approaches have been shown to provide improvements over regular RL methods, they are reactive and can fare significantly worse if certain categories of adversarial examples are not generated during training. To that end, we pursue a more proactive approach that relies on directly optimizing a well-studied robustness measure, regret instead of expected value. We provide a principled approach that minimizes maximum regret over a "neighborhood" of observations to the received "observation". Our regret criterion can be used to modify existing value- and policy-based Deep RL methods. We demonstrate that our approaches provide a significant improvement in performance across a wide variety of benchmarks against leading approaches for robust Deep RL. | 翻訳日:2023-08-24 18:27:03 公開日:2023-08-23 |
# 全スライド画像分類のためのインスタンスからバッグ分類器への反復結合多重インスタンス学習 Iteratively Coupled Multiple Instance Learning from Instance to Bag Classifier for Whole Slide Image Classification ( http://arxiv.org/abs/2303.15749v2 ) ライセンス: Link先を確認 | Hongyi Wang, Luyang Luo, Fang Wang, Ruofeng Tong, Yen-Wei Chen, Hongjie Hu, Lanfen Lin, and Hao Chen | (参考訳) Whole Slide Image (WSI)分類は、非常に高解像度であり、きめ細かいラベルがないため、依然として課題である。
この問題を解決するために,バッグレベルの分類器からパッチ埋め込み装置への損失バックプロパゲーションプロセスをブリッジするICMIL (Iteratively Coupled MIL) という新しいフレームワークを提案する。
コードは、https://github.com/Dootmaan/ICMIL.comで入手できる。 Whole Slide Image (WSI) classification remains a challenge due to their extremely high resolution and the absence of fine-grained labels. Presently, WSI classification is usually regarded as a Multiple Instance Learning (MIL) problem when only slide-level labels are available. MIL methods involve a patch embedding module and a bag-level classification module, but they are prohibitively expensive to be trained in an end-to-end manner. Therefore, existing methods usually train them separately, or directly skip the training of the embedder. Such schemes hinder the patch embedder's access to slide-level semantic labels, resulting in inconsistency within the entire MIL pipeline. To overcome this issue, we propose a novel framework called Iteratively Coupled MIL (ICMIL), which bridges the loss back-propagation process from the bag-level classifier to the patch embedder. In ICMIL, we use category information in the bag-level classifier to guide the patch-level fine-tuning of the patch feature extractor. The refined embedder then generates better instance representations for achieving a more accurate bag-level classifier. By coupling the patch embedder and bag classifier at a low cost, our proposed framework enables information exchange between the two modules, benefiting the entire MIL classification model. We tested our framework on two datasets using three different backbones, and our experimental results demonstrate consistent performance improvements over state-of-the-art MIL methods. The code is available at: https://github.com/Dootmaan/ICMIL. | 翻訳日:2023-08-24 18:20:23 公開日:2023-08-23 |
# 生成的半教師付き学習と生成的オープンセット認識の関連について On the link between generative semi-supervised learning and generative open-set recognition ( http://arxiv.org/abs/2303.11702v4 ) ライセンス: Link先を確認 | Emile Reyn Engelbrecht, Johan du Preez | (参考訳) 本研究では,GAN(Generative Adversarial Network)の文脈下で,半教師付き学習(SSL,部分的にラベル付けされたデータセットをトレーニングする)とオープンセット認識(OSR,同時ノベルティ検出を伴う分類)の関係について検討した。
具体的には、ssl-gans と osr-gans は、そのジェネレータに、分類器ネットワークを正規化する '悪い' サンプルを生成する必要がある。
その結果,SSL-GAN は OSR-GAN とほぼ同じ結果となり,SSL-OSR リンクが証明された。
SSL-OSRの複合フレームワークにより、分類器訓練の実用性とコスト効率が向上し、理論的および応用的研究も議論されている。 This study investigates the relationship between semi-supervised learning (SSL, which is training off partially labelled datasets) and open-set recognition (OSR, which is classification with simultaneous novelty detection) under the context of generative adversarial networks (GANs). Although no previous study has formally linked SSL and OSR, their respective methods share striking similarities. Specifically, SSL-GANs and OSR-GANs require their generators to produce 'bad-looking' samples which are used to regularise their classifier networks. We hypothesise that the definitions of bad-looking samples in SSL and OSR represents the same concept and realises the same goal. More formally, bad-looking samples lie in the complementary space, which is the area between and around the boundaries of the labelled categories within the classifier's embedding space. By regularising a classifier with samples in the complementary space, classifiers achieve improved generalisation for SSL and also generalise the open space for OSR. To test this hypothesis, we compare a foundational SSL-GAN with the state-of-the-art OSR-GAN under the same SSL-OSR experimental conditions. Our results find that SSL-GANs achieve near identical results to OSR-GANs, proving the SSL-OSR link. Subsequently, to further this new research path, we compare several SSL-GANs various SSL-OSR setups which this first benchmark results. A combined framework of SSL-OSR certainly improves the practicality and cost-efficiency of classifier training, and so further theoretical and application studies are also discussed. | 翻訳日:2023-08-24 18:19:57 公開日:2023-08-23 |
# マルチハイポテーゼアグリゲーションを用いた拡散に基づく3次元ポーズ推定 Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation ( http://arxiv.org/abs/2303.11579v2 ) ライセンス: Link先を確認 | Wenkang Shan, Zhenhua Liu, Xinfeng Zhang, Zhao Wang, Kai Han, Shanshe Wang, Siwei Ma, Wen Gao | (参考訳) 本稿では,3次元の確率的ポーズ推定のために,新しい拡散型3D Pose Estimation (D3DP) 法と関節ワイド・リジェクション型マルチハイポテーシス・アグリゲーション (JPMA) を提案する。
一方, jpma では, d3dp が生成する複数の仮説を, 一つの 3d ポーズに組み込むことが提案されている。
Human3.6M と MPI-INF-3DHP データセットの大規模な実験により,本手法は現状の決定論的アプローチと確率論的アプローチをそれぞれ 1.5% と 8.9% で上回った。
コードはhttps://github.com/paTRICK-swk/D3DPで入手できる。 In this paper, a novel Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed for probabilistic 3D human pose estimation. On the one hand, D3DP generates multiple possible 3D pose hypotheses for a single 2D observation. It gradually diffuses the ground truth 3D poses to a random distribution, and learns a denoiser conditioned on 2D keypoints to recover the uncontaminated 3D poses. The proposed D3DP is compatible with existing 3D pose estimators and supports users to balance efficiency and accuracy during inference through two customizable parameters. On the other hand, JPMA is proposed to assemble multiple hypotheses generated by D3DP into a single 3D pose for practical use. It reprojects 3D pose hypotheses to the 2D camera plane, selects the best hypothesis joint-by-joint based on the reprojection errors, and combines the selected joints into the final pose. The proposed JPMA conducts aggregation at the joint level and makes use of the 2D prior information, both of which have been overlooked by previous approaches. Extensive experiments on Human3.6M and MPI-INF-3DHP datasets show that our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively. Code is available at https://github.com/paTRICK-swk/D3DP. | 翻訳日:2023-08-24 18:19:28 公開日:2023-08-23 |
# hiface: 静的および動的詳細学習による高忠実度3d顔再構成 HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details ( http://arxiv.org/abs/2303.11225v2 ) ライセンス: Link先を確認 | Zenghao Chai, Tianke Zhang, Tianyu He, Xu Tan, Tadas Baltru\v{s}aitis, HsiangTao Wu, Runnan Li, Sheng Zhao, Chun Yuan, Jiang Bian | (参考訳) 3Dモーフィブルモデル(3DMM)は、1つの画像から忠実でアニマタブルな3D顔表面を再構築する大きな可能性を示している。
プロジェクトのページはhttps://project-hiface.github.ioで閲覧できます。 3D Morphable Models (3DMMs) demonstrate great potential for reconstructing faithful and animatable 3D facial surfaces from a single image. The facial surface is influenced by the coarse shape, as well as the static detail (e,g., person-specific appearance) and dynamic detail (e.g., expression-driven wrinkles). Previous work struggles to decouple the static and dynamic details through image-level supervision, leading to reconstructions that are not realistic. In this paper, we aim at high-fidelity 3D face reconstruction and propose HiFace to explicitly model the static and dynamic details. Specifically, the static detail is modeled as the linear combination of a displacement basis, while the dynamic detail is modeled as the linear interpolation of two displacement maps with polarized expressions. We exploit several loss functions to jointly learn the coarse shape and fine details with both synthetic and real-world datasets, which enable HiFace to reconstruct high-fidelity 3D shapes with animatable details. Extensive quantitative and qualitative experiments demonstrate that HiFace presents state-of-the-art reconstruction quality and faithfully recovers both the static and dynamic details. Our project page can be found at https://project-hiface.github.io. | 翻訳日:2023-08-24 18:19:04 公開日:2023-08-23 |
# 双対型自己増分・補充によるオンライン授業増分継続学習 Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement ( http://arxiv.org/abs/2303.10891v2 ) ライセンス: Link先を確認 | Fushuo Huo, Wenchao Xu, Jingcai Guo, Haozhao Wang, and Yunfeng Fan, Song Guo | (参考訳) 本稿では、データ例をバッファリングすることなく、ベースクラスの識別性を保ち、シングルパス(オンライン)データストリームで新しいクラスを継続的に学習することを目的とした、新しい、実用的な、しかし難しい問題であるNon-Exemplar Online Class-incremental Continual Learning(NO-CL)について検討する。
1) 基礎クラスと新規クラスの両方が, 過去のサンプルがリプレイに利用できないため, 致命的な放棄に苦しむこと, の2つが課題である。
2) オンラインデータは一度しか観測できないため,プロトタイプアライメントや特徴蒸留による決定境界の再校正など,モデル全体を完全に再訓練する手段はない。
本稿では,NO-CL問題に対するDSR(Dual-prototype Self-augment and Refinement Method)を提案する。
2) 自己提供と洗練: ネットワーク全体を更新するのではなく, 自己提供型バニラプロトタイプに基づく余分なプロジェクションモジュールを用いて, 二段階最適化問題により高次元プロトタイプを最適化する。
広範な実験により,提案するdsrの有効性と優位性が実証された。 This paper investigates a new, practical, but challenging problem named Non-exemplar Online Class-incremental continual Learning (NO-CL), which aims to preserve the discernibility of base classes without buffering data examples and efficiently learn novel classes continuously in a single-pass (i.e., online) data stream. The challenges of this task are mainly two-fold: (1) Both base and novel classes suffer from severe catastrophic forgetting as no previous samples are available for replay. (2) As the online data can only be observed once, there is no way to fully re-train the whole model, e.g., re-calibrate the decision boundaries via prototype alignment or feature distillation. In this paper, we propose a novel Dual-prototype Self-augment and Refinement method (DSR) for NO-CL problem, which consists of two strategies: 1) Dual class prototypes: vanilla and high-dimensional prototypes are exploited to utilize the pre-trained information and obtain robust quasi-orthogonal representations rather than example buffers for both privacy preservation and memory reduction. 2) Self-augment and refinement: Instead of updating the whole network, we optimize high-dimensional prototypes alternatively with the extra projection module based on self-augment vanilla prototypes, through a bi-level optimization problem. Extensive experiments demonstrate the effectiveness and superiority of the proposed DSR in NO-CL. | 翻訳日:2023-08-24 18:18:41 公開日:2023-08-23 |
# 深部画像指紋:低予算合成画像検出とモデル線形解析を目指して Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis ( http://arxiv.org/abs/2303.10762v3 ) ライセンス: Link先を確認 | Sergey Sinitsa and Ohad Fried | (参考訳) 高品質な画像の生成は、広くアクセスしやすくなり、急速に進化するプロセスである。
本手法は,GAN(Generative Adversarial Networks)とDiffusion Modelsに依存する最近の大規模テキスト画像モデル(LTIM)の両方で生成した画像に対して検証を行った。
提案手法は,Stable DiffusionとMidJourneyが生成した画像に対して,同じ条件下で訓練された他者より優れ,最先端の事前訓練検出手法に匹敵する性能を実現している。 The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, including malicious usage with deceptive intentions. Despite advances in detection techniques for generated images, a robust detection method still eludes us. Furthermore, model personalization techniques might affect the detection capabilities of existing methods. In this work, we utilize the architectural properties of convolutional neural networks (CNNs) to develop a new detection method. Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models. We tested the method on images produced by both Generative Adversarial Networks (GANs) and recent large text-to-image models (LTIMs) that rely on Diffusion Models. Our approach outperforms others trained under identical conditions and achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and MidJourney, with significantly fewer required train samples. | 翻訳日:2023-08-24 18:18:14 公開日:2023-08-23 |
# 線形パラメータスイープを用いた量子オシレータ系の制御 Controlling qubit-oscillator systems using linear parameter sweeps ( http://arxiv.org/abs/2303.09834v2 ) ライセンス: Link先を確認 | Sahel Ashhab, Tomoko Fuse, Fumiki Yoshihara, Sunmi Kim, Kouichi Semba | (参考訳) システムパラメータの線形スイープの影響下での量子オシレータ系のダイナミクスについて検討する。
第一に, 弱相関基底状態と強相関基底状態の関係, 常相と超ラジカル相の2つの相の有限次クエンチとみなすことができる状況, のパラメータを整理することを検討する。
本システムでは, 理論式の有効性の確立に加えて, 適切な条件下では, 決定論的かつ堅牢な多光子状態の調製が可能となる。 We investigate the dynamics of a qubit-oscillator system under the influence of a linear sweep of system parameters. We consider two main cases. In the first case, we consider sweeping the parameters between the regime of a weakly correlated ground state and the regime of a strongly correlated ground state, a situation that can be viewed as a finite-duration quench between two phases of matter: the normal phase and the superradiant phase. Excitations are created as a result of this quench. We investigate the dependence of the excitation probabilities on the various parameters. We find a qualitative asymmetry in the dynamics between the cases of a normal-to-superradiant and superradiant-to-normal quench. The second case of parameter sweeps that we investigate is the problem of a Landau-Zener sweep in the qubit bias term for a qubit coupled to a harmonic oscillator. We analyze a theoretical formula based on the assumption that the dynamics can be decomposed into a sequence of independent Landau-Zener transitions. In addition to establishing the conditions of validity for the theoretical formula, we find that under suitable conditions, deterministic and robust multi-photon state preparation is possible in this system. | 翻訳日:2023-08-24 18:17:54 公開日:2023-08-23 |
# 学習ビデオ圧縮における知覚損失関数の選択について On the Choice of Perception Loss Function for Learned Video Compression ( http://arxiv.org/abs/2305.19301v2 ) ライセンス: Link先を確認 | Sadaf Salehkalaibar, Buu Phan, Jun Chen, Wei Yu, Ashish Khisti | (参考訳) 本研究では,出力が平均二乗誤差(mse)歪み損失とターゲットリアリズムに対する知覚損失の両方を受ける場合の因果的,低遅延,逐次的映像圧縮について検討した。
従来のアプローチにより,2つの異なる知覚損失関数 (PLF) を考える。
情報理論解析と深層学習に基づく実験により, PLFの選択が再建, 特に低ビットレートにおいて有意な影響を及ぼすことを示した。
特に, PLF-JDに基づく再構成は, フレーム間の時間的相関を良好に保ちつつも, PLF-FMDに比べて歪みに顕著なペナルティを課し, 初期の出力フレームでの誤りからの回復を困難にしている。
特に、MSEを最小化するためにシステムのトレーニングによって生成された符号化表現は(いずれかの PLF も必要とせず)、デコーダでの PLF の選択に対して、ほぼ最適に近い再構成を生成することができる。
我々は,一発的情報理論分析,ガウス・マルコフ源モデルのレート・ディストリクト・パーセプショントレードオフの詳細な研究,移動mnistおよびkthデータセットを用いたディープラーニング実験を用いて,その検証を行った。 We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. Although the choice of PLF decisively affects reconstruction quality, we also demonstrate that it may not be essential to commit to a particular PLF during encoding and the choice of PLF can be delegated to the decoder. In particular, encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be {\em near universal} and can generate close to optimal reconstructions for either choice of PLF at the decoder. We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets. | 翻訳日:2023-08-24 18:10:42 公開日:2023-08-23 |
# 分子フォトセルのtop-ranked cycle flux network解析 Top-Ranked Cycle Flux Network Analysis of Molecular Photocells ( http://arxiv.org/abs/2305.11929v2 ) ライセンス: Link先を確認 | Nikhil Gupt, Shuvadip Ghosh and Arnab Ghosh | (参考訳) 本稿では、分子接合太陽電池の性能を評価するためにネットワーク解析の上位ランクのサイクルフラックスランキングスキームを提案する。
これらの発見は、分子の太陽光発電を司る複雑な機能に光を当て、それらを体系的に扱うための包括的なアプローチを提供する。 We introduce a top-ranked cycle flux ranking scheme of network analysis to assess the performance of molecular junction solar cells. By mapping the Lindblad master equation to the quantum-transition network, we propose a microscopic Hamiltonian description underpinning the rate equations commonly used to characterize molecular photocells. Our approach elucidates the paramount significance of edge flux and unveils two pertinent electron transfer pathways that play equally important roles in robust photocurrent generation. Furthermore, we demonstrate that non-radiative loss processes impede the maximum power efficiency of photocells, which may otherwise be above the Curzon-Ahlborn limit. These findings shed light on the intricate functionalities that govern molecular photovoltaics and offer a comprehensive approach to address them in a systematic way. | 翻訳日:2023-08-24 18:10:10 公開日:2023-08-23 |
# キラル集積量子光学のためのトポロジーおよび従来のナノフォトニック導波路 Topological and conventional nano-photonic waveguides for chiral integrated quantum optics ( http://arxiv.org/abs/2305.11082v2 ) ライセンス: Link先を確認 | N.J Martin, M. Jalali Mehrabad, X. Chen, R. Dost, E. Nussbaum, D. Hallett, L. Hallacy, A. Foster, E. Clarke, P.K. Patil, S. Hughes, M. Hafezi, A.M Fox, M.S. Skolnick, and L.R. Wilson | (参考訳) 集積量子フォトニクスにおけるキラリティは、量子非線形効果を持つスケーラブルな量子技術を達成するための有望な経路として現れてきた。
本研究では, トポロジカルフォトニック導波路におけるキラルカップリングについて, 実験, 理論的, 数値解析の組み合わせを用いて総合的に検討する。
本研究は、位相フォトニック量子回路におけるキラル光間相互作用の程度と特性に関する重要な知見を提供し、定量的に予測された量子非線形効果のチップ実装への道を開く。 Chirality in integrated quantum photonics has emerged as a promising route towards achieving scalable quantum technologies with quantum nonlinearity effects. Topological photonic waveguides, which utilize helical optical modes, have been proposed as a novel approach to harnessing chiral light-matter interactions on-chip. However, uncertainties remain regarding the nature and strength of the chiral coupling to embedded quantum emitters, hindering the scalability of these systems. In this work, we present a comprehensive investigation of chiral coupling in topological photonic waveguides using a combination of experimental, theoretical, and numerical analyses. We quantitatively characterize the position-dependence nature of the light-matter coupling on several topological photonic waveguides and benchmark their chiral coupling performance against conventional line defect waveguides for chiral quantum optical applications. Our results provide crucial insights into the degree and characteristics of chiral light-matter interactions in topological photonic quantum circuits and pave the way towards the implementation of quantitatively-predicted quantum nonlinear effects on-chip. | 翻訳日:2023-08-24 18:09:57 公開日:2023-08-23 |
# 北エフ・リンドブレディアンにおける混合状態量子スピン液体と動的アニオン凝縮 Mixed-State Quantum Spin Liquids and Dynamical Anyon Condensations in Kitaev Lindbladians ( http://arxiv.org/abs/2305.09197v3 ) ライセンス: Link先を確認 | Kyusung Hwang | (参考訳) 量子スピン液体とエノンは、凝縮物質物理学の主題として用いられており、量子ビットの様々なプラットフォームで実現され、多体量子絡み合い状態の基本物理学を研究する前例のない機会を提供している。
我々はlindblad master方程式によるキタエフスピン液体とトーリック符号の開量子系の研究を行った。
また, キタエフのスピン液体とトーリック符号の関係について, 常に凝縮した画像で考察した。
我々の研究は、開放量子系が量子スピン液体とエノンの位相現象の新しい会場となることを示唆している。 Quantum spin liquids and anyons, used to be subjects of condensed matter physics, now are realized in various platforms of qubits, offering unprecedented opportunities to investigate fundamental physics of many-body quantum entangled states. Qubits are inevitably exposed to environment effects such as decoherence and dissipation, which are believed to be detrimental to many-body entanglement. Here, we argue that unlike the common belief decoherence and dissipation can give rise to novel topological phenomena in quantum spin liquids. We study open quantum systems of the Kitaev spin liquid and the toric code via the Lindblad master equation approach. By using exact solutions and numerical approaches, we show the dynamical occurrence of anyon condensation by decoherence and dissipation, which results in a topological transition from the initial state spin liquid to the steady state spin liquid. The mechanism of the anyon condensation transition by the Lindblad dynamics is elucidated. We also provide an insight into the relationship between the Kitaev spin liquid and the toric code in the picture of anyon condensation. Our work suggests open quantum systems to be a new venue for topological phenomena of quantum spin liquids and anyons. | 翻訳日:2023-08-24 18:09:33 公開日:2023-08-23 |
# 2段階知識蒸留によるブラックボックスソースフリードメイン適応 Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation ( http://arxiv.org/abs/2305.07881v3 ) ライセンス: Link先を確認 | Shuai Wang, Daoan Zhang, Zipei Yan, Shitong Shao, Rui Li | (参考訳) ソースフリーなドメイン適応は、トレーニング済みのソースモデルとターゲットデータのみを使用して、ディープニューラルネットワークを適用することを目的としている。
提案手法は単純で柔軟であり,3つのクロスドメインセグメンテーションタスクにおいて驚くべき結果が得られる。 Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data. However, accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy. In this paper, we study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available. We propose a simple but effective two-stage knowledge distillation method. In Stage \uppercase\expandafter{\romannumeral1}, we train the target model from scratch with soft pseudo-labels generated by the source model in a knowledge distillation manner. In Stage \uppercase\expandafter{\romannumeral2}, we initialize another model as the new student model to avoid the error accumulation caused by noisy pseudo-labels. We feed the images with weak augmentation to the teacher model to guide the learning of the student model. Our method is simple and flexible, and achieves surprising results on three cross-domain segmentation tasks. | 翻訳日:2023-08-24 18:09:13 公開日:2023-08-23 |
# 空間的コントラストプレトレーニングを用いた訓練データに見る新しい道路の交通予測 Traffic Forecasting on New Roads Unseen in the Training Data Using Spatial Contrastive Pre-Training ( http://arxiv.org/abs/2305.05237v3 ) ライセンス: Link先を確認 | Arian Prabowo, Wei Shao, Hao Xue, Piotr Koniusz, Flora D. Salim | (参考訳) 常に新しい道路が建設されている。
コードはgithubで入手できる。 https://github.com/cruiseresearchgroup/forecasting-on-new-roads。 New roads are being constructed all the time. However, the capabilities of previous deep forecasting models to generalize to new roads not seen in the training data (unseen roads) are rarely explored. In this paper, we introduce a novel setup called a spatio-temporal (ST) split to evaluate the models' capabilities to generalize to unseen roads. In this setup, the models are trained on data from a sample of roads, but tested on roads not seen in the training data. Moreover, we also present a novel framework called Spatial Contrastive Pre-Training (SCPT) where we introduce a spatial encoder module to extract latent features from unseen roads during inference time. This spatial encoder is pre-trained using contrastive learning. During inference, the spatial encoder only requires two days of traffic data on the new roads and does not require any re-training. We also show that the output from the spatial encoder can be used effectively to infer latent node embeddings on unseen roads during inference time. The SCPT framework also incorporates a new layer, named the spatially gated addition (SGA) layer, to effectively combine the latent features from the output of the spatial encoder to existing backbones. Additionally, since there is limited data on the unseen roads, we argue that it is better to decouple traffic signals to trivial-to-capture periodic signals and difficult-to-capture Markovian signals, and for the spatial encoder to only learn the Markovian signals. Finally, we empirically evaluated SCPT using the ST split setup on four real-world datasets. The results showed that adding SCPT to a backbone consistently improves forecasting performance on unseen roads. More importantly, the improvements are greater when forecasting further into the future. The codes are available on GitHub: https://github.com/cruiseresearchgroup/forecasting-on-new-roads . | 翻訳日:2023-08-24 18:08:56 公開日:2023-08-23 |
# データセット蒸留に関する調査 : アプローチ, 応用, 今後の展開 A Survey on Dataset Distillation: Approaches, Applications and Future Directions ( http://arxiv.org/abs/2305.01975v2 ) ライセンス: Link先を確認 | Jiahui Geng, Zongxiong Chen, Yuandou Wang, Herbert Woisetschlaeger, Sonja Schimmler, Ruben Mayer, Zhiming Zhao and Chunming Rong | (参考訳) トレーニングセットが増加し続け、最先端モデルのトレーニングコストが高まる中、データセット蒸留は機械学習で注目を集めている。
また,本研究の課題を要約し,今後の方向性について考察する。 Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset distillation offers a range of potential applications, including support for continual learning, neural architecture search, and privacy protection. Despite recent advances, we lack a holistic understanding of the approaches and applications. Our survey aims to bridge this gap by first proposing a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications. In addition, we summarize the challenges and discuss future directions for this field of research. | 翻訳日:2023-08-24 18:08:24 公開日:2023-08-23 |
# 内視鏡画像とビデオにおける最小侵襲手術器具の分節化のための方法とデータセット:術法の現状について Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art ( http://arxiv.org/abs/2304.13014v2 ) ライセンス: Link先を確認 | Tobias Rueckert (1), Daniel Rueckert (2 and 3), Christoph Palm (1 and 4) ((1) Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany, (2) Artificial Intelligence in Healthcare and Medicine, Klinikum rechts der Isar, Technical University of Munich, Germany, (3) Department of Computing, Imperial College London, UK, (4) Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Germany) | (参考訳) コンピュータ・ロボット支援の低侵襲手術の分野では,内視鏡的画像・映像における手術器具の認識により,近年,大きな進歩を遂げている。
分析された出版物は、Google Scholar、Web of Science、PubMedのプラットフォームで識別された。
レビューされた文献に関する議論が提供され、既存の欠点と将来の発展の可能性を強調している。 In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images and videos. In particular, the determination of the position and type of instruments is of great interest. Current work involves both spatial and temporal information, with the idea that predicting the movement of surgical tools over time may improve the quality of the final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify and characterize datasets used for method development and evaluation and quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images and videos. The paper focuses on methods that work purely visually, without markers of any kind attached to the instruments, considering both single-frame semantic and instance segmentation approaches, as well as those that incorporate temporal information. The publications analyzed were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking", resulting in a total of 741 articles published between 01/2015 and 07/2023, of which 123 were included using systematic selection criteria. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing the available potential for future developments. | 翻訳日:2023-08-24 18:08:13 公開日:2023-08-23 |
# 自律走行における物体検出とセマンティックセグメンテーションのためのレーダー・カメラ融合 Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review ( http://arxiv.org/abs/2304.10410v2 ) ライセンス: Link先を確認 | Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, Yutao Yue | (参考訳) ディープラーニング技術によって、自動運転における認識技術は近年急速に発展し、安全かつ効率的なナビゲーションのために、車両が周囲の環境を正確に検出し、解釈できるようになった。
データセットとフュージョンメソッドの検索と比較を容易にするために、インタラクティブなwebサイトhttps://radar-camera-fusion.github.ioも提供しています。 Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years, enabling vehicles to accurately detect and interpret surrounding environment for safe and efficient navigation. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the perception system. Among these fused sensors, radars and cameras enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. This review aims to provide a comprehensive guideline for radar-camera fusion, particularly concentrating on perception tasks related to object detection and semantic segmentation.Based on the principles of the radar and camera sensors, we delve into the data processing process and representations, followed by an in-depth analysis and summary of radar-camera fusion datasets. In the review of methodologies in radar-camera fusion, we address interrogative questions, including "why to fuse", "what to fuse", "where to fuse", "when to fuse", and "how to fuse", subsequently discussing various challenges and potential research directions within this domain. To ease the retrieval and comparison of datasets and fusion methods, we also provide an interactive website: https://radar-camera-fusion.github.io. | 翻訳日:2023-08-24 18:07:46 公開日:2023-08-23 |
# マルチモーダル衣料デザイン : ファッション画像編集のための人間中心潜在拡散モデル Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing ( http://arxiv.org/abs/2304.02051v2 ) ライセンス: Link先を確認 | Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara | (参考訳) ファッションイラストは、デザイナーがビジョンを伝え、デザインのアイデアを概念化から実現し、服が人体とどのように相互作用するかを示すために使われる。
タスクに適した既存のデータセットがないので、Dress CodeとVITON-HDという2つの既存のファッションデータセットも半自動で収集するマルチモーダルアノテーションで拡張します。
ソースコードと収集されたマルチモーダルアノテーションは、https://github.com/aimagelab/multimodal-garment-designerで公開されている。 Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: https://github.com/aimagelab/multimodal-garment-designer. | 翻訳日:2023-08-24 18:07:21 公開日:2023-08-23 |
# 夜間深度知覚のための学習可能ディファレンスセンター Learnable Differencing Center for Nighttime Depth Perception ( http://arxiv.org/abs/2306.14538v3 ) ライセンス: Link先を確認 | Zhiqiang Yan and Yupeng Zheng and Chongyi Li and Jun Li and Jian Yang | (参考訳) 深度完了は、通常カラー画像の助けを借りて、スパースマップから深度マップを復元する作業である。
これらの課題に対処するために, LDCNet というシンプルなフレームワークを提案する。
夜間の深度推定と深度推定の両課題において, LDCNetの有効性を実証し, 最先端技術に到達した。 Depth completion is the task of recovering dense depth maps from sparse ones, usually with the help of color images. Existing image-guided methods perform well on daytime depth perception self-driving benchmarks, but struggle in nighttime scenarios with poor visibility and complex illumination. To address these challenges, we propose a simple yet effective framework called LDCNet. Our key idea is to use Recurrent Inter-Convolution Differencing (RICD) and Illumination-Affinitive Intra-Convolution Differencing (IAICD) to enhance the nighttime color images and reduce the negative effects of the varying illumination, respectively. RICD explicitly estimates global illumination by differencing two convolutions with different kernels, treating the small-kernel-convolution feature as the center of the large-kernel-convolution feature in a new perspective. IAICD softly alleviates local relative light intensity by differencing a single convolution, where the center is dynamically aggregated based on neighboring pixels and the estimated illumination map in RICD. On both nighttime depth completion and depth estimation tasks, extensive experiments demonstrate the effectiveness of our LDCNet, reaching the state of the art. | 翻訳日:2023-08-24 18:01:52 公開日:2023-08-23 |
# マルチモーダル名前付きエンティティ認識とマルチモーダル関係抽出のための急速蒸留 Chain-of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction ( http://arxiv.org/abs/2306.14122v3 ) ライセンス: Link先を確認 | Feng Chen and Yujian Feng | (参考訳) multimodal named entity recognition (mner) と multimodal relation extraction (mre) は、複雑な言語とマルチモーダル理解のための基本的な推論能力を必要とする。
本研究では,中間的推論ステップの列である \textit{chain of thought} (cot) を生成することにより,大規模言語モデル(llms)の推論能力を,よりコンパクトな学生モデルに蒸留することを検討する。
次に, LLMからコモンセンス推論能力を同化させる新しい条件付きプロンプト蒸留法を提案し, 画像やCoTの知識を必要とせず, テキストのみの入力に対処する際の学生モデルの有用性を高める。
広汎な実験により,本手法は最先端の精度を実現し,MNERおよびMREデータセット上での解釈可能性,データ効率,ドメイン間の一般化に関する多くの利点を示す。 Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) necessitate the fundamental reasoning capacity for intricate linguistic and multimodal comprehension. In this study, we explore distilling the reasoning ability of large language models (LLMs) into a more compact student model by generating a \textit{chain of thought} (CoT) -- a sequence of intermediate reasoning steps. Specifically, we commence by exemplifying the elicitation of such reasoning ability from LLMs through CoT prompts covering multi-grain (noun, sentence, multimodality) and data-augmentation (style, entity, image) dimensions. Subsequently, we present a novel conditional prompt distillation method to assimilate the commonsense reasoning ability from LLMs, thereby enhancing the utility of the student model in addressing text-only inputs without the requisite addition of image and CoT knowledge. Extensive experiments reveal that our approach attains state-of-the-art accuracy and manifests a plethora of advantages concerning interpretability, data efficiency, and cross-domain generalization on MNER and MRE datasets. | 翻訳日:2023-08-24 18:01:28 公開日:2023-08-23 |
# 2種$k$-体埋め込みガウスユニタリアンサンブル:固有値密度の$q$-正規形式 Two species $k$-body embedded Gaussian unitary ensembles: $q$-normal form of the eigenvalue density ( http://arxiv.org/abs/2306.12513v2 ) ライセンス: Link先を確認 | Manan Vyas, V. K. B. Kota | (参考訳) 2種(例えば$\mathbf{\pi}$と$\mathbf{\nu}$)フェルミオン系に対する$k$-body相互作用を組み込んだガウスユニタリアンアンサンブルによって生成された固有値密度を、最低6モーメントの式から導出した。
egue ($k:\mathbf{\pi} \mathbf{\nu}$) と呼ばれるこのアンサンブルを構築する際に仮定すると、$\mathbf{\pi}$ fermions (m_1$ in number) は縮退単粒子 (sp) の数が $n_1$ であり、同様に$\mathbf{\nu}$ fermions (m_2$ in number) は縮退したsp状態の $n_2$ である。
EGUE($k:\mathbf{\pi} \mathbf{\nu}$)形式と結果は2種類のボソン系に拡張される。
その結果,同一のフェルミオン系とボーソン系で最近確立された固有値密度の$q$正規形は2種のフェルミオン系とボーソン系に拡張された。 Eigenvalue density generated by embedded Gaussian unitary ensemble with $k$-body interactions for two species (say $\mathbf{\pi}$ and $\mathbf{\nu}$) fermion systems is investigated by deriving formulas for the lowest six moments. Assumed in constructing this ensemble, called EGUE($k:\mathbf{\pi} \mathbf{\nu}$), is that the $\mathbf{\pi}$ fermions ($m_1$ in number) occupy $N_1$ number of degenerate single particle (sp) states and similarly $\mathbf{\nu}$ fermions ($m_2$ in number) in $N_2$ number of degenerate sp states. The Hamiltonian is assumed to be $k$-body preserving $(m_1,m_2)$. Formulas with finite $(N_1,N_2)$ corrections and asymptotic limit formulas both show that the eigenvalue density takes $q$-normal form with the $q$ parameter defined by the fourth moment. The EGUE($k:\mathbf{\pi} \mathbf{\nu}$) formalism and results are extended to two species boson systems. Results in this work show that the $q$-normal form of the eigenvalue density established only recently for identical fermion and boson systems extends to two species fermion and boson systems. | 翻訳日:2023-08-24 18:00:37 公開日:2023-08-23 |
# 島を用いた3方向情報パラドックスとその解決 Three way information paradox and its resolution using islands ( http://arxiv.org/abs/2306.10801v2 ) ライセンス: Link先を確認 | Manish Ramchander, Sitender Pratap Kashyap, Roji Pius | (参考訳) ブラックホールは有限自由度を持ち、任意の系の非有界絡み合い成長を燃やすことはできない。
結合した系が1つの実体である通常の情報パラドックスの代わりに、ここではブラックホールの$\chi_0$と2つの無限の実体を結合する:熱浴$\chi_1$と補助系$\chi _2$。
これは、ブラックホールのエントロピーに対する重力補正が$\chi _1$と$\chi _2$エントロピーのパラドックス成長を防げないという意味で、新しい情報パラドックスを生み出す。
ブラックホールのエントロピー成長を改善させる量子極端表面を発見し、$\chi _1$と$\chi _2$エントロピーがモノガミーを用いてどのように振る舞うかを論じ、これらの期待を満たす島を導出する。
我々の結果の直接的な結果は、重力が独立に開始しても$\chi _1$と$\chi _2$の絡み合いを構築することである。 Black holes possess finite degrees of freedom and thus cannot fuel unbounded entanglement growth of any system. Instead of the usual information paradox where the coupled system is one entity, the Hawking radiation, here we couple a black hole $\chi_0$ with two infinite entities: a thermal bath $\chi_1$ and an auxiliary system $\chi _2$. This produces a novel information paradox in the sense that gravitational correction to black hole entropy does not rule out paradoxical growth of $\chi _1$ and $\chi _2$ entropies. This immediately raises what kind of resolution such a paradox has, and we address this question working in the AdS$_2$ JT gravity model, using the island formula, and ideas of entanglement monogamy. We find the quantum extremal surface that cures the black hole entropy growth, argue to the nature of how $\chi _1$ and $\chi _2$ entropies must behave using monogamy, and derive an island which satisfies these expectations. A direct consequence of our results is that gravitation builds entanglement between $\chi _1$ and $\chi _2$, even though they start out independently. | 翻訳日:2023-08-24 17:59:46 公開日:2023-08-23 |
# グローバルローカルマスケードオートエンコーダによるボリューム・メディカル・イメージ・セグメンテーションの促進 Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder ( http://arxiv.org/abs/2306.08913v2 ) ライセンス: Link先を確認 | Jia-Xin Zhuang, Luyang Luo, Hao Chen | (参考訳) Masked Autoencoder(MAE)は、人間の介入なしにニューラルネットワークの表現学習を改善する、有望な自己教師付き事前学習技術である。
しかし, ボリューム医療画像に直接MAEを適用することは, 2つの課題をもたらす。
これらの制約に対処するために、簡単なが効果的な自己教師型事前学習戦略である \textbf{G}lobal-\textbf{L}ocal \textbf{M}asked \textbf{A}uto\textbf{E}ncoder (GL-MAE) を提案する。
私たちのコードとモデルは受け入れ次第リリースされます。 Masked autoencoder (MAE) is a promising self-supervised pre-training technique that can improve the representation learning of a neural network without human intervention. However, applying MAE directly to volumetric medical images poses two challenges: (i) a lack of global information that is crucial for understanding the clinical context of the holistic data, (ii) no guarantee of stabilizing the representations learned from randomly masked inputs. To address these limitations, we propose the \textbf{G}lobal-\textbf{L}ocal \textbf{M}asked \textbf{A}uto\textbf{E}ncoder (GL-MAE), a simple yet effective self-supervised pre-training strategy. In addition to reconstructing masked local views, as in previous methods, GL-MAE incorporates global context learning by reconstructing masked global views. Furthermore, a complete global view is integrated as an anchor to guide the reconstruction and stabilize the learning process through global-to-global consistency learning and global-to-local consistency learning. Finetuning results on multiple datasets demonstrate the superiority of our method over other state-of-the-art self-supervised algorithms, highlighting its effectiveness on versatile volumetric medical image segmentation tasks, even when annotations are scarce. Our codes and models will be released upon acceptance. | 翻訳日:2023-08-24 17:59:21 公開日:2023-08-23 |
# 最先端生成モデルの信頼性景観について--包括的調査 On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey ( http://arxiv.org/abs/2307.16680v4 ) ライセンス: Link先を確認 | Mingyuan Fan, Cen Chen, Chengyu Wang, Jun Huang | (参考訳) 拡散モデルと大規模言語モデルが最先端生成モデルとして登場し、人間の生活の様々な側面に革命的な影響を与えた。
これらの取り組みは、これらのモデルの信頼できる展開を促進するのに不可欠であり、最終的には社会全体に利益をもたらす。 Diffusion models and large language models have emerged as leading-edge generative models and have sparked a revolutionary impact on various aspects of human life. However, the practical implementation of these models has also exposed inherent risks, highlighting their dual nature and raising concerns regarding their trustworthiness. Despite the abundance of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, This paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: privacy, security, fairness, and responsibility. In this way, we construct an extensive map outlining the trustworthiness of these models, while also providing practical recommendations and identifying future directions. These efforts are crucial for promoting the trustworthy deployment of these models, ultimately benefiting society as a whole. | 翻訳日:2023-08-24 17:50:24 公開日:2023-08-23 |
# GridMM:視覚・言語ナビゲーションのためのグリッドメモリマップ GridMM: Grid Memory Map for Vision-and-Language Navigation ( http://arxiv.org/abs/2307.12907v3 ) ライセンス: Link先を確認 | Zihan Wang and Xiangyang Li and Jiahao Yang and Yeqi Liu and Shuqiang Jiang | (参考訳) ビジョン・アンド・ランゲージナビゲーション(VLN)は、エージェントが3D環境における自然言語の指示に従って遠隔地へ移動できるようにする。
離散環境におけるREVERIE, R2R, SOONデータセット, 連続環境におけるR2R-CEデータセットについて, 実験を行い, 提案手法の優位性を示した。 Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. From a global perspective, historical observations are projected into a unified grid map in a top-down view, which can better represent the spatial relations of the environment. From a local perspective, we further propose an instruction relevance aggregation method to capture fine-grained visual clues in each grid region. Extensive experiments are conducted on both the REVERIE, R2R, SOON datasets in the discrete environments, and the R2R-CE dataset in the continuous environments, showing the superiority of our proposed method. | 翻訳日:2023-08-24 17:50:06 公開日:2023-08-23 |
# 開放型世代における自己一貫性 Self-consistency for open-ended generations ( http://arxiv.org/abs/2307.06857v2 ) ライセンス: Link先を確認 | Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang | (参考訳) LLM(Large Language Models)は、サンプル出力の品質にかなりの変化をもたらす可能性がある。
提案手法では, LLMへのブラックボックスアクセスのみを前提としているが, トークン確率への追加アクセスにより, さらなる性能向上が期待できる。 Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best $k$ generations for code generation tasks as well as robust improvements for best generation for the tasks of autoformalization, and summarization. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further. | 翻訳日:2023-08-24 17:49:47 公開日:2023-08-23 |
# back to optimization:拡散に基づくゼロショット3次元ポーズ推定 Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation ( http://arxiv.org/abs/2307.03833v2 ) ライセンス: Link先を確認 | Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang | (参考訳) 学習に基づく手法は、従来の最適化に基づく手法よりも多くのベンチマークにおいて非常に優れた性能を持つ3Dヒューマンポーズ推定(HPE)タスクを支配している。
それにもかかわらず、訓練されたネットワークは暗黙的にカメラ固有のパラメータとドメインベースの人間のポーズの分布と統計的平均による推定ポーズを学習するため、2D-3Dリフト、画像から3D、あるいは拡散ベースの方法で学習ベースのモデルにとって、野生の3D HPEは依然として最大の課題である。
最適化と学習に基づく手法の利点を組み合わせることで、3D HPEのためのZero-shot Diffusion-based Optimization (ZeDO) パイプラインを提案し、クロスドメインと3D HPEの問題を解決する。
われわれはHuman3.6MのSOTA(State-of-the-art)性能をminMPJPE 51.4$mmで達成した。
さらに,本論文では3DPWデータセット上でのSOTA性能をPA-MPJPE $42.6$mmで達成し,さらに3DPWでトレーニングした学習手法よりも優れていた。 Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE $51.4$mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE $42.6$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. | 翻訳日:2023-08-24 17:49:32 公開日:2023-08-23 |
# 特異点分解に対するMLアプローチ An ML approach to resolution of singularities ( http://arxiv.org/abs/2307.00252v2 ) ライセンス: Link先を確認 | Gergely B\'erczi and Honglu Fan and Mingcong Zeng | (参考訳) 多項式方程式系の解集合は典型的には不動点、特異点を含む。
特定の領域において、トレーニングされたモデルは、実行された多項式加算の総数において最先端の選択ヒューリスティックよりも優れており、近年の機械学習の発展は、シンボリック計算におけるアルゴリズムの性能を向上させる可能性があるという概念実証を提供する。 The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation. | 翻訳日:2023-08-24 17:49:08 公開日:2023-08-23 |
# ソフトウェア問題をチームメンバーに割り当てるための機械学習手法の比較 Comparison of Machine Learning Methods for Assigning Software Issues to Team Members ( http://arxiv.org/abs/2307.00009v2 ) ライセンス: Link先を確認 | B\"u\c{s}ra Tabak and Fatma Ba\c{s}ak Aydemir | (参考訳) ソフトウェアの問題には、開発中に新しいスレッドを修正、改善、作成し、チームメンバ間のコミュニケーションを促進するための作業単位が含まれている。
この貢献には、5つのアノテートされた産業問題データセットの公開共有、明確で包括的な特徴セットの開発、新しいラベルセットの導入、浅い機械学習技術のアンサンブル分類器の有効性の検証が含まれる。 Software issues contain units of work to fix, improve, or create new threads during the development and facilitate communication among the team members. Assigning an issue to the most relevant team member and determining a category of an issue is a tedious and challenging task. Wrong classifications cause delays and rework in the project and trouble among the team members. This paper proposes a set of carefully curated linguistic features for shallow machine learning methods and compares the performance of shallow and ensemble methods with deep language models. Unlike the state-of-the-art, we assign issues to four roles (designer, developer, tester, and leader) rather than to specific individuals or teams to contribute to the generality of our solution. We also consider the level of experience of the developers to reflect the industrial practices in our solution formulation. We collect and annotate five industrial data sets from one of the top three global television producers to evaluate our proposal and compare it with deep language models. Our data sets contain 5324 issues in total. We show that an ensemble classifier of shallow techniques achieves 0.92 for issue assignment in accuracy which is statistically comparable to the state-of-the-art deep language models. The contributions include the public sharing of five annotated industrial issue data sets, the development of a clear and comprehensive feature set, the introduction of a novel label set, and the validation of the efficacy of an ensemble classifier of shallow machine learning techniques. | 翻訳日:2023-08-24 17:48:50 公開日:2023-08-23 |
# 量子チャネルとスーパーチャネルの双対性は基底依存性である Duality between quantum channels and super-channels is basis-dependent ( http://arxiv.org/abs/2306.16395v2 ) ライセンス: Link先を確認 | Sohail, Sahil, Ritabrata Sengupta, Ujjwal Sen | (参考訳) Choi-Jamio{\l}kowski-Kraus-Sudarshan量子チャネル状態同型における完全正の対正の対応は基底の選択に依存する。
この対応の妥当性に基づく十分条件は、後に Kye~\cite{Kye} によって必要であることが証明された Paulsen と Shult~\cite{Paulsen} の業績に与えられる。
本研究では,この対応が真であるように,必要かつ十分な条件を求める。 The complete positivity vs positivity correspondence in the Choi-Jamio{\l}kowski-Kraus-Sudarshan quantum channel-state isomorphism depends on the choice of basis. Instead of the "canonical" basis, if we use, e.g., the Pauli spin matrices along with the identity as the basis for the space of bounded operators on the two-dimensional complex Hilbert space, this correspondence breaks down. A sufficient condition on the basis for validity of this correspondence is provided in the work of Paulsen and Shult~\cite{Paulsen}, which was later proven to be necessary by Kye~\cite{Kye}. A correspondence is also present between the space of super-maps and the tensor product of the spaces of the inputs and outputs of the same. In particular, a super-map is completely CP-preserving if and only if its Choi-type representation is completely positive (CP). This correspondence also depends on a specific choice of basis. In this work, we find the necessary and sufficient condition on a basis such that this correspondence holds true. | 翻訳日:2023-08-24 17:48:04 公開日:2023-08-23 |
# 複数ラベルの分類に必要な正のラベル Positive Label Is All You Need for Multi-Label Classification ( http://arxiv.org/abs/2306.16016v2 ) ライセンス: Link先を確認 | Zhixiang Yuan, Kaixin Zhang, Tao Huang | (参考訳) マルチラベル分類(MLC)は、各画像に様々な意味ラベルを注釈付けすることが困難であるため、トレーニングデータにおいて避けられないラベルノイズに悩まされる。
本稿では, 負ラベルが正ラベル以上であり, ほとんどのノイズラベルが負ラベルから来ていることを考慮し, データセット中のすべての負ラベルを直接破棄し, 正および未ラベルのマルチラベル分類(PU-MLC)と呼ばれる新しい手法を提案する。
PU-MLC は単純かつ効果的であり,MLC-PL タスクを伴う MLC と MLC の両方に適用可能である。
MS-COCOとPASCAL VOCデータセットの大規模な実験により、私たちのPU-MLCはより少ないアノテーションで MLC と MLC-PL の設定を大幅に改善することを示した。
コードはリリースされる。 Multi-label classification (MLC) suffers from the inevitable label noise in training data due to the difficulty in annotating various semantic labels in each image. To mitigate the influence of noisy labels, existing methods mainly devote to identifying and correcting the label mistakes via a trained MLC model. However, these methods still involve annoying noisy labels in training, which can result in imprecise recognition of noisy labels and weaken the performance. In this paper, considering that the negative labels are substantially more than positive labels, and most noisy labels are from the negative labels, we directly discard all the negative labels in the dataset, and propose a new method dubbed positive and unlabeled multi-label classification (PU-MLC). By extending positive-unlabeled learning into MLC task, our method trains model with only positive labels and unlabeled data, and introduces adaptive re-balance factor and adaptive temperature coefficient in the loss function to alleviate the catastrophic imbalance in label distribution and over-smoothing of probabilities in training. Furthermore, to capture both local and global dependencies in the image, we also introduce a local-global convolution module, which supplements global information into existing convolution layers with no retraining of backbone required. Our PU-MLC is simple and effective, and it is applicable to both MLC and MLC with partial labels (MLC-PL) tasks. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate that our PU-MLC achieves significantly improvements on both MLC and MLC-PL settings with even fewer annotations. Code will be released. | 翻訳日:2023-08-24 17:47:40 公開日:2023-08-23 |
# UTRNet: 印刷文書における高解像度ウルドゥー文字認識 UTRNet: High-Resolution Urdu Text Recognition In Printed Documents ( http://arxiv.org/abs/2306.15782v3 ) ライセンス: Link先を確認 | Abdur Rahman, Arjun Ghosh, and Chetan Arora | (参考訳) 本稿では,高解像度・マルチスケールな意味的特徴抽出を用いたUrduテキスト認識の課題に対処する新しい手法を提案する。
ウルドゥー文字の複雑さと十分な注釈付き実世界のデータの欠如に対応するために,我々は,11,000 行以上からなる大規模な注釈付き実世界データセット utrset-real と,実世界に近い2万行の合成データセット utrset-synth を導入し,既存のiii 番目のデータセットの基礎的真相を訂正し,将来の研究のためのより信頼性の高いリソースとした。
さらに,UTRNetをテキスト検出モデルに統合することにより,印刷物からUrdu OCRをエンド・ツー・エンドにするためのオンラインツールを開発した。
我々の研究は、現在のUrdu OCRの限界に対処するだけでなく、この領域における今後の研究の道を開くとともに、Urdu OCR技術の継続的な進歩を促進する。
ソースコード、データセット、アノテーション、トレーニングされたモデル、オンラインツールを備えたプロジェクトページは、abdur75648.github.io/utrnetで入手できる。 In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet. | 翻訳日:2023-08-24 17:47:07 公開日:2023-08-23 |
# MemoChat: 長期間のオープンドメイン会話にメモを使用するためのLLMのチューニング MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation ( http://arxiv.org/abs/2308.08239v2 ) ライセンス: Link先を確認 | Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu | (参考訳) 我々は,大規模言語モデル (LLM) を効果的に活用し,一貫した長距離オープンドメイン会話を維持するための命令を精錬するためのパイプラインであるMemoChatを提案する。
私たちのコード、データ、モデルはここで利用可能です。 We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations. We demonstrate a long-range open-domain conversation through iterative "memorization-retrieval-response" cycles. This requires us to carefully design tailored tuning instructions for each distinct stage. The instructions are reconstructed from a collection of public datasets to teach the LLMs to memorize and retrieve past dialogues with structured memos, leading to enhanced consistency when participating in future conversations. We invite experts to manually annotate a test set designed to evaluate the consistency of long-range conversations questions. Experiments on three testing scenarios involving both open-source and API-accessible chatbots at scale verify the efficacy of MemoChat, which outperforms strong baselines. Our codes, data and models are available here: https://github.com/LuJunru/MemoChat. | 翻訳日:2023-08-24 17:40:14 公開日:2023-08-23 |
# 拡散MRI信号の構造的コヒーレント連続表現のためのニューラル球高調波 Neural Spherical Harmonics for structurally coherent continuous representation of diffusion MRI signal ( http://arxiv.org/abs/2308.08210v2 ) ライセンス: Link先を確認 | Tom Hendriks, Anna Vilanova, Maxime Chamberland | (参考訳) 本研究では,拡散磁気共鳴画像(dMRI)データセットをモデル化する新しい手法を提案する。
我々は,ニューラルネットワークを用いて球面調和系(NeSH)をパラメータ化し,角領域と空間領域の両方で連続するHuman Connectome Projectデータセットから単一対象のdMRI信号を表現する。
本稿では, 平均拡散率, 分数異方性, および全繊維密度を計算するために, 再構成をどのように利用できるかを紹介する。
本稿では, 角領域と空間領域の両方におけるアップサンプリングが, 既存手法と同等以上の再現性をもたらすことを示す。 We present a novel way to model diffusion magnetic resonance imaging (dMRI) datasets, that benefits from the structural coherence of the human brain while only using data from a single subject. Current methods model the dMRI signal in individual voxels, disregarding the intervoxel coherence that is present. We use a neural network to parameterize a spherical harmonics series (NeSH) to represent the dMRI signal of a single subject from the Human Connectome Project dataset, continuous in both the angular and spatial domain. The reconstructed dMRI signal using this method shows a more structurally coherent representation of the data. Noise in gradient images is removed and the fiber orientation distribution functions show a smooth change in direction along a fiber tract. We showcase how the reconstruction can be used to calculate mean diffusivity, fractional anisotropy, and total apparent fiber density. These results can be achieved with a single model architecture, tuning only one hyperparameter. In this paper we also demonstrate how upsampling in both the angular and spatial domain yields reconstructions that are on par or better than existing methods. | 翻訳日:2023-08-24 17:39:57 公開日:2023-08-23 |
# 検証のための大規模言語モデルの前方推論 Forward-Backward Reasoning in Large Language Models for Verification ( http://arxiv.org/abs/2308.07758v3 ) ライセンス: Link先を確認 | Weisen Jiang and Han Shi and Longhui Yu and Zhengying Liu and Yu Zhang and Zhenguo Li and James T. Kwok | (参考訳) Chain-of-Though (CoT)プロンプトは様々な推論タスクで有望なパフォーマンスを示している。
近年、自己整合性(Self-Consistency) \citep{wang2023selfConsistency} は、最も多くの票を得た回答が選択される間に、異なる回答につながる可能性のある様々な推論チェーンをサンプリングすることを提案する。
質問中のトークンを${\bf x}$でマスキングし、候補の回答が \textit{a simple template} によって提供されたときにマスクされたトークンを予測するようllmに尋ねる。
さらに, 候補回答の確率を推定するために, 前方と後方の推論を組み合わせるフォバーを提案する。
実験結果から,FOBARは様々な推論ベンチマークで最先端の性能を達成することが示された。 Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., "\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}" Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks. | 翻訳日:2023-08-24 17:39:38 公開日:2023-08-23 |
# ガイド付き量子ウォーク Guided quantum walk ( http://arxiv.org/abs/2308.05418v2 ) ライセンス: Link先を確認 | Sebastian Schulz, Dennis Willsch, Kristel Michielsen | (参考訳) 局所振幅伝達(LAT)の理論を利用して、断熱定理を超える量子ウォーク(QW)と量子アニール(QA)の洞察を得る。
これらのスケジュールは、問題サイズで線形にスケールする進化時間内の大規模な組合せ最適化問題を解くことができるかもしれない。 We utilize the theory of local amplitude transfers (LAT) to gain insights into quantum walks (QWs) and quantum annealing (QA) beyond the adiabatic theorem. By representing the eigenspace of the problem Hamiltonian as a hypercube graph, we demonstrate that probability amplitude traverses the search space through a series of local Rabi oscillations. We argue that the amplitude movement can be systematically guided towards the ground state using a time-dependent hopping rate based solely on the problem's energy spectrum. Building upon these insights, we extend the concept of multi-stage QW by introducing the guided quantum walk (GQW) as a bridge between QW-like and QA-like procedures. We assess the performance of the GQW on exact cover, traveling salesperson and garden optimization problems with 9 to 30 qubits. Our results provide evidence for the existence of optimal annealing schedules, beyond the requirement of adiabatic time evolutions. These schedules might be capable of solving large-scale combinatorial optimization problems within evolution times that scale linearly in the problem size. | 翻訳日:2023-08-24 17:39:03 公開日:2023-08-23 |
# ロングテール認識のための新しいクラス発見 Novel Class Discovery for Long-tailed Recognition ( http://arxiv.org/abs/2308.02989v2 ) ライセンス: Link先を確認 | Zhang Chuyu, Xu Ruijie, He Xuming | (参考訳) 新たなクラス発見は、最近大きな進歩を遂げたが、既存のメソッドは通常、クラスバランスのベンチマークにおけるアルゴリズムの改善に焦点を当てている。
本手法は, 緩和した最適輸送問題を解くことにより, 新規クラスの高品質な擬似ラベルを推定し, 既知のクラスおよび新規クラスの学習におけるクラスバイアスを効果的に軽減する。
私たちのコードはhttps://github.com/kleinzcy/NCDLRで利用可能です。 While the novel class discovery has recently made great progress, existing methods typically focus on improving algorithms on class-balanced benchmarks. However, in real-world recognition tasks, the class distributions of their corresponding datasets are often imbalanced, which leads to serious performance degeneration of those methods. In this paper, we consider a more realistic setting for novel class discovery where the distributions of novel and known classes are long-tailed. One main challenge of this new problem is to discover imbalanced novel classes with the help of long-tailed known classes. To tackle this problem, we propose an adaptive self-labeling strategy based on an equiangular prototype representation of classes. Our method infers high-quality pseudo-labels for the novel classes by solving a relaxed optimal transport problem and effectively mitigates the class biases in learning the known and novel classes. We perform extensive experiments on CIFAR100, ImageNet100, Herbarium19 and large-scale iNaturalist18 datasets, and the results demonstrate the superiority of our method. Our code is available at https://github.com/kleinzcy/NCDLR. | 翻訳日:2023-08-24 17:38:38 公開日:2023-08-23 |
# msecnet:マルチスケールエッジコンディショニングによる3次元点雲の高精度かつロバストな正規推定 MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning ( http://arxiv.org/abs/2308.02237v2 ) ライセンス: Link先を確認 | Haoyi Xiu, Xin Liu, Weimin Wang, Kyoung-Sook Kim, Masashi Matsuoka | (参考訳) 3次元点雲から表面の正規度を推定することは、表面の再構成やレンダリングを含む様々なアプリケーションにとって重要である。
この問題に対処するため, エッジ検出問題として正規変分モデルを適用し, 正常変分領域の推定を改善するMSECNetという新しい手法を提案する。
最後に,表面再構成におけるアプローチの有効性を示す。 Estimating surface normals from 3D point clouds is critical for various applications, including surface reconstruction and rendering. While existing methods for normal estimation perform well in regions where normals change slowly, they tend to fail where normals vary rapidly. To address this issue, we propose a novel approach called MSECNet, which improves estimation in normal varying regions by treating normal variation modeling as an edge detection problem. MSECNet consists of a backbone network and a multi-scale edge conditioning (MSEC) stream. The MSEC stream achieves robust edge detection through multi-scale feature fusion and adaptive edge detection. The detected edges are then combined with the output of the backbone network using the edge conditioning module to produce edge-aware representations. Extensive experiments show that MSECNet outperforms existing methods on both synthetic (PCPNet) and real-world (SceneNN) datasets while running significantly faster. We also conduct various analyses to investigate the contribution of each component in the MSEC stream. Finally, we demonstrate the effectiveness of our approach in surface reconstruction. | 翻訳日:2023-08-24 17:38:18 公開日:2023-08-23 |
# AutoPoster: ポスター生成を広告するための高自動・コンテンツ対応デザインシステム AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation ( http://arxiv.org/abs/2308.01095v2 ) ライセンス: Link先を確認 | Jinpeng Lin, Min Zhou, Ye Ma, Yifan Gao, Chenxi Fei, Yangjian Chen, Zhang Yu, Tiezheng Ge | (参考訳) 情報提示の形式である広告ポスターは、視覚と言語的モダリティを組み合わせる。
ユーザ実験および実験から得られた質的・定量的な成果は,他のポスター生成手法と比較して,システムの有効性とポスターの美的優越性に及ぼしている。 Advertising posters, a form of information presentation, combine visual and linguistic modalities. Creating a poster involves multiple steps and necessitates design experience and creativity. This paper introduces AutoPoster, a highly automatic and content-aware system for generating advertising posters. With only product images and titles as inputs, AutoPoster can automatically produce posters of varying sizes through four key stages: image cleaning and retargeting, layout generation, tagline generation, and style attribute prediction. To ensure visual harmony of posters, two content-aware models are incorporated for layout and tagline generation. Moreover, we propose a novel multi-task Style Attribute Predictor (SAP) to jointly predict visual style attributes. Meanwhile, to our knowledge, we propose the first poster generation dataset that includes visual attribute annotations for over 76k posters. Qualitative and quantitative outcomes from user studies and experiments substantiate the efficacy of our system and the aesthetic superiority of the generated posters compared to other poster generation methods. | 翻訳日:2023-08-24 17:37:36 公開日:2023-08-23 |
# CoC-GAN:画像生成の新しい道を開くためにコンテキストクラスタを利用する CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image Generation ( http://arxiv.org/abs/2308.11857v1 ) ライセンス: Link先を確認 | Zihao Wang, Yiming Huang, Ziyu Zhou | (参考訳) 画像生成タスクは伝統的に畳み込みニューラルネットワーク(cnn)やトランスフォーマーアーキテクチャを使用して機能集約とディスパッチを行う。
本稿では,特徴集約とディスパッチの領域における特徴的視点を提供するContext Clustering Generative Adversarial Network (CoC-GAN) として,このモデルを紹介する。
提案手法の有効性を実証し,より斬新で解釈可能な画像生成にコンテキストクラスタリングを適用する今後の研究を保証した。 Image generation tasks are traditionally undertaken using Convolutional Neural Networks (CNN) or Transformer architectures for feature aggregating and dispatching. Despite the frequent application of convolution and attention structures, these structures are not fundamentally required to solve the problem of instability and the lack of interpretability in image generation. In this paper, we propose a unique image generation process premised on the perspective of converting images into a set of point clouds. In other words, we interpret an image as a set of points. As such, our methodology leverages simple clustering methods named Context Clustering (CoC) to generate images from unordered point sets, which defies the convention of using convolution or attention mechanisms. Hence, we exclusively depend on this clustering technique, combined with the multi-layer perceptron (MLP) in a generative model. Furthermore, we implement the integration of a module termed the 'Point Increaser' for the model. This module is just an MLP tasked with generating additional points for clustering, which are subsequently integrated within the paradigm of the Generative Adversarial Network (GAN). We introduce this model with the novel structure as the Context Clustering Generative Adversarial Network (CoC-GAN), which offers a distinctive viewpoint in the domain of feature aggregating and dispatching. Empirical evaluations affirm that our CoC-GAN, devoid of convolution and attention mechanisms, exhibits outstanding performance. Its interpretability, endowed by the CoC module, also allows for visualization in our experiments. The promising results underscore the feasibility of our method and thus warrant future investigations of applying Context Clustering to more novel and interpretable image generation. | 翻訳日:2023-08-24 16:19:56 公開日:2023-08-23 |
# 完璧なフィットを見つける:ClimateBench v1.0に回帰モデルを適用する Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0 ( http://arxiv.org/abs/2308.11854v1 ) ライセンス: Link先を確認 | Anmol Chaure, Ashok Kumar Behera, Sudip Bhattacharya | (参考訳) エミュレータとして機能するデータ駆動機械学習モデルを使用した気候予測は、政策立案者が情報的な決定を下せるための研究の主流分野の1つである。
この方向では、climatebench [1]は、気候データ用に設計された機械学習エミュレータのパフォーマンスを評価するために最近キュレートされたベンチマークデータセットである。
あるいは、Support VectorとKernel Ridgeのモデルも競争力のある結果をもたらすが、対処すべきトレードオフがある。
さらに,回帰モデルの性能をさらに高め,降水などの現象を含む複雑な非線形パターンを効果的にモデル化するために,複合カーネルの性能や変分推論などの手法を積極的に研究している。 Climate projections using data driven machine learning models acting as emulators, is one of the prevailing areas of research to enable policy makers make informed decisions. Use of machine learning emulators as surrogates for computationally heavy GCM simulators reduces time and carbon footprints. In this direction, ClimateBench [1] is a recently curated benchmarking dataset for evaluating the performance of machine learning emulators designed for climate data. Recent studies have reported that despite being considered fundamental, regression models offer several advantages pertaining to climate emulations. In particular, by leveraging the kernel trick, regression models can capture complex relationships and improve their predictive capabilities. This study focuses on evaluating non-linear regression models using the aforementioned dataset. Specifically, we compare the emulation capabilities of three non-linear regression models. Among them, Gaussian Process Regressor demonstrates the best-in-class performance against standard evaluation metrics used for climate field emulation studies. However, Gaussian Process Regression suffers from being computational resource hungry in terms of space and time complexity. Alternatively, Support Vector and Kernel Ridge models also deliver competitive results and but there are certain trade-offs to be addressed. Additionally, we are actively investigating the performance of composite kernels and techniques such as variational inference to further enhance the performance of the regression models and effectively model complex non-linear patterns, including phenomena like precipitation. | 翻訳日:2023-08-24 16:19:29 公開日:2023-08-23 |
# モバイルデータを用いた駅混雑軽減のためのリアルタイム需要応答型鉄道再スケジューリングのための深層強化学習手法 A deep reinforcement learning approach for real-time demand-responsive railway rescheduling to mitigate station overcrowding using mobile data ( http://arxiv.org/abs/2308.11849v1 ) ライセンス: Link先を確認 | Enze Liu, Zhiyuan Lin, Judith Y.T. Wang, Hong Chen | (参考訳) リアルタイム鉄道再スケジュールは、時間変化条件に応じて運行スケジュールを自動的に変更する、タイムリーで柔軟な手法である。
本研究は, 乗客の到着・退避のダイナミクス, 駅過密, 転がり在庫不足, オープンエンドの中断期間, 複数経路の統合再スケジュール, 脱線による遅延など, このシナリオに関わる課題に対処する。
本研究では, リアルタイム需要満足度, 駅過密, 列車容量利用量, および車道安全を考慮しつつ, 最適再スケジュール時刻, 経路停止時間, 車両配置を決定するために, DRLフレームワークを提案する。 Real-time railway rescheduling is a timely and flexible technique to automatically alter the operation schedule in response to time-varying conditions. Current research lacks data-driven approaches that capture real-time passenger mobility during railway disruptions, relying mostly on OD-based data and model-based methods for estimating demands of trains. Meanwhile, the schedule-updating principles for a long-term disruption overlook the uneven distribution of demand over time. To fill this gap, this paper proposes a demand-responsive approach by inferring real-world passenger mobility from mobile data (MD) to facilitate real-time rescheduling. Unlike network-level approaches, this paper focuses on a heavy-demand station upstream of the disrupted area. The objective is to reschedule all trains on multiple routes passing through this target station, which have been affected by a severe emergency event such as a natural disaster. Particular attention should be given to avoiding the accumulation of overcrowded passengers at this station, to prevent additional accidents arising from overcrowding. This research addresses the challenges associated with this scenario, including the dynamics of arriving and leaving of passengers, station overcrowding, rolling stock shortage, open-ended disruption duration, integrated rescheduling on multiple routes, and delays due to detours. A deep reinforcement learning (DRL) framework is proposed to determine the optimal rescheduled timetable, route stops, and rolling stock allocation, while considering real-time demand satisfaction, station overcrowding, train capacity utilization, and headway safety. | 翻訳日:2023-08-24 16:19:09 公開日:2023-08-23 |
# 4次発振器のパラメータ空間形状と二重井戸ポテンシャル:古典的記述と量子的記述 Parameter space geometry of the quartic oscillator and the double well potential: Classical and quantum description ( http://arxiv.org/abs/2308.11848v1 ) ライセンス: Link先を確認 | Diego Gonzalez, Jorge Ch\'avez-Carlos, Jorge G. Hirsch, J. David Vergara | (参考訳) 量子計量テンソルとそのスカラー曲率を用いた非調和振動子のパラメータ空間の幾何学を解析的および数値的に計算する。
摂動法は四次ポテンシャルが消滅する領域を記述できないが、摂動および半古典形式の両方が基底状態が2つの井戸に非局在化され始める負の振動子パラメータを認識することは注目すべきである。 We compute both analytically and numerically the geometry of the parameter space of the anharmonic oscillator employing the quantum metric tensor and its scalar curvature. A novel semiclassical treatment based on a Fourier decomposition allows to construct classical analogues of the quantum metric tensor and of the expectation values of the transition matrix elements. A detailed comparison is presented between exact quantum numerical results, a perturbative quantum description and the semiclassical analysis. They are shown to coincide for both positive and negative quadratic potentials, where the potential displays a double well. Although the perturbative method is unable to describe the region where the quartic potential vanishes, it is remarkable that both the perturbative and semiclassical formalisms recognize the negative oscillator parameter at which the ground state starts to be delocalized in two wells. | 翻訳日:2023-08-24 16:18:38 公開日:2023-08-23 |
# SEA:クエリベースのブラックボックス攻撃に対する共有可能で説明可能な属性 SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks ( http://arxiv.org/abs/2308.11845v1 ) ライセンス: Link先を確認 | Yue Gao, Ilia Shumailov, Kassem Fawaz | (参考訳) 機械学習(ML)システムは、特にクエリベースのブラックボックス攻撃による敵の例に対して脆弱である。
SEAはHidden Markov Modelsフレームワークを利用して、観測されたクエリシーケンスを既知の攻撃に属性する。
例えば、 art v1.14 における signopt と square attack の実装は、50% 以上の特定のゼロ差分クエリを送信する。
各種設定でSEAを徹底的に評価し,90%Top-1と95%Top-3の精度で同一攻撃の2回目の発生を認識できることを実証した。 Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, there is a need for a more comprehensive approach to logging, analyzing, and sharing evidence of attacks. While classic security benefits from well-established forensics and intelligence sharing, Machine Learning is yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages the Hidden Markov Models framework to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than just focusing on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on their second occurrence, and is robust to adaptive strategies designed to evade forensics analysis. Interestingly, SEA's explanations of the attack behavior allow us even to fingerprint specific minor implementation bugs in attack libraries. For example, we discover that the SignOPT and Square attacks implementation in ART v1.14 sends over 50% specific zero difference queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack's second occurrence with 90+% Top-1 and 95+% Top-3 accuracy. | 翻訳日:2023-08-24 16:18:24 公開日:2023-08-23 |
# 量子ドットとリングにおける形状依存寿命と熱逃避の関係 A link between shape dependent lifetimes and thermal escape in quantum dots and rings ( http://arxiv.org/abs/2308.11843v1 ) ライセンス: Link先を確認 | H.~T.~Sullivan and J.~H.~Cole | (参考訳) 半導体ナノ構造の発光特性を理解することは, デバイス適用性を決定する上で重要である。
このことは、幾何学が量子構造における支配的な熱逃避過程を決定する重要な要因である可能性を示唆している。 Understanding the optical emission characteristics of semiconductor nanostructures is important when determing their device applicability. The emission depends on the material and its geometry, but also depends on other processes such as thermal escape from the nanostructure. Although it is widely accepted that scattering involving longitudinal optical phonons is the key process in thermal escape, it remains unclear why some quantum structures thermally emit excitons and other single charge carriers. To investigate this phenomena we theoretically determine the energy levels and temperature-lifetime relationships of quantum dots and rings. We find that replicating the observed temperature dependence of the exciton lifetime requires both an eigenspectrum and a thermal escape mechanism which are geometry dependent. This suggests that geometry may be a significant factor in determining the dominant thermal escape process in quantum structures. | 翻訳日:2023-08-24 16:17:56 公開日:2023-08-23 |
# 協調型マルチエージェント強化学習のための${\rm E}(3)$-equivariant Actor-Critic法 ${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2308.11842v1 ) ライセンス: Link先を確認 | Dingyang Chen, Qi Zhang | (参考訳) 自然界における対称的パターンの同定と分析は、物理学における重力法則の定式化や化学構造の研究の進展など、様々な科学分野において重要な発見をもたらした。
コードはhttps://github.com/dchen48/e3ac。 Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC. | 翻訳日:2023-08-24 16:17:42 公開日:2023-08-23 |
# 連合学習の評価に関する調査 : 目標と対策 A Survey for Federated Learning Evaluations: Goals and Measures ( http://arxiv.org/abs/2308.11841v1 ) ライセンス: Link先を確認 | Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, and Qiang Yang | (参考訳) 評価は、システムが意図した目的を達成する方法を評価するための体系的なアプローチである。
Federated Learning(FL)は、プライバシ保護機械学習のための新しいパラダイムであり、複数のパーティが機密データを共有せずにモデルを協調的にトレーニングすることができる。
最後に,FL評価の課題と今後の研究方向性について述べる。 Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation. | 翻訳日:2023-08-24 16:17:24 公開日:2023-08-23 |
# 圧縮モデルによるレースバイアスの圧縮:公正な顔認識のための量子モデル Compressed Models Decompress Race Biases: What Quantized Models Forget for Fair Face Recognition ( http://arxiv.org/abs/2308.11840v1 ) ライセンス: Link先を確認 | Pedro C. Neto, Eduarda Caldeira, Jaime S. Cardoso, Ana F. Sequeira | (参考訳) 顔認識のためのディープラーニングモデルがますます複雑化する中、これらのシステムを現実に展開することは困難になる。
1) より小さいモデルを使う。
2) 現在のモデルを圧縮する。
合成データおよび実データを用いて, 総合的パフォーマンス, 民族別サブグループにおけるパフォーマンス, 人種的バイアスについて検討した。
これらのモデルは4番目のデータセットで評価され、異なる民族における顔認識モデルの性能を推定し比較した。 With the ever-growing complexity of deep learning models for face recognition, it becomes hard to deploy these systems in real life. Researchers have two options: 1) use smaller models; 2) compress their current models. Since the usage of smaller models might lead to concerning biases, compression gains relevance. However, compressing might be also responsible for an increase in the bias of the final model. We investigate the overall performance, the performance on each ethnicity subgroup and the racial bias of a State-of-the-Art quantization approach when used with synthetic and real data. This analysis provides a few more details on potential benefits of performing quantization with synthetic data, for instance, the reduction of biases on the majority of test scenarios. We tested five distinct architectures and three different training datasets. The models were evaluated on a fourth dataset which was collected to infer and compare the performance of face recognition models on different ethnicity. | 翻訳日:2023-08-24 16:17:11 公開日:2023-08-23 |
# 校正に関するベンチマーク研究 A Benchmark Study on Calibration ( http://arxiv.org/abs/2308.11838v1 ) ライセンス: Link先を確認 | Linwei Tao, Younan Zhu, Haolan Guo, Minjing Dong, Chang Xu | (参考訳) ディープニューラルネットワークは、さまざまな機械学習タスクでますます活用されている。
私たちが知る限り、我々の研究は校正特性に関する最初の大規模調査であり、NASにおける校正問題に関する主要な研究である。 Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through data preprocessing, the use of specific loss functions, and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different tasks? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS. | 翻訳日:2023-08-24 16:16:55 公開日:2023-08-23 |
# フィードバックループを用いた対人訓練 Adversarial Training Using Feedback Loops ( http://arxiv.org/abs/2308.11881v1 ) ライセンス: Link先を確認 | Ali Haisam Muhammad Rafid, Adrian Sandu | (参考訳) ディープニューラルネットワーク(dnn)は、非常に複雑な入出力関係を正確に学習する能力があるため、多くの分野において幅広い適用性を見出している。
フィードバック制御を組み込んだニューラルネットワークアーキテクチャであるFeedback Neural Networksが提案されている。
標準的なテスト問題に対する数値的な結果から,我々のFLAT法は敵攻撃に対する防御技術よりも有効であることが示された。 Deep neural networks (DNN) have found wide applicability in numerous fields due to their ability to accurately learn very complex input-output relations. Despite their accuracy and extensive use, DNNs are highly susceptible to adversarial attacks due to limited generalizability. For future progress in the field, it is essential to build DNNs that are robust to any kind of perturbations to the data points. In the past, many techniques have been proposed to robustify DNNs using first-order derivative information of the network. This paper proposes a new robustification approach based on control theory. A neural network architecture that incorporates feedback control, named Feedback Neural Networks, is proposed. The controller is itself a neural network, which is trained using regular and adversarial data such as to stabilize the system outputs. The novel adversarial training approach based on the feedback control architecture is called Feedback Looped Adversarial Training (FLAT). Numerical results on standard test problems empirically show that our FLAT method is more effective than the state-of-the-art to guard against adversarial attacks. | 翻訳日:2023-08-24 16:10:14 公開日:2023-08-23 |
# SUMMIT:マルチモーダルターゲットへのユニモーダルモデルのソースフリー適応 SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets ( http://arxiv.org/abs/2308.11880v1 ) ライセンス: Link先を確認 | Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury | (参考訳) マルチモーダルデータを用いたシーン理解は、自律ナビゲーションなど多くのアプリケーションで必要である。
提案手法は,ドメインギャップの推定に基づいて,2つの相補的擬似ラベル融合法 – 合意フィルタリングとエントロピー重み付け – を自動的に選択する交換フレームワークを用いてこの問題を解決する。
私たちのコードはhttps://github.com/csimo005/SUMMIT.comで公開されています。 Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT. | 翻訳日:2023-08-24 16:09:58 公開日:2023-08-23 |
# カブリタ:外国語のギャップを埋める Cabrita: closing the gap for foreign languages ( http://arxiv.org/abs/2308.11878v1 ) ライセンス: Link先を確認 | Celio Larcher, Marcos Piau, Paulo Finardi, Pedro Gengo, Piero Esposito, Vinicius Carid\'a | (参考訳) 特定の言語やドメインのスクラッチからモデルをトレーニングする戦略は、以下の2つの重要な目的を果たす。
一 特定の言語的又は領域的文脈における性能の向上及び
二 効果的なトークン化の確保。
本研究は,OpenLLaMAとして知られる3ビリオンパラメータモデルを用いて,ポルトガル語のテキストのみを用いた継続事前学習を行い,OpenCabrita 3Bと命名した。
opencabrita 3bはまた、テキストを表すのに必要なトークンの数を大幅に削減する新しいトークン化機能を備えている。
評価では,この3bモデルと,従来の連続前トレーニングアプローチと,英語前トレーニングモデルの7bモデルとで類似した結果を得た。 The strategy of training the model from scratch in a specific language or domain serves two essential purposes: i) enhancing performance in the particular linguistic or domain context, and ii) ensuring effective tokenization. The main limitation inherent to this approach lies in the associated cost, which can reach six to seven-digit dollar values, depending on the model size and the number of parameters involved. The main solution to overcome the cost challenge is to rely on available pre-trained models, which, despite recent advancements such as the LLaMA and LLaMA-2 models, still demonstrate inefficiency for certain specific domain problems or prove ineffective in scenarios involving conversational memory resources, given the large number of tokens required to represent text. To overcome this issue, we present a methodology named Cabrita, which, as our research demonstrates, successfully addresses the performance and efficient tokenization problem, all at an affordable cost. We believe that this methodology can be applied to any transformer-like architecture model. To validate the study, we conducted continuous pre-training exclusively using Portuguese text on a 3-billion-parameter model known as OpenLLaMA, resulting in a model named openCabrita 3B. The openCabrita 3B also features a new tokenizer that results in a significant reduction in the number of tokens required to represent the text. In our assessment, for few-shot learning tasks, we achieved similar results with this 3B model compared to a traditional continuous pre-training approach as well as to 7B models English pre-trained models. | 翻訳日:2023-08-24 16:09:30 公開日:2023-08-23 |
# ワーンド分類のための統合画像と位置解析:ディープラーニングによるアプローチ Integrated Image and Location Analysis for Wound Classification: A Deep Learning Approach ( http://arxiv.org/abs/2308.11877v1 ) ライセンス: Link先を確認 | Yash Patel, Tirth Shah, Mrinal Kanti Dhar, Taiyu Zhang, Jeffrey Niezgoda, Sandeep Gopalkrishnan, Zeyun Yu | (参考訳) 急性傷と慢性傷の世界的な負担は, 創傷分類法の向上に有効な症例であり, 最適な治療法の診断と判定において重要なステップである。
本手法のユニークな特徴は, 従来の創傷画像分類手法を改良し, 正確な創傷位置タグ付けを容易にするボディーマップシステムの導入である。
提案手法は従来の手法よりも優れており, 位置分類なしの関心領域(ROI)では74.79%から100%, 位置分類付きROIでは73.98%から100%, 全画像分類では78.10%から100%であった。
本研究は,創傷画像分類のための効果的な意思決定支援ツールとしてのマルチモーダルネットワークの可能性を示す。 The global burden of acute and chronic wounds presents a compelling case for enhancing wound classification methods, a vital step in diagnosing and determining optimal treatments. Recognizing this need, we introduce an innovative multi-modal network based on a deep convolutional neural network for categorizing wounds into four categories: diabetic, pressure, surgical, and venous ulcers. Our multi-modal network uses wound images and their corresponding body locations for more precise classification. A unique aspect of our methodology is incorporating a body map system that facilitates accurate wound location tagging, improving upon traditional wound image classification techniques. A distinctive feature of our approach is the integration of models such as VGG16, ResNet152, and EfficientNet within a novel architecture. This architecture includes elements like spatial and channel-wise Squeeze-and-Excitation modules, Axial Attention, and an Adaptive Gated Multi-Layer Perceptron, providing a robust foundation for classification. Our multi-modal network was trained and evaluated on two distinct datasets comprising relevant images and corresponding location information. Notably, our proposed network outperformed traditional methods, reaching an accuracy range of 74.79% to 100% for Region of Interest (ROI) without location classifications, 73.98% to 100% for ROI with location classifications, and 78.10% to 100% for whole image classifications. This marks a significant enhancement over previously reported performance metrics in the literature. Our results indicate the potential of our multi-modal network as an effective decision-support tool for wound image classification, paving the way for its application in various clinical contexts. | 翻訳日:2023-08-24 16:09:07 公開日:2023-08-23 |
# モーション・トゥ・マッチ:3次元物体追跡のための混合パラダイム Motion-to-Matching: A Mixed Paradigm for 3D Single Object Tracking ( http://arxiv.org/abs/2308.11875v1 ) ライセンス: Link先を確認 | Zhiheng Li, Yu Lin, Yubo Cui, Shuo Li, Zheng Fang | (参考訳) LiDARポイントを用いた3次元物体追跡はコンピュータビジョン分野において重要な課題である。
コードは近くhttps://github.com/leozhiheng/mtm-tracker.gitで公開される。 3D single object tracking with LiDAR points is an important task in the computer vision field. Previous methods usually adopt the matching-based or motion-centric paradigms to estimate the current target status. However, the former is sensitive to the similar distractors and the sparseness of point cloud due to relying on appearance matching, while the latter usually focuses on short-term motion clues (eg. two frames) and ignores the long-term motion pattern of target. To address these issues, we propose a mixed paradigm with two stages, named MTM-Tracker, which combines motion modeling with feature matching into a single network. Specifically, in the first stage, we exploit the continuous historical boxes as motion prior and propose an encoder-decoder structure to locate target coarsely. Then, in the second stage, we introduce a feature interaction module to extract motion-aware features from consecutive point clouds and match them to refine target movement as well as regress other target states. Extensive experiments validate that our paradigm achieves competitive performance on large-scale datasets (70.9% in KITTI and 51.70% in NuScenes). The code will be open soon at https://github.com/LeoZhiheng/MTM-Tracker.git. | 翻訳日:2023-08-24 16:08:19 公開日:2023-08-23 |
# クラス分散ミスマッチによる重み付き蒸留による半教師付き学習 Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch ( http://arxiv.org/abs/2308.11874v1 ) ライセンス: Link先を確認 | Pan Du, Suyun Zhao, Zisen Sheng, Cuiping Li, Hong Chen | (参考訳) クラス分散ミスマッチ下での半教師付き学習(SSL)は、ラベル付けされていないデータがラベル付けされていない多くの未知のカテゴリを含むという課題に対処することを目的としている。
SSLの誤りを軽減するために、重み付けにより、教師なしのコントラッシブ表現からターゲット分類器へ、目的タスクに有益な知識を選択的に転送する、Weight-Aware Distillation (WAD)と呼ばれる堅牢なSSLフレームワークを提案する。
コードはhttps://github.com/RUC-DWBI-ML/research/tree/main/WAD-masterで公開されている。 Semi-Supervised Learning (SSL) under class distribution mismatch aims to tackle a challenging problem wherein unlabeled data contain lots of unknown categories unseen in the labeled ones. In such mismatch scenarios, traditional SSL suffers severe performance damage due to the harmful invasion of the instances with unknown categories into the target classifier. In this study, by strict mathematical reasoning, we reveal that the SSL error under class distribution mismatch is composed of pseudo-labeling error and invasion error, both of which jointly bound the SSL population risk. To alleviate the SSL error, we propose a robust SSL framework called Weight-Aware Distillation (WAD) that, by weights, selectively transfers knowledge beneficial to the target task from unsupervised contrastive representation to the target classifier. Specifically, WAD captures adaptive weights and high-quality pseudo labels to target instances by exploring point mutual information (PMI) in representation space to maximize the role of unlabeled data and filter unknown categories. Theoretically, we prove that WAD has a tight upper bound of population risk under class distribution mismatch. Experimentally, extensive results demonstrate that WAD outperforms five state-of-the-art SSL approaches and one standard baseline on two benchmark datasets, CIFAR10 and CIFAR100, and an artificial cross-dataset. The code is available at https://github.com/RUC-DWBI-ML/research/tree/main/WAD-master. | 翻訳日:2023-08-24 16:07:43 公開日:2023-08-23 |
# 大規模言語モデルとデバッグCコンパイラの統合によるコンテキストエラー説明の生成 Integrating Large Language Models into the Debugging C Compiler for generating contextual error explanations ( http://arxiv.org/abs/2308.11873v1 ) ライセンス: Link先を確認 | Andrew Taylor and Alexandra Vassar and Jake Renzella and Hammond Pearce | (参考訳) 本稿では,デバッギングcコンパイラ(dcc)内で,単純な言語で拡張されたコンパイラエラー説明を生成するための大規模言語モデル(llm)の手法を提案する。
当社のツールをオープンソースとしてコミュニティに公開しています。 This paper introduces a method for Large Language Models (LLM) to produce enhanced compiler error explanations, in simple language, within our Debugging C Compiler (DCC). It is well documented that compiler error messages have been known to present a barrier for novices learning how to program. Although our initial use of DCC in introductory programming (CS1) has been instrumental in teaching C to novice programmers by providing safeguards to commonly occurring errors and translating the usually cryptic compiler error messages at both compile- and run-time, we proposed that incorporating LLM-generated explanations would further enhance the learning experience for novice programmers. Through an expert evaluation, we observed that LLM-generated explanations for compiler errors were conceptually accurate in 90% of compile-time errors, and 75% of run-time errors. Additionally, the new DCC-help tool has been increasingly adopted by students, with an average of 1047 unique runs per week, demonstrating a promising initial assessment of using LLMs to complement compiler output to enhance programming education for beginners. We release our tool as open-source to the community. | 翻訳日:2023-08-24 16:07:04 公開日:2023-08-23 |
# 導電円錐の電磁カシミール-ポルダー相互作用 Electromagnetic Casimir-Polder Interaction for a Conducting Cone ( http://arxiv.org/abs/2308.11869v1 ) ライセンス: Link先を確認 | Noah Graham | (参考訳) 解析的に連続する角運動量の観点から、完全導電性円錐の電磁グリーン関数の定式化を用いて、円錐のカシミール-ポルダー相互作用エネルギーを分極可能な粒子で計算する。
完全導電ウェッジに対する類似のアプローチをまず再検討し、その結果の積分の数値的評価による計算を実演することによって、この形式性を導入する。 Using the formulation of the electromagnetic Green's function of a perfectly conducting cone in terms of analytically continued angular momentum, we compute the Casimir-Polder interaction energy of the cone with a polarizable particle. We introduce this formalism by first reviewing the analogous approach for a perfectly conducting wedge, and then demonstrate the calculation through numerical evaluation of the resulting integrals. | 翻訳日:2023-08-24 16:06:39 公開日:2023-08-23 |
# kinspeak:半教師付き学習手法によるkinyarwandaの音声認識改善 KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods ( http://arxiv.org/abs/2308.11863v1 ) ライセンス: Link先を確認 | Antoine Nzeyimana | (参考訳) 近年,Kinyarwanda音声データが大規模に書き起こされているにもかかわらず,Kinyarwandaの頑健な音声認識はいまだに困難である。
私たちの最終的なモデルは、新しいデータセットで3.2%の単語誤り率(wer)、mozilla common voiceベンチマークで15.9%のwerを達成しています。
また,文字ベースのトークン化ではなく音節を用いることで,キニルワンダの音声認識性能が向上することを示す。 Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in four successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.9% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda. | 翻訳日:2023-08-24 16:06:30 公開日:2023-08-23 |
# 複合パルスシステムにおけるロバスト量子制御のための教師付き学習 Supervised Learning for Robust Quantum Control in Composite-Pulse Systems ( http://arxiv.org/abs/2308.11861v1 ) ライセンス: Link先を確認 | Zhi-Cheng Shi, Jun-Tong Ding, Ye-Hong Chen, Jie Song, Yan Xia, X. X. Yi, and Franco Nori | (参考訳) 本研究では,複合パルスシステムにおいてロバストな量子制御を実現するための教師あり学習モデルを開発した。
この研究は、様々な物理パラメータをトレーニングすることで、フォールトトレラント量子計算のための高効率な学習モデルを提供する。 In this work, we develop a supervised learning model for implementing robust quantum control in composite-pulse systems, where the training parameters can be either phases, detunings, or Rabi frequencies. This model exhibits great resistance to all kinds of systematic errors, including single, multiple, and time-varying errors. We propose a modified gradient descent algorithm for adapting the training of phase parameters, and show that different sampling methods result in different robust performances. In particular, there is a tradeoff between high-fidelity and robustness for a given number of training parameters, and both of them can be simultaneously enhanced by increasing the number of training parameters (pulses). For its applications, we demonstrate that the current model can be used for achieving high-fidelity arbitrary superposition states and universal quantum gates in a robust manner. This work provides a highly efficient learning model for fault-tolerant quantum computation by training various physical parameters. | 翻訳日:2023-08-24 16:06:08 公開日:2023-08-23 |
# 教師なしドメイン適応人物再同定のためのカメラ駆動表現学習 Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification ( http://arxiv.org/abs/2308.11901v1 ) ライセンス: Link先を確認 | Geon Lee, Sanghoon Lee, Dohyung Kim, Younghoon Shin, Yongsang Yoon, Bumsub Ham | (参考訳) 本稿では、ラベル付きソースドメイン上でトレーニングされたモデルをラベル付きターゲットドメインに一般化する、人物再識別(reID)のための新しい教師なしドメイン適応手法を提案する。
実実実および合成実実実シナリオを含む標準ベンチマークの実験結果から,本フレームワークの有効性を実証した。 We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain dataset into multiple subsets based on the camera labels, and initially train our model with a single subset (i.e., images captured by a single camera). We then gradually exploit more subsets for training, according to a curriculum sequence obtained with a camera-driven scheduling rule. The scheduler considers maximum mean discrepancies (MMD) between each subset and the source domain dataset, such that the subset closer to the source domain is exploited earlier within the curriculum. For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way. We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs. To address the camera bias problem, we also introduce a camera-diversity (CD) loss encouraging person images of the same pseudo label, but captured across various cameras, to involve more for discriminative feature learning, providing person representations robust to inter-camera variations. Experimental results on standard benchmarks, including real-to-real and synthetic-to-real scenarios, demonstrate the effectiveness of our framework. | 翻訳日:2023-08-24 16:00:31 公開日:2023-08-23 |
# HashReID: 効果的な人物識別のためのバイナリコード付き動的ネットワーク HashReID: Dynamic Network with Binary Codes for Efficient Person Re-identification ( http://arxiv.org/abs/2308.11900v1 ) ライセンス: Link先を確認 | Kshitij Nikhal, Yujunrong Ma, Shuvra S. Bhattacharyya, Benjamin S. Riggan | (参考訳) 人身認証(ReID)のような生体認証アプリケーションは、しばしばエネルギー制約のあるデバイスにデプロイされる。
提案手法の大規模解析は, Market1501, MSMT17 (Multi-Scene Multi-Time), BGC1 (BRIAR Government Collection) の3つのデータセットを用いて行った。
コードは利用可能になる。 Biometric applications, such as person re-identification (ReID), are often deployed on energy constrained devices. While recent ReID methods prioritize high retrieval performance, they often come with large computational costs and high search time, rendering them less practical in real-world settings. In this work, we propose an input-adaptive network with multiple exit blocks, that can terminate computation early if the retrieval is straightforward or noisy, saving a lot of computation. To assess the complexity of the input, we introduce a temporal-based classifier driven by a new training strategy. Furthermore, we adopt a binary hash code generation approach instead of relying on continuous-valued features, which significantly improves the search process by a factor of 20. To ensure similarity preservation, we utilize a new ranking regularizer that bridges the gap between continuous and binary features. Extensive analysis of our proposed method is conducted on three datasets: Market1501, MSMT17 (Multi-Scene Multi-Time), and the BGC1 (BRIAR Government Collection). Using our approach, more than 70% of the samples with compact hash codes exit early on the Market1501 dataset, saving 80% of the networks computational cost and improving over other hash-based methods by 60%. These results demonstrate a significant improvement over dynamic networks and showcase comparable accuracy performance to conventional ReID methods. Code will be made available. | 翻訳日:2023-08-24 15:59:43 公開日:2023-08-23 |
# 4波混合法による表面プラズモン分極の増幅と励起 Amplification and Excitation of Surface Plasmon Polaritons via Four-Wave Mixing Process ( http://arxiv.org/abs/2308.11899v1 ) ライセンス: Link先を確認 | Andleeb Zahra, Muqaddar Abbas, Rahmat Ullah | (参考訳) 金属と半導体量子井戸(SQW)の界面に沿った表面プラズモン分極(SPP)の励起と増幅を4波混合(FWM)プロセスを用いて提案する。
さらに,長距離および短距離SPPにおける利得の影響を解析し,両タイプのSPPの伝播距離と寿命が向上することが確認された。 We suggest a scheme for the excitation and amplification of surface plasmon polaritons (SPPs) along the interface between metal and semiconductor quantum well (SQW), employing a four-wave mixing (FWM) process. The SQW consists of four-level asymmetric double quantum wells that exhibit quantum interference effects, which leads to the coupler-free excitation of SPPs. In our proposed system, the inherent losses of SPPs are compensated by introducing gain through the FWM process. This results in a significant enhancement in the propagation length and large penetration depth of SPPs. We further analyze the effect of gain on the long-range and short-range SPPs and observe that the propagation distance and lifetime of both types of SPPs are enhanced. | 翻訳日:2023-08-24 15:59:19 公開日:2023-08-23 |
# 異常検出のための一クラス分類最適化目標の探索 Exploring the Optimization Objective of One-Class Classification for Anomaly Detection ( http://arxiv.org/abs/2308.11898v1 ) ライセンス: Link先を確認 | Han Gao, Huiyuan Luo, Fei Shen, Zhengtao Zhang | (参考訳) ワンクラス分類 (one-class classification, occ) は、長年にわたる異常検出法である。
広範な実験により,本研究の信頼性と有効性が検証され,一級分類と産業用視覚異常検出とセグメンテーション課題の両方において最先端のパフォーマンスが得られた。 One-class classification (OCC) is a longstanding method for anomaly detection. With the powerful representation capability of the pre-trained backbone, OCC methods have witnessed significant performance improvements. Typically, most of these OCC methods employ transfer learning to enhance the discriminative nature of the pre-trained backbone's features, thus achieving remarkable efficacy. While most current approaches emphasize feature transfer strategies, we argue that the optimization objective space within OCC methods could also be an underlying critical factor influencing performance. In this work, we conducted a thorough investigation into the optimization objective of OCC. Through rigorous theoretical analysis and derivation, we unveil a key insights: any space with the suitable norm can serve as an equivalent substitute for the hypersphere center, without relying on the distribution assumption of training samples. Further, we provide guidelines for determining the feasible domain of norms for the OCC optimization objective. This novel insight sparks a simple and data-agnostic deep one-class classification method. Our method is straightforward, with a single 1x1 convolutional layer as a trainable projector and any space with suitable norm as the optimization objective. Extensive experiments validate the reliability and efficacy of our findings and the corresponding methodology, resulting in state-of-the-art performance in both one-class classification and industrial vision anomaly detection and segmentation tasks. | 翻訳日:2023-08-24 15:59:07 公開日:2023-08-23 |
# コントラスト学習による顔画像からの年齢予測 Age Prediction From Face Images Via Contrastive Learning ( http://arxiv.org/abs/2308.11896v1 ) ライセンス: Link先を確認 | Yeongnam Chae, Poulami Raha, Mijung Kim, Bjorn Stenger | (参考訳) 本稿では,顔画像から年齢を正確に推定するための新しいアプローチを提案する。
本稿では,FG-NET と MORPH-II の2つの公開データセット上での最先端性能の実現による提案手法の有効性を示す。 This paper presents a novel approach for accurately estimating age from face images, which overcomes the challenge of collecting a large dataset of individuals with the same identity at different ages. Instead, we leverage readily available face datasets of different people at different ages and aim to extract age-related features using contrastive learning. Our method emphasizes these relevant features while suppressing identity-related features using a combination of cosine similarity and triplet margin losses. We demonstrate the effectiveness of our proposed approach by achieving state-of-the-art performance on two public datasets, FG-NET and MORPH-II. | 翻訳日:2023-08-24 15:58:41 公開日:2023-08-23 |
# 身体的逆転例は自律運転にとって本当に重要か?
対人物体侵入攻撃のシステムレベル効果に向けて Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack ( http://arxiv.org/abs/2308.11894v1 ) ライセンス: Link先を確認 | Ningfei Wang, Yunpeng Luo, Takami Sato, Kaidi Xu, Qi Alfred Chen | (参考訳) 自律運転(AD)では、安全で安全な運転を実現するためには正確な認識が不可欠である。
本研究は, 既存の設計がシステムレベルの効果, 特にSTOPのシグネチャ回避攻撃に対して, その人気と重大さに寄与するかどうかを, 初めて測定したものである。
そこで我々は,AD文脈における新たなシステム駆動攻撃設計であるSysAdvを提案し,その評価結果から,システムレベルの効果が大幅に向上し,違反率が約70%向上することを示す。 In autonomous driving (AD), accurate perception is indispensable to achieving safe and secure driving. Due to its safety-criticality, the security of AD perception has been widely studied. Among different attacks on AD perception, the physical adversarial object evasion attacks are especially severe. However, we find that all existing literature only evaluates their attack effect at the targeted AI component level but not at the system level, i.e., with the entire system semantics and context such as the full AD pipeline. Thereby, this raises a critical research question: can these existing researches effectively achieve system-level attack effects (e.g., traffic rule violations) in the real-world AD context? In this work, we conduct the first measurement study on whether and how effectively the existing designs can lead to system-level effects, especially for the STOP sign-evasion attacks due to their popularity and severity. Our evaluation results show that all the representative prior works cannot achieve any system-level effects. We observe two design limitations in the prior works: 1) physical model-inconsistent object size distribution in pixel sampling and 2) lack of vehicle plant model and AD system model consideration. Then, we propose SysAdv, a novel system-driven attack design in the AD context and our evaluation results show that the system-level effects can be significantly improved, i.e., the violation rate increases by around 70%. | 翻訳日:2023-08-24 15:58:30 公開日:2023-08-23 |
# ギャップのブリッジ:大言語モデルを用いた語彙データの解読 Bridging the Gap: Deciphering Tabular Data Using Large Language Model ( http://arxiv.org/abs/2308.11891v1 ) ライセンス: Link先を確認 | Hengyuan Zhang, Peng Chang, Zongcheng Ji | (参考訳) 自然言語処理の領域では、表形式のデータの理解は学術的な調査の焦点として永久に立っていた。
本研究は,大規模言語モデルを表型質問応答タスクに適用し,表構造と内容の理解を深めた最初の事例である。 In the realm of natural language processing, the understanding of tabular data has perpetually stood as a focal point of scholarly inquiry. The emergence of expansive language models, exemplified by the likes of ChatGPT, has ushered in a wave of endeavors wherein researchers aim to harness these models for tasks related to table-based question answering. Central to our investigative pursuits is the elucidation of methodologies that amplify the aptitude of such large language models in discerning both the structural intricacies and inherent content of tables, ultimately facilitating their capacity to provide informed responses to pertinent queries. To this end, we have architected a distinctive module dedicated to the serialization of tables for seamless integration with expansive language models. Additionally, we've instituted a corrective mechanism within the model to rectify potential inaccuracies. Experimental results indicate that, although our proposed method trails the SOTA by approximately 11.7% in overall metrics, it surpasses the SOTA by about 1.2% in tests on specific datasets. This research marks the first application of large language models to table-based question answering tasks, enhancing the model's comprehension of both table structures and content. | 翻訳日:2023-08-24 15:58:09 公開日:2023-08-23 |
# 等変拡散モデルによる形状条件付き3次元分子生成 Shape-conditioned 3D Molecule Generation via Equivariant Diffusion Models ( http://arxiv.org/abs/2308.11890v1 ) ライセンス: Link先を確認 | Ziqi Chen, Bo Peng, Srinivasan Parthasarathy, Xia Ning | (参考訳) リガンドベースの薬物設計は、既知の活性分子と類似した形状の新しい薬物候補を特定することを目的としている。
本稿では, シリカ形状条件分子生成問題を定式化し, 与えられた分子の形状を条件とした3次元分子構造を生成する。
この問題に対処するために, 変換および回転同変形状誘導生成モデル shapemol を開発した。
これらの結果は、タンパク質標的ポケットに結合する所望の3d形状の薬物候補の設計におけるshapemolの可能性を示している。 Ligand-based drug design aims to identify novel drug candidates of similar shapes with known active molecules. In this paper, we formulated an in silico shape-conditioned molecule generation problem to generate 3D molecule structures conditioned on the shape of a given molecule. To address this problem, we developed a translation- and rotation-equivariant shape-guided generative model ShapeMol. ShapeMol consists of an equivariant shape encoder that maps molecular surface shapes into latent embeddings, and an equivariant diffusion model that generates 3D molecules based on these embeddings. Experimental results show that ShapeMol can generate novel, diverse, drug-like molecules that retain 3D molecular shapes similar to the given shape condition. These results demonstrate the potential of ShapeMol in designing drug candidates of desired 3D shapes binding to protein target pockets. | 翻訳日:2023-08-24 15:57:51 公開日:2023-08-23 |
# 3dポイントクラウドビジュアライゼーションのための統一フレームワーク A Unified Framework for 3D Point Cloud Visual Grounding ( http://arxiv.org/abs/2308.11887v1 ) ライセンス: Link先を確認 | Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong Jin, Donghao Luo, Chengjie Wang, Yan Wang, Liujuan Cao | (参考訳) 3Dポイント雲の視覚的接地は3Dシーン理解において重要な役割を担い、3D参照表現理解(3DREC)とセグメンテーション(3DRES)を含んでいる。
そこで本研究では,3DRECと3DRESを統合した3D Referring Transformer(3DRefTR)を提案する。
一 異種CPU-GPU並列性を利用することにより、GPUが視覚トークンの生成に占有されている間、CPUは同時にスーパーポイントを生成し、そのアップサンプリング計算を同等に達成する。
二 スーパーポイントとポイントクラウドの固有の関連性を利用して、アップサンプリングのための高解像度の視覚的特徴に対する計算オーバーヘッドをなくす。
具体的には、ScanReferデータセットにおいて、3DRefTRは最先端の3DRES法を12.43%mIoUで上回り、SOTA 3DREC法を0.6%Acc@0.25IoUで改善する。 3D point cloud visual grounding plays a critical role in 3D scene comprehension, encompassing 3D referring expression comprehension (3DREC) and segmentation (3DRES). We argue that 3DREC and 3DRES should be unified in one framework, which is also a natural progression in the community. To explain, 3DREC can help 3DRES locate the referent, while 3DRES can also facilitate 3DREC via more finegrained language-visual alignment. To achieve this, this paper takes the initiative step to integrate 3DREC and 3DRES into a unified framework, termed 3D Referring Transformer (3DRefTR). Its key idea is to build upon a mature 3DREC model and leverage ready query embeddings and visual tokens from the 3DREC model to construct a dedicated mask branch. Specially, we propose Superpoint Mask Branch, which serves a dual purpose: i) By leveraging the heterogeneous CPU-GPU parallelism, while the GPU is occupied generating visual tokens, the CPU concurrently produces superpoints, equivalently accomplishing the upsampling computation; ii) By harnessing on the inherent association between the superpoints and point cloud, it eliminates the heavy computational overhead on the high-resolution visual features for upsampling. This elegant design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6% additional latency compared to the original 3DREC model. Empirical evaluations affirm the superiority of 3DRefTR. Specifically, on the ScanRefer dataset, 3DRefTR surpasses the state-of-the-art 3DRES method by 12.43% in mIoU and improves upon the SOTA 3DREC method by 0.6% Acc@0.25IoU. | 翻訳日:2023-08-24 15:57:36 公開日:2023-08-23 |
# ウィキデータ分類をYAGOに統合する Integrating the Wikidata Taxonomy into YAGO ( http://arxiv.org/abs/2308.11884v1 ) ライセンス: Link先を確認 | Fabian Suchanek, Mehwish Alam, Thomas Bonald, Pierre-Henri Paris, Jules Soria | (参考訳) Wikidataは、公共の汎用知識ベース(KB)の1つである。
YAGO 4 KB では、Wikidata と Schema.org のオントロジーを組み合わせることで、分類学と制約の削減とクリーン化が可能になり、データ上で自動推論を実行できるようになりました。
本稿では,ウィキデータの全分類を可能な限りYAGO KBにマージする取り組みについて述べる。
我々の研究はYAGO 4.5を作成し、これは豊富な情報クラス層をYAGOに追加すると同時にKBを論理的に一貫性を保つ。 Wikidata is one of the largest public general-purpose Knowledge Bases (KBs). Yet, due to its collaborative nature, its schema and taxonomy have become convoluted. For the YAGO 4 KB, we combined Wikidata with the ontology from Schema.org, which reduced and cleaned up the taxonomy and constraints and made it possible to run automated reasoners on the data. However, it also cut away large parts of the Wikidata taxonomy. In this paper, we present our effort to merge the entire Wikidata taxonomy into the YAGO KB as much as possible. We pay particular attention to logical constraints and a careful distinction of classes and instances. Our work creates YAGO 4.5, which adds a rich layer of informative classes to YAGO, while at the same time keeping the KB logically consistent. | 翻訳日:2023-08-24 15:57:01 公開日:2023-08-23 |
# 説明可能な医用画像分類のための視覚概念フィルタリングを用いた概念ボトルネック Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification ( http://arxiv.org/abs/2308.11920v1 ) ライセンス: Link先を確認 | Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim | (参考訳) 解釈性は、様々な医療応用のために信頼できるモデルを構築する上で重要な要素である。
概念セットを構築するために大規模な人的労働を必要とする従来の手法とは異なり、概念を生成するためにLLM(Large Language Models)を利用する最近の研究は、自動概念生成を可能にした。
さらに,視覚的アクティベーションスコアを用いて,視覚的関連概念の選択に成功していることを示す。 Interpretability is a crucial factor in building reliable models for various medical applications. Concept Bottleneck Models (CBMs) enable interpretable image classification by utilizing human-understandable concepts as intermediate targets. Unlike conventional methods that require extensive human labor to construct the concept set, recent works leveraging Large Language Models (LLMs) for generating concepts made automatic concept generation possible. However, those methods do not consider whether a concept is visually relevant or not, which is an important factor in computing meaningful concept scores. Therefore, we propose a visual activation score that measures whether the concept contains visual cues or not, which can be easily computed with unlabeled image data. Computed visual activation scores are then used to filter out the less visible concepts, thus resulting in a final concept set with visually meaningful concepts. Our experimental results show that adopting the proposed visual activation score for concept filtering consistently boosts performance compared to the baseline. Moreover, qualitative analyses also validate that visually relevant concepts are successfully selected with the visual activation score. | 翻訳日:2023-08-24 15:49:24 公開日:2023-08-23 |
# AMSP-UOD:渦畳み込みと確率摂動と水中物体検出 AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection ( http://arxiv.org/abs/2308.11918v1 ) ライセンス: Link先を確認 | Jingchun Zhou, Zongxin He, Kin-Man Lam, Yudong Wang, Weishi Zhang, ChunLe Guo, Chongyi Li | (参考訳) 本稿では,水中物体検出のためのAmplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UODを提案する。
AMSP Vortex Convolution (AMSP-VConv) は, 物体検出性能に対するノイズの影響を軽減するため, ノイズ分布の破壊, 特徴抽出能力の向上, パラメータの効果的削減, ネットワークロバスト性の向上を目的としている。
本研究では, 複雑な水中環境下でのネットワーク性能を向上させるとともに, 長期・短距離の特徴の関連性を高めたFAD-CSPモジュールを設計する。
さらに, アスペクト比類似度閾値の非最大抑制に基づく高度後処理手法により, 雑草や魚の群れなどの密集した場面における検出を最適化し, 物体検出精度を向上する。
コードは公開される予定だ。 In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection. AMSP-UOD specifically addresses the impact of non-ideal imaging factors on detection accuracy in complex underwater environments. To mitigate the influence of noise on object detection performance, we propose AMSP Vortex Convolution (AMSP-VConv) to disrupt the noise distribution, enhance feature extraction capabilities, effectively reduce parameters, and improve network robustness. We design the Feature Association Decoupling Cross Stage Partial (FAD-CSP) module, which strengthens the association of long and short-range features, improving the network performance in complex underwater environments. Additionally, our sophisticated post-processing method, based on non-maximum suppression with aspect-ratio similarity thresholds, optimizes detection in dense scenes, such as waterweed and schools of fish, improving object detection accuracy. Extensive experiments on the URPC and RUOD datasets demonstrate that our method outperforms existing state-of-the-art methods in terms of accuracy and noise immunity. AMSP-UOD proposes an innovative solution with the potential for real-world applications. Code will be made publicly available. | 翻訳日:2023-08-24 15:49:04 公開日:2023-08-23 |
# LFS-GANの長寿命画像生成 LFS-GAN: Lifelong Few-Shot Image Generation ( http://arxiv.org/abs/2308.11917v1 ) ライセンス: Link先を確認 | Juwon Seo, Ji-Su Kang, Gyeong-Moon Park | (参考訳) 我々は、初めて挑戦的な生涯の少数ショット画像生成タスクに対処した。
そこで本稿では,この課題を解消するために,lng(lifelong few-shot gan,lfs-gan)というフレームワークを提案する。
提案するフレームワークは,効率的なタスク固有変調器-Learningable Factorized Tensor (LeFT) を用いて各タスクを学習する。
コードはgithubで入手できる。 We address a challenging lifelong few-shot image generation task for the first time. In this situation, a generative model learns a sequence of tasks using only a few samples per task. Consequently, the learned model encounters both catastrophic forgetting and overfitting problems at a time. Existing studies on lifelong GANs have proposed modulation-based methods to prevent catastrophic forgetting. However, they require considerable additional parameters and cannot generate high-fidelity and diverse images from limited data. On the other hand, the existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To alleviate these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in lifelong few-shot image generation task. Our proposed framework learns each task using an efficient task-specific modulator - Learnable Factorized Tensor (LeFT). LeFT is rank-constrained and has a rich representation ability due to its unique reconstruction technique. Furthermore, we propose a novel mode seeking loss to improve the diversity of our model in low-data circumstances. Extensive experiments demonstrate that the proposed LFS-GAN can generate high-fidelity and diverse images without any forgetting and mode collapse in various domains, achieving state-of-the-art in lifelong few-shot image generation task. Surprisingly, we find that our LFS-GAN even outperforms the existing few-shot GANs in the few-shot image generation task. The code is available at Github. | 翻訳日:2023-08-24 15:48:40 公開日:2023-08-23 |
# 部分変形一貫性による意味認識型暗黙的テンプレート学習 Semantic-Aware Implicit Template Learning via Part Deformation Consistency ( http://arxiv.org/abs/2308.11916v1 ) ライセンス: Link先を確認 | Sihyeon Kim, Minseok Joo, Jaewon Lee, Juyeon Ko, Juhan Cha, Hyunwoo J. Kim | (参考訳) 暗黙的なテンプレートをニューラルネットワークとして学習することは、教師なしの形状対応で印象的なパフォーマンスを示している。
コードはhttps://github.com/mlvlab/pdcで入手できる。 Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency and propose a semantic-aware implicit template learning framework to enable semantically plausible deformation. By leveraging semantic prior from a self-supervised feature extractor, we suggest local conditioning with novel semantic-aware deformation code and deformation consistency regularizations regarding part deformation, global deformation, and global scaling. Our extensive experiments demonstrate the superiority of the proposed method over baselines in various tasks: keypoint transfer, part label transfer, and texture transfer. More interestingly, our framework shows a larger performance gain under more challenging settings. We also provide qualitative analyses to validate the effectiveness of semantic-aware deformation. The code is available at https://github.com/mlvlab/PDC. | 翻訳日:2023-08-24 15:48:16 公開日:2023-08-23 |
# CausalGPTに向けて : LLMにおける因果一貫性の促進による多元的知識推論 Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs ( http://arxiv.org/abs/2308.11914v1 ) ライセンス: Link先を確認 | Ziyi Tang, Ruilin Wang, Weixing Chen, Keze Wang, Yang Liu, Tianshui Chen, Liang Lin | (参考訳) LLMの進歩にもかかわらず、知識に基づく推論は、知識のリコールと推論の脆弱さのために、長く続く問題である。
様々な知識推論タスク(科学質問応答やコモンセンス推論など)に関する広範囲かつ包括的な評価によると、我々のフレームワークは、最先端のアプローチを大きなマージンで比較する上で優れています。 Despite advancements in LLMs, knowledge-based reasoning remains a longstanding issue due to the fragility of knowledge recall and inference. Existing methods primarily encourage LLMs to autonomously plan and solve problems or to extensively sample reasoning chains without addressing the conceptual and inferential fallacies. Attempting to alleviate inferential fallacies and drawing inspiration from multi-agent collaboration, we present a framework to increase faithfulness and causality for knowledge-based reasoning. Specifically, we propose to employ multiple intelligent agents (i.e., reasoner and causal evaluator) to work collaboratively in a reasoning-and-consensus paradigm for elevated reasoning faithfulness. The reasoners focus on providing solutions with human-like causality to solve open-domain problems. On the other hand, the causal evaluator agent scrutinizes if the answer in a solution is causally deducible from the question and vice versa, with a counterfactual answer replacing the original. According to the extensive and comprehensive evaluations on a variety of knowledge reasoning tasks (e.g., science question answering and commonsense reasoning), our framework outperforms all compared state-of-the-art approaches by large margins. | 翻訳日:2023-08-24 15:47:56 公開日:2023-08-23 |
# コンピュータ適応テストにおける選択バイアスへの対処--ユーザ要求の影響関数アプローチ Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise Aggregate Influence Function Approach ( http://arxiv.org/abs/2308.11912v1 ) ライセンス: Link先を確認 | Soonwoo Kwon, Sojung Kim, Seunghyun Lee, Jin-Young Kim, Suyeong An, and Kyuseok Kim | (参考訳) コンピュータ化適応テスト(computerized adaptive testing、cat)は、テスト領域における試験者の熟練度レベルに適応する、広く使用される効率的なテストモードである。
3つの公開データセットと,実世界の猫応答データを含む1つのデータセットに基づいて,提案手法の優越性を示す実験を行った。 Computerized Adaptive Testing (CAT) is a widely used, efficient test mode that adapts to the examinee's proficiency level in the test domain. CAT requires pre-trained item profiles, for CAT iteratively assesses the student real-time based on the registered items' profiles, and selects the next item to administer using candidate items' profiles. However, obtaining such item profiles is a costly process that involves gathering a large, dense item-response data, then training a diagnostic model on the collected data. In this paper, we explore the possibility of leveraging response data collected in the CAT service. We first show that this poses a unique challenge due to the inherent selection bias introduced by CAT, i.e., more proficient students will receive harder questions. Indeed, when naively training the diagnostic model using CAT response data, we observe that item profiles deviate significantly from the ground-truth. To tackle the selection bias issue, we propose the user-wise aggregate influence function method. Our intuition is to filter out users whose response data is heavily biased in an aggregate manner, as judged by how much perturbation the added data will introduce during parameter estimation. This way, we may enhance the performance of CAT while introducing minimal bias to the item profiles. We provide extensive experiments to demonstrate the superiority of our proposed method based on the three public datasets and one dataset that contains real-world CAT response data. | 翻訳日:2023-08-24 15:47:33 公開日:2023-08-23 |
# ACLS:適応型および条件付きラベル平滑化によるネットワーク校正 ACLS: Adaptive and Conditional Label Smoothing for Network Calibration ( http://arxiv.org/abs/2308.11911v1 ) ライセンス: Link先を確認 | Hyekang Park, Jongyoun Noh, Youngmin Oh, Donghyeon Baek, Bumsub Ham | (参考訳) 本稿では,ディープニューラルネットワークの信頼度を補正するネットワークキャリブレーションの問題に対処する。
CIFAR10, Tiny-ImageNet, ImageNet, PASCAL VOCなど, 標準ベンチマークにおける画像分類とセマンティックセマンティックセマンティックセマンティクスの広範な実験結果を示し, 損失関数の有効性を示した。 We address the problem of network calibration adjusting miscalibrated confidences of deep neural networks. Many approaches to network calibration adopt a regularization-based method that exploits a regularization term to smooth the miscalibrated confidences. Although these approaches have shown the effectiveness on calibrating the networks, there is still a lack of understanding on the underlying principles of regularization in terms of network calibration. We present in this paper an in-depth analysis of existing regularization-based methods, providing a better understanding on how they affect to network calibration. Specifically, we have observed that 1) the regularization-based methods can be interpreted as variants of label smoothing, and 2) they do not always behave desirably. Based on the analysis, we introduce a novel loss function, dubbed ACLS, that unifies the merits of existing regularization methods, while avoiding the limitations. We show extensive experimental results for image classification and semantic segmentation on standard benchmarks, including CIFAR10, Tiny-ImageNet, ImageNet, and PASCAL VOC, demonstrating the effectiveness of our loss function. | 翻訳日:2023-08-24 15:47:09 公開日:2023-08-23 |
# 脳画像データのためのエッジ対応ハードクラスタリンググラフポーリング Edge-aware Hard Clustering Graph Pooling for Brain Imaging Data ( http://arxiv.org/abs/2308.11909v1 ) ライセンス: Link先を確認 | Cheng Zhu, Jiayi Zhu, Lijuan Zhang, Xi Wu, Shuqi Yang, Ping Liang, Honghan Chen, Ying Tan | (参考訳) グラフ畳み込みネットワーク(GCN)は、異なる脳領域間の非ユークリッド空間依存性を捉えることができ、GCNにおけるグラフプーリング演算子は、表現学習能力を高め、異常な脳地図を取得する鍵となる。
この手法は、データ駆動の観点から異なるタイプの機能的脳ネットワークを探索する可能性を秘めている最初のディープラーニングツールであると考えている。 Graph Convolutional Networks (GCNs) can capture non-Euclidean spatial dependence between different brain regions, and the graph pooling operator in GCNs is key to enhancing the representation learning capability and acquiring abnormal brain maps. However, the majority of existing research designs graph pooling operators only from the perspective of nodes while disregarding the original edge features, in a way that not only confines graph pooling application scenarios, but also diminishes its ability to capture critical substructures. In this study, a clustering graph pooling method that first supports multidimensional edge features, called Edge-aware hard clustering graph pooling (EHCPool), is developed. EHCPool proposes the first 'Edge-to-node' score evaluation criterion based on edge features to assess node feature significance. To more effectively capture the critical subgraphs, a novel Iteration n-top strategy is further designed to adaptively learn sparse hard clustering assignments for graphs. Subsequently, an innovative N-E Aggregation strategy is presented to aggregate node and edge feature information in each independent subgraph. The proposed model was evaluated on multi-site brain imaging public datasets and yielded state-of-the-art performance. We believe this method is the first deep learning tool with the potential to probe different types of abnormal functional brain networks from data-driven perspective. | 翻訳日:2023-08-24 15:46:50 公開日:2023-08-23 |
# ヒューリスティック学習のための許容境界の利用 Utilizing Admissible Bounds for Heuristic Learning ( http://arxiv.org/abs/2308.11905v1 ) ライセンス: Link先を確認 | Carlos N\'u\~nez-Molina and Masataro Asai | (参考訳) 近年,機械学習技術を用いた前方探索アルゴリズムのヒューリスティック関数の学習が注目されているが,学習すべき \emph{what} や学習すべき \emph{how} ,学習すべき \emph{why} などの理論的理解は乏しい。
さらに、訓練されたヒューリスティックの許容度が不足していることから、許容度 \emph{during} 学習の役割にはほとんど焦点が当てられていない。
本稿では,教師付きヒューリスティック学習における許容的ヒューリスティックスの役割を,通常のガウス分布と比較して仮説空間を狭くするTrncated Gaussian distributionsのパラメータとして用いた。
この数学的モデルは最大エントロピーの原理に忠実に従い、結果としてより正確なヒューリスティックが得られ、訓練中により早く収束することを示す。 While learning a heuristic function for forward search algorithms with modern machine learning techniques has been gaining interest in recent years, there has been little theoretical understanding of \emph{what} they should learn, \emph{how} to train them, and \emph{why} we do so. This lack of understanding leads to various literature performing an ad-hoc selection of datasets (suboptimal vs optimal costs or admissible vs inadmissible heuristics) and optimization metrics (e.g., squared vs absolute errors). Moreover, due to the lack of admissibility of the resulting trained heuristics, little focus has been put on the role of admissibility \emph{during} learning. This paper articulates the role of admissible heuristics in supervised heuristic learning using them as parameters of Truncated Gaussian distributions, which tightens the hypothesis space compared to ordinary Gaussian distributions. We argue that this mathematical model faithfully follows the principle of maximum entropy and empirically show that, as a result, it yields more accurate heuristics and converges faster during training. | 翻訳日:2023-08-24 15:46:26 公開日:2023-08-23 |
# 半教師型医用画像分割のためのデータ摂動とモデル安定化の再考 Rethinking Data Perturbation and Model Stabilization for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2308.11903v1 ) ライセンス: Link先を確認 | Zhen Zhao, Ye Liu, Meng Zhao, Di Yin, Yixuan Yuan, Luping Zhou | (参考訳) 近年,半監督型医用画像分割(SSMIS)の研究が急速に進展している。
一方,強増補を施す場合の ema teacher の使用は必ずしも性能を向上させるものではない。
その単純さにもかかわらず、DPMSはパブリックな2D ACDCと3D LAデータセットで新しい最先端のパフォーマンスを得ることができ、例えば、5%のラベルを持つACDC上の以前のSOTAよりも22.62%改善されている。 Studies on semi-supervised medical image segmentation (SSMIS) have seen fast progress recently. Due to the limited labelled data, SSMIS methods mainly focus on effectively leveraging unlabeled data to enhance the segmentation performance. However, despite their promising performance, current state-of-the-art methods often prioritize integrating complex techniques and loss terms rather than addressing the core challenges of semi-supervised scenarios directly. We argue that the key to SSMIS lies in generating substantial and appropriate prediction disagreement on unlabeled data. To this end, we emphasize the crutiality of data perturbation and model stabilization in semi-supervised segmentation, and propose a simple yet effective approach to boost SSMIS performance significantly, dubbed DPMS. Specifically, we first revisit SSMIS from three distinct perspectives: the data, the model, and the loss, and conduct a comprehensive study of corresponding strategies to examine their effectiveness. Based on these examinations, we then propose DPMS, which adopts a plain teacher-student framework with a standard supervised loss and unsupervised consistency loss. To produce appropriate prediction disagreements, DPMS perturbs the unlabeled data via strong augmentations to enlarge prediction disagreements considerably. On the other hand, using EMA teacher when strong augmentation is applied does not necessarily improve performance. DPMS further utilizes a forwarding-twice and momentum updating strategies for normalization statistics to stabilize the training on unlabeled data effectively. Despite its simplicity, DPMS can obtain new state-of-the-art performance on the public 2D ACDC and 3D LA datasets across various semi-supervised settings, e.g. obtaining a remarkable 22.62% improvement against previous SOTA on ACDC with 5% labels. | 翻訳日:2023-08-24 15:46:03 公開日:2023-08-23 |
# 小売需要予測:多変量時系列の比較研究 Retail Demand Forecasting: A Comparative Study for Multivariate Time Series ( http://arxiv.org/abs/2308.11939v1 ) ライセンス: Link先を確認 | Md Sabbirul Haque, Md Shahedul Amin, Jonayet Miah | (参考訳) 小売業における正確な需要予測は、金融パフォーマンスとサプライチェーン効率の重要な決定要因である。
本研究では,CPI(Consumer Price Index),ICS(Index of Consumer Sentiment),失業率などのマクロ経済変数による顧客需要の時系列データを統合することで,このギャップを埋める。
この包括的なデータセットを利用して、様々な回帰モデルと機械学習モデルを開発し比較し、小売需要を正確に予測する。 Accurate demand forecasting in the retail industry is a critical determinant of financial performance and supply chain efficiency. As global markets become increasingly interconnected, businesses are turning towards advanced prediction models to gain a competitive edge. However, existing literature mostly focuses on historical sales data and ignores the vital influence of macroeconomic conditions on consumer spending behavior. In this study, we bridge this gap by enriching time series data of customer demand with macroeconomic variables, such as the Consumer Price Index (CPI), Index of Consumer Sentiment (ICS), and unemployment rates. Leveraging this comprehensive dataset, we develop and compare various regression and machine learning models to predict retail demand accurately. | 翻訳日:2023-08-24 15:40:52 公開日:2023-08-23 |
# イベント画像・ボクセル特徴融合のための学習ボトルネックトランスフォーマー Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification ( http://arxiv.org/abs/2308.11937v1 ) ライセンス: Link先を確認 | Chengguo Yuan, Yu Jin, Zongzhen Wu, Fanting Wei, Yangzirui Wang, Lan Chen, and Xiao Wang | (参考訳) 近年,イベントベースカメラを用いた対象物体の認識が注目されている。
この作業のソースコードは以下の通りである。 \url{https://github.com/Event-AHU/EFV_event_classification} Recognizing target objects using an event-based camera draws more and more attention in recent years. Existing works usually represent the event streams into point-cloud, voxel, image, etc, and learn the feature representations using various deep neural networks. Their final results may be limited by the following factors: monotonous modal expressions and the design of the network structure. To address the aforementioned challenges, this paper proposes a novel dual-stream framework for event representation, extraction, and fusion. This framework simultaneously models two common representations: event images and event voxels. By utilizing Transformer and Structured Graph Neural Network (GNN) architectures, spatial information and three-dimensional stereo information can be learned separately. Additionally, a bottleneck Transformer is introduced to facilitate the fusion of the dual-stream information. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on two widely used event-based classification datasets. The source code of this work is available at: \url{https://github.com/Event-AHU/EFV_event_classification} | 翻訳日:2023-08-24 15:40:40 公開日:2023-08-23 |
# 連続時間線形力学系のシステム同定 System Identification for Continuous-time Linear Dynamical Systems ( http://arxiv.org/abs/2308.11933v1 ) ライセンス: Link先を確認 | Peter Halmos, Jonathan Pillow, David A. Knowles | (参考訳) カルマンフィルタのシステム同定の問題は、力学系の基本パラメータを学習するための期待最大化(EM)法に依存しており、観測が等間隔の時間点でサンプリングされることを前提に研究が進められている。
本稿では, 連続時間It\^o確率微分方程式(SDE)を潜時状態と共分散ダイナミクスの解に頼って, カルマンフィルタの学習を一般化することを目的として, 連続離散フィルタのシステム同定に対処する。
生物学的に現実的なパラメータを用いてトグルスイッチの遺伝的回路を表す潜在多変量フォッカープランクsdeのパラメータを学習し、離散時間カルマンフィルタに対する学習の有効性をダイナミクス行列の増加のステップサイズの不規則性とスペクトルラジウスとして比較する。 The problem of system identification for the Kalman filter, relying on the expectation-maximization (EM) procedure to learn the underlying parameters of a dynamical system, has largely been studied assuming that observations are sampled at equally-spaced time points. However, in many applications this is a restrictive and unrealistic assumption. This paper addresses system identification for the continuous-discrete filter, with the aim of generalizing learning for the Kalman filter by relying on a solution to a continuous-time It\^o stochastic differential equation (SDE) for the latent state and covariance dynamics. We introduce a novel two-filter, analytical form for the posterior with a Bayesian derivation, which yields analytical updates which do not require the forward-pass to be pre-computed. Using this analytical and efficient computation of the posterior, we provide an EM procedure which estimates the parameters of the SDE, naturally incorporating irregularly sampled measurements. Generalizing the learning of latent linear dynamical systems (LDS) to continuous-time may extend the use of the hybrid Kalman filter to data which is not regularly sampled or has intermittent missing values, and can extend the power of non-linear system identification methods such as switching LDS (SLDS), which rely on EM for the linear discrete-time Kalman filter as a sub-unit for learning locally linearized behavior of a non-linear system. We apply the method by learning the parameters of a latent, multivariate Fokker-Planck SDE representing a toggle-switch genetic circuit using biologically realistic parameters, and compare the efficacy of learning relative to the discrete-time Kalman filter as the step-size irregularity and spectral-radius of the dynamics-matrix increases. | 翻訳日:2023-08-24 15:40:24 公開日:2023-08-23 |
# 水中画像強調のための固有スーパービジョンによる相乗的マルチスケールディテール微細化 Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement ( http://arxiv.org/abs/2308.11932v1 ) ライセンス: Link先を確認 | Dehuan Zhang, Jingchun Zhou, Weishi Zhang, ChunLe Guo, Chongyi Li | (参考訳) 水中シーンの視覚復元は視覚タスクにとって不可欠であり、水中メディアからの干渉を避けることが重要な関心事となっている。
低分解段階(low-degradation stage)は、アダプティブ・選択的内在教師付き特徴モジュール(asisf)を通じた特徴伝播を通じて、相乗的多スケール詳細を達成する相乗的多スケール詳細化を実現するオリジナルステージのための多スケール詳細を提供する。
コードは公開される予定だ。 Visual restoration of underwater scenes is crucial for visual tasks, and avoiding interference from underwater media has become a prominent concern. In this work, we present a synergistic multiscale detail refinement via intrinsic supervision (SMDR-IS) to recover underwater scene details. The low-degradation stage provides multiscale detail for original stage, which achieves synergistic multiscale detail refinement through feature propagation via the adaptive selective intrinsic supervised feature module (ASISF), which achieves synergistic multiscale detail refinement. ASISF is developed using intrinsic supervision to precisely control and guide feature transmission in the multi-degradation stages. ASISF improves the multiscale detail refinement while reducing interference from irrelevant scene information from the low-degradation stage. Additionally, within the multi-degradation encoder-decoder of SMDR-IS, we introduce a bifocal intrinsic-context attention module (BICA). This module is designed to effectively leverage multi-scale scene information found in images, using intrinsic supervision principles as its foundation. BICA facilitates the guidance of higher-resolution spaces by leveraging lower-resolution spaces, considering the significant dependency of underwater image restoration on spatial contextual relationships. During the training process, the network gains advantages from the integration of a multi-degradation loss function. This function serves as a constraint, enabling the network to effectively exploit information across various scales. When compared with state-of-the-art methods, SMDR-IS demonstrates its outstanding performance. Code will be made publicly available. | 翻訳日:2023-08-24 15:39:49 公開日:2023-08-23 |
# 亜熱帯市街地における地すべり原因の解明に向けての過去30年間の動的地すべり感受性マッピング Dynamic landslide susceptibility mapping over recent three decades to uncover variations in landslide causes in subtropical urban mountainous areas ( http://arxiv.org/abs/2308.11929v1 ) ライセンス: Link先を確認 | Peifeng Ma, Li Chen, Chang Yu, Qing Zhu, Yulin Ding | (参考訳) 地すべりリスクの軽減には地すべり感受性評価(lsa)が重要である。
本研究は, 年次lsaに複数の予測モデルを用いる動的地すべり率マッピングを提案する。
さらに, MT-InSAR を LSA 結果の強化と検証に応用した。
また, 地すべり原因の変動は, 地球規模の気候変動による極度の降雨現象と, 香港政府による地すべり防止緩和計画(LPMitP)の実施によるものであることが示唆された。 Landslide susceptibility assessment (LSA) is of paramount importance in mitigating landslide risks. Recently, there has been a surge in the utilization of data-driven methods for predicting landslide susceptibility due to the growing availability of aerial and satellite data. Nonetheless, the rapid oscillations within the landslide-inducing environment (LIE), primarily due to significant changes in external triggers such as rainfall, pose difficulties for contemporary data-driven LSA methodologies to accommodate LIEs over diverse timespans. This study presents dynamic landslide susceptibility mapping that simply employs multiple predictive models for annual LSA. In practice, this will inevitably encounter small sample problems due to the limited number of landslide samples in certain years. Another concern arises owing to the majority of the existing LSA approaches train black-box models to fit distinct datasets, yet often failing in generalization and providing comprehensive explanations concerning the interactions between input features and predictions. Accordingly, we proposed to meta-learn representations with fast adaptation ability using a few samples and gradient updates; and apply SHAP for each model interpretation and landslide feature permutation. Additionally, we applied MT-InSAR for LSA result enhancement and validation. The chosen study area is Lantau Island, Hong Kong, where we conducted a comprehensive dynamic LSA spanning from 1992 to 2019. The model interpretation results demonstrate that the primary factors responsible for triggering landslides in Lantau Island are terrain slope and extreme rainfall. The results also indicate that the variation in landslide causes can be primarily attributed to extreme rainfall events, which result from global climate change, and the implementation of the Landslip Prevention and Mitigation Programme (LPMitP) by the Hong Kong government. | 翻訳日:2023-08-24 15:38:46 公開日:2023-08-23 |
# OFVL-MS: 複数の屋内シーンにまたがる視覚的ローカライゼーション OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes ( http://arxiv.org/abs/2308.11928v1 ) ライセンス: Link先を確認 | Tao Xie, Kun Dai, Siyi Lu, Ke Wang, Zhiqiang Jiang, Jinghan Gao, Dedong Liu, Jie Xu, Lijun Zhao, Ruifeng Li | (参考訳) 本研究では,各シーンの局所化を新たなタスクとして捉えたマルチタスク学習手法を用いて,シーン間のカメラポーズの予測を試みる。
また,ofvl-msが局所化性能を向上しつつ,パラメータの少ない新たなシーンに一般化できることを検証した。 In this work, we seek to predict camera poses across scenes with a multi-task learning manner, where we view the localization of each scene as a new task. We propose OFVL-MS, a unified framework that dispenses with the traditional practice of training a model for each individual scene and relieves gradient conflict induced by optimizing multiple scenes collectively, enabling efficient storage yet precise visual localization for all scenes. Technically, in the forward pass of OFVL-MS, we design a layer-adaptive sharing policy with a learnable score for each layer to automatically determine whether the layer is shared or not. Such sharing policy empowers us to acquire task-shared parameters for a reduction of storage cost and task-specific parameters for learning scene-related features to alleviate gradient conflict. In the backward pass of OFVL-MS, we introduce a gradient normalization algorithm that homogenizes the gradient magnitude of the task-shared parameters so that all tasks converge at the same pace. Furthermore, a sparse penalty loss is applied on the learnable scores to facilitate parameter sharing for all tasks without performance degradation. We conduct comprehensive experiments on multiple benchmarks and our new released indoor dataset LIVL, showing that OFVL-MS families significantly outperform the state-of-the-arts with fewer parameters. We also verify that OFVL-MS can generalize to a new scene with much few parameters while gaining superior localization performance. | 翻訳日:2023-08-24 15:38:15 公開日:2023-08-23 |
# 液相電子顕微鏡による分子の3次元ダイナミクスの復元 Recovering a Molecule's 3D Dynamics from Liquid-phase Electron Microscopy Movies ( http://arxiv.org/abs/2308.11927v1 ) ライセンス: Link先を確認 | Enze Ye, Yuhang Wang, Hong Zhang, Yiqin Gao, Huan Wang, He Sun | (参考訳) 生体分子の動態は、生物系におけるその機能を理解する上で重要である。
構造生物学における分子の3dダイナミクスを研究するための、有望な新しいアプローチを提供する。 The dynamics of biomolecules are crucial for our understanding of their functioning in living systems. However, current 3D imaging techniques, such as cryogenic electron microscopy (cryo-EM), require freezing the sample, which limits the observation of their conformational changes in real time. The innovative liquid-phase electron microscopy (liquid-phase EM) technique allows molecules to be placed in the native liquid environment, providing a unique opportunity to observe their dynamics. In this paper, we propose TEMPOR, a Temporal Electron MicroscoPy Object Reconstruction algorithm for liquid-phase EM that leverages an implicit neural representation (INR) and a dynamical variational auto-encoder (DVAE) to recover time series of molecular structures. We demonstrate its advantages in recovering different motion dynamics from two simulated datasets, 7bcq and Cas9. To our knowledge, our work is the first attempt to directly recover 3D structures of a temporally-varying particle from liquid-phase EM movies. It provides a promising new approach for studying molecules' 3D dynamics in structural biology. | 翻訳日:2023-08-24 15:37:50 公開日:2023-08-23 |
# 物理情報ニューラルネットワークを用いた楕円最適制御問題の解法 Solving Elliptic Optimal Control Problems using Physics Informed Neural Networks ( http://arxiv.org/abs/2308.11925v1 ) ライセンス: Link先を確認 | Bangti Jin and Ramesh Sau and Luowei Yin and Zhi Zhou | (参考訳) 本研究では,線形および半線形2次楕円問題に対する最適制御問題(ボックス制約なし/無制約)に対する数値解法を提案し,解析する。
提案手法の具体例をいくつか提示し,既存の3つの手法と比較する。 In this work, we present and analyze a numerical solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. The approach is based on a coupled system derived from the first-order optimality system of the optimal control problem, and applies physics informed neural networks (PINNs) to solve the coupled system. We present an error analysis of the numerical scheme, and provide $L^2(\Omega)$ error bounds on the state, control and adjoint state in terms of deep neural network parameters (e.g., depth, width, and parameter bounds) and the number of sampling points in the domain and on the boundary. The main tools in the analysis include offset Rademacher complexity and boundedness and Lipschitz continuity of neural network functions. We present several numerical examples to illustrate the approach and compare it with three existing approaches. | 翻訳日:2023-08-24 15:37:32 公開日:2023-08-23 |
# 報酬のないマルコフ決定プロセスで多様な政策が収束 Diverse Policies Converge in Reward-free Markov Decision Processe ( http://arxiv.org/abs/2308.11924v1 ) ライセンス: Link先を確認 | Fanqi Lin, Shiyu Huang, Weiwei Tu | (参考訳) 強化学習は多くの意思決定タスクで大きな成功を収めており、従来の強化学習アルゴリズムは主に一つの最適解を得るために設計されている。
最後に,本手法の有効性を数値実験により検証する。 Reinforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments. | 翻訳日:2023-08-24 15:37:13 公開日:2023-08-23 |
# 類似性不一致を利用した音声差分キャプション Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement ( http://arxiv.org/abs/2308.11923v1 ) ライセンス: Link先を確認 | Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino | (参考訳) 類似の音声クリップの入力対間の意味的差異を記述するために,音声キャプションの新たな拡張タスクとして,ADC(Audio difference Captioning)を提案する。
AudioDiffCapsデータセットを用いた実験により,提案手法はADCタスクを効果的に解き,アテンション重みを改良し,トランスフォーマーエンコーダに表示することで差分を抽出することを示した。 We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips. The ADC solves the problem that conventional audio captioning sometimes generates similar captions for similar audio clips, failing to describe the difference in content. We also propose a cross-attention-concentrated transformer encoder to extract differences by comparing a pair of audio clips and a similarity-discrepancy disentanglement to emphasize the difference in the latent space. To evaluate the proposed methods, we built an AudioDiffCaps dataset consisting of pairs of similar but slightly different audio clips with human-annotated descriptions of their differences. The experiment with the AudioDiffCaps dataset showed that the proposed methods solve the ADC task effectively and improve the attention weights to extract the difference by visualizing them in the transformer encoder. | 翻訳日:2023-08-24 15:37:02 公開日:2023-08-23 |
# グラフェンマッハ・ゼーダー干渉計におけるナノバブルによるひずみ効果の検出 Detecting Strain Effects due to Nanobubbles in Graphene Mach-Zehnder Interferometers ( http://arxiv.org/abs/2308.11954v1 ) ライセンス: Link先を確認 | Nojoon Myoung, Taegeun Song, Hee Chul Park | (参考訳) 量子ホール状態におけるグラフェンp-n接合によるマッハツェンダー干渉計(mz)に対する弾性ひずみの影響について検討した。
本研究は,グラフェンをひずみセンサとして使用し,グラフェンを用いたデバイス製造・測定技術の開発の可能性を示している。 We investigate the effect of elastic strain on a Mach-Zehnder (MZ) interferometer created by graphene p-n junction in quantum Hall regime. We demonstrate that a Gaussian-shaped nanobubble causes detuning of the quantum Hall conductance oscillations across the p-n junction, due to the strain-induced local pseudo-magnetic fields. By performing a machine-learning-based Fourier analysis, we differentiate the nanobubble-induced Fourier component from the conductance oscillations originating from the external magnetic fields. We show that the detuning of the conductance oscillations is due to the altered pathway of quantum Hall interface channels caused by the strain-induced pseudo-magnetic fields. In the presence of the nanobubble, a new Fourier component for a magnetic flux $\Phi_{0}/2$ appears, and the corresponding MZ interferometry indicates that the enclosed area is reduced by half due to the strain-mediated pathway between two quantum Hall interface channels. Our findings suggest the potential of using graphene as a strain sensor for developments in graphene-based device fabrications and measurements technologies. | 翻訳日:2023-08-24 15:28:45 公開日:2023-08-23 |
# MiniBatch SGDがSplitFed Learningに出会ったとき:収束分析と性能評価 When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation ( http://arxiv.org/abs/2308.11953v1 ) ライセンス: Link先を確認 | Chao Huang, Geng Tian, Ming Tang | (参考訳) フェデレーション学習(fl)は、生データを共有することなく、分散クライアント(エッジデバイスなど)間で協調的なモデルトレーニングを可能にする。
SplitFed Learning(SFL)は、クライアントデバイスにおける計算負荷を軽減するために、カットされたレイヤでモデルを2つの部分に分割することで、クライアントがモデルの一部をトレーニングするだけである。
しかし、SFLはクライアントのデータが非IIDである場合にも、textit{client drift}問題に悩まされている。
このアルゴリズムはMiniBatch SGDをSFLに組み込み、クライアントはFL方式でクライアントサイドモデルを訓練し、サーバはMiniBatch SGDに似たサーバサイドモデルを訓練する。
精度の改善は、高い非IIDデータでそれぞれ24.1\%と17.1\%に達する。 Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively. | 翻訳日:2023-08-24 15:28:26 公開日:2023-08-23 |
# ビデオからのポーズ変調アバター Pose Modulated Avatars from Video ( http://arxiv.org/abs/2308.11951v1 ) ライセンス: Link先を確認 | Chunjin Song, Bastian Wandt, Helge Rhodin | (参考訳) 基礎となる骨格によって駆動されるニューラル・レージアンス・フィールド(Near Radiance Fields,NeRF)を用いて、スパース・カメラ群から人間の動と形状を再構築することができる。
実験により,ネットワークが最先端の手法よりも詳細保持と一般化能力の面で優れていることを実証した。 It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. | 翻訳日:2023-08-24 15:27:57 公開日:2023-08-23 |
# 拡散モデルを用いた高品質画像デハジング High-quality Image Dehazing with Diffusion Model ( http://arxiv.org/abs/2308.11949v1 ) ライセンス: Link先を確認 | Hu Yu, Jie Huang, Kaiwen Zheng, Man Zhou and Feng Zhao | (参考訳) 濃密な曇りのシナリオでは、画像のデハジングは非常に難しい。
最近登場したDenoising Diffusion Probabilistic Model (DDPM)は、強力な生成能力を示し、この問題を解決する可能性を示している。
大規模実験により,本手法は合成データと実世界のヘイジーデータセットの両方において最先端の性能が得られることを示した。 Image dehazing is quite challenging in dense-haze scenarios, where quite less original information remains in the hazy image. Though previous methods have made marvelous progress, they still suffer from information loss in content and color in dense-haze scenarios. The recently emerged Denoising Diffusion Probabilistic Model (DDPM) exhibits strong generation ability, showing potential for solving this problem. However, DDPM fails to consider the physics property of dehazing task, limiting its information completion capacity. In this work, we propose DehazeDDPM: A DDPM-based and physics-aware image dehazing framework that applies to complex hazy scenarios. Specifically, DehazeDDPM works in two stages. The former stage physically models the dehazing task with the Atmospheric Scattering Model (ASM), pulling the distribution closer to the clear data and endowing DehazeDDPM with fog-aware ability. The latter stage exploits the strong generation ability of DDPM to compensate for the haze-induced huge information loss, by working in conjunction with the physical modelling. Extensive experiments demonstrate that our method attains state-of-the-art performance on both synthetic and real-world hazy datasets. | 翻訳日:2023-08-24 15:27:37 公開日:2023-08-23 |
# 逆雑音による拡散モデルの効率的な伝達学習 Efficient Transfer Learning in Diffusion Models via Adversarial Noise ( http://arxiv.org/abs/2308.11948v1 ) ライセンス: Link先を確認 | Xiyu Wang, Baijiong Lin, Daochang Liu, Chang Xu | (参考訳) diffusion probabilistic models (dpms) は画像生成タスクにおいて有望であるが、大量のトレーニングデータの可用性に大きく依存している。
画像生成タスクの文脈における広範囲な実験により,本手法は効率だけでなく,既存のGAN法やDDPM法と比較して画像品質や多様性も優れていることが示された。 Diffusion Probabilistic Models (DPMs) have demonstrated substantial promise in image generation tasks but heavily rely on the availability of large amounts of training data. Previous works, like GANs, have tackled the limited data problem by transferring pre-trained models learned with sufficient data. However, those methods are hard to be utilized in DPMs since the distinct differences between DPM-based and GAN-based methods, showing in the unique iterative denoising process integral and the need for many timesteps with no-targeted noise in DPMs. In this paper, we propose a novel DPMs-based transfer learning method, TAN, to address the limited data problem. It includes two strategies: similarity-guided training, which boosts transfer with a classifier, and adversarial noise selection which adaptive chooses targeted noise based on the input image. Extensive experiments in the context of few-shot image generation tasks demonstrate that our method is not only efficient but also excels in terms of image quality and diversity when compared to existing GAN-based and DDPM-based methods. | 翻訳日:2023-08-24 15:27:17 公開日:2023-08-23 |
# 多変量時系列予測のためのマルチスケール変圧器ピラミッドネットワーク Multi-scale Transformer Pyramid Networks for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2308.11946v1 ) ライセンス: Link先を確認 | Yifan Zhang, Rui Wu, Sergiu M. Dascalu, Frederick C. Harris Jr | (参考訳) 多変量時系列(MTS)予測では、歴史的記録内の時間的依存関係をモデル化する。
さらに,マルチスケール・トランスフォーマー・ピラミッド・ネットワーク(mtpnet,multi-scale transformer pyramid network)を提案する。
9つのベンチマークデータセットに対する大規模な実験は、提案されたMPPNetが最近の最先端の手法より優れていることを示している。 Multivariate Time Series (MTS) forecasting involves modeling temporal dependencies within historical records. Transformers have demonstrated remarkable performance in MTS forecasting due to their capability to capture long-term dependencies. However, prior work has been confined to modeling temporal dependencies at either a fixed scale or multiple scales that exponentially increase (most with base 2). This limitation hinders their effectiveness in capturing diverse seasonalities, such as hourly and daily patterns. In this paper, we introduce a dimension invariant embedding technique that captures short-term temporal dependencies and projects MTS data into a higher-dimensional space, while preserving the dimensions of time steps and variables in MTS data. Furthermore, we present a novel Multi-scale Transformer Pyramid Network (MTPNet), specifically designed to effectively capture temporal dependencies at multiple unconstrained scales. The predictions are inferred from multi-scale latent representations obtained from transformers at various scales. Extensive experiments on nine benchmark datasets demonstrate that the proposed MTPNet outperforms recent state-of-the-art methods. | 翻訳日:2023-08-24 15:26:59 公開日:2023-08-23 |
# LongDanceDiff:条件付き拡散モデルによる長期ダンス生成 LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model ( http://arxiv.org/abs/2308.11945v1 ) ライセンス: Link先を確認 | Siqi Yang, Zejun Yang, Zhisheng Wang | (参考訳) 音楽で踊ることは感情を表現するのに不可欠な人間の芸術形式である。
また,GTM(Global-Trajectory Modulation)層を通した空間制約と運動知覚損失を取り入れることで,フットスライディングやアンスムースモーションなどのダンス生成における一般的な視覚的品質問題にも対処し,運動生成の滑らかさと自然性を向上させる。
私たちはまもなくコードとモデルをリリースする予定です。 Dancing with music is always an essential human art form to express emotion. Due to the high temporal-spacial complexity, long-term 3D realist dance generation synchronized with music is challenging. Existing methods suffer from the freezing problem when generating long-term dances due to error accumulation and training-inference discrepancy. To address this, we design a conditional diffusion model, LongDanceDiff, for this sequence-to-sequence long-term dance generation, addressing the challenges of temporal coherency and spatial constraint. LongDanceDiff contains a transformer-based diffusion model, where the input is a concatenation of music, past motions, and noised future motions. This partial noising strategy leverages the full-attention mechanism and learns the dependencies among music and past motions. To enhance the diversity of generated dance motions and mitigate the freezing problem, we introduce a mutual information minimization objective that regularizes the dependency between past and future motions. We also address common visual quality issues in dance generation, such as foot sliding and unsmooth motion, by incorporating spatial constraints through a Global-Trajectory Modulation (GTM) layer and motion perceptual losses, thereby improving the smoothness and naturalness of motion generation. Extensive experiments demonstrate a significant improvement in our approach over the existing state-of-the-art methods. We plan to release our codes and models soon. | 翻訳日:2023-08-24 15:26:41 公開日:2023-08-23 |
# RamseyRL: インテリジェントなRamsey数値反例検索フレームワーク RamseyRL: A Framework for Intelligent Ramsey Number Counterexample Searching ( http://arxiv.org/abs/2308.11943v1 ) ライセンス: Link先を確認 | Steve Vott, Adam M. Lehavi | (参考訳) ラムゼー数 (ramsey number) は最小のノード数で、$n = r(s, t)$ であり、すべての無向単純グラフの順序 $n$ は、順序 $s$ または独立な順序 $t$ を含む。
コードとメソッドは、PyPIパッケージとGitHubリポジトリを通じて利用できる。 The Ramsey number is the minimum number of nodes, $n = R(s, t)$, such that all undirected simple graphs of order $n$, contain a clique of order $s$, or an independent set of order $t$. This paper explores the application of a best first search algorithm and reinforcement learning (RL) techniques to find counterexamples to specific Ramsey numbers. We incrementally improve over prior search methods such as random search by introducing a graph vectorization and deep neural network (DNN)-based heuristic, which gauge the likelihood of a graph being a counterexample. The paper also proposes algorithmic optimizations to confine a polynomial search runtime. This paper does not aim to present new counterexamples but rather introduces and evaluates a framework supporting Ramsey counterexample exploration using other heuristics. Code and methods are made available through a PyPI package and GitHub repository. | 翻訳日:2023-08-24 15:26:17 公開日:2023-08-23 |
# 適応運動量サンプラーを用いた昇圧拡散モデル Boosting Diffusion Models with an Adaptive Momentum Sampler ( http://arxiv.org/abs/2308.11941v1 ) ライセンス: Link先を確認 | Xiyu Wang, Anh-Dung Dinh, Daochang Liu, Chang Xu | (参考訳) 拡散確率モデル (DPM) は, 微妙な対向訓練を必要とせず, 高品質な画像を生成することが示されている。
提案手法は, 学習前の拡散モデルに容易に適用可能であり, モーメント機構と適応更新を利用して, 逆サンプリングプロセスの円滑化と安定な生成を図り, 結果として品質が向上する。
ソースコードを利用可能にします。 Diffusion probabilistic models (DPMs) have been shown to generate high-quality images without the need for delicate adversarial training. However, the current sampling process in DPMs is prone to violent shaking. In this paper, we present a novel reverse sampler for DPMs inspired by the widely-used Adam optimizer. Our proposed sampler can be readily applied to a pre-trained diffusion model, utilizing momentum mechanisms and adaptive updating to smooth the reverse sampling process and ensure stable generation, resulting in outputs of enhanced quality. By implicitly reusing update directions from early steps, our proposed sampler achieves a better balance between high-level semantics and low-level details. Additionally, this sampler is flexible and can be easily integrated into pre-trained DPMs regardless of the sampler used during training. Our experimental results on multiple benchmarks demonstrate that our proposed reverse sampler yields remarkable improvements over different baselines. We will make the source code available. | 翻訳日:2023-08-24 15:26:00 公開日:2023-08-23 |
# 複数条件拡散モデルによる音声生成 Audio Generation with Multiple Conditional Diffusion Model ( http://arxiv.org/abs/2308.11940v1 ) ライセンス: Link先を確認 | Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang | (参考訳) テキストベースの音声生成モデルは、音声中のすべての情報を包含できないため制限があり、テキストのみに依存する場合の制御性を制限する。
オーディオサンプルとデータセットはhttps://conditionaudiogen.github.io/conditionaudiogen/で公開されています。 Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text. This approach achieves fine-grained control over the temporal order, pitch, and energy of generated audio. To preserve the diversity of generation, we employ a trainable control condition encoder that is enhanced by a large language model and a trainable Fusion-Net to encode and fuse the additional conditions while keeping the weights of the pre-trained text-to-audio model frozen. Due to the lack of suitable datasets and evaluation metrics, we consolidate existing datasets into a new dataset comprising the audio and corresponding conditions and use a series of evaluation metrics to evaluate the controllability performance. Experimental results demonstrate that our model successfully achieves fine-grained control to accomplish controllable audio generation. Audio samples and our dataset are publicly available at https://conditionaudiogen.github.io/conditionaudiogen/ | 翻訳日:2023-08-24 15:25:45 公開日:2023-08-23 |
# 回転不変コンプリートネットワーク Rotation-Invariant Completion Network ( http://arxiv.org/abs/2308.11979v1 ) ライセンス: Link先を確認 | Yu Chen and Pengcheng Shi | (参考訳) 実世界の点雲は通常不完全さに苦しめられ、異なるポーズを示す。
本稿では,DPCNet (Dual Pipeline Completion Network) と拡張モジュールの2つの部分から構成されるRotation-Invariant Completion Network (RICNet) を提案する。
実験の結果, RICNet は既存手法に比べて性能が優れていることがわかった。 Real-world point clouds usually suffer from incompleteness and display different poses. While current point cloud completion methods excel in reproducing complete point clouds with consistent poses as seen in the training set, their performance tends to be unsatisfactory when handling point clouds with diverse poses. We propose a network named Rotation-Invariant Completion Network (RICNet), which consists of two parts: a Dual Pipeline Completion Network (DPCNet) and an enhancing module. Firstly, DPCNet generates a coarse complete point cloud. The feature extraction module of DPCNet can extract consistent features, no matter if the input point cloud has undergone rotation or translation. Subsequently, the enhancing module refines the fine-grained details of the final generated point cloud. RICNet achieves better rotation invariance in feature extraction and incorporates structural relationships in man-made objects. To assess the performance of RICNet and existing methods on point clouds with various poses, we applied random transformations to the point clouds in the MVP dataset and conducted experiments on them. Our experiments demonstrate that RICNet exhibits superior completion performance compared to existing methods. | 翻訳日:2023-08-24 15:20:38 公開日:2023-08-23 |
# より表現力のあるグラフニューラルネットワークは生成タスクを改善するか? Will More Expressive Graph Neural Networks do Better on Generative Tasks? ( http://arxiv.org/abs/2308.11978v1 ) ライセンス: Link先を確認 | Xiandong Zou, Xiangyu Zhao, Pietro Li\`o, Yiren Zhao | (参考訳) グラフ生成は、与えられたラベルに基づいて、複数のノードとエッジを持つ完全なグラフを予測するため、大きな課題となる。
さらに,提案する分子生成目標 (DRD2, Median1, Median2) に基づいて, 変分オートエンコーダやベイズ最適化モデルなどの非GNNグラフ生成手法を用いて, 高度GNNを用いたGCPNとGraphAFの最先端結果が得られることを示す。 Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks (GCPN and GraphAF), on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN and GraphAF on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design. | 翻訳日:2023-08-24 15:20:22 公開日:2023-08-23 |
# Jaynes-Cummings-Hubbard配列における多体相の研究 Study on many-body phases in Jaynes-Cummings-Hubbard arrays ( http://arxiv.org/abs/2308.11976v1 ) ライセンス: Link先を確認 | Jin-Lou Ma, Bobo Liu, Qing Li, Zexian Guo, Lei Tan, and Lei Ying | (参考訳) 1次元(1次元)多体系の障害は多体局在(MBL)や熱化などの豊富な位相を生じる。
本稿では,jaynes-cummings-hubbard (jch) アレイとして知られる単純なボソニック-スピンハイブリッドモデルに基づき,クリーンシステムにおける現象と原子-光子結合強度の変動との比較を行った。
この研究は、1D JCHモデルにおける豊富な多体相を体系的に明らかにし、障害のない系の熱化特性における相違を明らかにする。 Disorder in one-dimensional (1D) many-body systems emerges abundant phases such as many-body localization (MBL), and thermalization. However, it remains unclear regarding their existence and behavior within hybrid quantum systems. Here, based on a simple bosonic-spin hybrid model, as known as the Jaynes-Cummings-Hubbard (JCH) array, we investigate the effect of disorder comparing to the phenomena in the clean system with the variation of atom-photon coupling strength. By using the level-spacing ratio, entanglement entropy, and the properties of observable diagonal and off-diagonal matrix elements, we find that strong disorder results in the appearance of MBL phase in the JCH model that strongly violate eigenstate thermalization hypothesis (ETH), while a conditional prethermal behavior can exist in weak disorder or weak coupling regime. The conditional prethermal dynamics is based on the choice of initial product states. This work systematically reveals abundant many-body phases in the 1D JCH model and clarifies the discrepancies in the thermalization properties of systems with and without disorder. | 翻訳日:2023-08-24 15:19:52 公開日:2023-08-23 |
# コンフォーマル回帰を用いたスコアベース説明手法の近似 Approximating Score-based Explanation Techniques Using Conformal Regression ( http://arxiv.org/abs/2308.11975v1 ) ライセンス: Link先を確認 | Amr Alkhatib, Henrik Bostr\"om, Sofiane Ennadir, Ulf Johansson | (参考訳) スコアベースの説明可能な機械学習技術は、ブラックボックスモデルの背後にあるロジックを理解するためにしばしば使用される。
さらに,提案手法では,異なる近似法の説明を比較し,予測間隔が情報的(tight)であるかに基づいた方法を選択することができる。 Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals. | 翻訳日:2023-08-24 15:19:30 公開日:2023-08-23 |
# Blending-NeRF:ニューラルラジアンス分野におけるテキスト駆動型局所編集 Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields ( http://arxiv.org/abs/2308.11974v1 ) ライセンス: Link先を確認 | Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim | (参考訳) テキスト駆動による3Dオブジェクトの局所的編集は、元の3Dオブジェクトと意図された新しいオブジェクトとを局所的に混合することが特に困難である。
そこで本研究では,2つのNeRFネットワーク – 事前学習されたNeRFと編集可能なNeRF – で構成される,新しいNeRFベースモデルであるBlending-NeRFを提案する。
Blending-NeRFは様々なテキストプロンプトから自然および局所的に編集された3Dオブジェクトを生成する。 Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts. | 翻訳日:2023-08-24 15:19:12 公開日:2023-08-23 |
# EVE:Masked PredictionとModality-Aware MoEを用いた高能率ビジョンランゲージ事前トレーニング EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE ( http://arxiv.org/abs/2308.11971v1 ) ライセンス: Link先を確認 | Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, Dongyu Zhang | (参考訳) 多様なマルチモーダルデータから学ぶためのスケーラブルなビジョン言語モデルの構築は、まだ未解決の課題である。
本稿では,1つの統合事前学習タスクのみで事前学習された1つの統合マルチモーダルトランスフォーマであるEVE(Efficient Vision-languagE foundation model)を紹介する。
具体的には、EVEは、Modality-aware sparse Mixture-of-Experts (MoE)モジュールと統合された共有トランスフォーマーネットワーク内の視覚と言語の両方をエンコードする。
その単純さにもかかわらず、EVEは視覚的質問応答、視覚的推論、画像テキスト検索など、様々な視覚言語下流タスクで最先端のパフォーマンスを達成する。 Building scalable vision-language models to learn from diverse, multimodal data remains an open challenge. In this paper, we introduce an Efficient Vision-languagE foundation model, namely EVE, which is one unified multimodal Transformer pre-trained solely by one unified pre-training task. Specifically, EVE encodes both vision and language within a shared Transformer network integrated with modality-aware sparse Mixture-of-Experts (MoE) modules, which capture modality-specific information by selectively switching to different experts. To unify pre-training tasks of vision and language, EVE performs masked signal modeling on image-text pairs to reconstruct masked signals, i.e., image pixels and text tokens, given visible signals. This simple yet effective pre-training objective accelerates training by 3.5x compared to the model pre-trained with Image-Text Contrastive and Image-Text Matching losses. Owing to the combination of the unified architecture and pre-training task, EVE is easy to scale up, enabling better downstream performance with fewer resources and faster training speed. Despite its simplicity, EVE achieves state-of-the-art performance on various vision-language downstream tasks, including visual question answering, visual reasoning, and image-text retrieval. | 翻訳日:2023-08-24 15:18:54 公開日:2023-08-23 |
# 不確実性定量化を伴う肝腫瘍セグメンテーションのための異方性ハイブリッドネットワーク Anisotropic Hybrid Networks for liver tumor segmentation with uncertainty quantification ( http://arxiv.org/abs/2308.11969v1 ) ライセンス: Link先を確認 | Benjamin Lambert, Pauline Roca, Florence Forbes, Senan Doyle and Michel Dojat | (参考訳) 肝腫瘍の重荷は重要であり、がん死亡の4番目に多い原因である。
肝細胞癌 (HCC) の場合, 造影MRI(CE-MRI) による肝腫瘍, 肝腫瘍の描出が治療戦略の指針となる。
どちらのソリューションも肝臓と腫瘍の分画に関するmiccai 2023 atlas challengeに提出された。 The burden of liver tumors is important, ranking as the fourth leading cause of cancer mortality. In case of hepatocellular carcinoma (HCC), the delineation of liver and tumor on contrast-enhanced magnetic resonance imaging (CE-MRI) is performed to guide the treatment strategy. As this task is time-consuming, needs high expertise and could be subject to inter-observer variability there is a strong need for automatic tools. However, challenges arise from the lack of available training data, as well as the high variability in terms of image resolution and MRI sequence. In this work we propose to compare two different pipelines based on anisotropic models to obtain the segmentation of the liver and tumors. The first pipeline corresponds to a baseline multi-class model that performs the simultaneous segmentation of the liver and tumor classes. In the second approach, we train two distinct binary models, one segmenting the liver only and the other the tumors. Our results show that both pipelines exhibit different strengths and weaknesses. Moreover we propose an uncertainty quantification strategy allowing the identification of potential false positive tumor lesions. Both solutions were submitted to the MICCAI 2023 Atlas challenge regarding liver and tumor segmentation. | 翻訳日:2023-08-24 15:18:16 公開日:2023-08-23 |
# 反-$\mathcal{pt}$-symmetric imaginary couplingsによる可変アハラノフ-ボームケージ Tunable Aharonov-Bohm cages through anti-$\mathcal{PT}$-symmetric imaginary couplings ( http://arxiv.org/abs/2308.11968v1 ) ライセンス: Link先を確認 | S. M. Zhang, H. S. Xu, L. Jin | (参考訳) aharonov-bohm (ab)ケージは任意の励起に対して非拡散伝播を伴う局所的な閉じ込めを可能にする。
本研究では,一般クロイツラダーにおいて反パリティ時間 (anti-$\mathcal{pt}$) 対称イマジナリーカップリングを導入し,可変なフラットバンドエネルギーを持つ非エルミートabケージを構成する。
本手法は,より一般的な状況に広く適用でき,物理における局所化の操作を容易にする。 The Aharonov-Bohm (AB) cage enables localized confinement with nondiffractive propagation for arbitrary excitation. In this study, we introduce an anti-parity-time (anti-$\mathcal{PT}$) symmetric imaginary coupling in a generalized Creutz ladder to construct a non-Hermitian AB cage with tunable flat-band energy. We investigate compact localized states and complete localization dynamics, and show that non-Hermiticity affects the localization probability distributions and increases the oscillation period of the AB cage dynamics. Non-Hermitian engineering of the decoupled core of the AB cage is the essential point in our proposal. Our approach is widely applicable to a more general situation and can facilitate the manipulation of localization in physics. | 翻訳日:2023-08-24 15:17:41 公開日:2023-08-23 |
# 移動エージェントの援助の価値 Value of Assistance for Mobile Agents ( http://arxiv.org/abs/2308.11961v1 ) ライセンス: Link先を確認 | Adi Amuzig, David Dovrat and Sarah Keren | (参考訳) 移動ロボットエージェントは、時間とエージェントの動きとともに成長する局所的不確実性に苦しむことが多い。
我々は,実世界とシミュレーションロボットの両方で支援を受ける際に,エージェントの平均コスト削減を予測できることを実証的に実証し,有効な条件を規定する。 Mobile robotic agents often suffer from localization uncertainty which grows with time and with the agents' movement. This can hinder their ability to accomplish their task. In some settings, it may be possible to perform assistive actions that reduce uncertainty about a robot's location. For example, in a collaborative multi-robot system, a wheeled robot can request assistance from a drone that can fly to its estimated location and reveal its exact location on the map or accompany it to its intended location. Since assistance may be costly and limited, and may be requested by different members of a team, there is a need for principled ways to support the decision of which assistance to provide to an agent and when, as well as to decide which agent to help within a team. For this purpose, we propose Value of Assistance (VOA) to represent the expected cost reduction that assistance will yield at a given point of execution. We offer ways to compute VOA based on estimations of the robot's future uncertainty, modeled as a Gaussian process. We specify conditions under which our VOA measures are valid and empirically demonstrate the ability of our measures to predict the agent's average cost reduction when receiving assistance in both simulated and real-world robotic settings. | 翻訳日:2023-08-24 15:17:06 公開日:2023-08-23 |
# 再生規則化による塑性維持 Maintaining Plasticity via Regenerative Regularization ( http://arxiv.org/abs/2308.11958v1 ) ライセンス: Link先を確認 | Saurabh Kumar, Henrik Marklund, Benjamin Van Roy | (参考訳) 連続学習において、可塑性とは、エージェントが新しい情報に迅速に適応できる能力を指す。
本稿では,初期パラメータに対する損失関数 l2 正規化を組み込んだ可塑性維持のための非常に簡単な手法 l2 init を提案する。
これは標準 L2 正規化 (L2) と非常によく似ているが、唯一の違いは L2 が原点に向かって正規化することである。
L2 Initは実装が簡単で、単一のハイパーパラメータのみを選択する必要がある。
連続学習における様々な非定常性を表す単純な問題について,l2 initが可塑性損失を一貫して軽減することを示す。
さらに、正規化項はパラメータの規模を小さくし、高い有効特徴ランクを維持する。 In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a very simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On simple problems representative of different types of nonstationarity in continual learning, we demonstrate that L2 Init consistently mitigates plasticity loss. We additionally find that our regularization term reduces parameter magnitudes and maintains a high effective feature rank. | 翻訳日:2023-08-24 15:16:42 公開日:2023-08-23 |
# 連続時間ハイブリッド量子回路モデルにおける浄化ダイナミクス Purification Dynamics in a Continuous-time Hybrid Quantum Circuit Model ( http://arxiv.org/abs/2308.12003v1 ) ライセンス: Link先を確認 | Sebastian Leontica and Max McGinley | (参考訳) 本稿では,無限小ランダムユニタリ演算に基づく多体量子力学の連続時間モデルと射影計測について紹介する。
このモデルでは, システムは混合状態において初期化され, 測定結果から経時的に浄化される, 浄化ダイナミクスを考える。
以上の結果から, 2つの異なる動的相の存在を確認し, システムサイズにおける指数関数対定数の時間スケールで浄化を行う。
この微視的モデルに対する解析式と,これらの測定誘起相転移を捉えた場理論の結果を比較し,両者の定量的一致を求める。 We introduce a continuous time model of many-body quantum dynamics based on infinitesimal random unitary operations, combined with projective measurements. We consider purification dynamics in this model, where the system is initialized in a mixed state, which then purifies over time as a result of the measurements. By mapping our model to a family of effective 1D quantum Hamiltonians, we are able to derive analytic expressions that capture how the entropy of the system decays in time. Our results confirm the existence of two distinct dynamical phases, where purification occurs over a timescale that is exponential vs. constant in system size. We compare our analytic expressions for this microscopic model to results derived from field theories that are expected to capture such measurement-induced phase transitions, and find quantitative agreement between the two. | 翻訳日:2023-08-24 15:08:17 公開日:2023-08-23 |
# 磁気ヒステリシスモデリングのためのニューラルオシレータ Neural oscillators for magnetic hysteresis modeling ( http://arxiv.org/abs/2308.12002v1 ) ライセンス: Link先を確認 | Abhishek Chandra, Taniya Kapoor, Bram Daniels, Mitrofan Curti, Koen Tiels, Daniel M. Tartakovsky, Elena A. Lomonova | (参考訳) ヒステリシスは科学や工学においてユビキタスな現象であり、そのモデリングと識別は様々なシステムの振る舞いを理解し最適化するために重要である。
本研究は、従来のrnn法よりも、従来のレート依存法が固有非線形を捉えるのに不十分な磁性材料における複雑なヒステリシスパターンを捉える際のニューラルオシレータの利点を強調する。 Hysteresis is a ubiquitous phenomenon in science and engineering; its modeling and identification are crucial for understanding and optimizing the behavior of various systems. We develop an ordinary differential equation-based recurrent neural network (RNN) approach to model and quantify the hysteresis, which manifests itself in sequentiality and history-dependence. Our neural oscillator, HystRNN, draws inspiration from coupled-oscillatory RNN and phenomenological hysteresis models to update the hidden states. The performance of HystRNN is evaluated to predict generalized scenarios, involving first-order reversal curves and minor loops. The findings show the ability of HystRNN to generalize its behavior to previously untrained regions, an essential feature that hysteresis models must have. This research highlights the advantage of neural oscillators over the traditional RNN-based methods in capturing complex hysteresis patterns in magnetic materials, where traditional rate-dependent methods are inadequate to capture intrinsic nonlinearity. | 翻訳日:2023-08-24 15:08:03 公開日:2023-08-23 |
# 画像品質評価のための効率的な変圧器適応を考慮した局所歪み認識 Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment ( http://arxiv.org/abs/2308.12001v1 ) ライセンス: Link先を確認 | Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, Haoning Wu, Qiong Yan, Weisi Lin | (参考訳) 画像品質評価(IQA)はコンピュータビジョンの分野における基本的な課題であるが、複雑な歪み条件、多様な画像の内容、データの可用性の制限により未解決の課題である。
抽出器とインジェクタのみを訓練することにより,提案手法は強力な基礎モデルの豊富な知識を生かし,IQAデータセットの最先端性能を達成し,IQAが低レベル問題であるだけでなく,大規模事前学習モデルから引き出されたより強力な高レベル特徴の恩恵を受けることを示す。 Image Quality Assessment (IQA) constitutes a fundamental task within the field of computer vision, yet it remains an unresolved challenge, owing to the intricate distortion conditions, diverse image contents, and limited availability of data. Recently, the community has witnessed the emergence of numerous large-scale pretrained foundation models, which greatly benefit from dramatically increased data and parameter capacities. However, it remains an open problem whether the scaling law in high-level tasks is also applicable to IQA task which is closely related to low-level clues. In this paper, we demonstrate that with proper injection of local distortion features, a larger pretrained and fixed foundation model performs better in IQA tasks. Specifically, for the lack of local distortion structure and inductive bias of vision transformer (ViT), alongside the large-scale pretrained ViT, we use another pretrained convolution neural network (CNN), which is well known for capturing the local structure, to extract multi-scale image features. Further, we propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT. By only training the extractor and injector, our method can benefit from the rich knowledge in the powerful foundation models and achieve state-of-the-art performance on popular IQA datasets, indicating that IQA is not only a low-level problem but also benefits from stronger high-level features drawn from large-scale pretrained models. | 翻訳日:2023-08-24 15:07:44 公開日:2023-08-23 |
# 固定予算付き2要素バンドのベストアーム同定のための一様最適アルゴリズムについて On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget ( http://arxiv.org/abs/2308.12000v1 ) ライセンス: Link先を確認 | Po-An Wang, Kaito Ariu, Alexandre Proutiere | (参考訳) ベルヌーイ報奨を伴う確率的二本腕包帯における固定予算によるベストアーム識別の問題について検討した。
(ii) このアルゴリズムを少なくとも1つのインスタンスで厳密に上回る。
この結果は, cite{qin2022open} で示される2つの開問題に対する解を提供する。 We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}. | 翻訳日:2023-08-24 15:07:16 公開日:2023-08-23 |
# Topical-Chat: 知識を中心としたオープンドメイン会話を目指して Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations ( http://arxiv.org/abs/2308.11995v1 ) ライセンス: Link先を確認 | Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tur | (参考訳) ai(artificial intelligence)の最大の課題は、人間との深いオープンドメイン会話を可能にするソーシャルボットの開発だ。
我々は,オープンドメイン会話型aiのさらなる研究を支援するために,8つの幅広い話題にまたがる知識と会話パートナーが明確に定義された役割を持たない,知識基盤の人間-人間対話データセットである topical-chat を紹介する。
また,トピックチャットにおける最先端エンコーダ・デコーダ会話モデルをトレーニングし,ベンチマークのための自動評価とヒューマン評価を行う。 Building socialbots that can have deep, engaging open-domain conversations with humans is one of the grand challenges of artificial intelligence (AI). To this end, bots need to be able to leverage world knowledge spanning several domains effectively when conversing with humans who have their own world knowledge. Existing knowledge-grounded conversation datasets are primarily stylized with explicit roles for conversation partners. These datasets also do not explore depth or breadth of topical coverage with transitions in conversations. We introduce Topical-Chat, a knowledge-grounded human-human conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don't have explicitly defined roles, to help further research in open-domain conversational AI. We also train several state-of-the-art encoder-decoder conversational models on Topical-Chat and perform automated and human evaluation for benchmarking. | 翻訳日:2023-08-24 15:06:59 公開日:2023-08-23 |
# プログレッシブ特徴マイニングと外部知識支援テキストペデストリアン画像検索 Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval ( http://arxiv.org/abs/2308.11994v1 ) ライセンス: Link先を確認 | Huafeng Li, Shedan Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu | (参考訳) Text-Pedestrian Image Retrievalは、歩行者の外観を記述するテキストを使用して、対応する歩行者画像を取得することを目的としている。
そこで本研究では, プログレッシブ機能マイニングと外部知識支援機能浄化手法を提案する。
3つの挑戦的データセットに対する大規模な実験により提案手法の有効性と優位性を実証し,大規模データセット上での大規模モデルベース手法よりも高い検索性能を示した。 Text-Pedestrian Image Retrieval aims to use the text describing pedestrian appearance to retrieve the corresponding pedestrian image. This task involves not only modality discrepancy, but also the challenge of the textual diversity of pedestrians with the same identity. At present, although existing research progress has been made in text-pedestrian image retrieval, these methods do not comprehensively consider the above-mentioned problems. Considering these, this paper proposes a progressive feature mining and external knowledge-assisted feature purification method. Specifically, we use a progressive mining mode to enable the model to mine discriminative features from neglected information, thereby avoiding the loss of discriminative information and improving the expression ability of features. In addition, to further reduce the negative impact of modal discrepancy and text diversity on cross-modal matching, we propose to use other sample knowledge of the same modality, i.e., external knowledge to enhance identity-consistent features and weaken identity-inconsistent features. This process purifies features and alleviates the interference caused by textual diversity and negative sample correlation features of the same modal. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed method, and the retrieval performance even surpasses that of the large-scale model-based method on large-scale datasets. | 翻訳日:2023-08-24 15:06:43 公開日:2023-08-23 |
# 前立腺癌病理のデジタル双生児としての人工知能の批判的評価 Critical Evaluation of Artificial Intelligence as Digital Twin of Pathologist for Prostate Cancer Pathology ( http://arxiv.org/abs/2308.11992v1 ) ライセンス: Link先を確認 | Okyaz Eminaga, Mahmoud Abbas, Christian Kunder, Yuri Tolkach, Ryan Han, James D. Brooks, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Robert West, Jin Long, Richard Fan, Olaf Bettendorf | (参考訳) 前立腺癌の病理は臨床管理において重要な役割を担っているが、時間を要する。
ai(artificial intelligence)は、前立腺がんの検出とグレーディングパターンに有望である。
特に, 管, 神経, 血管, リンパ細胞浸潤などの相補的組織学的特徴の同定には, ある程度の一致が認められた。
しかし, 前立腺摘出標本 (kappa = 0.44) では, 生検コア (kappa = 0.70) と比較して腫瘍診断の一致度が低下した。
二次グリアソンパターンの決定閾値を5%から10%に調整すると、前立腺摘出標本(kappa: 0.44から 0.64)の病理組織とvpathoの一致レベルが改善した。
特にvPathoとの成績不一致は, 定期的な臨床成績にかかわる6人の病理医に特有ではなかった。
このアプローチは、AI導入の限界と、前立腺癌病理の現在の段階付けシステムを明らかにするのに役立つ。 Prostate cancer pathology plays a crucial role in clinical management but is time-consuming. Artificial intelligence (AI) shows promise in detecting prostate cancer and grading patterns. We tested an AI-based digital twin of a pathologist, vPatho, on 2,603 histology images of prostate tissue stained with hematoxylin and eosin. We analyzed various factors influencing tumor-grade disagreement between vPatho and six human pathologists. Our results demonstrated that vPatho achieved comparable performance in prostate cancer detection and tumor volume estimation, as reported in the literature. Concordance levels between vPatho and human pathologists were examined. Notably, moderate to substantial agreement was observed in identifying complementary histological features such as ductal, cribriform, nerve, blood vessels, and lymph cell infiltrations. However, concordance in tumor grading showed a decline when applied to prostatectomy specimens (kappa = 0.44) compared to biopsy cores (kappa = 0.70). Adjusting the decision threshold for the secondary Gleason pattern from 5% to 10% improved the concordance level between pathologists and vPatho for tumor grading on prostatectomy specimens (kappa from 0.44 to 0.64). Potential causes of grade discordance included the vertical extent of tumors toward the prostate boundary and the proportions of slides with prostate cancer. Gleason pattern 4 was particularly associated with discordance. Notably, grade discordance with vPatho was not specific to any of the six pathologists involved in routine clinical grading. In conclusion, our study highlights the potential utility of AI in developing a digital twin of a pathologist. This approach can help uncover limitations in AI adoption and the current grading system for prostate cancer pathology. | 翻訳日:2023-08-24 15:06:20 公開日:2023-08-23 |
# 関係概念に基づくモデル Relational Concept Based Models ( http://arxiv.org/abs/2308.11991v1 ) ライセンス: Link先を確認 | Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, Giuseppe Marra | (参考訳) 概念ベースモデル(CBM)のような解釈可能なディープラーニング手法は、関係的な問題を解決するために設計されていないが、リレーショナルモデルはCBMほど解釈できない。
二 量化概念に基づく説明の生成を支援すること。
(四)配当外のシナリオ、限られた訓練データ体制、コンセプト監督の不足等、必要な設定に耐える。 The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions. | 翻訳日:2023-08-24 15:05:51 公開日:2023-08-23 |
# RankMixup: ネットワーク校正のためのランキングベースの混合トレーニング RankMixup: Ranking-Based Mixup Training for Network Calibration ( http://arxiv.org/abs/2308.11990v1 ) ライセンス: Link先を確認 | Jongyoun Noh, Hyekang Park, Junghyup Lee and Bumsub Ham | (参考訳) ネットワークキャリブレーションは信頼性のレベルを正確に推定することを目的としており、これは現実世界のシステムでディープニューラルネットワークを使用する上で特に重要である。
本論では, ネットワークは, 改良されたサンプルよりも高い信頼度を推定すべきであると仮定する(第1図)。
この考え方を実現するため,混合型ランキング損失 (MRL) を導入し, 改良標本に対する信頼度を生品と比較して低くし, ランキング関係を維持した。
ネットワークキャリブレーションの標準ベンチマークに関する広範囲な実験結果がrankmixupの有効性を示している。 Network calibration aims to accurately estimate the level of confidences, which is particularly important for employing deep neural networks in real-world systems. Recent approaches leverage mixup to calibrate the network's predictions during training. However, they do not consider the problem that mixtures of labels in mixup may not accurately represent the actual distribution of augmented samples. In this paper, we present RankMixup, a novel mixup-based framework alleviating the problem of the mixture of labels for network calibration. To this end, we propose to use an ordinal ranking relationship between raw and mixup-augmented samples as an alternative supervisory signal to the label mixtures for network calibration. We hypothesize that the network should estimate a higher level of confidence for the raw samples than the augmented ones (Fig.1). To implement this idea, we introduce a mixup-based ranking loss (MRL) that encourages lower confidences for augmented samples compared to raw ones, maintaining the ranking relationship. We also propose to leverage the ranking relationship among multiple mixup-augmented samples to further improve the calibration capability. Augmented samples with larger mixing coefficients are expected to have higher confidences and vice versa (Fig.1). That is, the order of confidences should be aligned with that of mixing coefficients. To this end, we introduce a novel loss, M-NDCG, in order to reduce the number of misaligned pairs of the coefficients and confidences. Extensive experimental results on standard benchmarks for network calibration demonstrate the effectiveness of RankMixup. | 翻訳日:2023-08-24 15:05:38 公開日:2023-08-23 |
# マルチモーダルマルチタスク(3mt)道路セグメンテーション Multi-Modal Multi-Task (3MT) Road Segmentation ( http://arxiv.org/abs/2308.11983v1 ) ライセンス: Link先を確認 | Erkan Milli, \"Ozg\"ur Erkent, As{\i}m Egemen Y{\i}lmaz | (参考訳) マルチモーダルシステムは、シーンの異なる側面を認識することにより、道路検出の単一モードを持つシステムよりも信頼性の高い結果を生み出す能力を有する。
また,imu/gnss (inertial measurement unit/global navigation satellite system) の慣性航法システムを用いて,lidarカメラを用いてデータ収集と校正を行い,密集したlidar深度画像の計算を行った。
また, 生のLiDARデータが利用できない都市景観において, 提案手法の性能を示す。
すべての実験で得られた推論時間は、リアルタイム実験に非常に有望である。 Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both the pre-processing andmodel computation costs. This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensorswithin a multi-task learning architecture.Afusion architecture is proposed in which RGB and LiDAR depth images constitute the inputs of the network. Another contribution of this study is to use IMU/GNSS (inertial measurement unit/global navigation satellite system) inertial navigation system whose data is collected synchronously and calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images. It has been demonstrated by experiments on the KITTI dataset that the proposed method offers fast and high-performance solutions. We have also shown the performance of our method on Cityscapes where raw LiDAR data is not available. The segmentation results obtained for both full and half resolution images are competitive with existing methods. Therefore, we conclude that our method is not dependent only on raw LiDAR data; rather, it can be used with different sensor modalities. The inference times obtained in all experiments are very promising for real-time experiments. | 翻訳日:2023-08-24 15:05:15 公開日:2023-08-23 |
# リストコンテキスト情報を用いた粗大なニューラルリトライバを用いたリグレードパス Reranking Passages with Coarse-to-Fine Neural Retriever using List-Context Information ( http://arxiv.org/abs/2308.12022v1 ) ライセンス: Link先を確認 | Hongyin Zhu | (参考訳) 多くのアプリケーション、特に大規模ドキュメントを扱う場合、パスリランクは重要なタスクである。
本稿では,他の候補からのlist-context情報を取り込むことにより,パッセージ表現を増強するlist-context attentionメカニズムを提案する。
提案手法の有効性を示す実験を行った。 Passage reranking is a crucial task in many applications, particularly when dealing with large-scale documents. Traditional neural architectures are limited in retrieving the best passage for a question because they usually match the question to each passage separately, seldom considering contextual information in other passages that can provide comparison and reference information. This paper presents a list-context attention mechanism to augment the passage representation by incorporating the list-context information from other candidates. The proposed coarse-to-fine (C2F) neural retriever addresses the out-of-memory limitation of the passage attention mechanism by dividing the list-context modeling process into two sub-processes, allowing for efficient encoding of context information from a large number of candidate answers. This method can be generally used to encode context information from any number of candidate answers in one pass. Different from most multi-stage information retrieval architectures, this model integrates the coarse and fine rankers into the joint optimization process, allowing for feedback between the two layers to update the model simultaneously. Experiments demonstrate the effectiveness of the proposed approach. | 翻訳日:2023-08-24 14:59:22 公開日:2023-08-23 |
# スケーラブルなハイゼンベルク模型の量子基底状態合成--変分量子固有解法 Scalable Quantum Ground State Preparation of the Heisenberg Model: A Variational Quantum Eigensolver Approach ( http://arxiv.org/abs/2308.12020v1 ) ライセンス: Link先を確認 | Jinao Wang, Rimika Jaiswal | (参考訳) 量子システムは歴史的に、特にシステムのサイズが大きくなるにつれて、古典的な計算手法を用いてシミュレートすることが難しい。
変分量子固有解法 (VQE) アルゴリズムは、変分パラメータを反復的に最適化することでハイゼンベルク基底状態を効率的に作成できる古典最適化器と同様に量子回路からなるシステムである。
ハイゼンベルク模型の基底状態の準備において、本論文はより効率的な量子アルゴリズムへの道を開き、凝縮物質物理学の幅広い分野に寄与する。 Quantum systems have historically been formidable to simulate using classical computational methods, particularly as the system size grows. The Heisenberg Model, pivotal in understanding magnetic materials, is a quintessential example where classical simulations face scalability issues. The Variational Quantum Eigensolver (VQE) algorithm is a system composed of a quantum circuit as well as a classical optimizer that can efficiently prepare the Heisenberg ground state by iteratively optimizing the variational parameters. We assess the efficacy and scalability of VQE by preparing the ground states of isotropic and anisotropic Heisenberg models. This paper also aims to provide insights into the precision and time consumption involved in classical and optimized sampling approaches in the calculation of expectation values. In preparing the ground state for the Heisenberg models, this paper paves the way for more efficient quantum algorithms and contributes to the broader field of condensed matter physics. | 翻訳日:2023-08-24 14:59:00 公開日:2023-08-23 |
# バイアス認識最小化:プライベートSGDにおける推定バイアスの理解と緩和 Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD ( http://arxiv.org/abs/2308.12018v1 ) ライセンス: Link先を確認 | Moritz Knolle, Robert Dorfman, Alexander Ziller, Daniel Rueckert and Georgios Kaissis | (参考訳) 微分プライベートSGD(DP-SGD)は、機密データセットに対する機械学習の安全かつ責任ある適用を可能にするという約束を掲げている。
しかし、DP-SGD は最小バッチ勾配のバイアス付き雑音推定のみを提供する。
最後に、BAMがバイアスを減らすだけでなく、CIFAR-10、CIFAR-100、ImageNet-32データセットのプライバシー利用トレードオフを大幅に改善するという実証的な証拠を提供する。 Differentially private SGD (DP-SGD) holds the promise of enabling the safe and responsible application of machine learning to sensitive datasets. However, DP-SGD only provides a biased, noisy estimate of a mini-batch gradient. This renders optimisation steps less effective and limits model utility as a result. With this work, we show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. Here, we propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias. We show how to efficiently compute quantities needed for BAM to scale to large neural networks and highlight similarities to closely related methods such as Sharpness-Aware Minimisation. Finally, we provide empirical evidence that BAM not only reduces bias but also substantially improves privacy-utility trade-offs on the CIFAR-10, CIFAR-100, and ImageNet-32 datasets. | 翻訳日:2023-08-24 14:58:45 公開日:2023-08-23 |
# ノイズバウンディングボックスを用いた物体検出のための分布認識キャリブレーション Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes ( http://arxiv.org/abs/2308.12017v1 ) ライセンス: Link先を確認 | Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen | (参考訳) 大規模な注釈付きデータセットは、効果的なオブジェクト検出器のトレーニングにおいて非常に重要である。
本研究は, 実際の地盤が通常, 騒音の多い地盤に割り当てられた提案の集約領域にあるという観測に動機づけられ, 監視信号の校正のための提案の空間分布をモデル化する分散認識校正(disco)を提案する。
大規模ノイズ画像データセット(Pascal VOCとMS-COCO)の大規模な実験により、特に高雑音レベルにおいてdisCOが最先端検出性能を達成できることが示されている。 Large-scale well-annotated datasets are of great importance for training an effective object detector. However, obtaining accurate bounding box annotations is laborious and demanding. Unfortunately, the resultant noisy bounding boxes could cause corrupt supervision signals and thus diminish detection performance. Motivated by the observation that the real ground-truth is usually situated in the aggregation region of the proposals assigned to a noisy ground-truth, we propose DIStribution-aware CalibratiOn (DISCO) to model the spatial distribution of proposals for calibrating supervision signals. In DISCO, spatial distribution modeling is performed to statistically extract the potential locations of objects. Based on the modeled distribution, three distribution-aware techniques, i.e., distribution-aware proposal augmentation (DA-Aug), distribution-aware box refinement (DA-Ref), and distribution-aware confidence estimation (DA-Est), are developed to improve classification, localization, and interpretability, respectively. Extensive experiments on large-scale noisy image datasets (i.e., Pascal VOC and MS-COCO) demonstrate that DISCO can achieve state-of-the-art detection performance, especially at high noise levels. | 翻訳日:2023-08-24 14:58:25 公開日:2023-08-23 |
# MKL-$L_{0/1}$-SVM MKL-$L_{0/1}$-SVM ( http://arxiv.org/abs/2308.12016v1 ) ライセンス: Link先を確認 | Bin Zhu and Yijie Shi | (参考訳) 本稿では,$(0, 1)$損失関数を持つサポートベクターマシン(svm)のためのマルチカーネル学習(mkl)フレームワークを提案する。
合成および実データ集合に関する広範な数値実験により、我々のmkl-$l_{0/1}$-svmの性能は、rakotomamonjy、bach、canu、grandvaletによって開発されたsimplemklと呼ばれる主要なアプローチの1つに匹敵することが示された。 This paper presents a Multiple Kernel Learning (abbreviated as MKL) framework for the Support Vector Machine (SVM) with the $(0, 1)$ loss function. Some first-order optimality conditions are given and then exploited to develop a fast ADMM solver to deal with the nonconvex and nonsmooth optimization problem. Extensive numerical experiments on synthetic and real datasets show that the performance of our MKL-$L_{0/1}$-SVM is comparable with the one of the leading approaches called SimpleMKL developed by Rakotomamonjy, Bach, Canu, and Grandvalet [Journal of Machine Learning Research, vol. 9, pp. 2491-2521, 2008]. | 翻訳日:2023-08-24 14:58:05 公開日:2023-08-23 |
# 指示から本質的人間価値へ ---大規模モデルのためのアライメント目標の調査- From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models ( http://arxiv.org/abs/2308.12014v1 ) ライセンス: Link先を確認 | Jing Yao, Xiaoyuan Yi, Xiting Wang, Jindong Wang and Xing Xie | (参考訳) 大きなモデルは、大きな言語モデル(llm)によって例示され、通常、巨大なデータに基づいて事前訓練されたモデルであり、巨大なパラメータで構成されます。
それでも 'What toaligned' は十分に議論されておらず、不適切なアライメントの目標がバックファイアすることさえある。
これらの結果を踏まえて,本質的価値アライメントを実現するための課題をさらに議論し,大規模モデルのアライメントに関する今後の研究のために利用可能なリソースのコレクションを提供する。 Big models, exemplified by Large Language Models (LLMs), are models typically pre-trained on massive data and comprised of enormous parameters, which not only obtain significantly improved performance across diverse tasks but also present emergent capabilities absent in smaller models. However, the growing intertwining of big models with everyday human lives poses potential risks and might cause serious social harm. Therefore, many efforts have been made to align LLMs with humans to make them better follow user instructions and satisfy human preferences. Nevertheless, `what to align with' has not been fully discussed, and inappropriate alignment goals might even backfire. In this paper, we conduct a comprehensive survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Particularly, we investigate related works from two perspectives: the definition of alignment goals and alignment evaluation. Our analysis encompasses three distinct levels of alignment goals and reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs. Based on such results, we further discuss the challenges of achieving such intrinsic value alignment and provide a collection of available resources for future research on the alignment of big models. | 翻訳日:2023-08-24 14:57:48 公開日:2023-08-23 |
# 量子ノイズ駆動生成拡散モデル Quantum-Noise-driven Generative Diffusion Models ( http://arxiv.org/abs/2308.12013v1 ) ライセンス: Link先を確認 | Marco Parigi, Stefano Martina, Filippo Caruso | (参考訳) 機械学習技術で実現された生成モデルは、新しい合成データを生成するために、有限個のトレーニングサンプルから複雑な未知のデータ分布を推測する強力なツールである。
そこで,本研究では,気候予測から神経科学,交通の流れ解析から金融予測まで,現実世界に広く応用されるデータ生成/予測として,より強力な古典的タスクに対処する,量子インスパイアや量子ベースの生成拡散アルゴリズムへの道を開くことが期待されている。 Generative models realized with machine learning techniques are powerful tools to infer complex and unknown data distributions from a finite number of training samples in order to produce new synthetic data. Diffusion models are an emerging framework that have recently overcome the performance of the generative adversarial networks in creating synthetic text and high-quality images. Here, we propose and discuss the quantum generalization of diffusion models, i.e., three quantum-noise-driven generative diffusion models that could be experimentally tested on real quantum systems. The idea is to harness unique quantum features, in particular the non-trivial interplay among coherence, entanglement and noise that the currently available noisy quantum processors do unavoidably suffer from, in order to overcome the main computational burdens of classical diffusion models during inference. Hence, we suggest to exploit quantum noise not as an issue to be detected and solved but instead as a very remarkably beneficial key ingredient to generate much more complex probability distributions that would be difficult or even impossible to express classically, and from which a quantum processor might sample more efficiently than a classical one. Therefore, our results are expected to pave the way for new quantum-inspired or quantum-based generative diffusion algorithms addressing more powerfully classical tasks as data generation/prediction with widespread real-world applications ranging from climate forecasting to neuroscience, from traffic flow analysis to financial forecasting. | 翻訳日:2023-08-24 14:57:24 公開日:2023-08-23 |
# stofnet: フライトネットワークの超解像時間 StofNet: Super-resolution Time of Flight Network ( http://arxiv.org/abs/2308.12009v1 ) ライセンス: Link先を確認 | Christopher Hahne, Michel Hayoz, Raphael Sznitman | (参考訳) Time of Flight (ToF) は、ロボット工学、医用画像、非破壊検査の分野で広く使われている深度検知技術である。
私たちのコードはhttps://github.com/hahnec/stofnetで利用可能です。 Time of Flight (ToF) is a prevalent depth sensing technology in the fields of robotics, medical imaging, and non-destructive testing. Yet, ToF sensing faces challenges from complex ambient conditions making an inverse modelling from the sparse temporal information intractable. This paper highlights the potential of modern super-resolution techniques to learn varying surroundings for a reliable and accurate ToF detection. Unlike existing models, we tailor an architecture for sub-sample precise semi-global signal localization by combining super-resolution with an efficient residual contraction block to balance between fine signal details and large scale contextual information. We consolidate research on ToF by conducting a benchmark comparison against six state-of-the-art methods for which we employ two publicly available datasets. This includes the release of our SToF-Chirp dataset captured by an airborne ultrasound transducer. Results showcase the superior performance of our proposed StofNet in terms of precision, reliability and model complexity. Our code is available at https://github.com/hahnec/stofnet. | 翻訳日:2023-08-24 14:56:59 公開日:2023-08-23 |
# graecia capta ferum victorem cepitの略。
古代ギリシア文学へのラテン語の言及 Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient Greek Literature ( http://arxiv.org/abs/2308.12008v1 ) ライセンス: Link先を確認 | Frederick Riemenschneider and Anette Frank | (参考訳) 間文的引用は古典哲学において重要な役割を担っており、ラテン語の著者はしばしば古代ギリシアの文献を参照している。
私たちのモデルとリソースはhttps://github.com/heidelberg-nlp/ancient-language-modelsで利用可能です。 Intertextual allusions hold a pivotal role in Classical Philology, with Latin authors frequently referencing Ancient Greek texts. Until now, the automatic identification of these intertextual references has been constrained to monolingual approaches, seeking parallels solely within Latin or Greek texts. In this study, we introduce SPhilBERTa, a trilingual Sentence-RoBERTa model tailored for Classical Philology, which excels at cross-lingual semantic comprehension and identification of identical sentences across Ancient Greek, Latin, and English. We generate new training data by automatically translating English texts into Ancient Greek. Further, we present a case study, demonstrating SPhilBERTa's capability to facilitate automated detection of intertextual parallels. Our models and resources are available at https://github.com/Heidelberg-NLP/ancient-language-models. | 翻訳日:2023-08-24 14:56:41 公開日:2023-08-23 |
# RGB-D動作とジェスチャー認識のための多段階分解時空間表現 Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition ( http://arxiv.org/abs/2308.12006v1 ) ライセンス: Link先を確認 | Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang | (参考訳) rgb-dアクションとジェスチャー認識は、主に複数の粒度と人間の動きのばらつきのために、人間中心のシーン理解において興味深いトピックである。
以上の課題に対処するため、RGB-Dアクションとジェスチャー認識のためのMFST(Multi-stage Factorized Spatio-Temporal)と呼ばれる革新的なヒューリスティックアーキテクチャを提案する。
これらの革新的なデザインのシームレスな統合は、rgb-dアクションとジェスチャー認識データセットの最先端のアプローチを上回る堅牢な時空間表現をもたらす。 RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion. Although many RGB-D based action and gesture recognition approaches have demonstrated remarkable results by utilizing highly integrated spatio-temporal representations across multiple modalities (i.e., RGB and depth data), they still encounter several challenges. Firstly, vanilla 3D convolution makes it hard to capture fine-grained motion differences between local clips under different modalities. Secondly, the intricate nature of highly integrated spatio-temporal modeling can lead to optimization difficulties. Thirdly, duplicate and unnecessary information can add complexity and complicate entangled spatio-temporal modeling. To address the above issues, we propose an innovative heuristic architecture called Multi-stage Factorized Spatio-Temporal (MFST) for RGB-D action and gesture recognition. The proposed MFST model comprises a 3D Central Difference Convolution Stem (CDC-Stem) module and multiple factorized spatio-temporal stages. The CDC-Stem enriches fine-grained temporal perception, and the multiple hierarchical spatio-temporal stages construct dimension-independent higher-order semantic primitives. Specifically, the CDC-Stem module captures bottom-level spatio-temporal features and passes them successively to the following spatio-temporal factored stages to capture the hierarchical spatial and temporal features through the Multi- Scale Convolution and Transformer (MSC-Trans) hybrid block and Weight-shared Multi-Scale Transformer (WMS-Trans) block. The seamless integration of these innovative designs results in a robust spatio-temporal representation that outperforms state-of-the-art approaches on RGB-D action and gesture recognition datasets. | 翻訳日:2023-08-24 14:56:28 公開日:2023-08-23 |
# 大規模多言語モデルによる言語間のゼロショットマルチモーダル学習 Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages ( http://arxiv.org/abs/2308.12038v1 ) ライセンス: Link先を確認 | Jinyi Hu, Yuan Yao, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun | (参考訳) 近年,画像・テキスト・テキスト・画像生成の両面で,マルチモーダル学習が著しく増加している。
将来の研究を促進するため、私たちはhttps://github.com/OpenBMB/VisCPM.gitでコードとモデルの重みをオープンソース化しました。 Recently there has been a significant surge in multimodal learning in terms of both image-to-text and text-to-image generation. However, the success is typically limited to English, leaving other languages largely behind. Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i.e., lack of large-scale, high-quality image-text data). In this work, we propose MPM, an effective training paradigm for training large multimodal models in low-resource languages. MPM demonstrates that Multilingual language models can Pivot zero-shot Multimodal learning across languages. Specifically, based on a strong multilingual large language model, multimodal models pretrained on English-only image-text data can well generalize to other languages in a zero-shot manner for both image-to-text and text-to-image generation, even surpassing models trained on image-text data in native languages. Taking Chinese as a practice of MPM, we build large multimodal models VisCPM in image-to-text and text-to-image generation, which achieve state-of-the-art (open-source) performance in Chinese. To facilitate future research, we open-source codes and model weights at https://github.com/OpenBMB/VisCPM.git. | 翻訳日:2023-08-24 14:48:20 公開日:2023-08-23 |
# RefEgo:Ego4Dの自己認識から得られる表現理解データを参照 RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D ( http://arxiv.org/abs/2308.12035v1 ) ライセンス: Link先を確認 | Shuhei Kurita, Naoki Katsura, Eri Onami | (参考訳) 一対一の視点からシーンオブジェクトのテキスト表現を接地することは、周囲を認識し、直感的なテキスト指示に従って振る舞うエージェントの開発において本当に要求される能力である。
実験では、最先端の2D参照表現理解モデルとオブジェクト追跡アルゴリズムを併用し、困難な状況下でもビデオワイド参照オブジェクト追跡を実現する:ビデオの途中で参照オブジェクトがフレーム外になる、あるいはビデオに複数の類似オブジェクトが提示される。 Grounding textual expressions on scene objects from first-person views is a truly demanding capability in developing agents that are aware of their surroundings and behave following intuitive text instructions. Such capability is of necessity for glass-devices or autonomous robots to localize referred objects in the real-world. In the conventional referring expression comprehension tasks of images, however, datasets are mostly constructed based on the web-crawled data and don't reflect diverse real-world structures on the task of grounding textual expressions in diverse objects in the real world. Recently, a massive-scale egocentric video dataset of Ego4D was proposed. Ego4D covers around the world diverse real-world scenes including numerous indoor and outdoor situations such as shopping, cooking, walking, talking, manufacturing, etc. Based on egocentric videos of Ego4D, we constructed a broad coverage of the video-based referring expression comprehension dataset: RefEgo. Our dataset includes more than 12k video clips and 41 hours for video-based referring expression comprehension annotation. In experiments, we combine the state-of-the-art 2D referring expression comprehension models with the object tracking algorithm, achieving the video-wise referred object tracking even in difficult conditions: the referred object becomes out-of-frame in the middle of the video or multiple similar objects are presented in the video. | 翻訳日:2023-08-24 14:47:56 公開日:2023-08-23 |
# フィードバック-リフレクション-refineによるプロンプトアンサンブル学習 PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine ( http://arxiv.org/abs/2308.12033v1 ) ライセンス: Link先を確認 | Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai | (参考訳) 大規模言語モデル(llm)のパワーを引き出す効果的なツールとして、プロンプトは、最近様々な複雑なタスクで前例のない能力を実証した。
本稿では,提案する制限に対処するために,preceer(pompt ensemble learning via feedback-reflect-refine)という,単純で普遍的で自動的な手法を提案する。
さらに, 即効性評価の安定性を高めるために, 多数決よりも優れ, ブースティングにおけるフィードバックと重み計算の両面で有益である, 前方・後方思考を含む新しいプロンプトバッグ手法を提案する。
私たちはコードを公開しました。 As an effective tool for eliciting the power of Large Language Models (LLMs), prompting has recently demonstrated unprecedented abilities across a variety of complex tasks. To further improve the performance, prompt ensemble has attracted substantial interest for tackling the hallucination and instability of LLMs. However, existing methods usually adopt a two-stage paradigm, which requires a pre-prepared set of prompts with substantial manual effort, and is unable to perform directed optimization for different weak learners. In this paper, we propose a simple, universal, and automatic method named PREFER (Pompt Ensemble learning via Feedback-Reflect-Refine) to address the stated limitations. Specifically, given the fact that weak learners are supposed to focus on hard examples during boosting, PREFER builds a feedback mechanism for reflecting on the inadequacies of existing weak learners. Based on this, the LLM is required to automatically synthesize new prompts for iterative refinement. Moreover, to enhance stability of the prompt effect evaluation, we propose a novel prompt bagging method involving forward and backward thinking, which is superior to majority voting and is beneficial for both feedback and weight calculation in boosting. Extensive experiments demonstrate that our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin. We have made our code publicly available. | 翻訳日:2023-08-24 14:47:35 公開日:2023-08-23 |
# 量から品質へ:インストラクションチューニングのための自己ガイドデータ選択によるLCM性能向上 From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning ( http://arxiv.org/abs/2308.12032v1 ) ライセンス: Link先を確認 | Ming Li, Yong Zhang, Zhitao Li, Jiuhai Chen, Lichang Chen, Ning Cheng, Jianzong Wang, Tianyi Zhou, Jing Xiao | (参考訳) 大規模言語モデルの領域では、命令データの品質と量とのバランスが焦点となっている。
私たちの重要なイノベーションであるIFD(Instruction-Following Difficulty)メトリックは、モデルが期待する応答と自動生成技術との間の相違を識別するための重要なツールとして現れます。
この自己誘導チェリーピッキングとIFDメトリックの合成は、LLMの最適化における革新的な飛躍を意味し、効率性と資源意識の進歩を約束する。 In the realm of Large Language Models, the balance between instruction data quality and quantity has become a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from vast open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to identify discrepancies between a model's expected responses and its autonomous generation prowess. Through the adept application of IFD, cherry samples are pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on renowned datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of conventional data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the optimization of LLMs, promising both efficiency and resource-conscious advancements. | 翻訳日:2023-08-24 14:47:11 公開日:2023-08-23 |
# CACTUS: 構造を明らかにするための包括的な抽象化と分類ツール CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures ( http://arxiv.org/abs/2308.12031v1 ) ライセンス: Link先を確認 | Luca Gherardini, Varun Ravi Varma, Karol Capala, Roger Woods, Jose Sousa | (参考訳) 大規模データセットの可用性は、現在の人工知能開発を駆動するための衝動を提供する。
その性能はウィスコンシン州診断乳がんとThyroid0387データセットに適用することで評価される。 The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets. | 翻訳日:2023-08-24 14:46:51 公開日:2023-08-23 |
# 強化学習によるプロンプトベース長制御生成 Prompt-Based Length Controlled Generation with Reinforcement Learning ( http://arxiv.org/abs/2308.12030v1 ) ライセンス: Link先を確認 | Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu | (参考訳) 近年,ChatGPT や GPT-4 のような大規模言語モデル (LLM) が注目されている。
我々は,LLMの時代に向けて,この長さ制御能力がより多くの可能性をもたらすと信じている。 Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance. Length controlled generation of LLMs emerges as an important topic, which also enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can arbitrarily reduce the inference cost by limiting the length, and thus satisfy different needs. Therefore, we aim to propose a prompt-based length control method to achieve this length controlled generation, which can also be widely applied in GPT-style LLMs. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward model, which further affects the generation of LLMs via rewarding a pre-defined target length. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. We believe this length-controllable ability can provide more potentials towards the era of LLMs. | 翻訳日:2023-08-24 14:46:40 公開日:2023-08-23 |
# マルチタスク学習のためのスケール不変タスクバランシングアプローチ A Scale-Invariant Task Balancing Approach for Multi-Task Learning ( http://arxiv.org/abs/2308.12029v1 ) ライセンス: Link先を確認 | Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu | (参考訳) 複数のタスクを同時に学習する学習パラダイムであるマルチタスク学習(MTL)は,様々な分野で大きな成功を収めている。
本稿では,損失と勾配の両面からタスクバランス問題を緩和するために,SI-MTL(Scale-Invariant Multi-Task Learning)法を提案する。
いくつかのベンチマークデータセットで実施された大規模な実験は、SI-Gの有効性とSI-MTLの最先端性能を一貫して実証している。 Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task-balancing remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Scale-Invariant Multi-Task Learning (SI-MTL) method to alleviate the task-balancing problem from both loss and gradient perspectives. Specifically, SI-MTL contains a logarithm transformation which is performed on all task losses to ensure scale-invariant at the loss level, and a gradient balancing method, SI-G, which normalizes all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the effectiveness of SI-G and the state-of-the-art performance of SI-MTL. | 翻訳日:2023-08-24 14:46:23 公開日:2023-08-23 |
# LKPNR:パーソナライズされたニュースレコメンデーションフレームワークのためのLLMとKG LKPNR: LLM and KG for Personalized News Recommendation Framework ( http://arxiv.org/abs/2308.12028v1 ) ライセンス: Link先を確認 | Chen hao, Xie Runfeng, Cui Xiangyang, Yan Zhou, Wang Xin, Xuan Zhanwei, Zhang Kai | (参考訳) 候補者のニュース記事を正確にユーザに推薦することは、パーソナライズされたニュースレコメンデーションシステムが直面する基本的な課題である。
これらの問題に対処するため,本研究では,大規模言語モデル (llm) と知識グラフ (kg) を,従来の手法の意味表現と組み合わせた,新しい汎用フレームワークを提案する。
私たちのコードはhttps://github.com/xuan-zw/lkpnrで利用可能です。 Accurately recommending candidate news articles to users is a basic challenge faced by personalized news recommendation systems. Traditional methods are usually difficult to grasp the complex semantic information in news texts, resulting in unsatisfactory recommendation results. Besides, these traditional methods are more friendly to active users with rich historical behaviors. However, they can not effectively solve the "long tail problem" of inactive users. To address these issues, this research presents a novel general framework that combines Large Language Models (LLM) and Knowledge Graphs (KG) into semantic representations of traditional methods. In order to improve semantic understanding in complex news texts, we use LLMs' powerful text understanding ability to generate news representations containing rich semantic information. In addition, our method combines the information about news entities and mines high-order structural information through multiple hops in KG, thus alleviating the challenge of long tail distribution. Experimental results demonstrate that compared with various traditional models, the framework significantly improves the recommendation effect. The successful integration of LLM and KG in our framework has established a feasible path for achieving more accurate personalized recommendations in the news field. Our code is available at https://github.com/Xuan-ZW/LKPNR. | 翻訳日:2023-08-24 14:46:08 公開日:2023-08-23 |
# 位相絶縁体に基づくパリティ保護超伝導量子ビット Parity-protected superconducting qubit based on topological insulators ( http://arxiv.org/abs/2308.12027v1 ) ライセンス: Link先を確認 | Guo-Liang Guo, Han-Bing Leng and Xin Liu | (参考訳) 位相ジョセフソン接合に基づく2つの0-$\pi$ qubitsを用いて、パリティ保護の超伝導量子ビットを実装する新しいアーキテクチャを提案する。
位相的ジョセフソン接合は、製造のバリエーションに対する保護を提供し、0-$\pi$ qubit を実装するのに必要な同じジョセフソン接合を保証する。
0-$\pi$ qubit の偶数および奇数パリティ基底状態を spin-$\frac{1}{2}$ 状態として見ることにより、2つの 0-$\pi$ qubit の全パリティ奇部分空間を用いて論理キュービット状態を構成する。
帯電ノイズとフラックスノイズの両方からの同時保護により,$t_1$ と $t_2$ のコヒーレンス時間が劇的に向上することを示す。
本研究は, 工学的対称性を保護した超伝導量子ビットに対する新しいアプローチを提案する。 We propose a novel architecture that utilizes two 0-$\pi$ qubits based on topological Josephson junctions to implement a parity-protected superconducting qubit. The topological Josephson junctions provides protection against fabrication variations, which ensures the identical Josephson junctions required to implement the0-$\pi$ qubit. By viewing the even and odd parity ground states of a 0-$\pi$ qubit as spin-$\frac{1}{2}$ states, we construct the logic qubit states using the total parity odd subspace of two 0-$\pi$ qubits. This parity-protected qubit exhibits robustness against charge noise, similar to a singlet-triplet qubit's immunity to global magnetic field fluctuations. Meanwhile, the flux noise cannot directly couple two states with the same total parity and therefore is greatly suppressed. Benefiting from the simultaneous protection from both charge and flux noise, we demonstrate a dramatic enhancement of both $T_1$ and $T_2$ coherence times. Our work presents a new approach to engineer symmetry-protected superconducting qubits. | 翻訳日:2023-08-24 14:45:44 公開日:2023-08-23 |
# 中国のバイオメディカルエンティティ正規化のための知識注入型プロンプト学習 Knowledge-injected Prompt Learning for Chinese Biomedical Entity Normalization ( http://arxiv.org/abs/2308.12025v1 ) ライセンス: Link先を確認 | Songhua Yang, Chenghao Zhang, Hongfei Xu and Yuxiang Jia | (参考訳) バイオメディカルエンティティ正規化(BEN)タスクは、生の非構造化医療エンティティを標準エンティティに整合させ、データの一貫性を促進し、下流の医療アプリケーションを改善することを目的としている。
提案手法は既存のベースラインよりも優れており, ショット数では平均12.96倍, フルデータでは0.94倍の精度でBENタスクに優れていた。 The Biomedical Entity Normalization (BEN) task aims to align raw, unstructured medical entities to standard entities, thus promoting data coherence and facilitating better downstream medical applications. Recently, prompt learning methods have shown promising results in this task. However, existing research falls short in tackling the more complex Chinese BEN task, especially in the few-shot scenario with limited medical data, and the vast potential of the external medical knowledge base has yet to be fully harnessed. To address these challenges, we propose a novel Knowledge-injected Prompt Learning (PL-Knowledge) method. Specifically, our approach consists of five stages: candidate entity matching, knowledge extraction, knowledge encoding, knowledge injection, and prediction output. By effectively encoding the knowledge items contained in medical entities and incorporating them into our tailor-made knowledge-injected templates, the additional knowledge enhances the model's ability to capture latent relationships between medical entities, thus achieving a better match with the standard entities. We extensively evaluate our model on a benchmark dataset in both few-shot and full-scale scenarios. Our method outperforms existing baselines, with an average accuracy boost of 12.96\% in few-shot and 0.94\% in full-data cases, showcasing its excellence in the BEN task. | 翻訳日:2023-08-24 14:45:21 公開日:2023-08-23 |
# 層間フィードバック伝搬 Layer-wise Feedback Propagation ( http://arxiv.org/abs/2308.12053v1 ) ライセンス: Link先を確認 | Leander Weber, Jim Berend, Alexander Binder, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin | (参考訳) 本稿では、ニューラルネットワークのような予測器のための新しいトレーニング手法LFP(Layer-wise Feedback Propagation)を提案し、説明可能性、特にLayer-wise Relevance Propagation(LRP)を利用して、与えられたタスクの解決へのそれぞれの貢献に基づいて、個々のコネクションに対する報酬を割り当てる。
さらに,lrp-ruleをlfpに拡張する方法や,そのトレーニングへの影響,さらには,ステップ関数活性化スパイクニューラルネットワーク(snns)やトランスファー学習など,有意義な派生性のないトレーニングモデルなど,既存の知識を効率的に活用するための潜在的な応用について検討する。 In this paper, we present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors that utilizes explainability, specifically Layer-wise Relevance Propagation(LRP), to assign rewards to individual connections based on their respective contributions to solving a given task. This differs from traditional gradient descent, which updates parameters towards anestimated loss minimum. LFP distributes a reward signal throughout the model without the need for gradient computations. It then strengthens structures that receive positive feedback while reducingthe influence of structures that receive negative feedback. We establish the convergence of LFP theoretically and empirically, and demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets. Notably, LFP overcomes certain limitations associated with gradient-based methods, such as reliance on meaningful derivatives. We further investigate how the different LRP-rules can be extended to LFP, what their effects are on training, as well as potential applications, such as training models with no meaningful derivatives, e.g., step-function activated Spiking Neural Networks (SNNs), or for transfer learning, to efficiently utilize existing knowledge. | 翻訳日:2023-08-24 14:39:45 公開日:2023-08-23 |
# コールド原子 SO(5) ディラック場上の高次位相ねじれ A higher-order topological twist on cold-atom SO(5) Dirac fields ( http://arxiv.org/abs/2308.12051v1 ) ライセンス: Link先を確認 | A. Bermudez, D. Gonz\'alez-Cuadra, S. Hands | (参考訳) スピン-3/2原子の超低温フェルミガスは、実験室における4-フェルミ相互作用のSO(5)モデルを実現するためのクリーンなプラットフォームを提供する。
我々の研究は、D = 2 + 1$次元のディラックフェルミオンの非自明な相対論的QFTに興味深い接続を持つ調整可能な相互作用を持つ高次位相状態の実装のための経路を開く。 Ultracold Fermi gases of spin-3/2 atoms provide a clean platform to realise SO(5) models of 4-Fermi interactions in the laboratory. By confining the atoms in a two-dimensional Raman lattice, we show how this system can be used as a flexible quantum simulator of Dirac quantum field theories (QFTs) that combine Gross-Neveu and Thirring interactions with a higher-order topological twist. We show that the lattice model corresponds to a regularization of this QFT with an anisotropic twisted Wilson mass. This allows us to access higher-order topological states protected by a hidden SO(5) symmetry, a remnant of the original rotational symmetry of the 4-Fermi interactions that is not explicitly broken by the lattice discretization. Using large-$N$ methods, we show that the 4-Fermi interactions lead to a rich phase diagram with various competing fermion condensates. Our work opens a route for the implementation of correlated higher-order topological states with tunable interactions that has interesting connections to non-trivial relativistic QFTs of Dirac fermions in $D = 2 + 1$ dimensions. | 翻訳日:2023-08-24 14:39:23 公開日:2023-08-23 |
# オフライン強化学習による言語モデルの調整 Aligning Language Models with Offline Reinforcement Learning from Human Feedback ( http://arxiv.org/abs/2308.12050v1 ) ライセンス: Link先を確認 | Jian Hu, Li Tao, June Yang, Chandler Zhou | (参考訳) 人間の好みから学ぶことは言語モデル(LM)にとって重要であり、人間のニーズや社会的価値に効果的に対応する。
しかし、これらのアプローチは主にPPO(Proximal Policy Optimization)のようなオンライン強化学習(RL)技術に依存しており、言語モデルのチューニングが不安定で難しいことが証明されている。
実験の結果,DTアライメントは他のオフラインRLHF法よりも優れ,PPOよりも優れていた。 Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online reinforcement learning (RL) techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline reinforcement learning from human feedback (RLHF) framework to align LMs using pre-generated samples without interacting with RL environments. Specifically, we explore maximum likelihood estimation (MLE) with filtering, reward-weighted regression (RWR), and Decision Transformer (DT) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 12.3\%) computing resources. Experimental results demonstrate the DT alignment outperforms other Offline RLHF methods and is better than PPO. | 翻訳日:2023-08-24 14:39:03 公開日:2023-08-23 |
# 深部教師なしRGB2Depth適応によるプライバシ・サポーティング・フォール検出に向けて Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation ( http://arxiv.org/abs/2308.12049v1 ) ライセンス: Link先を確認 | Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang and Rainer Stiefelhagen | (参考訳) 転倒検知は、システムが警告をトリガーし、転倒を経験した場合の迅速な介入を可能にするため、健康モニタリングにおいて重要なタスクである。
クロスモーダルフォール検出を実現するために,ラベル付きrgbデータとラベルなし深度データを活用した教師なしrgb to depth (rgb2depth)クロスモーダルドメイン適応手法を提案する。
コードはhttps://github.com/1015206533/privacy_supporting_fall_detectionで入手できる。 Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merely capture the distance of objects from the sensor or camera, omitting color and texture information. In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection. To achieve cross-modal fall detection, we present an unsupervised RGB to Depth (RGB2Depth) cross-modal domain adaptation approach that leverages labelled RGB data and unlabelled depth data during training. Our proposed pipeline incorporates an intermediate domain module for feature bridging, modality adversarial loss for modality discrimination, classification loss for pseudo-labeled depth data and labeled source data, triplet loss that considers both source and target domains, and a novel adaptive loss weight adjustment method for improved coordination among various losses. Our approach achieves state-of-the-art results in the unsupervised RGB2Depth domain adaptation task for fall detection. Code is available at https://github.com/1015206533/privacy_supporting_fall_detection. | 翻訳日:2023-08-24 14:38:42 公開日:2023-08-23 |
# 無バイアスシーングラフ生成のためのヘッドテール協調学習ネットワーク Head-Tail Cooperative Learning Network for Unbiased Scene Graph Generation ( http://arxiv.org/abs/2308.12048v1 ) ライセンス: Link先を確認 | Lei Wang, Zejian Yuan, Yao Lu, Badong Chen | (参考訳) 画像理解における重要なタスクであるシーングラフ生成(sgg)は,述語長尾分布に起因する頭部バイアス予測の課題に直面している。
VG150、Open Images V6、GQA200データセット上の様々なSGGモデルに適用することで、HTCLの有効性を実証する。
私たちのコードはhttps://github.com/wanglei0618/HTCLで利用可能です。 Scene Graph Generation (SGG) as a critical task in image understanding, facing the challenge of head-biased prediction caused by the long-tail distribution of predicates. However, current unbiased SGG methods can easily prioritize improving the prediction of tail predicates while ignoring the substantial sacrifice in the prediction of head predicates, leading to a shift from head bias to tail bias. To address this issue, we propose a model-agnostic Head-Tail Collaborative Learning (HTCL) network that includes head-prefer and tail-prefer feature representation branches that collaborate to achieve accurate recognition of both head and tail predicates. We also propose a self-supervised learning approach to enhance the prediction ability of the tail-prefer feature representation branch by constraining tail-prefer predicate features. Specifically, self-supervised learning converges head predicate features to their class centers while dispersing tail predicate features as much as possible through contrast learning and head center loss. We demonstrate the effectiveness of our HTCL by applying it to various SGG models on VG150, Open Images V6 and GQA200 datasets. The results show that our method achieves higher mean Recall with a minimal sacrifice in Recall and achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/HTCL. | 翻訳日:2023-08-24 14:38:15 公開日:2023-08-23 |
# CgT-GAN:画像キャプチャのためのCLIP誘導テキストGAN CgT-GAN: CLIP-guided Text GAN for Image Captioning ( http://arxiv.org/abs/2308.12045v1 ) ライセンス: Link先を確認 | Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu and Xiangnan He | (参考訳) 大規模視覚言語事前訓練モデルであるContrastive Language-Image Pre-training (CLIP) は、人間のアノテーションのないシナリオにおける画像キャプションを大幅に改善した。
実世界の画像を得るのは容易なことを考えると、トレーニングプロセスにイメージを組み込んだCLIP-Guided Text GAN(CgT-GAN)を提案する。
コードはhttps://github.com/Lihr747/CgtGANで入手できる。 The large-scale visual-language pre-trained model, Contrastive Language-Image Pre-training (CLIP), has significantly improved image captioning for scenarios without human-annotated image-caption pairs. Recent advanced CLIP-based image captioning without human annotations follows a text-only training paradigm, i.e., reconstructing text from shared embedding space. Nevertheless, these approaches are limited by the training/inference gap or huge storage requirements for text embeddings. Given that it is trivial to obtain images in the real world, we propose CLIP-guided text GAN (CgT-GAN), which incorporates images into the training process to enable the model to "see" real visual modality. Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance. The caption generator is jointly rewarded based on the caption naturalness to human language calculated from the GAN's discriminator and the semantic guidance reward computed by the CLIP-based reward module. In addition to the cosine similarity as the semantic guidance reward (i.e., CLIP-cos), we further introduce a novel semantic guidance reward called CLIP-agg, which aligns the generated caption with a weighted text embedding by attentively aggregating the entire corpus. Experimental results on three subtasks (ZS-IC, In-UIC and Cross-UIC) show that CgT-GAN outperforms state-of-the-art methods significantly across all metrics. Code is available at https://github.com/Lihr747/CgtGAN. | 翻訳日:2023-08-24 14:37:50 公開日:2023-08-23 |
# ディープニューラルネットワークの正規化経路計算のための多目的継続法 A multiobjective continuation method to compute the regularization path of deep neural networks ( http://arxiv.org/abs/2308.12044v1 ) ライセンス: Link先を確認 | Augustina C. Amakor, Konstantin Sontag and Sebastian Peitz | (参考訳) 深層ニューラルネットワーク(dnn)では、数値効率の確保、モデルの解釈性の向上(関連する特徴の数が少ないことによる)、堅牢性が期待できる機能である。
線形モデルに基づく機械学習のアプローチでは、$\ell^1$ ノルム(すなわちゼロウェイト)と正規化パスと呼ばれる非正規化解という観点から、最もスパース解の間に接続経路が存在することがよく知られている。
ごく最近になって、経験的損失とスパーシリティ($\ell^1$ norm)を2つの矛盾する基準として扱い、結果として生じる多目的最適化問題を解くことによって、正規化パスをDNNに拡張する最初の試みがあった。
しかし、$\ell^1$ のノルムの非滑らかさとパラメータの多さのため、このアプローチは計算の観点からはあまり効率的ではない。
さらに,正規化経路の知識がネットワークパラメトリゼーションを十分に一般化することを示す。 Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization. | 翻訳日:2023-08-24 14:37:15 公開日:2023-08-23 |
# IncreLoRA:パラメータ効率の良い微調整のためのインクリメンタルパラメータ割り当て法 IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning ( http://arxiv.org/abs/2308.12043v1 ) ライセンス: Link先を確認 | Feiyu Zhang, Liangzhi Li, Junhao Chen, Zhouqiang Jiang, Bowen Wang, Yiming Qian | (参考訳) 事前学習された言語モデル(plm)のサイズが大きくなるため、モデル内のすべてのパラメータを微調整することは効率的ではない。
私たちのコードは公開されています。 With the increasing size of pre-trained language models (PLMs), fine-t |