Related papers: Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing

Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing

URL: http://arxiv.org/abs/2512.17923v1
Date: Mon, 08 Dec 2025 15:48:57 GMT
Title: Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing
Authors: Christopher Regan, Ying Xie,
Abstract summary: Tests three dealer hedging constraint patterns on 242 trading days (95.6% coverage) of S&P 500 options data.<n>We find LLMs achieve 71.5% detection rate using unbiased prompts that provide only raw gamma exposure values.
Score: 0.0937899315060426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce obfuscation testing, a novel methodology for validating whether large language models detect structural market patterns through causal reasoning rather than temporal association. Testing three dealer hedging constraint patterns (gamma positioning, stock pinning, 0DTE hedging) on 242 trading days (95.6% coverage) of S&P 500 options data, we find LLMs achieve 71.5% detection rate using unbiased prompts that provide only raw gamma exposure values without regime labels or temporal context. The WHO-WHOM-WHAT causal framework forces models to identify the economic actors (dealers), affected parties (directional traders), and structural mechanisms (forced hedging) underlying observed market dynamics. Critically, detection accuracy (91.2%) remains stable even as economic profitability varies quarterly, demonstrating that models identify structural constraints rather than profitable patterns. When prompted with regime labels, detection increases to 100%, but the 71.5% unbiased rate validates genuine pattern recognition. Our findings suggest LLMs possess emergent capabilities for detecting complex financial mechanisms through pure structural reasoning, with implications for systematic strategy development, risk management, and our understanding of how transformer architectures process financial market dynamics.

Related papers

Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals [0.0]
We develop a walk-forward validation framework for algorithmic trading designed to overfitting and lookahead bias.<n>Our methodology combines interpretable hypothesis-driven signal generation with reinforcement learning and strict out-of-sample testing.<n>The framework enforces strict information set discipline, employs rolling window validation across 34 independent test periods, maintains complete interpretability through natural language hypothesis explanations.
arXiv Detail & Related papers (2025-12-15T02:20:42Z)
Bayesian Modeling for Uncertainty Management in Financial Risk Forecasting and Compliance [0.0]
We develop an integrated approach that consistently enhances the handling of risk in market volatility forecasting, fraud detection, and compliance monitoring.<n>We evaluate the performance of one-day-ahead 95% Value-at-Risk (VaR) forecasts on daily S&P 500 returns, with a training period from 2000 to 2019 and an out-of-sample test period spanning 2020 to 2024.<n>Our proposed discount-factor DLM model produces a slightly liberal VaR estimate, with evidence of clustered violations.
arXiv Detail & Related papers (2025-12-06T23:00:19Z)
A FEDformer-Based Hybrid Framework for Anomaly Detection and Risk Forecasting in Financial Time Series [0.8065001399110248]
This study proposes a FEDformer-Based Hybrid Framework for Anomaly Detection and Risk Forecasting in Financial Time Series.<n>It integrates the Frequency Enhanced Decomposed Transformer (FEDformer) with a residual-based anomaly detector and a risk forecasting head.<n>Experiments conducted on the S&P 500, NASDAQ Composite, and Brent Crude Oil datasets (2000-2024) demonstrate the superiority of the proposed model over benchmark methods.
arXiv Detail & Related papers (2025-11-17T04:09:04Z)
Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions [51.56484100374058]
Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes.<n>This gap, driven by mutually cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector.<n>This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure.
arXiv Detail & Related papers (2025-11-10T22:24:21Z)
Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets [57.179679246370114]
In financial applications, reinforcement learning (RL) agents are commonly trained on historical data, where their actions do not influence prices.<n>During deployment, these agents trade in live markets where their own transactions can shift asset prices, a phenomenon known as market impact.<n>Traditional robust RL approaches address this model misspecification by optimizing the worst-case performance over a set of uncertainties.<n>We develop a novel class of elliptic uncertainty sets, enabling efficient and tractable robust policy evaluation.
arXiv Detail & Related papers (2025-10-22T18:22:25Z)
Predicting Market Troughs: A Machine Learning Approach with Causal Interpretation [0.0]
This paper provides robust, new evidence on the causal drivers of market troughs.<n>We move beyond restrictive linear models with a flexible DML average partial effect causal machine learning framework.
arXiv Detail & Related papers (2025-09-07T04:38:40Z)
ProteuS: A Generative Approach for Simulating Concept Drift in Financial Markets [44.76567557906836]
A fundamental problem in developing and validating adaptive algorithms is the lack of a ground truth in real-world financial data.<n>This paper introduces a novel framework, named ProteuS, for generating semi-synthetic financial time series with pre-defined structural breaks.<n>An analysis of the generated data confirms the complexity of the task, revealing significant overlap between the different market states.
arXiv Detail & Related papers (2025-08-30T21:01:47Z)
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z)
Your AI, Not Your View: The Bias of LLMs in Investment Analysis [62.388554963415906]
In finance, Large Language Models (LLMs) face frequent knowledge conflicts arising from discrepancies between their pre-trained parametric knowledge and real-time market data.<n>These conflicts are especially problematic in real-world investment services, where a model's inherent biases can misalign with institutional objectives.<n>We propose an experimental framework to investigate emergent behaviors in such conflict scenarios, offering a quantitative analysis of bias in investment analysis.
arXiv Detail & Related papers (2025-07-28T16:09:38Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation [30.312011441118194]
Decentralized finance applications depend on accurate price oracles to ensure secure transactions.<n>Price oracles are highly vulnerable to manipulation, enabling attackers to exploit smart contract vulnerabilities.<n>We propose a novel framework that automates the detection of price oracle manipulations.
arXiv Detail & Related papers (2025-02-10T10:58:09Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.