Enabling Equitable Access to Trustworthy Financial Reasoning
- URL: http://arxiv.org/abs/2508.21051v1
- Date: Thu, 28 Aug 2025 17:55:07 GMT
- Title: Enabling Equitable Access to Trustworthy Financial Reasoning
- Authors: William Jurayj, Nils Holzenberger, Benjamin Van Durme,
- Abstract summary: Tax filing requires complex reasoning, combining application of overlapping rules with numerical calculations.<n>We propose an approach that integrates LLMs with a symbolic solver to calculate tax obligations.<n>We show how combining up-front translation of plain-text rules into formal logic programs, combined with intelligently retrieved exemplars for formal case representations, can dramatically improve performance.
- Score: 50.73061215297832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: According to the United States Internal Revenue Service, ''the average American spends $\$270$ and 13 hours filing their taxes''. Even beyond the U.S., tax filing requires complex reasoning, combining application of overlapping rules with numerical calculations. Because errors can incur costly penalties, any automated system must deliver high accuracy and auditability, making modern large language models (LLMs) poorly suited for this task. We propose an approach that integrates LLMs with a symbolic solver to calculate tax obligations. We evaluate variants of this system on the challenging StAtutory Reasoning Assessment (SARA) dataset, and include a novel method for estimating the cost of deploying such a system based on real-world penalties for tax errors. We further show how combining up-front translation of plain-text rules into formal logic programs, combined with intelligently retrieved exemplars for formal case representations, can dramatically improve performance on this task and reduce costs to well below real-world averages. Our results demonstrate the promise and economic feasibility of neuro-symbolic architectures for increasing equitable access to reliable tax assistance.
Related papers
- Training LLMs with LogicReward for Faithful and Rigorous Reasoning [75.30425553246177]
We propose LogicReward, a reward system that guides model training by enforcing step-level logical correctness with a theorem prover.<n>An 8B model trained on data constructed with LogicReward surpasses GPT-4o and o4-mini by 11.6% and 2% on natural language inference and logical reasoning tasks.
arXiv Detail & Related papers (2025-12-20T03:43:02Z) - TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task [0.11999555634662631]
Calculating US personal income taxes is a task that requires building an understanding of vast amounts of English text.<n>We propose TaxCalcBench, a benchmark for determining models' abilities to calculate personal income tax returns.
arXiv Detail & Related papers (2025-07-22T00:37:59Z) - TaxAgent: How Large Language Model Designs Fiscal Policy [22.859190941594296]
This study introduces TaxAgent, a novel integration of large language models (LLMs) with agent-based modeling (ABM) to design adaptive tax policies.<n>In our macroeconomic simulation, heterogeneous H-Agents (households) simulate real-world taxpayer behaviors while the TaxAgent (government) utilizes LLMs to iteratively optimize tax rates, balancing equity and productivity.<n> Benchmarked against Saez Optimal Taxation, U.S. federal income taxes, and free markets, TaxAgent achieves superior equity-efficiency trade-offs.
arXiv Detail & Related papers (2025-06-03T13:06:19Z) - Technical Challenges in Maintaining Tax Prep Software with Large Language Models [6.419602857618507]
We focus on identifying, understanding, and tackling technical challenges in leveraging Large Language Models (LLMs)<n>Our research efforts focus on identifying, understanding, and tackling technical challenges in leveraging ChatGPT and Llama to faithfully extract code differentials from IRS publications.
arXiv Detail & Related papers (2025-04-25T21:00:20Z) - A Taxation Perspective for Fair Re-ranking [61.946428892727795]
We introduce a new fair re-ranking method named Tax-rank, which levies taxes based on the difference in utility between two items.
Our model Tax-rank offers a superior tax policy for fair re-ranking, theoretically demonstrating both continuity and controllability over accuracy loss.
arXiv Detail & Related papers (2024-04-27T08:21:29Z) - Learning Optimal Tax Design in Nonatomic Congestion Games [56.85292809260111]
In multiplayer games, self-interested behavior among the players can harm the social welfare.<n>We take the initial step of learning the optimal tax that can induce social welfare with limited feedback in congestion games.
arXiv Detail & Related papers (2024-02-12T06:32:53Z) - On the Potential and Limitations of Few-Shot In-Context Learning to
Generate Metamorphic Specifications for Tax Preparation Software [12.071874385139395]
Nearly 50% of taxpayers filed their individual income taxes using tax software in the U.S. in FY22.
This paper formulates the task of generating metamorphic specifications as a translation task between properties extracted from tax documents.
arXiv Detail & Related papers (2023-11-20T18:12:28Z) - Algorithmic Fairness and Vertical Equity: Income Fairness with IRS Tax
Audit Models [73.24381010980606]
This study examines issues of algorithmic fairness in the context of systems that inform tax audit selection by the IRS.
We show how the use of more flexible machine learning methods for selecting audits may affect vertical equity.
Our results have implications for the design of algorithmic tools across the public sector.
arXiv Detail & Related papers (2022-06-20T16:27:06Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Metamorphic Testing and Debugging of Tax Preparation Software [2.185694185279913]
We focus on an open-source tax preparation software for our case study.
We develop a randomized test-case generation strategy to systematically validate the correctness of tax preparation software.
arXiv Detail & Related papers (2022-05-10T16:10:10Z) - Integrating Reward Maximization and Population Estimation: Sequential
Decision-Making for Internal Revenue Service Audit Selection [2.2182596728059116]
We introduce a new setting, optimize-and-estimate structured bandits.
This setting is inherent to many public and private sector applications.
We demonstrate its importance on real data from the United States Internal Revenue Service.
arXiv Detail & Related papers (2022-04-25T18:28:55Z) - The AI Economist: Improving Equality and Productivity with AI-Driven Tax
Policies [119.07163415116686]
We train social planners that discover tax policies that can effectively trade-off economic equality and productivity.
We present an economic simulation environment that features competitive pressures and market dynamics.
We show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies.
arXiv Detail & Related papers (2020-04-28T06:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.