Related papers: On the Performance of LLMs for Real Estate Appraisal

On the Performance of LLMs for Real Estate Appraisal

URL: http://arxiv.org/abs/2506.11812v1
Date: Fri, 13 Jun 2025 14:14:40 GMT
Title: On the Performance of LLMs for Real Estate Appraisal
Authors: Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt,
Abstract summary: This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates.<n>We evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques.<n>Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates.
Score: 5.812129569528997
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The real estate market is vital to global economies but suffers from significant information asymmetry. This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates through optimized In-Context Learning (ICL) strategies. We systematically evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques. Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates. While traditional machine learning models remain strong for pure predictive accuracy, LLMs offer a more accessible, interactive and interpretable alternative. Although self-explanations require cautious interpretation, we find that LLMs explain their predictions in agreement with state-of-the-art models, confirming their trustworthiness. Carefully selected in-context examples based on feature similarity and geographic proximity, significantly enhance LLM performance, yet LLMs struggle with overconfidence in price intervals and limited spatial reasoning. We offer practical guidance for structured prediction tasks through prompt optimization. Our findings highlight LLMs' potential to improve transparency in real estate appraisal and provide actionable insights for stakeholders.

Related papers

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification [4.0057196015831495]
Large Language Models (LLMs) have attracted significant attention for classification tasks.<n>Their reliability for structured data remains unclear, particularly in high stakes applications like financial risk assessment.<n>Our study systematically evaluates LLMs and generates their SHAP values on financial classification tasks.
arXiv Detail & Related papers (2025-11-28T19:04:25Z)
Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML? [4.0057196015831495]
Large Language Models (LLMs) are increasingly explored as flexible alternatives to classical machine learning models for classification tasks through zero-shot prompting.<n>This study conducts a systematic comparison between zero-shot LLM-based classifiers and LightGBM, a state-of-the-art gradient-boosting model, on a real-world loan default prediction task.<n>We evaluate their predictive performance, analyze feature attributions using SHAP, and assess the reliability of LLM-generated self-explanations.
arXiv Detail & Related papers (2025-10-29T17:05:00Z)
Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs [57.82819770709032]
Large language models (LLMs) can be effective context-aided forecasters via na"ive direct prompting.<n>ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context.<n>CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines.<n> IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models.
arXiv Detail & Related papers (2025-08-13T16:02:55Z)
DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective [10.932591941137698]
This paper introduces DeepFund, a comprehensive platform for evaluating Large Language Models (LLMs) in a simulated live environment.<n>Our approach implements a multi agent framework where LLMs serve as both analysts and managers, creating a realistic simulation of investment decision making.<n>We provide a web interface that visualizes model performance across different market conditions and investment parameters, enabling detailed comparative analysis.
arXiv Detail & Related papers (2025-03-24T03:32:13Z)
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models [50.16340812031201]
We show that large language models (LLMs) do not update their beliefs as expected from the Bayesian framework.<n>We teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
arXiv Detail & Related papers (2025-03-21T20:13:04Z)
Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents [69.58565132975504]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks.<n>We present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading.
arXiv Detail & Related papers (2025-02-25T08:41:01Z)
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values [13.798198972161657]
A number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes.<n>This paper examines whether large language models (LLMs) adhere to fundamental fairness concepts and investigate their alignment with human preferences.
arXiv Detail & Related papers (2025-02-01T04:24:47Z)
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation [45.059818539256426]
We propose the Chain-of-Embedding (CoE) in the latent space to enable LLMs to perform output-free self-evaluation.<n>CoE consists of all progressive hidden states produced during the inference time, which can be treated as the latent thinking path of LLMs.
arXiv Detail & Related papers (2024-10-17T15:09:24Z)
Financial Statement Analysis with Large Language Models [0.0]
We provide standardized and anonymous financial statements to GPT4 and instruct the model to analyze them.<n>The model outperforms financial analysts in its ability to predict earnings changes directionally.<n>Our trading strategies based on GPT's predictions yield a higher Sharpe ratio and alphas than strategies based on other models.
arXiv Detail & Related papers (2024-07-25T08:36:58Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Benchmarking LLMs via Uncertainty Quantification [91.72588235407379]
The proliferation of open-source Large Language Models (LLMs) has highlighted the urgent need for comprehensive evaluation methods. We introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. Our findings reveal that: I) LLMs with higher accuracy may exhibit lower certainty; II) Larger-scale LLMs may display greater uncertainty compared to their smaller counterparts; and III) Instruction-finetuning tends to increase the uncertainty of LLMs.
arXiv Detail & Related papers (2024-01-23T14:29:17Z)
A Comparative Analysis of Fine-Tuned LLMs and Few-Shot Learning of LLMs for Financial Sentiment Analysis [0.0]
We employ two approaches: in-context learning and fine-tuning LLMs on a finance-domain dataset. Our results demonstrate that fine-tuned smaller LLMs can achieve comparable performance to state-of-the-art fine-tuned LLMs. There is no observed enhancement in performance for finance-domain sentiment analysis when the number of shots for in-context learning is increased.
arXiv Detail & Related papers (2023-12-14T08:13:28Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models [11.154814189699735]
Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks. We introduce a retrieval-augmented LLMs framework for financial sentiment analysis. Our approach achieves 15% to 48% performance gain in accuracy and F1 score.
arXiv Detail & Related papers (2023-10-06T05:40:23Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.