Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
- URL: http://arxiv.org/abs/2508.03156v1
- Date: Tue, 05 Aug 2025 07:00:42 GMT
- Title: Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
- Authors: Paul Gümmer, Julian Rosenberger, Mathias Kraus, Patrick Zschech, Nico Hambauer,
- Abstract summary: House price valuation remains challenging due to localized market variations.<n>Existing approaches often rely on black-box machine learning models, which lack interpretability.<n>We propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features.
- Score: 3.738325076054202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.
Related papers
- Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation [49.45822979879046]
Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens.<n>We present Chunk AutoRegressive Modeling (CAR), a new generation paradigm following the decision pattern that users usually think semantic aspects of items.
arXiv Detail & Related papers (2025-06-30T09:13:54Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models [61.467781476005435]
skill-wise performance is obscured when inspecting aggregate accuracy, under-utilizing the rich signal modern benchmarks contain.
We propose an automatic approach to recover the underlying skills relevant for any evaluation instance, by way of inspecting model-generated rationales.
Our skill-slices and framework open a new avenue in model evaluation, leveraging skill-specific analyses to unlock a more granular and actionable understanding of model capabilities.
arXiv Detail & Related papers (2024-10-17T17:51:40Z) - From Predictive Importance to Causality: Which Machine Learning Model Reflects Reality? [0.0]
We find a moderate Spearman rank correlation of 0.48 between SHAP-based feature importance and causally significant features.
This work underscores the need for integrated approaches that combine predictive power with causal insights in real estate valuation.
arXiv Detail & Related papers (2024-09-01T22:37:47Z) - Utilizing Model Residuals to Identify Rental Properties of Interest: The
Price Anomaly Score (PAS) and Its Application to Real-time Data in Manhattan [0.0]
Drawing from data collected of all possible available properties for rent in Manhattan as of September 2023, this paper aims to strengthen our understanding of model residuals.
To harness these insights, we introduce the Price Anomaly Score (PAS), a metric capable of capturing boundaries between irregularly predicted prices.
arXiv Detail & Related papers (2023-11-29T00:14:30Z) - Measuring Self-Supervised Representation Quality for Downstream
Classification using Discriminative Features [56.89813105411331]
We study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins.
We propose Self-Supervised Representation Quality Score (or Q-Score), an unsupervised score that can reliably predict if a given sample is likely to be mis-classified.
Fine-tuning with Q-Score regularization can boost the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K.
arXiv Detail & Related papers (2022-03-03T17:48:23Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand.
We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value.
Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z) - Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area.
Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models.
We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.