Metadata Conditioned Large Language Models for Localization
- URL: http://arxiv.org/abs/2601.15236v1
- Date: Wed, 21 Jan 2026 18:20:59 GMT
- Title: Metadata Conditioned Large Language Models for Localization
- Authors: Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos,
- Abstract summary: We show that metadata conditioning consistently improves in-region performance without sacrificing cross-region generalization.<n>Our ablation studies demonstrate that URL-level metadata alone captures much of the geographic signal.<n>After instruction tuning, metadata conditioned global models achieve accuracy comparable to LLaMA-3.2-1B-Instruct, despite being trained on substantially less data.
- Score: 25.913929585741034
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models are typically trained by treating text as a single global distribution, often resulting in geographically homogenized behavior. We study metadata conditioning as a lightweight approach for localization, pre-training 31 models (at 0.5B and 1B parameter scales) from scratch on large-scale English news data annotated with verified URLs, country tags, and continent tags, covering 4 continents and 17 countries. Across four controlled experiments, we show that metadata conditioning consistently improves in-region performance without sacrificing cross-region generalization, enables global models to recover localization comparable to region-specific models, and improves learning efficiency. Our ablation studies demonstrate that URL-level metadata alone captures much of the geographic signal, while balanced regional data coverage remains essential, as metadata cannot fully compensate for missing regions. Finally, we introduce a downstream benchmark of 800 localized news MCQs and show that after instruction tuning, metadata conditioned global models achieve accuracy comparable to LLaMA-3.2-1B-Instruct, despite being trained on substantially less data. Together, these results establish metadata conditioning as a practical and compute-efficient approach for localization of language models.
Related papers
- Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts [20.100765943688454]
GRAM is a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions.<n>We use a million-scale satellite imagery dataset from 12 cities across four continents for source training.<n>During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions.
arXiv Detail & Related papers (2025-11-13T13:35:50Z) - RainShift: A Benchmark for Precipitation Downscaling Across Geographies [21.183274939386088]
We introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts.<n>We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South.<n>Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.
arXiv Detail & Related papers (2025-07-07T12:25:14Z) - Enhancing the Performance of Global Model by Improving the Adaptability of Local Models in Federated Learning [5.783667435751743]
Federated learning enables the clients to collaboratively train a global model, which is aggregated from local models.<n>Due to the heterogeneous data distributions over clients and data privacy in federated learning, it is difficult to train local models to achieve a well-performed global model.<n>We introduce the adaptability of local models, and enhance the performance of the global model by improving the adaptability of local models.
arXiv Detail & Related papers (2025-05-15T09:51:47Z) - Subgraph Federated Learning for Local Generalization [41.64806982207585]
Federated Learning (FL) on graphs enables collaborative model training to enhance performance without compromising privacy of each client.<n>Existing methods often overlook the mutable nature of graph data, which frequently introduces new nodes and leads to shifts in label distribution.<n>Our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting.
arXiv Detail & Related papers (2025-03-06T01:08:01Z) - Metadata Conditioning Accelerates Language Model Pre-training [76.54265482251454]
We propose a new method, termed Metadata Conditioning then Cooldown (MeCo) to incorporate additional learning cues during pre-training.<n>MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM)<n>MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.
arXiv Detail & Related papers (2025-01-03T18:59:23Z) - Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas [23.868986217962373]
Small models trained only with locally-collected data outperform published global TCH maps.
We identify specific points of conflict and synergy between local and global modeling paradigms.
arXiv Detail & Related papers (2024-11-21T17:53:27Z) - Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Improving Visual Grounding by Encouraging Consistent Gradient-based
Explanations [58.442103936918805]
We show that Attention Mask Consistency produces superior visual grounding results than previous methods.
AMC is effective, easy to implement, and is general as it can be adopted by any vision-language model.
arXiv Detail & Related papers (2022-06-30T17:55:12Z) - Federated and Generalized Person Re-identification through Domain and
Feature Hallucinating [88.77196261300699]
We study the problem of federated domain generalization (FedDG) for person re-identification (re-ID)
We propose a novel method, called "Domain and Feature Hallucinating (DFH)", to produce diverse features for learning generalized local and global models.
Our method achieves the state-of-the-art performance for FedDG on four large-scale re-ID benchmarks.
arXiv Detail & Related papers (2022-03-05T09:15:13Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - Think Locally, Act Globally: Federated Learning with Local and Global
Representations [92.68484710504666]
Federated learning is a method of training models on private data distributed over multiple devices.
We propose a new federated learning algorithm that jointly learns compact local representations on each device.
We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key.
arXiv Detail & Related papers (2020-01-06T12:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.