Related papers: Metadata Conditioned Large Language Models for Localization

Metadata Conditioned Large Language Models for Localization

URL: http://arxiv.org/abs/2601.15236v1
Date: Wed, 21 Jan 2026 18:20:59 GMT
Title: Metadata Conditioned Large Language Models for Localization
Authors: Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos,
Abstract summary: We show that metadata conditioning consistently improves in-region performance without sacrificing cross-region generalization.<n>Our ablation studies demonstrate that URL-level metadata alone captures much of the geographic signal.<n>After instruction tuning, metadata conditioned global models achieve accuracy comparable to LLaMA-3.2-1B-Instruct, despite being trained on substantially less data.
Score: 25.913929585741034
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models are typically trained by treating text as a single global distribution, often resulting in geographically homogenized behavior. We study metadata conditioning as a lightweight approach for localization, pre-training 31 models (at 0.5B and 1B parameter scales) from scratch on large-scale English news data annotated with verified URLs, country tags, and continent tags, covering 4 continents and 17 countries. Across four controlled experiments, we show that metadata conditioning consistently improves in-region performance without sacrificing cross-region generalization, enables global models to recover localization comparable to region-specific models, and improves learning efficiency. Our ablation studies demonstrate that URL-level metadata alone captures much of the geographic signal, while balanced regional data coverage remains essential, as metadata cannot fully compensate for missing regions. Finally, we introduce a downstream benchmark of 800 localized news MCQs and show that after instruction tuning, metadata conditioned global models achieve accuracy comparable to LLaMA-3.2-1B-Instruct, despite being trained on substantially less data. Together, these results establish metadata conditioning as a practical and compute-efficient approach for localization of language models.

Related papers

Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts [20.100765943688454]
GRAM is a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions.<n>We use a million-scale satellite imagery dataset from 12 cities across four continents for source training.<n>During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions.
arXiv Detail & Related papers (2025-11-13T13:35:50Z)
RainShift: A Benchmark for Precipitation Downscaling Across Geographies [21.183274939386088]
We introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts.<n>We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South.<n>Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.
arXiv Detail & Related papers (2025-07-07T12:25:14Z)
Enhancing the Performance of Global Model by Improving the Adaptability of Local Models in Federated Learning [5.783667435751743]
Federated learning enables the clients to collaboratively train a global model, which is aggregated from local models.<n>Due to the heterogeneous data distributions over clients and data privacy in federated learning, it is difficult to train local models to achieve a well-performed global model.<n>We introduce the adaptability of local models, and enhance the performance of the global model by improving the adaptability of local models.
arXiv Detail & Related papers (2025-05-15T09:51:47Z)
Subgraph Federated Learning for Local Generalization [41.64806982207585]
Federated Learning (FL) on graphs enables collaborative model training to enhance performance without compromising privacy of each client.<n>Existing methods often overlook the mutable nature of graph data, which frequently introduces new nodes and leads to shifts in label distribution.<n>Our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting.
arXiv Detail & Related papers (2025-03-06T01:08:01Z)
Metadata Conditioning Accelerates Language Model Pre-training [76.54265482251454]
We propose a new method, termed Metadata Conditioning then Cooldown (MeCo) to incorporate additional learning cues during pre-training.<n>MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM)<n>MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.
arXiv Detail & Related papers (2025-01-03T18:59:23Z)
Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas [23.868986217962373]
Small models trained only with locally-collected data outperform published global TCH maps. We identify specific points of conflict and synergy between local and global modeling paradigms.
arXiv Detail & Related papers (2024-11-21T17:53:27Z)
Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations [58.442103936918805]
We show that Attention Mask Consistency produces superior visual grounding results than previous methods. AMC is effective, easy to implement, and is general as it can be adopted by any vision-language model.
arXiv Detail & Related papers (2022-06-30T17:55:12Z)
Federated and Generalized Person Re-identification through Domain and Feature Hallucinating [88.77196261300699]
We study the problem of federated domain generalization (FedDG) for person re-identification (re-ID) We propose a novel method, called "Domain and Feature Hallucinating (DFH)", to produce diverse features for learning generalized local and global models. Our method achieves the state-of-the-art performance for FedDG on four large-scale re-ID benchmarks.
arXiv Detail & Related papers (2022-03-05T09:15:13Z)
Jalisco's multiclass land cover analysis and classification using a novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis. In this work, we combine three real-world open data sources to obtain 13 channels. Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instances during training. Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)
Think Locally, Act Globally: Federated Learning with Local and Global Representations [92.68484710504666]
Federated learning is a method of training models on private data distributed over multiple devices. We propose a new federated learning algorithm that jointly learns compact local representations on each device. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key.
arXiv Detail & Related papers (2020-01-06T12:40:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.