Geolocation differences of language use in urban areas
- URL: http://arxiv.org/abs/2108.00533v1
- Date: Sun, 1 Aug 2021 19:55:45 GMT
- Title: Geolocation differences of language use in urban areas
- Authors: Olga Kellert and Nicholas H. Matlis
- Abstract summary: We explore the use of Twitter data with precise geolocation information to resolve spatial variations in language use on an urban scale down to single city blocks.
Our work shows that analysis of small-scale variations can provide unique information on correlations between language use and social context.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The explosion in the availability of natural language data in the era of
social media has given rise to a host of applications such as sentiment
analysis and opinion mining. Simultaneously, the growing availability of
precise geolocation information is enabling visualization of global phenomena
such as environmental changes and disease propagation. Opportunities for
tracking spatial variations in language use, however, have largely been
overlooked, especially on small spatial scales. Here we explore the use of
Twitter data with precise geolocation information to resolve spatial variations
in language use on an urban scale down to single city blocks. We identify
several categories of language tokens likely to show distinctive patterns of
use and develop quantitative methods to visualize the spatial distributions
associated with these patterns. Our analysis concentrates on comparison of
contrasting pairs of Tweet distributions from the same category, each defined
by a set of tokens. Our work shows that analysis of small-scale variations can
provide unique information on correlations between language use and social
context which are highly valuable to a wide range of fields from linguistic
science and commercial advertising to social services.
Related papers
- Exploring language relations through syntactic distances and geographic proximity [0.4369550829556578]
We explore linguistic distances using series of parts of speech (POS) extracted from the Universal Dependencies dataset.
We find definite clusters that correspond to well known language families and groups, with exceptions explained by distinct morphological typologies.
arXiv Detail & Related papers (2024-03-27T10:36:17Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Comparing Measures of Linguistic Diversity Across Social Media Language
Data and Census Data at Subnational Geographic Areas [1.0128808054306186]
This paper describes the comparative linguistic ecology of online spaces (i.e., social media language data) and real-world spaces in Aotearoa New Zealand.
We compare measures of linguistic diversity between these different spaces and discuss how social media users align with real-world populations.
arXiv Detail & Related papers (2023-08-21T03:54:23Z) - GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark [56.08664336835741]
We propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE.
We collect data from open-released geographic resources and introduce six natural language understanding tasks.
We pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
arXiv Detail & Related papers (2023-05-11T03:21:56Z) - Building Dynamic Ontological Models for Place using Social Media Data
from Twitter and Sina Weibo [3.662177902714955]
We use social media data (Twitter, Weibo) to build a dynamic ontology model in two separate areas: Beijing, China and San Diego, the U.S.A.
We identify types of place name from geotagged social media data and classified them by comparing their default search of radius of geo-tagged points.
We also investigate the semantic meaning of each place name by examining Pointwise Mutual Information (PMI) scores of word clouds.
arXiv Detail & Related papers (2023-03-02T00:33:47Z) - SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity
Representation [25.52363878314735]
SpaBERT provides a general-purpose geo-entity representation based on neighboring entities in geospatial data.
SpaBERT is pretrained with masked language modeling and masked entity prediction tasks.
We apply SpaBERT to two downstream tasks: geo-entity typing and geo-entity linking.
arXiv Detail & Related papers (2022-10-21T19:42:32Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Measuring Geographic Performance Disparities of Offensive Language
Classifiers [12.545108947857802]
We ask two questions: Does language, dialect, and topical content vary across geographical regions?'' and If there are differences across the regions, do they impact model performance?''
We find that current models do not generalize across locations. Likewise, we show that while offensive language models produce false positives on African American English, model performance is not correlated with each city's minority population proportions.
arXiv Detail & Related papers (2022-09-15T15:08:18Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.