Bhasacitra: Visualising the dialect geography of South Asia
- URL: http://arxiv.org/abs/2105.14082v3
- Date: Thu, 19 Oct 2023 05:39:52 GMT
- Title: Bhasacitra: Visualising the dialect geography of South Asia
- Authors: Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu
- Abstract summary: Bhasacitra is a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data.
We analyse language coverage and look towards applications to typology by visualising example datasets.
- Score: 5.30875181537382
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present Bhasacitra, a dialect mapping system for South Asia built on a
database of linguistic studies of languages of the region annotated for topic
and location data. We analyse language coverage and look towards applications
to typology by visualising example datasets. The application is not only meant
to be useful for feature mapping, but also serves as a new kind of interactive
bibliography for linguists of South Asian languages.
Related papers
- The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Content-Localization based System for Analyzing Sentiment and Hate
Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf [5.2957928879391]
This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects.
We utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf.
Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis.
arXiv Detail & Related papers (2023-11-27T15:37:33Z) - Logographic Information Aids Learning Better Representations for Natural
Language Inference [3.677231059555795]
We present a novel study which explores the benefits of providing language models with logographic information in learning better semantic representations.
Our evaluation results in six languages suggest significant benefits of using multi-modal embeddings in languages with logograhic systems.
arXiv Detail & Related papers (2022-11-03T20:40:14Z) - MASALA: Modelling and Analysing the Semantics of Adpositions in
Linguistic Annotation of Hindi [11.042037758273226]
We use language models to attempt automatic labelling of SNACS supersenses in Hindi.
We look towards upstream applications in semantic role labelling and extension to related languages such as Gujarati.
arXiv Detail & Related papers (2022-05-08T21:13:33Z) - Urdu Morphology, Orthography and Lexicon Extraction [0.0]
This paper describes an implementation of the Urdu language as a software API.
We deal with orthography, morphology and the extraction of the lexicon.
arXiv Detail & Related papers (2022-04-06T20:14:01Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - SIGTYP 2020 Shared Task: Prediction of Typological Features [78.95376120154083]
A major drawback hampering broader adoption of typological KBs is that they are sparsely populated.
As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs.
Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations.
arXiv Detail & Related papers (2020-10-16T08:47:24Z) - PhraseCut: Language-based Image Segmentation in the Wild [62.643450401286]
We consider the problem of segmenting image regions given a natural language phrase.
Our dataset is collected on top of the Visual Genome dataset.
Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art.
arXiv Detail & Related papers (2020-08-03T20:58:53Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.