TLA: Twitter Linguistic Analysis
- URL: http://arxiv.org/abs/2107.09710v1
- Date: Tue, 20 Jul 2021 18:25:48 GMT
- Title: TLA: Twitter Linguistic Analysis
- Authors: Tushar Sarkar, Nishant Rajadhyaksha
- Abstract summary: TLA(Twitter Linguistic Analysis) is a framework for collecting, labeling, and analyzing data from Twitter for a corpus of languages.
We provide a basic understanding of the framework and discuss the process of collecting, labeling, and analyzing data from Twitter for a corpus of languages.
The analysis provided by TLA will also go a long way in understanding the sentiments of different linguistic communities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linguistics has been instrumental in developing a deeper understanding of
human nature. Words are indispensable to bequeath the thoughts, emotions, and
purpose of any human interaction, and critically analyzing these words can
elucidate the social and psychological behavior and characteristics of these
social animals. Social media has become a platform for human interaction on a
large scale and thus gives us scope for collecting and using that data for our
study. However, this entire process of collecting, labeling, and analyzing this
data iteratively makes the entire procedure cumbersome. To make this entire
process easier and structured, we would like to introduce TLA(Twitter
Linguistic Analysis). In this paper, we describe TLA and provide a basic
understanding of the framework and discuss the process of collecting, labeling,
and analyzing data from Twitter for a corpus of languages while providing
detailed labeled datasets for all the languages and the models are trained on
these datasets. The analysis provided by TLA will also go a long way in
understanding the sentiments of different linguistic communities and come up
with new and innovative solutions for their problems based on the analysis.
Related papers
- X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences.
Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language.
Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Neural Approaches to Entity-Centric Information Extraction [2.8935588665357077]
We introduce a radically different, entity-centric view of the information in text.
We argue that instead of using individual mentions in text to understand their meaning, we should build applications that would work in terms of entity concepts.
In our work, we show that this task can be improved by considering performing entity linking at the coreference cluster level rather than each of the mentions individually.
arXiv Detail & Related papers (2023-04-15T20:07:37Z) - Types of Approaches, Applications and Challenges in the Development of
Sentiment Analysis Systems [0.0]
Sentiment analysis is one of the important applications of natural language processing and machine learning.
Millions of comments are recorded daily and it creates a huge volume of unstructured text data.
arXiv Detail & Related papers (2023-03-09T15:18:34Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - Identifying concept libraries from language about object structure [56.83719358616503]
We leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use.
We formalize our problem as search over a space of program libraries that contain different part concepts.
By combining naturalistic language at scale with structured program representations, we discover a fundamental information-theoretic tradeoff governing the part concepts people name.
arXiv Detail & Related papers (2022-05-11T17:49:25Z) - Metaphors in Pre-Trained Language Models: Probing and Generalization
Across Datasets and Languages [6.7126373378083715]
Large pre-trained language models (PLMs) are assumed to encode metaphorical knowledge useful for NLP systems.
We present studies in multiple metaphor detection datasets and in four languages.
Our experiments suggest that contextual representations in PLMs do encode metaphorical knowledge, and mostly in their middle layers.
arXiv Detail & Related papers (2022-03-26T19:05:24Z) - Social Analysis of Young Basque Speaking Communities in Twitter [0.9445512376558136]
We take into account both social and linguistic aspects to perform demographic analysis by processing a large amount of tweets in Basque language.
The study of demographic characteristics and social relationships are approached by applying machine learning and modern deep-learning Natural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2021-09-08T08:19:08Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - Presentation and Analysis of a Multimodal Dataset for Grounded Language
Learning [32.28310581819443]
Grounded language acquisition involves learning how language-based interactions refer to the world around them.
In practice the data used for learning tends to be cleaner, clearer, and more grammatical than actual human interactions.
We present a dataset of common household objects described by people using either spoken or written language.
arXiv Detail & Related papers (2020-07-29T17:58:04Z) - Experience Grounds Language [185.73483760454454]
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.
Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world.
arXiv Detail & Related papers (2020-04-21T16:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.