Community Needs and Assets: A Computational Analysis of Community Conversations
- URL: http://arxiv.org/abs/2403.13272v1
- Date: Wed, 20 Mar 2024 03:14:54 GMT
- Title: Community Needs and Assets: A Computational Analysis of Community Conversations
- Authors: Md Towhidul Absar Chowdhury, Naveen Sharma, Ashiqur R. KhudaBukhsh,
- Abstract summary: We introduce the first dataset about community needs and assets consisting of 3,511 conversations from Reddit.
Using this dataset, we evaluate an utterance-level classification model compared to sentiment classification and a popular large language model.
We observe through our study that conversations about needs have negative sentiments and emotions, while conversations about assets focus on location and entities.
- Score: 11.456416081243654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A community needs assessment is a tool used by non-profits and government agencies to quantify the strengths and issues of a community, allowing them to allocate their resources better. Such approaches are transitioning towards leveraging social media conversations to analyze the needs of communities and the assets already present within them. However, manual analysis of exponentially increasing social media conversations is challenging. There is a gap in the present literature in computationally analyzing how community members discuss the strengths and needs of the community. To address this gap, we introduce the task of identifying, extracting, and categorizing community needs and assets from conversational data using sophisticated natural language processing methods. To facilitate this task, we introduce the first dataset about community needs and assets consisting of 3,511 conversations from Reddit, annotated using crowdsourced workers. Using this dataset, we evaluate an utterance-level classification model compared to sentiment classification and a popular large language model (in a zero-shot setting), where we find that our model outperforms both baselines at an F1 score of 94% compared to 49% and 61% respectively. Furthermore, we observe through our study that conversations about needs have negative sentiments and emotions, while conversations about assets focus on location and entities. The dataset is available at https://github.com/towhidabsar/CommunityNeeds.
Related papers
- Building Better: Avoiding Pitfalls in Developing Language Resources when Data is Scarce [27.918975040084387]
Data in a given language should be viewed as more than a collection of tokens.
Good data collection and labeling practices are key to building more human-centered and socially aware technologies.
arXiv Detail & Related papers (2024-10-16T15:51:18Z) - Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets.
Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects.
We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z) - Detecting value-expressive text posts in Russian social media [0.0]
We aimed to find a model that can accurately detect value-expressive posts in Russian social media VKontakte.
A training dataset of 5,035 posts was annotated by three experts, 304 crowd-workers and ChatGPT.
ChatGPT was more consistent but struggled with spam detection.
arXiv Detail & Related papers (2023-12-14T14:18:27Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - ValueNet: A New Dataset for Human Value Driven Dialogue System [103.2044265617704]
We present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios.
Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks.
ValueNet is the first large-scale text dataset for human value modeling.
arXiv Detail & Related papers (2021-12-12T23:02:52Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - Discovering and Categorising Language Biases in Reddit [5.670038395203354]
This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit.
We use word embeddings to transform text into high-dimensional dense vectors and capture semantic relations between words.
We successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities.
arXiv Detail & Related papers (2020-08-06T16:42:10Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - A Framework for Pre-processing of Social Media Feeds based on Integrated
Local Knowledge Base [1.5749416770494706]
This paper proposes an improved framework for pre-processing of social media feeds for better performance.
The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets.
arXiv Detail & Related papers (2020-06-29T07:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.