Classifying Organizations for Food System Ontologies using Natural
Language Processing
- URL: http://arxiv.org/abs/2309.10880v1
- Date: Tue, 19 Sep 2023 19:07:48 GMT
- Title: Classifying Organizations for Food System Ontologies using Natural
Language Processing
- Authors: Tianyu Jiang, Sonia Vinogradova, Nathan Stringham, E. Louise Earl,
Allan D. Hollander, Patrick R. Huber, Ellen Riloff, R. Sandra Schillo,
Giorgio A. Ubbiali, Matthew Lange
- Abstract summary: We have created NLP models that can automatically classify organizations associated with environmental issues.
As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization.
We believe NLP models represent a promising approach for harvesting information to populate knowledge graphs.
- Score: 9.462188694526134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our research explores the use of natural language processing (NLP) methods to
automatically classify entities for the purpose of knowledge graph population
and integration with food system ontologies. We have created NLP models that
can automatically classify organizations with respect to categories associated
with environmental issues as well as Standard Industrial Classification (SIC)
codes, which are used by the U.S. government to characterize business
activities. As input, the NLP models are provided with text snippets retrieved
by the Google search engine for each organization, which serves as a textual
description of the organization that is used for learning. Our experimental
results show that NLP models can achieve reasonably good performance for these
two classification tasks, and they rely on a general framework that could be
applied to many other classification problems as well. We believe that NLP
models represent a promising approach for automatically harvesting information
to populate knowledge graphs and aligning the information with existing
ontologies through shared categories and concepts.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing [3.3916160303055567]
NLP-KG is a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing fields.
In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest.
A Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas.
arXiv Detail & Related papers (2024-06-21T16:38:22Z) - Understanding Survey Paper Taxonomy about Large Language Models via
Graph Representation Learning [2.88268082568407]
We develop a method to automatically assign survey papers to a taxonomy.
Our work indicates that leveraging graph structure information on co-category graphs can significantly outperform the language models.
arXiv Detail & Related papers (2024-02-16T02:21:59Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - To Classify is to Interpret: Building Taxonomies from Heterogeneous Data
through Human-AI Collaboration [0.39160947065896795]
We explore how taxonomy building can be supported with systems that integrate machine learning (ML)
We propose an approach that allows the user to iteratively take into account multiple model's outputs as part of their sensemaking process.
arXiv Detail & Related papers (2023-07-31T08:24:29Z) - Company classification using zero-shot learning [0.0]
We propose an approach for company classification using NLP and zero-shot learning.
We evaluate our approach on a dataset obtained through the Wharton Research Data Services (WRDS)
arXiv Detail & Related papers (2023-05-01T18:36:06Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - Learning Mutual Fund Categorization using Natural Language Processing [0.5249805590164901]
We learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP)
We show that the categorization system can indeed be learned with high accuracy.
arXiv Detail & Related papers (2022-07-11T15:40:18Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - FedNLP: A Research Platform for Federated Learning in Natural Language
Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP.
FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling.
Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.