Extracting Cultural Commonsense Knowledge at Scale
- URL: http://arxiv.org/abs/2210.07763v3
- Date: Wed, 10 May 2023 12:35:06 GMT
- Title: Extracting Cultural Commonsense Knowledge at Scale
- Authors: Tuan-Phong Nguyen, Simon Razniewski, Aparna Varde, Gerhard Weikum
- Abstract summary: CANDLE is an end-to-end methodology for extracting high-quality cultural commonsense knowledge at scale.
It organizes assertions into coherent clusters for 3 domains of subjects (geography, religion, occupation) and several cultural facets.
It includes judicious techniques for classification-based filtering and scoring of interestingness.
- Score: 28.856786775318486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured knowledge is important for many AI applications. Commonsense
knowledge, which is crucial for robust human-centric AI, is covered by a small
number of structured knowledge projects. However, they lack knowledge about
human traits and behaviors conditioned on socio-cultural contexts, which is
crucial for situative AI. This paper presents CANDLE, an end-to-end methodology
for extracting high-quality cultural commonsense knowledge (CCSK) at scale.
CANDLE extracts CCSK assertions from a huge web corpus and organizes them into
coherent clusters, for 3 domains of subjects (geography, religion, occupation)
and several cultural facets (food, drinks, clothing, traditions, rituals,
behaviors). CANDLE includes judicious techniques for classification-based
filtering and scoring of interestingness. Experimental evaluations show the
superiority of the CANDLE CCSK collection over prior works, and an extrinsic
use case demonstrates the benefits of CCSK for the GPT-3 language model. Code
and data can be accessed at https://candle.mpi-inf.mpg.de/.
Related papers
- A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks.
The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps.
The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability.
The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning [45.62134354858683]
CANDLE is a framework that iteratively performs conceptualization and instantiation over commonsense knowledge bases.
By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples.
arXiv Detail & Related papers (2024-01-14T13:24:30Z) - Visually Grounded Commonsense Knowledge Acquisition [132.42003872906062]
Large-scale commonsense knowledge bases empower a broad range of AI applications.
Visual perception contains rich commonsense knowledge about real-world entities.
We present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem.
arXiv Detail & Related papers (2022-11-22T07:00:16Z) - Refined Commonsense Knowledge from Large-Scale Web Contents [24.10708502359049]
Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications.
This paper presents a method, called ASCENT++, to automatically build a large-scale knowledge base (KB) of CSK assertions.
arXiv Detail & Related papers (2021-11-30T20:26:09Z) - A Data-Driven Study of Commonsense Knowledge using the ConceptNet
Knowledge Base [8.591839265985412]
Acquiring commonsense knowledge and reasoning is recognized as an important frontier in achieving general Artificial Intelligence (AI)
In this paper, we propose and conduct a systematic study to enable a deeper understanding of commonsense knowledge by doing an empirical and structural analysis of the ConceptNet knowledge base.
Detailed experimental results on three carefully designed research questions, using state-of-the-art unsupervised graph representation learning ('embedding') and clustering techniques, reveal deep substructures in ConceptNet relations.
arXiv Detail & Related papers (2020-11-28T08:08:25Z) - Advanced Semantics for Commonsense Knowledge Extraction [32.43213645631101]
Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots.
This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions.
Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets.
arXiv Detail & Related papers (2020-11-02T11:37:17Z) - CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE)
CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective.
We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z) - TransOMCS: From Linguistic Graphs to Commonsense Knowledge [109.36596335148091]
Conventional methods of acquiring commonsense knowledge require laborious and costly human annotations.
We explore a practical way of mining commonsense knowledge from linguistic graphs, with the goal of transferring cheap knowledge obtained with linguistic patterns into expensive commonsense knowledge.
Experimental results demonstrate the transferability of linguistic knowledge to commonsense knowledge and the effectiveness of the proposed approach in terms of quantity, novelty, and quality.
arXiv Detail & Related papers (2020-05-01T04:03:58Z) - Joint Reasoning for Multi-Faceted Commonsense Knowledge [28.856786775318486]
Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots.
Prior works on acquiring CSK have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept.
This paper introduces a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements.
arXiv Detail & Related papers (2020-01-13T11:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.