Développement automatique de lexiques pour les concepts émergents : une exploration méthodologique
- URL: http://arxiv.org/abs/2406.10253v1
- Date: Mon, 10 Jun 2024 12:58:56 GMT
- Title: Développement automatique de lexiques pour les concepts émergents : une exploration méthodologique
- Authors: Revekka Kyriakoglou, Anna Pappa, Jilin He, Antoine Schoen, Patricia Laurens, Markarit Vartampetian, Philippe Laredo, Tita Kyriacopoulou,
- Abstract summary: This paper presents the development of a lexicon centered on emerging concepts, focusing on non-technological innovation.
It introduces a four-step methodology that combines human expertise, statistical analysis, and machine learning techniques to establish a model that can be generalized across multiple domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the development of a lexicon centered on emerging concepts, focusing on non-technological innovation. It introduces a four-step methodology that combines human expertise, statistical analysis, and machine learning techniques to establish a model that can be generalized across multiple domains. This process includes the creation of a thematic corpus, the development of a Gold Standard Lexicon, annotation and preparation of a training corpus, and finally, the implementation of learning models to identify new terms. The results demonstrate the robustness and relevance of our approach, highlighting its adaptability to various contexts and its contribution to lexical research. The developed methodology promises applicability in conceptual fields.
Related papers
- From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity.
We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT.
The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models.
Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z) - Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review [3.0712840129998513]
This literature review proposes a taxonomy and framework that encapsulates recent methodological advances in this field.
We introduce a novel data fusion category -- mid fusion -- and a graph-based technique for refining literature reviews, termed citation graph pruning.
There remains a need for further research to bridge the divide between multimodal learning and training studies and foundational AI research.
arXiv Detail & Related papers (2024-08-22T22:42:23Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Learning Interpretable Concepts: Unifying Causal Representation Learning
and Foundation Models [51.43538150982291]
We study how to learn human-interpretable concepts from data.
Weaving together ideas from both fields, we show that concepts can be provably recovered from diverse data.
arXiv Detail & Related papers (2024-02-14T15:23:59Z) - A Data Generation Perspective to the Mechanism of In-Context Learning [37.933016939520684]
In-Context Learning (ICL) empowers Large Language Models (LLMs) with the capacity to learn in context.
Despite the encouraging empirical success, the underlying mechanism of ICL remains unclear.
arXiv Detail & Related papers (2024-02-03T17:13:03Z) - A Survey of Text Representation Methods and Their Genealogy [0.0]
In recent years, with the advent of highly scalable artificial-neural-network-based text representation methods the field of natural language processing has seen unprecedented growth and sophistication.
We provide a survey of current approaches, by arranging them in a genealogy, and by conceptualizing a taxonomy of text representation methods to examine and explain the state-of-the-art.
arXiv Detail & Related papers (2022-11-26T15:22:01Z) - Biologically-informed deep learning models for cancer: fundamental
trends for encoding and interpreting oncology data [0.0]
We provide a structured literature analysis focused on Deep Learning (DL) models used to support inference in cancer biology.
The work focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability.
arXiv Detail & Related papers (2022-07-02T12:11:35Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.