Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs
- URL: http://arxiv.org/abs/2510.25370v1
- Date: Wed, 29 Oct 2025 10:41:03 GMT
- Title: Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs
- Authors: Alexander Sternfeld, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud, Julian Jang-Jaccard, Nathan Monnet,
- Abstract summary: We propose a novel, data-driven pipeline to monitor the emergence of transformative technologies.<n>Our approach leverages advances in Large Language Models (LLMs) to extract semantic triples from unstructured text.<n>The pipeline includes multi-stage filtering, domain-specific clustering, and a temporal trend analysis of topic co-occurence.
- Score: 35.70283902821063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Forecasting transformative technologies remains a critical but challenging task, particularly in fast-evolving domains such as Information and Communication Technologies (ICTs). Traditional expert-based methods struggle to keep pace with short innovation cycles and ambiguous early-stage terminology. In this work, we propose a novel, data-driven pipeline to monitor the emergence of transformative technologies by identifying patterns of technological convergence. Our approach leverages advances in Large Language Models (LLMs) to extract semantic triples from unstructured text and construct a large-scale graph of technology-related entities and relations. We introduce a new method for grouping semantically similar technology terms (noun stapling) and develop graph-based metrics to detect convergence signals. The pipeline includes multi-stage filtering, domain-specific keyword clustering, and a temporal trend analysis of topic co-occurence. We validate our methodology on two complementary datasets: 278,625 arXiv preprints (2017--2024) to capture early scientific signals, and 9,793 USPTO patent applications (2018-2024) to track downstream commercial developments. Our results demonstrate that the proposed pipeline can identify both established and emerging convergence patterns, offering a scalable and generalizable framework for technology forecasting grounded in full-text analysis.
Related papers
- Advances and Challenges in Semantic Textual Similarity: A Comprehensive Survey [0.0]
This survey reviews progress across six key areas: transformer-based models, contrastive learning, domain-focused solutions, multi-modal methods, graph-based approaches, and knowledge-enhanced techniques.<n>It aims to guide researchers and practitioners alike in navigating rapid advancements, highlighting emerging trends and future opportunities in the evolving field of Semantic Textual Similarity.
arXiv Detail & Related papers (2025-12-19T18:07:36Z) - Chunking Strategies for Multimodal AI Systems [0.0]
This survey provides a comprehensive taxonomy and technical analysis of chunking strategies tailored for each modality.<n>We examine classical and modern approaches such as fixed-size token windowing, object-centric visual chunking, silence-based audio segmentation, and scene detection in videos.<n>We explore emerging cross-modal chunking strategies that aim to preserve alignment and semantic consistency across disparate data types.
arXiv Detail & Related papers (2025-11-28T19:48:14Z) - METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark [48.78602579128459]
We introduce METER, a unified benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content.<n>Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations.
arXiv Detail & Related papers (2025-07-22T03:42:51Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [57.19302613163439]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - Understanding 6G through Language Models: A Case Study on LLM-aided Structured Entity Extraction in Telecom Domain [55.627646392044824]
This work proposes a novel language model-based information extraction technique, aiming to extract structured entities from the telecom context.<n>The proposed telecom structured entity extraction (TeleSEE) technique applies a token-efficient representation method to predict entity types and attribute keys, aiming to save the number of output tokens and improve prediction accuracy.
arXiv Detail & Related papers (2025-05-20T21:00:08Z) - SAFT: Structure-aware Transformers for Textual Interaction Classification [15.022958096869734]
Textual interaction networks (TINs) are an omnipresent data structure used to model the interplay between users and items on e-commerce websites, social networks, etc.<n>We propose SAFT, a new architecture that integrates language- and graph-based modules for the effective fusion of textual and structural semantics in the representation learning of interactions.
arXiv Detail & Related papers (2025-04-07T09:19:12Z) - Technology Mapping with Large Language Models [1.1900482352079937]
STARS (Semantic Technology and Retrieval System) is a novel framework that harnesses Large Language Models (LLMs) and Sentence-BERT.<n>It pinpoints relevant technologies within unstructured content, build comprehensive company profiles, and rank each firm's technologies according to their operational importance.<n> Experimental results show that STARS markedly boosts retrieval accuracy, offering a versatile and high-performance solution for cross-industry technology mapping.
arXiv Detail & Related papers (2025-01-25T08:18:15Z) - A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques [40.704014941800594]
Traditional methods fail to capture nuanced semantic differences between human and machine-generated content.<n>We propose a novel approach that combines a pre-trained DeBERTa-v3-large model, Bi-directional LSTMs, and linear attention pooling to capture both local and global semantic patterns.<n> Experimental results show that this approach works better than traditional methods, proving its usefulness for AI-generated text detection and other text comparison tasks.
arXiv Detail & Related papers (2025-01-24T07:07:37Z) - Finding frames with BERT: A transformer-based approach to generic news frame detection [0.0]
We introduce a transformer-based approach for generic news frame detection in Anglophone online content.
We discuss the composition of the training and test datasets, the model architecture, and the validation of the approach.
arXiv Detail & Related papers (2024-08-30T22:05:01Z) - Measuring Technological Convergence in Encryption Technologies with
Proximity Indices: A Text Mining and Bibliometric Analysis using OpenAlex [46.3643544723237]
This study identifies technological convergence among emerging technologies in cybersecurity.
The proposed method integrates text mining and bibliometric analyses to formulate and predict technological proximity indices.
Our case study findings highlight a significant convergence between blockchain and public-key cryptography, evidenced by the increasing proximity indices.
arXiv Detail & Related papers (2024-03-03T20:03:03Z) - Embedding in Recommender Systems: A Survey [54.55152033023537]
This survey presents a comprehensive analysis of advances in recommender system embedding techniques.<n>In matrix-based scenarios, collaborative filtering generates embeddings that effectively model user-item preferences.<n>We introduce emerging approaches, including AutoML, hashing techniques, and quantization methods, to enhance performance.
arXiv Detail & Related papers (2023-10-28T06:31:06Z) - Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale.
Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks.
Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.