Related papers: TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

URL: http://arxiv.org/abs/2012.15639v1
Date: Thu, 31 Dec 2020 14:58:01 GMT
Title: TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis
Authors: Haisong Zhang, Lemao Liu, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Jianchen Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi
Abstract summary: This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems.
Score: 61.28407236720969
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. Compared to most previous publicly available text understanding systems and tools, TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems. Third, a spectrum of algorithms (from very fast algorithms to those that are relatively slow but more accurate) are implemented for one function in TexSmart, to fulfill the requirements of different academic and industrial applications. The adoption of unsupervised or weakly-supervised algorithms is especially emphasized, with the goal of easily updating our models to include fresh data with less human annotation efforts. The main contents of this report include major functions of TexSmart, algorithms for achieving these functions, how to use the TexSmart toolkit and Web APIs, and evaluation results of some key algorithms.

Related papers

BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks [3.9933521189187693]
We introduce BugSweeper, an end-to-end deep learning framework that detects vulnerabilities directly from the source code without manual engineering.<n>BugSweeper represents each Solidity function as a Function-Level Abstract Syntax Graph (FLAG), a novel graph that combines its Abstract Syntax Tree (AST) with enriched control-flow and data-flow semantics.<n>Our two-stage Graph Neural Network (GNN) filters noise from the syntax graphs, while the second-stage GNN conducts high-level reasoning to detect diverse vulnerabilities.
arXiv Detail & Related papers (2025-12-10T07:30:03Z)
$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement [53.14935624161711]
Vision-language models (VLMs) have achieved remarkable progress across a range of visual understanding tasks.<n>We propose $A2R2$: Advancing Img2La Conversion via Visual Reasoning with Attention-Guided Refinement.<n>For effective evaluation, we introduce a new dataset, Img2LaTex-Hard-1K, consisting of 1,100 carefully curated and challenging examples.
arXiv Detail & Related papers (2025-07-28T14:41:57Z)
LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID. It adopts prompt-tuning strategies to leverage attribute-based text knowledge. Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z)
Towards Semantic Markup of Mathematical Documents via User Interaction [0.0]
We present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing s macro definitions and parsing formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its potential using a grammar for parsing untyped $lambda$-terms.
arXiv Detail & Related papers (2024-08-05T12:36:40Z)
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models [58.08177466768262]
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. We introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin.
arXiv Detail & Related papers (2024-06-20T17:57:51Z)
Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs. In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations. We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks [7.142786325863891]
Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations. Recent studies have demonstrated that a large portion of Jupyter notebooks are undocumented and lacks a narrative structure. This paper presents HeaderGen, a novel tool-based approach that automatically annotates code cells with categorical markdown headers.
arXiv Detail & Related papers (2023-01-11T11:57:52Z)
Gradient Backpropagation based Feature Attribution to Enable Explainable-AI on the Edge [1.7338677787507768]
In this work, we analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference. We develop a High-Level Synthesis (HLS) based FPGA design that is targeted for edge devices and supports three feature attribution algorithms. Our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.
arXiv Detail & Related papers (2022-10-19T22:58:59Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants [10.500933545429202]
In intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. We describe a NER system intended to address these problems. We show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
arXiv Detail & Related papers (2021-08-15T00:14:47Z)
LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes [0.25782420501870296]
We present high-precision fully-automated scalable framework for extracting E-commerce numeric attributes from product text. We propose a multi-task architecture to deal with missing labels in attribute data, leading to F1 improvement of 9.2% for numeric attributes over single-task architecture. We propose an automated algorithm for alias creation using attribute values, leading to a 20.2% F1 improvement.
arXiv Detail & Related papers (2021-04-19T19:14:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.