TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced
Semantic Analysis
- URL: http://arxiv.org/abs/2012.15639v1
- Date: Thu, 31 Dec 2020 14:58:01 GMT
- Title: TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced
Semantic Analysis
- Authors: Haisong Zhang, Lemao Liu, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun
Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Jianchen Zhu, Xiao Feng, Tao
Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi
- Abstract summary: This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.
TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types.
Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems.
- Score: 61.28407236720969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technique report introduces TexSmart, a text understanding system that
supports fine-grained named entity recognition (NER) and enhanced semantic
analysis functionalities. Compared to most previous publicly available text
understanding systems and tools, TexSmart holds some unique features. First,
the NER function of TexSmart supports over 1,000 entity types, while most other
public tools typically support several to (at most) dozens of entity types.
Second, TexSmart introduces new semantic analysis functions like semantic
expansion and deep semantic representation, that are absent in most previous
systems. Third, a spectrum of algorithms (from very fast algorithms to those
that are relatively slow but more accurate) are implemented for one function in
TexSmart, to fulfill the requirements of different academic and industrial
applications. The adoption of unsupervised or weakly-supervised algorithms is
especially emphasized, with the goal of easily updating our models to include
fresh data with less human annotation efforts.
The main contents of this report include major functions of TexSmart,
algorithms for achieving these functions, how to use the TexSmart toolkit and
Web APIs, and evaluation results of some key algorithms.
Related papers
- Towards Semantic Markup of Mathematical Documents via User Interaction [0.0]
We present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing s macro definitions and parsing formulas with them.
We also present a GUI-based tool for the disambiguation of parse results and showcase its potential using a grammar for parsing untyped $lambda$-terms.
arXiv Detail & Related papers (2024-08-05T12:36:40Z) - GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models [58.08177466768262]
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks.
We introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously.
Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin.
arXiv Detail & Related papers (2024-06-20T17:57:51Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks [7.142786325863891]
Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations.
Recent studies have demonstrated that a large portion of Jupyter notebooks are undocumented and lacks a narrative structure.
This paper presents HeaderGen, a novel tool-based approach that automatically annotates code cells with categorical markdown headers.
arXiv Detail & Related papers (2023-01-11T11:57:52Z) - Gradient Backpropagation based Feature Attribution to Enable
Explainable-AI on the Edge [1.7338677787507768]
In this work, we analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference.
We develop a High-Level Synthesis (HLS) based FPGA design that is targeted for edge devices and supports three feature attribution algorithms.
Our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.
arXiv Detail & Related papers (2022-10-19T22:58:59Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition
in Virtual Assistants [10.500933545429202]
In intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error.
We describe a NER system intended to address these problems.
We show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.
arXiv Detail & Related papers (2021-08-15T00:14:47Z) - LaTeX-Numeric: Language-agnostic Text attribute eXtraction for
E-commerce Numeric Attributes [0.25782420501870296]
We present high-precision fully-automated scalable framework for extracting E-commerce numeric attributes from product text.
We propose a multi-task architecture to deal with missing labels in attribute data, leading to F1 improvement of 9.2% for numeric attributes over single-task architecture.
We propose an automated algorithm for alias creation using attribute values, leading to a 20.2% F1 improvement.
arXiv Detail & Related papers (2021-04-19T19:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.