Learning Context-Aware Representations of Subtrees
- URL: http://arxiv.org/abs/2111.04308v1
- Date: Mon, 8 Nov 2021 07:43:14 GMT
- Title: Learning Context-Aware Representations of Subtrees
- Authors: Cedric Cook
- Abstract summary: This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification.
We hypothesise that the context around the element inside the web page is of high value to the problem and is currently under exploited.
This thesis aims to solve the problem of classifying web elements as subtrees of a DOM tree by also considering their context.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This thesis tackles the problem of learning efficient representations of
complex, structured data with a natural application to web page and element
classification. We hypothesise that the context around the element inside the
web page is of high value to the problem and is currently under exploited. This
thesis aims to solve the problem of classifying web elements as subtrees of a
DOM tree by also considering their context.
To achieve this, first we discuss current expert knowledge systems that work
on structures, such as Tree-LSTM. Then, we propose context-aware extensions to
this model. We show that the new model achieves an average F1-score of 0.7973
on a multi-class web classification task. This model generates better
representations for various subtrees and may be used for applications such
element classification, state estimators in reinforcement learning over the Web
and more.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Learning Hierarchical Prompt with Structured Linguistic Knowledge for
Vision-Language Models [43.56153167864033]
We propose a novel approach to harnessing structured knowledge in large language models (LLMs)
We introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.
In addition, by incorporating high-level and global-level prompts, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships.
arXiv Detail & Related papers (2023-12-11T12:14:06Z) - PLM-GNN: A Webpage Classification Method based on Joint Pre-trained
Language Model and Graph Neural Network [19.75890828376791]
We propose a representation and classification method based on a pre-trained language model and graph neural network, named PLM-GNN.
It is based on the joint encoding of text and HTML DOM trees in the web pages. It performs well on the KI-04 and SWDE datasets and on practical dataset AHS for the project of scholar's homepage crawling.
arXiv Detail & Related papers (2023-05-09T12:19:10Z) - Visualization Of Class Activation Maps To Explain AI Classification Of
Network Packet Captures [0.0]
The number of connections and the addition of new applications in our networks causes a vast amount of log data.
Deep learning methods provide both feature extraction and classification from data in a single system.
We present a visual interactive tool that combines the classification of network data with an explanation technique to form an interface between experts, algorithms, and data.
arXiv Detail & Related papers (2022-09-05T16:34:43Z) - Tree Structure-Aware Few-Shot Image Classification via Hierarchical
Aggregation [27.868736254566397]
We focus on how to learn additional feature representations for few-shot image classification through pretext tasks.
This additional knowledge can further improve the performance of few-shot learning.
We present a plug-in Hierarchical Tree Structure-aware (HTS) method, which learns the relationship of FSL and pretext tasks.
arXiv Detail & Related papers (2022-07-14T15:17:19Z) - Active Predictive Coding Networks: A Neural Solution to the Problem of
Learning Reference Frames and Part-Whole Hierarchies [1.5990720051907859]
We introduce Active Predictive Coding Networks (APCNs)
APCNs are a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling.
We demonstrate that APCNs can (a) learn to parse images into part-whole hierarchies, (b) learn compositional representations, and (c) transfer their knowledge to unseen classes of objects.
arXiv Detail & Related papers (2022-01-14T21:22:48Z) - CoVA: Context-aware Visual Attention for Webpage Information Extraction [65.11609398029783]
We propose to reformulate WIE as a context-aware Webpage Object Detection task.
We develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree.
We show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T00:21:46Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Neural Entity Linking: A Survey of Models Based on Deep Learning [82.43751915717225]
This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015.
Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks.
The survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models.
arXiv Detail & Related papers (2020-05-31T18:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.