Improving Human-Object Interaction Detection via Phrase Learning and
Label Composition
- URL: http://arxiv.org/abs/2112.07383v1
- Date: Tue, 14 Dec 2021 13:22:16 GMT
- Title: Improving Human-Object Interaction Detection via Phrase Learning and
Label Composition
- Authors: Zhimin Li, Cheng Zou, Yu Zhao, Boxun Li, Sheng Zhong
- Abstract summary: Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding.
We propose PhraseHOI, containing a HOI branch and a novel phrase branch, to leverage language prior and improve relation expression.
- Score: 14.483347746239055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interaction (HOI) detection is a fundamental task in high-level
human-centric scene understanding. We propose PhraseHOI, containing a HOI
branch and a novel phrase branch, to leverage language prior and improve
relation expression. Specifically, the phrase branch is supervised by semantic
embeddings, whose ground truths are automatically converted from the original
HOI annotations without extra human efforts. Meanwhile, a novel label
composition method is proposed to deal with the long-tailed problem in HOI,
which composites novel phrase labels by semantic neighbors. Further, to
optimize the phrase branch, a loss composed of a distilling loss and a balanced
triplet loss is proposed. Extensive experiments are conducted to prove the
effectiveness of the proposed PhraseHOI, which achieves significant improvement
over the baseline and surpasses previous state-of-the-art methods on Full and
NonRare on the challenging HICO-DET benchmark.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts [17.477542644785483]
Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages.
EA pipeline that jointly performs entity-level and Relation-level Alignment by neighbor triple matching strategy.
arXiv Detail & Related papers (2024-07-22T12:25:48Z) - Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency [37.99421732397288]
SSENE is built based on a generative pretrained language model (PLM) of-Decoder architecture with a multi-task learning framework.
We have constructed a high-quality Chinese dataset NegComment based on the users' reviews from the real-world platform of Meituan.
arXiv Detail & Related papers (2024-04-15T14:28:33Z) - A semantically enhanced dual encoder for aspect sentiment triplet
extraction [0.7291396653006809]
Aspect sentiment triplet extraction (ASTE) is a crucial subtask of aspect-based sentiment analysis (ABSA)
Previous research has focused on enhancing ASTE through innovative table-filling strategies.
We propose a framework that leverages both a basic encoder, primarily based on BERT, and a particular encoder comprising a Bi-LSTM network and graph convolutional network (GCN)
Experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our proposed framework.
arXiv Detail & Related papers (2023-06-14T09:04:14Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Entailment Relation Aware Paraphrase Generation [17.6146622291895]
We propose a reinforcement learning-based weakly-supervised paraphrasing system, ERAP, that can be trained using existing paraphrase and natural language inference corpora.
A combination of automated and human evaluations show that ERAP generates paraphrases conforming to the specified entailment relation.
arXiv Detail & Related papers (2022-03-20T08:02:09Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Training Bi-Encoders for Word Sense Disambiguation [4.149972584899897]
State-of-the-art approaches in Word Sense Disambiguation leverage lexical information along with pre-trained embeddings from these models to achieve results comparable to human inter-annotator agreement on standard evaluation benchmarks.
We further the state of the art in Word Sense Disambiguation through our multi-stage pre-training and fine-tuning pipeline.
arXiv Detail & Related papers (2021-05-21T06:06:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.