StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching
- URL: http://arxiv.org/abs/2509.02033v1
- Date: Tue, 02 Sep 2025 07:21:36 GMT
- Title: StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching
- Authors: Chao Xue, Ziyuan Gao,
- Abstract summary: StructCoh is a graph-enhanced contrastive learning framework.<n>A hierarchical contrastive objective enforces consistency at multiple granularities.<n>Experiments on three legal document matching benchmarks and academic plagiarism detection datasets demonstrate significant improvements.
- Score: 10.000850856259866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text semantic matching requires nuanced understanding of both structural relationships and fine-grained semantic distinctions. While pre-trained language models excel at capturing token-level interactions, they often overlook hierarchical structural patterns and struggle with subtle semantic discrimination. In this paper, we proposed StructCoh, a graph-enhanced contrastive learning framework that synergistically combines structural reasoning with representation space optimization. Our approach features two key innovations: (1) A dual-graph encoder constructs semantic graphs via dependency parsing and topic modeling, then employs graph isomorphism networks to propagate structural features across syntactic dependencies and cross-document concept nodes. (2) A hierarchical contrastive objective enforces consistency at multiple granularities: node-level contrastive regularization preserves core semantic units, while graph-aware contrastive learning aligns inter-document structural semantics through both explicit and implicit negative sampling strategies. Experiments on three legal document matching benchmarks and academic plagiarism detection datasets demonstrate significant improvements over state-of-the-art methods. Notably, StructCoh achieves 86.7% F1-score (+6.2% absolute gain) on legal statute matching by effectively identifying argument structure similarities.
Related papers
- Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs [0.0]
In traditional generative grammar, systematic contrasts in grammaticality such as subject-auxiliary inversion and the licensing of parasitic gaps are taken as evidence for an internal, hierarchical grammar.<n>We test whether large language models (LLMs), trained only on surface forms, reproduce these contrasts in ways that imply an underlying structural representation.
arXiv Detail & Related papers (2025-12-11T09:17:35Z) - Counting trees: A treebank-driven exploration of syntactic variation in speech and writing across languages [0.0]
We define syntactic structures as delexicalized dependency (sub)trees and extract them from spoken and written Universal Dependencies treebanks.<n>For each corpus, we analyze the size, diversity, and distribution of syntactic inventories, their overlap across modalities, and the structures most characteristic of speech.<n>Results show that, across both languages, spoken corpora contain fewer and less diverse syntactic structures than their written counterparts.
arXiv Detail & Related papers (2025-05-28T18:43:26Z) - DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse
Relation Recognition [27.977742959064916]
We propose a prompt-based path prediction method to utilize the interactive information and intrinsic senses among the hierarchy in IDRR.
This is the first work that injects such structure information into pre-trained language models via prompt tuning.
arXiv Detail & Related papers (2023-05-06T08:16:07Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Syntactic Substitutability as Unsupervised Dependency Syntax [31.488677474152794]
We model a more general property implicit in the definition of dependency relations, syntactic substitutability.
This property captures the fact that words at either end of a dependency can be substituted with words from the same category.
We show that increasing the number of substitutions used improves parsing accuracy on natural data.
arXiv Detail & Related papers (2022-11-29T09:01:37Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation [60.62039705180484]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.<n> Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Transformer-based Dual Relation Graph for Multi-label Image Recognition [56.12543717723385]
We propose a novel Transformer-based Dual Relation learning framework.
We explore two aspects of correlation, i.e., structural relation graph and semantic relation graph.
Our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks.
arXiv Detail & Related papers (2021-10-10T07:14:52Z) - A Self-supervised Representation Learning of Sentence Structure for
Authorship Attribution [3.5991811164452923]
We propose a self-supervised framework for learning structural representations of sentences.
We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task.
arXiv Detail & Related papers (2020-10-14T02:57:10Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.