$\texttt{InfoHier}$: Hierarchical Information Extraction via Encoding and Embedding
- URL: http://arxiv.org/abs/2501.08717v1
- Date: Wed, 15 Jan 2025 10:58:32 GMT
- Title: $\texttt{InfoHier}$: Hierarchical Information Extraction via Encoding and Embedding
- Authors: Tianru Zhang, Li Ju, Prashant Singh, Salman Toor,
- Abstract summary: $texttInfoHier$ is a framework for learning robust latent representations and hierarchical structures.<n>It provides adaptive representations, enhancing HC's ability to capture complex patterns.<n>It integrates HC loss to refine SSL training, resulting in representations more attuned to the underlying information hierarchy.
- Score: 0.7499722271664147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing large-scale datasets, especially involving complex and high-dimensional data like images, is particularly challenging. While self-supervised learning (SSL) has proven effective for learning representations from unlabelled data, it typically focuses on flat, non-hierarchical structures, missing the multi-level relationships present in many real-world datasets. Hierarchical clustering (HC) can uncover these relationships by organizing data into a tree-like structure, but it often relies on rigid similarity metrics that struggle to capture the complexity of diverse data types. To address these we envision $\texttt{InfoHier}$, a framework that combines SSL with HC to jointly learn robust latent representations and hierarchical structures. This approach leverages SSL to provide adaptive representations, enhancing HC's ability to capture complex patterns. Simultaneously, it integrates HC loss to refine SSL training, resulting in representations that are more attuned to the underlying information hierarchy. $\texttt{InfoHier}$ has the potential to improve the expressiveness and performance of both clustering and representation learning, offering significant benefits for data analysis, management, and information retrieval.
Related papers
- Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning [8.889313669713918]
We propose a novel contrastive learning framework for heterogeneous graphs (ASHGCL)
ASHGCL incorporates three distinct views, each focusing on node attributes, high-order and low-order structural information, respectively.
We introduce an attribute-enhanced positive sample selection strategy that combines both structural information and attribute information.
arXiv Detail & Related papers (2025-03-18T05:15:21Z) - HyperG: Hypergraph-Enhanced LLMs for Structured Knowledge [25.279158571663036]
HyperG is a hypergraph-based generation framework aimed at enhancing Large Language Models' ability to process structured knowledge.
Specifically, HyperG first augments sparse data with contextual information, and incorporate a prompt-attentive hypergraph learning network to encode both the augmented information and the intricate structural relationships within the data.
To validate the effectiveness and generalization of HyperG, we conduct extensive experiments across two different downstream tasks requiring structured knowledge.
arXiv Detail & Related papers (2025-02-25T11:47:32Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Struct-X: Enhancing Large Language Models Reasoning with Structured Data [38.558614152006975]
Struct-X operates through five key phases: read-model-fill-reflect-reason''
It encodes structured data into a topological space using graph embeddings.
It fills in missing entity information with knowledge retrieval modules.
The final phase involves constructing a topological network with selected tokens.
arXiv Detail & Related papers (2024-07-17T13:06:25Z) - SLRL: Structured Latent Representation Learning for Multi-view Clustering [24.333292079699554]
Multi-View Clustering (MVC) aims to exploit the inherent consistency and complementarity among different views to improve clustering outcomes.
Despite extensive research in MVC, most existing methods focus predominantly on harnessing complementary information across views to enhance clustering effectiveness.
We introduce a novel framework, termed Structured Latent Representation Learning based Multi-View Clustering method.
arXiv Detail & Related papers (2024-07-11T09:43:57Z) - How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model [4.215221129670858]
We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations.
We quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
arXiv Detail & Related papers (2024-04-16T17:01:27Z) - GraphEdit: Large Language Models for Graph Structure Learning [62.618818029177355]
Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data.
Existing GSL methods heavily depend on explicit graph structural information as supervision signals.
We propose GraphEdit, an approach that leverages large language models (LLMs) to learn complex node relationships in graph-structured data.
arXiv Detail & Related papers (2024-02-23T08:29:42Z) - Scalable Incomplete Multi-View Clustering with Structure Alignment [71.62781659121092]
In this paper, we propose a novel incomplete anchor graph learning framework.
We construct the view-specific anchor graph to capture the complementary information from different views.
The time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples.
arXiv Detail & Related papers (2023-08-31T08:30:26Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.