Related papers: StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation

StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation

URL: http://arxiv.org/abs/2507.13377v1
Date: Tue, 15 Jul 2025 01:01:29 GMT
Title: StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation
Authors: Zhenglin Pan, Haoran Xie,
Abstract summary: We propose StructInbet, an inbetweening system designed to generate controllable transitions over explicit structural guidance.<n>We adopt a temporal attention mechanism that incorporates visual identity from both the preceding and succeeding transitions, ensuring consistency in character appearance.
Score: 3.528466385159056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose StructInbet, an inbetweening system designed to generate controllable transitions over explicit structural guidance. StructInbet introduces two key contributions. First, we propose explicit structural guidance to the inbetweening problem to reduce the ambiguity inherent in pixel trajectories. Second, we adopt a temporal attention mechanism that incorporates visual identity from both the preceding and succeeding keyframes, ensuring consistency in character appearance.

Related papers

ffstruc2vec: Flat, Flexible and Scalable Learning of Node Representations from Structural Identities [0.0]
This paper introduces ffstruc2vec, a scalable deep-learning framework for learning node embedding vectors that preserve structural identities.<n>Its flat, efficient architecture allows high flexibility in capturing diverse types of structural patterns, enabling broad adaptability to various downstream application tasks.<n>The proposed framework significantly outperforms existing approaches across diverse unsupervised and supervised tasks in practical applications.
arXiv Detail & Related papers (2025-04-01T18:47:16Z)
"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
Composing or Not Composing? Towards Distributional Construction Grammars [47.636049672406145]
Building the meaning of a linguistic utterance is incremental, step-by-step, based on a compositional process.<n>It is therefore necessary to propose a framework bringing together both approaches.<n>We present an approach based on Construction Grammars and completing this framework in order to account for these different mechanisms.
arXiv Detail & Related papers (2024-12-10T11:17:02Z)
Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge. It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z)
Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks. We introduce a characterization of compositional structures in terms of "interaction decompositions" We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z)
Learning Correlation Structures for Vision Transformers [93.22434535223587]
We introduce a new attention mechanism, dubbed structural self-attention (StructSA) We generate attention maps by recognizing space-time structures of key-query correlations via convolution. This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations.
arXiv Detail & Related papers (2024-04-05T07:13:28Z)
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation [82.88378582161717]
State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction. We present ViStruct, a training framework to learn VLMs for effective visual structural knowledge extraction.
arXiv Detail & Related papers (2023-11-22T09:23:34Z)
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure [5.2869308707704255]
StrAE is a Structured Autoencoder framework that through strict adherence to explicit structure, enables effective learning of multi-level representations. We show that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm.
arXiv Detail & Related papers (2023-05-09T16:20:48Z)
Learning to Incorporate Structure Knowledge for Image Inpainting [20.93448933499842]
This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting. The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures. In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process.
arXiv Detail & Related papers (2020-02-11T02:22:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.