An attempt to generate new bridge types from latent space of PixelCNN
- URL: http://arxiv.org/abs/2401.05964v1
- Date: Thu, 11 Jan 2024 15:06:25 GMT
- Title: An attempt to generate new bridge types from latent space of PixelCNN
- Authors: Hongjun Zhang
- Abstract summary: PixelCNN can capture the statistical structure of the images and calculate the probability distribution of the next pixel.
From the obtained latent space sampling, new bridge types different from the training dataset can be generated.
- Score: 2.05750372679553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Try to generate new bridge types using generative artificial intelligence
technology. Using symmetric structured image dataset of three-span beam bridge,
arch bridge, cable-stayed bridge and suspension bridge , based on Python
programming language, TensorFlow and Keras deep learning platform framework ,
PixelCNN is constructed and trained. The model can capture the statistical
structure of the images and calculate the probability distribution of the next
pixel when the previous pixels are given. From the obtained latent space
sampling, new bridge types different from the training dataset can be
generated. PixelCNN can organically combine different structural components on
the basis of human original bridge types, creating new bridge types that have a
certain degree of human original ability. Autoregressive models cannot
understand the meaning of the sequence, while multimodal models combine
regression and autoregressive models to understand the sequence. Multimodal
models should be the way to achieve artificial general intelligence in the
future.
Related papers
- Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images.
Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images.
ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z) - PixelBytes: Catching Unified Representation for Multimodal Generation [0.0]
PixelBytes is an approach for unified multimodal representation learning.
We explore integrating text, audio, action-state, and pixelated images (sprites) into a cohesive representation.
We conducted experiments on a PixelBytes Pokemon dataset and an Optimal-Control dataset.
arXiv Detail & Related papers (2024-09-16T09:20:13Z) - An attempt to generate new bridge types from latent space of denoising
diffusion Implicit model [2.05750372679553]
Use denoising diffusion implicit model for bridge-type innovation.
Process of adding noise and denoising to an image can be likened to the process of a corpse rotting and a detective restoring the scene of a victim being killed, to help beginners understand.
arXiv Detail & Related papers (2024-02-11T08:54:37Z) - An attempt to generate new bridge types from latent space of
energy-based model [2.05750372679553]
Train energy function on symmetric structured image dataset of three span beam bridge, arch bridge, cable-stayed bridge, and suspension bridge.
Langevin dynamics technology to generate a new sample with low energy value.
arXiv Detail & Related papers (2024-01-31T08:21:35Z) - An attempt to generate new bridge types from latent space of generative
adversarial network [2.05750372679553]
Symmetric structured image dataset of three-span beam bridge, arch bridge, cable-stayed bridge and suspension bridge are used.
Based on Python programming language, and Keras deep learning platform framework, generative adversarial network is constructed and trained.
arXiv Detail & Related papers (2024-01-01T08:46:29Z) - An attempt to generate new bridge types from latent space of variational
autoencoder [2.05750372679553]
Variational autoencoder can combine two bridge types on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in bridge-type innovation, and can be used as copilot.
arXiv Detail & Related papers (2023-11-02T08:18:37Z) - BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning [91.93547262073213]
Vision-Language (VL) models with the Two-Tower architecture have dominated visual representation learning in recent years.
We propose BridgeTower, which introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder.
BridgeTower achieves an accuracy of 78.73%, outperforming the previous state-of-the-art model METER by 1.09% with the same pre-training data.
arXiv Detail & Related papers (2022-06-17T09:42:35Z) - Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - WenLan: Bridging Vision and Language by Large-Scale Multi-Modal
Pre-Training [71.37731379031487]
We propose a two-tower pre-training model called BriVL within the cross-modal contrastive learning framework.
Unlike OpenAI CLIP that adopts a simple contrastive learning method, we devise a more advanced algorithm by adapting the latest method MoCo into the cross-modal scenario.
By building a large queue-based dictionary, our BriVL can incorporate more negative samples in limited GPU resources.
arXiv Detail & Related papers (2021-03-11T09:39:49Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.