Latent Variable Models in the Era of Industrial Big Data: Extension and
Beyond
- URL: http://arxiv.org/abs/2208.10847v1
- Date: Tue, 23 Aug 2022 09:58:37 GMT
- Title: Latent Variable Models in the Era of Industrial Big Data: Extension and
Beyond
- Authors: Xiangyin Kong, Xiaoyu Jiang, Bingxin Zhang, Jinsong Yuan, Zhiqiang Ge
- Abstract summary: latent variable models (LVMs) and their counterparts account for a major share and play a vital role in many industrial modeling areas.
We propose a novel concept called lightweight deep LVM (LDLVM)
- Score: 7.361977372607915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A rich supply of data and innovative algorithms have made data-driven
modeling a popular technique in modern industry. Among various data-driven
methods, latent variable models (LVMs) and their counterparts account for a
major share and play a vital role in many industrial modeling areas. LVM can be
generally divided into statistical learning-based classic LVM and neural
networks-based deep LVM (DLVM). We first discuss the definitions, theories and
applications of classic LVMs in detail, which serves as both a comprehensive
tutorial and a brief application survey on classic LVMs. Then we present a
thorough introduction to current mainstream DLVMs with emphasis on their
theories and model architectures, soon afterwards provide a detailed survey on
industrial applications of DLVMs. The aforementioned two types of LVM have
obvious advantages and disadvantages. Specifically, classic LVMs have concise
principles and good interpretability, but their model capacity cannot address
complicated tasks. Neural networks-based DLVMs have sufficient model capacity
to achieve satisfactory performance in complex scenarios, but it comes at
sacrifices in model interpretability and efficiency. Aiming at combining the
virtues and mitigating the drawbacks of these two types of LVMs, as well as
exploring non-neural-network manners to build deep models, we propose a novel
concept called lightweight deep LVM (LDLVM). After proposing this new idea, the
article first elaborates the motivation and connotation of LDLVM, then provides
two novel LDLVMs, along with thorough descriptions on their principles,
architectures and merits. Finally, outlooks and opportunities are discussed,
including important open questions and possible research directions.
Related papers
- GenRecal: Generation after Recalibration from Large to Small Vision-Language Models [63.27511432647797]
Vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V.<n>Recent advancements in vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V.
arXiv Detail & Related papers (2025-06-18T17:59:49Z) - From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? [62.58235852194057]
Transformer-based models have gained increasing attention in time series research.<n>As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction.
arXiv Detail & Related papers (2025-05-29T22:05:28Z) - Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives [36.297745473653166]
Vision-language modeling (VLM) aims to bridge the information gap between images and natural language.<n>Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote sensing domain has made significant progress.
arXiv Detail & Related papers (2025-05-20T13:47:40Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks [60.5257456681402]
We build universal embedding models capable of handling a wide range of downstream tasks.
Our contributions are twofold: (1) MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training and 16 evaluation datasets, and (2) VLM2Vec (Vision-Language Model -> Vector), a contrastive training framework that converts any state-of-the-art vision-language model into an embedding model via training on MMEB.
arXiv Detail & Related papers (2024-10-07T16:14:05Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models [42.182009352159]
We present a new efficient LLVM, Mamba-based traversal of rationales (Meteor)
To embed lengthy rationales containing abundant information, we employ the Mamba architecture, capable of processing sequential data with linear time complexity.
Subsequently, the backbone multimodal language model (MLM) is trained to generate answers with the aid of rationale.
arXiv Detail & Related papers (2024-05-24T14:04:03Z) - Local Binary and Multiclass SVMs Trained on a Quantum Annealer [0.8399688944263844]
In the last years, with the advent of working quantum annealers, hybrid SVM models characterised by quantum training and classical execution have been introduced.
These models have demonstrated comparable performance to their classical counterparts.
However, they are limited in the training set size due to the restricted connectivity of the current quantum annealers.
arXiv Detail & Related papers (2024-03-13T14:37:00Z) - MoAI: Mixture of All Intelligence for Large Language and Vision Models [42.182009352159]
Mixture of All Intelligence (MoAI) is an instruction-tuned large language and vision model (LLVM)
MoAI uses auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.
MoAI significantly outperforms both open-source and closed-source LLVMs in numerous zero-shot vision language (VL) tasks.
arXiv Detail & Related papers (2024-03-12T10:44:13Z) - Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models [87.47400128150032]
We propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.
Lumen first promotes fine-grained vision-language concept alignment.
Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders.
arXiv Detail & Related papers (2024-03-12T04:13:45Z) - Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions [11.786387517781328]
Vision-Language Models (VLMs) are advanced models that can tackle more intricate tasks such as image captioning and visual question answering.
Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.
We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible.
arXiv Detail & Related papers (2024-02-20T18:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.