Related papers: Dependent Latent Class Models

Related papers

LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z)
Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations [17.427722515310606]
We propose amortized inference of Structural Causal Models (SCMs) by training a single model on multiple datasets sampled from different SCMs.<n>We first use a transformer-based architecture for amortized learning of dataset embeddings, and then extend the Fixed-Point Approach (FiP) to infer SCMs conditionally on their dataset embeddings.<n>As a byproduct, our method can generate observational and interventional data from novel SCMs at inference time, without updating parameters.
arXiv Detail & Related papers (2024-10-08T15:31:33Z)
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models [49.1574468325115]
We show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs. We show that the reason for the difficulty of quantizing SSMs is caused by activation outliers, similar to those observed in transformer-based LLMs.
arXiv Detail & Related papers (2024-07-17T08:21:06Z)
Equivalence Set Restricted Latent Class Models (ESRLCM) [0.0]
We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM) This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models.
arXiv Detail & Related papers (2024-06-05T23:35:37Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
LLMs can learn self-restraint through iterative self-reflection [57.26854891567574]
Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach. We devise a utility function that can encourage the model to produce responses only when it is confident in them.
arXiv Detail & Related papers (2024-05-15T13:35:43Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
Induced Model Matching: How Restricted Models Can Help Larger Ones [1.7676816383911753]
We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model. How can the restricted model be useful to the full model? We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's.
arXiv Detail & Related papers (2024-02-19T20:21:09Z)
Latent class analysis with weighted responses [0.0]
We propose a novel generative model, the weighted latent class model (WLCM) Our model allows data's response matrix to be generated from an arbitrary distribution with a latent class structure. We investigate the identifiability of the model and propose an efficient algorithm for estimating the latent classes and other model parameters.
arXiv Detail & Related papers (2023-10-17T04:16:20Z)
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z)
Attitudes and Latent Class Choice Models using Machine learning [0.0]
We present a method of efficiently incorporating attitudinal indicators in the specification of Latent Class Choice Models (LCCM) This formulation overcomes structural equations in its capability of exploring the relationship between the attitudinal indicators and the decision choice. We test our proposed framework for estimating a Car-Sharing (CS) service subscription choice with stated preference data from Copenhagen, Denmark.
arXiv Detail & Related papers (2023-02-20T10:03:01Z)
The Ordered Matrix Dirichlet for Modeling Ordinal Dynamics [54.96229007229786]
We propose the Ordered Matrix Dirichlet (OMD) to map latent states to observed action types. Models built on the OMD recover interpretable latent states and show superior forecasting performance in few-shot settings.
arXiv Detail & Related papers (2022-12-08T08:04:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.