Related papers: Multi-environment Topic Models

Multi-environment Topic Models

URL: http://arxiv.org/abs/2410.24126v2
Date: Fri, 01 Nov 2024 01:49:56 GMT
Title: Multi-environment Topic Models
Authors: Dominic Sobhani, Amir Feder, David Blei,
Abstract summary: We introduce the Multi-environment Topic Model (MTM), an unsupervised probabilistic model that separates global and environment-specific terms. We show that the MTM produces interpretable global topics with distinct environment-specific words. It also enables the discovery of accurate causal effects.
Score: 8.609587510471943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Probabilistic topic models are a powerful tool for extracting latent themes from large text datasets. In many text datasets, we also observe per-document covariates (e.g., source, style, political affiliation) that act as environments that modulate a "global" (environment-agnostic) topic representation. Accurately learning these representations is important for prediction on new documents in unseen environments and for estimating the causal effect of topics on real-world outcomes. To this end, we introduce the Multi-environment Topic Model (MTM), an unsupervised probabilistic model that separates global and environment-specific terms. Through experimentation on various political content, from ads to tweets and speeches, we show that the MTM produces interpretable global topics with distinct environment-specific words. On multi-environment data, the MTM outperforms strong baselines in and out-of-distribution. It also enables the discovery of accurate causal effects.

Related papers

Capturing research literature attitude towards Sustainable Development Goals: an LLM-based topic modeling approach [0.7806050661713976]
The Sustainable Development Goals were formulated by the United Nations in 2015 to address these global challenges by 2030. Natural language processing techniques can help uncover discussions on SDGs within research literature. We propose a completely automated pipeline to fetch content from the Scopus database and prepare datasets dedicated to five groups of SDGs.
arXiv Detail & Related papers (2024-11-05T09:37:23Z)
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [68.92637077909693]
This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions.
arXiv Detail & Related papers (2024-08-05T15:16:22Z)
WorldGPT: Empowering LLM as Multimodal World Model [51.243464216500975]
We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM) WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains. We conduct evaluations on WorldNet, a multimodal state transition prediction benchmark.
arXiv Detail & Related papers (2024-04-28T14:42:02Z)
EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection [0.0]
EcoVerse is an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis.
arXiv Detail & Related papers (2024-04-08T01:21:11Z)
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models [25.047123247476016]
LITE is a large language model for environmental ecosystems modeling. It unifies different environmental variables by transforming them into natural language descriptions and line graph images. During this step, the incomplete features are imputed by a sparse Mixture-of-Experts framework.
arXiv Detail & Related papers (2024-04-01T15:14:07Z)
FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems [28.166089112650926]
FREE maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to a semantic recognition problem. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. Free is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa.
arXiv Detail & Related papers (2023-11-17T00:53:09Z)
Deep Generative Model for Simultaneous Range Error Mitigation and Environment Identification [29.827191184889898]
This paper proposes a deep generative model (DGM) for simultaneous range error mitigation and environment identification. Experiments on a general Ultra-wideband dataset demonstrate the superior performance on range error mitigation, scalability to different environments, and novel capability on simultaneous environment identification.
arXiv Detail & Related papers (2023-05-23T10:16:22Z)
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z)
Multi-Environment Pretraining Enables Transfer to Action Limited Datasets [129.24823721649028]
In reinforcement learning, available data of decision making is often not annotated with actions. We propose combining large but sparsely-annotated datasets from a emphtarget environment of interest with fully-annotated datasets from various other emphsource environments. We show that utilizing even one additional environment dataset of sequential labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences.
arXiv Detail & Related papers (2022-11-23T22:48:22Z)
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z)
DeepClimGAN: A High-Resolution Climate Data Generator [60.59639064716545]
Earth system models (ESMs) are often used to generate future projections of climate change scenarios. As a compromise, emulators are substantially less expensive but may not have all of the complexity of an ESM. Here we demonstrate the use of a conditional generative adversarial network (GAN) to act as an ESM emulator.
arXiv Detail & Related papers (2020-11-23T20:13:37Z)
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation. Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance? We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.