AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
- URL: http://arxiv.org/abs/2410.00655v1
- Date: Tue, 1 Oct 2024 13:13:15 GMT
- Title: AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
- Authors: Maria Khodorchenko, Nikolay Butakov, Maxim Zuev, Denis Nasonov,
- Abstract summary: AutoTM 2.0 is a framework for optimizing additively regularized topic models.
We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM.
- Score: 0.17999333451993949
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.
Related papers
- AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning [79.65732142949014]
Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories.<n>Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets.<n>We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories.
arXiv Detail & Related papers (2025-12-15T12:38:04Z) - AutoSynth: Automated Workflow Optimization for High-Quality Synthetic Dataset Generation via Monte Carlo Tree Search [19.631058407921728]
Supervised fine-tuning (SFT) of large language models (LLMs) for specialized tasks requires high-quality datasets.<n>Existing automated workflow methods face a cold start problem: they require labeled datasets for reward modeling.<n>We introduce Auto Synth, a framework that automates workflow discovery and optimization without reference datasets.
arXiv Detail & Related papers (2025-11-12T17:02:03Z) - AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving [0.17205106391379024]
AutoOpt-11k dataset is a unique image dataset of over 11,000 single-objective, multi-objective, and handwritten mathematical optimization problems.<n>The dataset is created by 25 experts to avoid errors in data creation.<n>We develop AutoOpt, a machine learning based automated approach for optimization problems.
arXiv Detail & Related papers (2025-10-24T13:14:53Z) - Automated Algorithm Design for Auto-Tuning Optimizers [0.3459227740065624]
We introduce a new paradigm: using large language models to automatically generate optimization algorithms tailored to auto-tuning problems.<n>We evaluate these algorithms on four real-world auto-tuning applications across six hardware platforms.<n>Our best-performing generated optimization algorithms achieve, on average, 72.4% improvement over state-of-the-art parameters for auto-tuning.
arXiv Detail & Related papers (2025-10-19T09:38:15Z) - AutoMaAS: Self-Evolving Multi-Agent Architecture Search for Large Language Models [4.720605681761044]
AutoMaAS is a self-evolving multi-agent architecture search framework.<n>It uses neural architecture search principles to automatically discover optimal agent configurations.<n>It achieves 1.0-7.1% performance improvement and reduces inference costs by 3-5% compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-10-03T01:57:07Z) - Auto-nnU-Net: Towards Automated Medical Image Segmentation [14.342326020477723]
Medical Image Decathlon (MIS) includes diverse tasks, from bone to organ segmentation, each with its own challenges in finding best segmentation model.<n>The state-of-the-art AutoML-related MIS-framework nnU-Net automates many aspects of model configuration.<n>We propose AutonnU-Net, a novel nnU-Net variant enabling hyper parameter optimization (HPO), neural architecture search (NAS), and hierarchical NAS.
arXiv Detail & Related papers (2025-05-22T11:52:16Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models [79.84205827056907]
We present Self-Steering Optimization ($SSO$), an algorithm that autonomously generates high-quality preference data.<n>$SSO$ employs a specialized optimization objective to build a data generator from the policy model itself, which is used to produce accurate and on-policy data.<n>Our evaluation shows that $SSO$ consistently outperforms baselines in human preference alignment and reward optimization.
arXiv Detail & Related papers (2024-10-22T16:04:03Z) - Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation [18.077562738603792]
We propose an approach that leverages the best of both worlds.
We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems.
We then use this analysis to curate a new dataset, MT-Pref, which comprises 18k instances covering 18 language directions.
arXiv Detail & Related papers (2024-10-10T10:09:54Z) - FeatNavigator: Automatic Feature Augmentation on Tabular Data [29.913561808461612]
FeatNavigator is a framework that explores and integrates high-quality features in relational tables for machine learning (ML) models.
We show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance.
arXiv Detail & Related papers (2024-06-13T18:44:48Z) - AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - AutoMix: Automatically Mixing Language Models [62.51238143437967]
Large language models (LLMs) are now available from cloud API providers in various sizes and configurations.
We present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM.
arXiv Detail & Related papers (2023-10-19T17:57:39Z) - Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and
Robust AutoDL [53.40030379661183]
Auto-PyTorch is a framework to enable fully automated deep learning (AutoDL)
It combines multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs)
We show that Auto-PyTorch performs better than several state-of-the-art competitors on average.
arXiv Detail & Related papers (2020-06-24T15:15:17Z) - AutoFIS: Automatic Feature Interaction Selection in Factorization Models
for Click-Through Rate Prediction [75.16836697734995]
We propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS)
AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence.
AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service.
arXiv Detail & Related papers (2020-03-25T06:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.