Related papers: HTLM: Hyper-Text Pre-Training and Prompting of Language Models

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

URL: http://arxiv.org/abs/2107.06955v1
Date: Wed, 14 Jul 2021 19:39:31 GMT
Title: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Authors: Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer
Abstract summary: We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. We find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs.
Score: 52.32659647159799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research.

Related papers

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling [46.60911294356232]
We introduce Text-Aligned Speech Tokenization and Embedding (TASTE) TASTE is a method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage. We conduct extensive experiments and show that TASTE can preserve essential paralinguistic information while dramatically reducing the token sequence length.
arXiv Detail & Related papers (2025-04-09T17:14:33Z)
Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z)
TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models [14.411646409316624]
We introduce textbfHierarchical textbfText-textbfFree textbfAlignment (textbfTS-HTFA), a novel method for time-series forecasting. We replace paired text data with adaptive virtual text based on QR decomposition word embeddings and learnable prompt. Experiments on multiple time-series benchmarks demonstrate that HTFA achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-23T12:57:24Z)
Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification [0.0]
Generative AI offers a simple, prompt-based alternative to fine-tuning smaller BERT-style LLMs for text classification tasks. We show that smaller, fine-tuned LLMs consistently and significantly outperform larger, zero-shot prompted models in text classification.
arXiv Detail & Related papers (2024-06-12T21:46:13Z)
5W1H Extraction With Large Language Models [27.409473072672277]
The extraction of essential news elements through the 5W1H framework is critical for event extraction and text summarization. ChatGPT has encountered challenges in processing longer news texts and analyzing specific attributes in context. We design several strategies from zero-shot/few-shot prompting to efficient fine-tuning to conduct 5W1H aspects extraction from the original news documents.
arXiv Detail & Related papers (2024-05-25T09:42:58Z)
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis [69.15016747150868]
We introduce WebAgent, an agent that learns from self-experience to complete tasks on real websites. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks.
arXiv Detail & Related papers (2023-07-24T14:56:30Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Understanding HTML with Large Language Models [73.92747433749271]
Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks. We show that LLMs pretrained on standard natural language corpora transfer remarkably well to HTML understanding tasks.
arXiv Detail & Related papers (2022-10-08T07:27:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.