Related papers: State of AI: An Empirical 100 Trillion Token Study with OpenRouter

State of AI: An Empirical 100 Trillion Token Study with OpenRouter

URL: http://arxiv.org/abs/2601.10088v1
Date: Thu, 15 Jan 2026 05:28:39 GMT
Title: State of AI: An Empirical 100 Trillion Token Study with OpenRouter
Authors: Malika Aubakirova, Alex Atallah, Chris Clark, Justin Summerville, Anjney Midha,
Abstract summary: We use the Open platform, an AI inference provider, to analyze over 100 trillion tokens of real-world LLM interactions.<n>We observe substantial adoption of open-weight models, the outsized popularity of creative roleplay, and the rise of agentic inference.<n>Our retention analysis identifies cohorts: early users whose engagement persists far longer than later cohorts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella "Glass Slipper" effect. These findings underscore that the way developers and end-users engage with LLMs "in the wild" is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems.

Related papers

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z)
On Path to Multimodal Generalist: General-Level and General-Bench [153.9720740167528]
This project introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality.<n>At the core of the framework is the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation.<n>The evaluation results that involve over 100 existing state-of-the-art MLLMs uncover the capability rankings of generalists.
arXiv Detail & Related papers (2025-05-07T17:59:32Z)
Large Action Models: From Inception to Implementation [51.81485642442344]
Large Action Models (LAMs) are designed for action generation and execution within dynamic environments.<n>LAMs hold the potential to transform AI from passive language understanding to active task completion.<n>We present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment.
arXiv Detail & Related papers (2024-12-13T11:19:56Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
An Empirical Study on Challenges for LLM Application Developers [28.69628251749012]
We crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum.<n>After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers.
arXiv Detail & Related papers (2024-08-06T05:46:28Z)
A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z)
CogBench: a large language model walks into a psychology lab [12.981407327149679]
This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. We apply CogBench to 35 large language models (LLMs) and analyze this data using statistical multilevel modeling techniques. We find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior.
arXiv Detail & Related papers (2024-02-28T10:43:54Z)
Large Language Model based Multi-Agents: A Survey of Progress and Challenges [44.92286030322281]
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation.
arXiv Detail & Related papers (2024-01-21T23:36:14Z)
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey [18.930417261395906]
Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents. This article offers a survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs.
arXiv Detail & Related papers (2023-11-21T04:59:17Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.