The Web unpacked: a quantitative analysis of global Web usage
- URL: http://arxiv.org/abs/2404.17095v2
- Date: Wed, 23 Oct 2024 17:00:38 GMT
- Title: The Web unpacked: a quantitative analysis of global Web usage
- Authors: Henrique S. Xavier,
- Abstract summary: We estimate the total web traffic and investigate its distribution among domains and industry sectors.
Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits.
Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.
- Score: 0.0
- License:
- Abstract: This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.
Related papers
- AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.
AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z) - Fuzzy Logic Approach For Visual Analysis Of Websites With K-means Clustering-based Color Extraction [0.0]
This paper examines the importance of website design aesthetics in enhancing user experience.
It emphasizes the significant impact of first impressions, often formed within 50 milliseconds, on users' perceptions of a website's appeal and usability.
We introduce a novel method for measuring website aesthetics based on color harmony and font popularity.
arXiv Detail & Related papers (2024-07-16T06:56:05Z) - WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models [65.18602126334716]
Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots.
We introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites.
We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups.
arXiv Detail & Related papers (2024-01-25T03:33:18Z) - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks [93.85005277463802]
VisualWebArena is a benchmark designed to assess the performance of multimodal web agents on realistic tasks.
To perform on this benchmark, agents need to accurately process image-text inputs, interpret natural language instructions, and execute actions on websites to accomplish user-defined objectives.
arXiv Detail & Related papers (2024-01-24T18:35:21Z) - Mind2Web: Towards a Generalist Agent for the Web [25.363429937913065]
Mind2Web is the first dataset for developing and evaluating generalist agents for the web.
With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains, Mind2Web provides three necessary ingredients for building generalist web agents.
Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents.
arXiv Detail & Related papers (2023-06-09T17:44:31Z) - Multimodal Web Navigation with Instruction-Finetuned Foundation Models [99.14209521903854]
We study data-driven offline training for web agents with vision-language foundation models.
We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages.
We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning.
arXiv Detail & Related papers (2023-05-19T17:44:34Z) - Leveraging Google's Publisher-specific IDs to Detect Website
Administration [3.936965297430477]
We propose a novel, graph-based methodology to detect administration of websites on the Web.
We apply our methodology across the top 1 million websites and study the characteristics of the created graphs of website administration.
Our findings show that approximately 90% of the websites are associated each with a single publisher, and that small publishers tend to manage less popular websites.
arXiv Detail & Related papers (2022-02-10T14:59:17Z) - Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An
Approach [115.91099791629104]
We construct two new benchmark webly supervised fine-grained datasets, WebFG-496 and WebiNat-5089, respectively.
For WebiNat-5089, it contains 5089 sub-categories and more than 1.1 million web training images, which is the largest webly supervised fine-grained dataset ever.
As a minor contribution, we also propose a novel webly supervised method (termed Peer-learning'') for benchmarking these datasets.
arXiv Detail & Related papers (2021-08-05T06:28:32Z) - Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan
Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 [0.0]
One of the factors that support online development is online buying and selling sites or Electronic Commerce.
Website or also commonly called the web is a form of media that can be interpreted as a collection of pages.
This study uses the Webqual 4.0 method which consists of 3 dimensions, namely usability, information quality and interaction quality on user satisfaction.
arXiv Detail & Related papers (2021-06-23T10:57:04Z) - A Large Visual, Qualitative and Quantitative Dataset of Web Pages [4.5002924206836]
We have created a large dataset of 49,438 Web pages.
It consists of visual, textual and numerical data types, includes all countries worldwide, and considers a broad range of topics.
arXiv Detail & Related papers (2021-05-15T01:31:25Z) - Open Domain Generalization with Domain-Augmented Meta-Learning [83.59952915761141]
We study a novel and practical problem of Open Domain Generalization (OpenDG)
We propose a Domain-Augmented Meta-Learning framework to learn open-domain generalizable representations.
Experiment results on various multi-domain datasets demonstrate that the proposed Domain-Augmented Meta-Learning (DAML) outperforms prior methods for unseen domain recognition.
arXiv Detail & Related papers (2021-04-08T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.