"Way back then": A Data-driven View of 25+ years of Web Evolution
- URL: http://arxiv.org/abs/2202.08239v1
- Date: Wed, 16 Feb 2022 18:36:03 GMT
- Title: "Way back then": A Data-driven View of 25+ years of Web Evolution
- Authors: Vibhor Agarwal, Nishanth Sastry
- Abstract summary: We look at the top 100 Alexa websites for over 25 years from the Internet Archive or the "Wayback Machine", archive.org.
We study the changes in popularity, from Geocities and Yahoo! in the mid-to-late 1990s to the likes of Google, Facebook, and Tiktok of today.
We also look at different categories of websites and their popularity over the years and find evidence for the decline in popularity of news and education-related websites.
- Score: 4.055696230852368
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Since the inception of the first web page three decades back, the Web has
evolved considerably, from static HTML pages in the beginning to the dynamic
web pages of today, from mainly the text-based pages of the 1990s to today's
multimedia rich pages, etc. Although much of this is known anecdotally, to our
knowledge, there is no quantitative documentation of the extent and timing of
these changes. This paper attempts to address this gap in the literature by
looking at the top 100 Alexa websites for over 25 years from the Internet
Archive or the "Wayback Machine", archive.org. We study the changes in
popularity, from Geocities and Yahoo! in the mid-to-late 1990s to the likes of
Google, Facebook, and Tiktok of today. We also look at different categories of
websites and their popularity over the years and find evidence for the decline
in popularity of news and education-related websites, which have been replaced
by streaming media and social networking sites. We explore the emergence and
relative prevalence of different MIME-types (text vs. image vs. video vs.
javascript and json) and study whether the use of text on the Internet is
declining.
Related papers
- Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains [3.659498819753633]
We develop a website credibility classification and discovery system that integrates webgraph and social media contexts.
We introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines.
We release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.
arXiv Detail & Related papers (2024-06-17T11:22:04Z) - Exploring Embeddings for Measuring Text Relatedness: Unveiling
Sentiments and Relationships in Online Comments [1.7230140898679147]
This paper investigates sentiment and semantic relationships among comments across various social media platforms.
It uses word embeddings to analyze components in sentences and documents.
Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.
arXiv Detail & Related papers (2023-09-15T04:57:23Z) - Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Web 3.0: The Future of Internet [53.234101208024335]
Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before.
Web 3.0 is capable of addressing web data ownership according to distributed technology.
It will optimize the internet world from the perspectives of economy, culture, and technology.
arXiv Detail & Related papers (2023-03-23T15:37:42Z) - Web3: The Next Internet Revolution [50.16560061003771]
Next internet revolution: Web3 is going to open new opportunities for traditional social models.
Decentralized finance will be global, and open with financial inclusiveness for unbanked people.
Several worthwhile future research directions of Web3 are discussed.
arXiv Detail & Related papers (2023-03-22T23:37:43Z) - Leveraging Google's Publisher-specific IDs to Detect Website
Administration [3.936965297430477]
We propose a novel, graph-based methodology to detect administration of websites on the Web.
We apply our methodology across the top 1 million websites and study the characteristics of the created graphs of website administration.
Our findings show that approximately 90% of the websites are associated each with a single publisher, and that small publishers tend to manage less popular websites.
arXiv Detail & Related papers (2022-02-10T14:59:17Z) - Prediction of new outlinks for focused Web crawling [0.0]
This work provides a methodology for detecting new links effectively using a short history.
We provide statistical models for three targets: the link change rate, the presence of new links, and the number of new links.
A notable finding is that, if the history of the target page is not available, then our new features, that represent the history of related pages, are most predictive for new links in the target page.
arXiv Detail & Related papers (2021-11-09T11:36:21Z) - hBert + BiasCorp -- Fighting Racism on the Web [58.768804813646334]
We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube.
In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.
We are also releasing a JavaScript library and a Chrome Extension Application, to help developers make use of our trained model in web applications.
arXiv Detail & Related papers (2021-04-06T02:17:20Z) - The Rise and Fall of Fake News sites: A Traffic Analysis [62.51737815926007]
We investigate the online presence of fake news websites and characterize their behavior in comparison to real news websites.
Based on our findings, we build a content-agnostic ML for automatic detection of fake news websites.
arXiv Detail & Related papers (2021-03-16T18:10:22Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.