Related papers: UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

URL: http://arxiv.org/abs/2602.18824v1
Date: Sat, 21 Feb 2026 12:50:55 GMT
Title: UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals
Authors: Pedram Riyazimehr, Seyyed Ehsan Mahmoudi,
Abstract summary: We present UniRank, a multi-agent pipeline that estimates university positions across global ranking systems.<n>The system employs a three-stage architecture: zero-shot estimation from anonymized institutional metrics, per-system tool-augmented calibration against real ranked universities, and final synthesis.<n>On the Times Higher Education (THE) World University Rankings ($n=352$), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman $= 0.769$, Kendall $= 0.591$, hit rate @50 = 20.7%,
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present UniRank, a multi-agent LLM pipeline that estimates university positions across global ranking systems using only publicly available bibliometric data from OpenAlex and Semantic Scholar. The system employs a three-stage architecture: (a) zero-shot estimation from anonymized institutional metrics, (b) per-system tool-augmented calibration against real ranked universities, and (c) final synthesis. Critically, institutions are anonymized -- names, countries, DOIs, paper titles, and collaboration countries are all redacted -- and their actual ranks are hidden from the calibration tools during evaluation, preventing LLM memorization from confounding results. On the Times Higher Education (THE) World University Rankings ($n=352$), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman $ρ= 0.769$, Kendall $τ= 0.591$, hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .

Related papers

Linear-PAL: A Lightweight Ranker for Mitigating Shortcut Learning in Personalized, High-Bias Tabular Ranking [0.0]
In e-commerce ranking, implicit user feedback is systematically confounded by Position Bias.<n>We propose a lightweight framework that enforces de-biasing through structural constraints.<n>We show that Linear-PAL achieves robust, personalized ranking in near real-time.
arXiv Detail & Related papers (2025-12-15T12:06:04Z)
Preliminary Ranking of WMT25 General Machine Translation Systems [58.40564895086757]
We present the preliminary rankings of machine translation (MT) systems submitted to the WMT25 General Machine Translation Shared Task.<n>The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results.
arXiv Detail & Related papers (2025-08-11T17:22:31Z)
CoRanking: Collaborative Ranking with Small and Large Ranking Agents [94.09834629572403]
Large Language Models (LLMs) have demonstrated superior listwise ranking performance.<n>CoRanking combines small and large ranking models for efficient and effective ranking.
arXiv Detail & Related papers (2025-03-30T13:00:52Z)
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look [52.114284476700874]
This paper reports on the results of a large-scale evaluation (the TREC 2024 RAG Track) where four different relevance assessment approaches were deployed. We find that automatically generated UMBRELA judgments can replace fully manual judgments to accurately capture run-level effectiveness. Surprisingly, we find that LLM assistance does not appear to increase correlation with fully manual assessments, suggesting that costs associated with human-in-the-loop processes do not bring obvious tangible benefits.
arXiv Detail & Related papers (2024-11-13T01:12:35Z)
Soft Condorcet Optimization for Ranking of General Agents [44.90789674063613]
We describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO)<n>SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive.<n>SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.
arXiv Detail & Related papers (2024-10-31T18:17:39Z)
The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review [49.43514488610211]
Author-provided rankings could be leveraged to improve peer review processes at machine learning conferences.<n>We focus on the Isotonic Mechanism, which calibrates raw review scores using the author-provided rankings.<n>We propose several cautious, low-risk applications of the Isotonic Mechanism and author-provided rankings in peer review.
arXiv Detail & Related papers (2024-08-24T01:51:23Z)
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling [50.08315607506652]
We propose a Constrained Active Sampling Framework (CASF) for reliable human judgment. Experiment results show CASF receives 93.18% top-ranked system recognition accuracy.
arXiv Detail & Related papers (2024-06-12T07:44:36Z)
Predicting article quality scores with machine learning: The UK Research Excellence Framework [6.582887504429817]
Accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, humanities, and UoAs were much lower or close to zero. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, as estimated by the algorithms, but this substantially reduced the number of scores predicted.
arXiv Detail & Related papers (2022-12-11T05:45:12Z)
Data Driven and Visualization based Strategization for University Rank Improvement using Decision Trees [1.933681537640272]
We present a novel idea of classifying the rankings data using Decision Tree (DT) based algorithms and retrieve decision paths for rank improvement using data visualization techniques. The proposed methodology can aid HEIs to quantitatively asses the scope of improvement, adumbrate a fine-grained long-term action plan and prepare a suitable road-map.
arXiv Detail & Related papers (2021-10-18T06:41:45Z)
PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z)
How Reliable are University Rankings? [0.7646713951724009]
We take a fresh look at this ranking scheme using the public College dataset. We show in multiple ways that this ranking scheme is not reliable and cannot be trusted as authoritative. We conclude by making the case that all data and methods used for rankings should be made open for validation and repeatability.
arXiv Detail & Related papers (2020-04-20T01:00:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.