EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting
- URL: http://arxiv.org/abs/2508.21550v1
- Date: Fri, 29 Aug 2025 12:06:49 GMT
- Title: EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting
- Authors: Yujin Park, Haejun Chung, Ikbeom Jang,
- Abstract summary: Pairwise comparison is often favored over absolute rating or ordinal classification in subjective or difficult annotation tasks.<n>Recent work has greatly reduced the annotation burden (O(n log n)) by actively sampling pairwise comparisons using a sorting algorithm.<n>We further improve annotation efficiency by (1) roughly pre-ordering items using the Contrastive Language-Image Pre-training (CLIP) model hierarchically without training, and (2) replacing easy, obvious human comparisons with automated comparisons.
- Score: 0.5861893391102716
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pairwise comparison is often favored over absolute rating or ordinal classification in subjective or difficult annotation tasks due to its improved reliability. However, exhaustive comparisons require a massive number of annotations (O(n^2)). Recent work has greatly reduced the annotation burden (O(n log n)) by actively sampling pairwise comparisons using a sorting algorithm. We further improve annotation efficiency by (1) roughly pre-ordering items using the Contrastive Language-Image Pre-training (CLIP) model hierarchically without training, and (2) replacing easy, obvious human comparisons with automated comparisons. The proposed EZ-Sort first produces a CLIP-based zero-shot pre-ordering, then initializes bucket-aware Elo scores, and finally runs an uncertainty-guided human-in-the-loop MergeSort. Validation was conducted using various datasets: face-age estimation (FGNET), historical image chronology (DHCI), and retinal image quality assessment (EyePACS). It showed that EZ-Sort reduced human annotation cost by 90.5% compared to exhaustive pairwise comparisons and by 19.8% compared to prior work (when n = 100), while improving or maintaining inter-rater reliability. These results demonstrate that combining CLIP-based priors with uncertainty-aware sampling yields an efficient and scalable solution for pairwise ranking.
Related papers
- K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge [51.93484138861584]
The rapid development of visual generative models raises the need for more scalable and human-aligned evaluation methods.<n>We propose K-Sort Eval, a reliable and efficient VLM-based evaluation framework that integrates posterior correction and dynamic matching.<n>Experiments show that K-Sort Eval delivers evaluation results consistent with K-Sort Arena, typically requiring fewer than 90 model runs.
arXiv Detail & Related papers (2026-02-10T05:07:46Z) - Using tournaments to calculate AUROC for zero-shot classification with LLMs [4.270472870948892]
Large language models perform surprisingly well on many zero-shot classification tasks.<n>We propose and evaluate a method that converts binary classification tasks into pairwise comparison tasks.<n>Repeated pairwise comparisons can be used to score instances using the Elo rating system.
arXiv Detail & Related papers (2025-02-20T20:13:20Z) - Active Preference Learning for Ordering Items In- and Out-of-sample [7.0774164818430565]
actively sampling item pairs can reduce the number of annotations necessary for learning an accurate ordering.
Many algorithms ignore shared structure between items.
It is also common to disregard how noise in comparisons varies between item pairs.
arXiv Detail & Related papers (2024-05-05T21:44:03Z) - Sorting with Predictions [1.7042264000899532]
We explore the fundamental problem of sorting through the lens of learning-augmented algorithms.
We design new and simple algorithms using only $O(sum_i log eta_i)$ exact comparisons.
We prove that the comparison complexity is theoretically optimal with respect to the examined error measures.
arXiv Detail & Related papers (2023-11-01T18:00:03Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Learning by Sorting: Self-supervised Learning with Group Ordering
Constraints [75.89238437237445]
This paper proposes a new variation of the contrastive learning objective, Group Ordering Constraints (GroCo)
It exploits the idea of sorting the distances of positive and negative pairs and computing the respective loss based on how many positive pairs have a larger distance than the negative pairs, and thus are not ordered correctly.
We evaluate the proposed formulation on various self-supervised learning benchmarks and show that it not only leads to improved results compared to vanilla contrastive learning but also shows competitive performance to comparable methods in linear probing and outperforms current methods in k-NN performance.
arXiv Detail & Related papers (2023-01-05T11:17:55Z) - Efficient computation of rankings from pairwise comparisons [0.0]
We describe an alternative and similarly simple iteration that provably returns identical results but does so much faster.
We demonstrate this algorithm with applications to a range of example data sets and derive a number of results regarding its convergence.
arXiv Detail & Related papers (2022-06-30T19:39:09Z) - Decreasing Annotation Burden of Pairwise Comparisons with
Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating [3.5314411880556063]
Ranking by pairwise comparisons has shown improved reliability over ordinal classification.
We propose a method for reducing the number of pairwise comparisons required to rank by a quantitative metric.
arXiv Detail & Related papers (2022-02-10T04:02:45Z) - Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end.
NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training.
We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.