Abstract: With the exponential growth of online marketplaces and user-generated content
therein, aspect-based sentiment analysis has become more important than ever.
In this work, we critically review a representative sample of the models
published during the past six years through the lens of a practitioner, with an
eye towards deployment in production. First, our rigorous empirical evaluation
reveals poor reproducibility: an average 4-5% drop in test accuracy across the
sample. Second, to further bolster our confidence in empirical evaluation, we
report experiments on two challenging data slices, and observe a consistent
12-55% drop in accuracy. Third, we study the possibility of transfer across
domains and observe that as little as 10-25% of the domain-specific training
dataset, when used in conjunction with datasets from other domains within the
same locale, largely closes the gap between complete cross-domain and complete
in-domain predictive performance. Lastly, we open-source two large-scale
annotated review corpora from a large e-commerce portal in India in order to
aid the study of replicability and transfer, with the hope that it will fuel
further growth of the field.