Abstract: Gender information is no longer a mandatory input when registering for an
account at many leading Internet companies. However, prediction of demographic
information such as gender and age remains an important task, especially in
intervention of unintentional gender/age bias in recommender systems. Therefore
it is necessary to infer the gender of those users who did not to provide this
information during registration. We consider the problem of predicting the
gender of registered users based on their declared name. By analyzing the first
names of 100M+ users, we found that genders can be very effectively classified
using the composition of the name strings. We propose a number of character
based machine learning models, and demonstrate that our models are able to
infer the gender of users with much higher accuracy than baseline models.
Moreover, we show that using the last names in addition to the first names
improves classification performance further.