Abstract: Written language contains stylistic cues that can be exploited to
automatically infer a variety of potentially sensitive author information.
Adversarial stylometry intends to attack such models by rewriting an author's
text. Our research proposes several components to facilitate deployment of
these adversarial attacks in the wild, where neither data nor target models are
accessible. We introduce a transformer-based extension of a lexical replacement
attack, and show it achieves high transferability when trained on a weakly
labeled corpus -- decreasing target model performance below chance. While not
completely inconspicuous, our more successful attacks also prove notably less
detectable by humans. Our framework therefore provides a promising direction
for future privacy-preserving adversarial attacks.