BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2019)
BART is a denoising autoencoder for Pretraining sequence-to-sequence models. BART is trained on corrupted text, and updates the parameters to reconstruct the original text. The authors experimented with several noising functions that corrupt text like token masking, token deletion, text infilling, sentence permutation, and document rotation. BART with text infilling, where text spans are sampled with span lengths drawn from a Poisson distribution(\(\lambda = 3\)), demonstrated the most consistently strong performance.