Transformer

Poly-Encoders: Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring (2019)

LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS (2021)

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2019)

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Self-Attention with Relative Position Representations(2018)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer(2020)

AN IMAGE IS WORTH 16x16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE(2021)

Generating Long Sequences with Sparse Transformers (2019)