Supervised Learning of Universal Sentence Representations From Natural Language Inference Data (2017)
May 30, 2025Supervised Learning of Universal Sentence Representations From Natural Language Inference Data proposes a supervised approach to train neural networks on the Stanford Natural Language Inference (SNLI) dataset to produce general-purpose sentence embeddings applicable to a wide range of downstream tasks. The authors evaluated seven different encoder architectures across twelve NLP tasks and found that a bidirectional LSTM with max pooling achieved the best performance.
They chose to train on a natural language inference dataset because they hypothesized that NLI requires models to perform high-level semantic reasoning between sentence pairs. The SNLI dataset consists of 570,000 sentence pairs, each comprising a premise and a hypothesis, annotated with one of three labels: entailment, neutral, or contradiction.
While the encoder component varies across the seven architectures, the classification layer is shared. Each model generates two sentence embeddings, ( u ) and ( v ), for the premise and hypothesis, respectively. These embeddings are combined into a single feature vector using concatenation: ( (u, v, |u - v|, u \odot v) ). This vector is then passed through fully connected layers and a softmax classifier to predict the relationship label.
In the best-performing architecture—bidirectional LSTM with max pooling, the sentence is encoded by concatenating the outputs from forward and backward LSTMs. Max pooling is then applied across all token vectors to produce a fixed-size sentence embedding, where each dimension takes the maximum value across the sequence.