Dense Passage Retrieval for Open-Domain Question Answering (2020)
December 30, 2023Open-domain question answering (QA) involves answering fact-based questions using documents. An open-domain QA system can be divided into two components: one that retrieves relevant passages and another that extracts the answer spans from those passages (Chen et al., 2017). While traditional approaches use sparse vector space models like BM25 for the retrieval step, Dense Passage Retrieval for Open-Domain Question Answering shows that dense representations can also be practically implemented using dense representations.
The embeddings are learned from the training dataset, optimizing for maximizing inner products between question and relevant passage vectors. The training is essentially metric learning, and each question needs irrelvant passages. In experiments, it was found that utilizing both the top passages returned by BM25, excluding the answer, and positive passages paired with other questions as negatives yielded best performance.
Each passage \(p_i\) is a sequence of tokens,\(w_1^{(i)},w_2^{(i)}, \dots , w_{|p_i|}^{(i)}\), extracted from a document. Given a question \(q\), the task is to find a span \(s_s^{(i)}, w_{s+1}^{(i)},\dots ,w_{e}^{(i)}\) from one of the passage \(p_i\) that can answer the question.
The proposed approach employs BERT as an encoder. The similarity between the question and passage is computed using the dot product of their vectors: $$ \text{sim}(q, p) = E_Q(q)^{\top} E_P(p) $$ where \(E_P(\cdot)\) maps any text passage to a \(d\)-dimentional vectors, and \(E_Q(\cdot)\) maps the input question to a vector.
Each instance consists of one question \(q_i\) and one relevant passage \(p^+\), along with \(n\) irrelevant passages \(p_{i,j}^-\). The loss for an instance is defined as \(L(q_i,p^+_i,p^-_{i,1}\dots,p^-_{i,n})\), the negative log likelihood of the positive passage: $$ L(q_i,p^+_i,p^-_{i,1}\dots,p^-_{i,n})=-\log\frac{e^{\text{sim}(q_i,p^+_i)}}{e^{\text{sim}(q_i,p^+_i)}+\sum^n_{j=1}e^{\text{sim}(q_i,p^-_{i,j})}} $$