Posts

LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS (2021)

LoRA is motivated by the findings of Li et al. (2018) and Aghajanyan et al. (2020), which show that overparameterized models tend to converge to solutions that lie within a low-dimensional intrinsic subspace.

The intrinsic dimension refers to the minimum number of trainable parameters needed to reach satisfactory performance on a given task.

LoRA introduces low-rank adaptation by decomposing the weight matrices in dense layers. Instead of fine-tuning all parameters of a pre-trained model, LoRA freezes the original weights and learns two low-rank matrices during training.

June 7, 2025 (Originally posted on November 18, 2023)

Supervised Learning of Universal Sentence Representations From Natural Language Inference Data (2017)

Supervised Learning of Universal Sentence Representations From Natural Language Inference Data proposes a supervised approach to train neural networks on the Stanford Natural Language Inference (SNLI) dataset to produce general-purpose sentence embeddings applicable to a wide range of downstream tasks. The authors evaluated seven different encoder architectures across twelve NLP tasks and found that a bidirectional LSTM with max pooling achieved the best performance.

#Sentence Embedding

May 30, 2025

Scaling Instruction Finetuned Language Models (2022)

Instruction finetuning is a technique for enhancing the zero-shot performance of large language models (LLMs). The seminal work, Finetuned Language Models Are Zero-Shot Learners, refers to this method as instruction tuning. Building on this, Scaling Instruction-Finetuned Language Models explored scaling the number of tasks, model sizes, and the incorporation of chain-of-thought (CoT) data. Their findings demonstrate that instruction finetuning can significantly improve LLM performance across a wide range of settings.

May 7, 2025

Reading Wikipedia to Answer Open Domain Questions (2017)

Reading Wikipedia to Answer Open Domain Questions proposes DrQA, a system for open-domain question answering. DrQA consists of two components: Document Retriever and Document Reader.

Given a question, the Document Retriever uses bigram TF-IDF matching to search Wikipedia and retrieve the five most relevant articles. These articles, along with the question, are then passed to the Document Reader, a neural network model that embeds the question and the paragraphs into vector representations. The model then compares the paragraph embeddings with the question embedding to identify the most salient text span that likely contains the answer.

April 26, 2025

Language Models Are Few Shot Learners (2020)

In Language Models are Few-Shot Learners, it is shown that increasing the number of parameters of GPT-2 improves few-shot performance across various tasks. The proposed model, GPT-3, is an autoregressive language model with 175 billion parameters. Its architecture mirrors that of GPT-2, except that GPT-3 uses alternating dense and locally banded sparse attention patterns, similar to the Sparse Transformer.

April 12, 2025

Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

Consistent hashing is used for load balancing in cache systems at the scale of a Content Delivery Network (CDN), as discussed. The paper Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web proposed the hashing algorithm to decrease or eliminate the occurrence of hot spots in the network.

#Consistent Hashing

March 18, 2025

Coverage is Not Strongly Correlated with Test Suite Effectiveness (2014)

There are debates concerning the ability of code coverage to measure the effectiveness of test suites. For instance, Winters et al. argued that setting an objective coverage threshold does not necessarily lead developers to write meaningful tests, as discussed in ‘A Note on Code Coverage’ (Section 11.2.4) in Software Engineering at Google. They also claimed that coverage percentage does not guarantee fault detection ability.

compared with the abundance of rational arguments and anecdotal experiences, empirical studies on this topic are relatively rare. Coverage Is Not Strongly Correlated With Test Suite Effectiveness (ICSE 2014) is an empirical study that was awarded the most influential paper ICSE N-10 at ICSE 2024. The paper investigated correlation between test suite size, coverage and effectiveness. The authors applied mutation testing to Apache POI, Closure, HSQLDB, JFreeChart, and Joda Time. They found a moderate to high correlation between the effectiveness and coverage of a test suite when ignoring the influence of the number of test cases. However, the correlation dropped when suite size was controlled for.

#Code Coverage

March 3, 2025

On the Fly Garbage Collection an Exercise in Cooperation (1978)

In Go 1.5, the algorithm of the garbage collector (GC) changed to a concurrent, tri-color, mark-sweep collector, which was first proposed by Dijkstra et al. in 1978. The algorithm proposed by Dijkstra et al. alternates a marking phase and an appending phase. In the marking phase, the garbage collector colors reachable objects in gray or black. In the appending phase, it collects white objects that are unreachable from things on the stack and global variables. The algorithm prioritizes minimizing the atomic operations of the garbage collector to reduce overhead on the application.

February 28, 2025

Publishing S3 Objects with Signed URLs

The owner of an S3 bucket can share an object by creating a presigned URL for it. The AWS Documentation provides a user guide on how to generate presigned URLs.

When I tried generating a presigned URL using the document, I found that preparing the bucket and file was a bit cumbersome. Additionally, the guide doesn’t cover how to instantiate a client to generate the URL, so I had to look for another guide to learn how to do this.

I’ve researched generating presigned URLs a couple of times before and eventually implemented Terraform scripts and a Go script for future reference. The Terraform scripts create an S3 bucket and upload a file to it. The Go script generates a presigned URL for the file using the AWS SDK. These resources are available in this GitHub repository.

January 30, 2025

Highly Available Transactions: Virtues and Limitations (2013)

Overview

The CAP theorem’s trilemma of consistency, availability, and partition tolerance does not require sacrificing one property entirely but rather prioritizing among the three. Given a specific consistency model, how much availability is feasible? The paper Highly Available Transactions: Virtues and Limitations explores the relationship between consistency models and their achievable availability. It organizes each consistency-availability pair into a partial order based on their strengths.

January 30, 2025