Posts

A design methodology for reliable software systems (1972)

After earning her PhD in computer Science at Stanford University, Liskov worked again for MITRE Corporation. She was involved in the development of a time-sharing system called Venus, then was involved in finding ways to address the “software crisis.” at MITRE. A design methodology for reliable software systems describes a design methodology of structured programming developed as part of the second project.

The methodology uses testing to guarantee reliability. To test a program, it is necessary to identify relevant test cases, and the set of them must be small enough to implement. Structured programming helps to identify the relevant test cases and reduce the number of required test cases by dividing a system into modules.

#Structured Programming

October 8, 2023

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (2019)

BART is a denoising autoencoder for Pretraining sequence-to-sequence models. BART is trained on corrupted text, and updates the parameters to reconstruct the original text. The authors experimented with several noising functions that corrupt text like token masking, token deletion, text infilling, sentence permutation, and document rotation. BART with text infilling, where text spans are sampled with span lengths drawn from a Poisson distribution(\(\lambda = 3\)), demonstrated the most consistently strong performance.

#Transformer

October 7, 2023

Communicating Sequential Processes (1978)

Communicating Sequential Processes (CSP) is a program structuring method that constructs a program as a parallel composition of a fixed number of sequential processes. A process is a sequence of commands.

September 30, 2023

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks(2019)

Sentence-BERT derives semantically meaningful sentence embedding that can be compared using cosine-similarity. BERT achieved new state-of-the art performance on various sentence-pair regression tasks using a cross-encoder. A cross-encoder accepts two sentences as input to the transformer network and the target value is predicted. Semantic textual similarity is one of the sentence-pair regression tasks. However, this setup is often not scalable for various pair regression tasks due to many possible combinations. The semantic search that maps each sentence to a vector space where semantically similar sentences are close alleviates the combinatorial explosion. Sentence-BERT uses a siamese network in which the two BERT networks have tied weights such that the produced sentence embeddings can be semantically compared using cosine-similarity.

September 23, 2023

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning (2008)

A Unified Architecture for Natural Language Processing is an instance of multitask learning. The first layer is a lookup table that stores embeddings of a fixed dictionary and size. The second layer is a Time-Delay Neural Networks layer. It extracts features from the sentence treating it as a sequence with local structure. The third layer takes the maximum value for each of the output features of the second layer over time. The following layers are classical NN layers. The lookup-table is shared among the tasks, and the other layers can be task specific to each task.

#Multitask Learning

September 16, 2023

Transmission of Information(1927)

In Transmission of Information, Hartley developed a quantitative measure of “information” in 1927. Hartley claims that information is the outcome of a selection among a finite set of possible messages. Shannon’s “A mathematical theory of communication”, which is based in part on Hartley’s ideas, published in 1947. Hartley did not model the source of information probabilistically. Shannon modeled the source of information as a random process.

Hartley stated that information should be proportional to the number of selections for practical engineering value. He uses the letter \(H\) to denote the amount of information associated with \(n\) selections.

#Entropy

September 9, 2023

RoBERTa: A Robustly Optimized BERT Pretraining Approach(2019)

RoBERTa(Robusty optimized BERT approach) is an improved recipe for training BERT models. BERT uses two objectives, masked language modeling and next sequence prediction (NSP), during pretraining. RoBERTa uses masked language modeling only. The authors increased the batch size and Byte-Pair Encoding (BPE) vocabulary size. RobERTa is trained with byte-level BPE, which uses bytes instead of unicode characters as the base subword units.

August 27, 2023

Go To Statement Considered Harmful (1968)

Edgar Djkstra criticized the excessive use of the go to statement in Go To Statement Considered Harmful. While the source code is static, the process taking place under the control of the source code is dynamic. The programmers should aim to shorten the conceptual gap between the source code and its process to describe the progress.

#Early Programming

August 26, 2023

Self-Attention with Relative Position Representations(2018)

The authors of Self-Attention with Relative Position Representations presented a way of injecting relative position representations in the self-attention mechanism of the Transformer. In contrast to recurrent and convolutional neural networks, Transformer does not explicitly model position information in its structure. The original position encoding employs sine and cosine functions of different frequencies. The authors of Transformer hypothesized that sinusoidal position encodings would help Transformer to generalize to sequence lengths unseen during training. Positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks of Transformer. This hypothesis was shared by the relative position representations. In contrast to absolute position representations, the relative position representations are invariant to the total sequence length.

August 19, 2023

Self-Adjusting Binary Search Trees(1985)

The splay tree, a self-adjusting form of a binary search tree, is a binary search tree that moves an accessed node to the root after each access. On an \(n\)-node splay tree, accessing, inserting and deleting have an amortized time bound of \(O(\log n)\) per operation. In addition, for sufficiently long access sequences, splay trees are as efficient, to within a constant factor, as static optimum search trees.

#Splay Tree

August 12, 2023