GAUSSIAN ERROR LINEAR UNITS (GELUs) (2016)
October 21, 2023The GAUSSIAN ERROR LINEAR UNITS (GELUs), a neural network activation function,is \(x\Phi(s)\), where \(\Phi(x)\) the standard Gaussian cumulative distribution function. GELUs have properties of Dropout, Zoneout, and ReLUs. Zoneout is a method for regularizing RNNs, and stochastically forces some hidden units to maintain their previous values. ReLUs introduce nonlinearity to neural networks. Dropout is a regularizer. GELUs merge the both functionalities by multipling the neuron input \(x\) by \(m \sim \text{Bernoulli}(\Phi (x))\) , where \(\Phi(x)=P(X\le x), X\sim \mathcal{N}(0, 1)\). GELU is the expected transformation on an input \(x\), which is \(\Phi(x) \times Ix + (1-\Phi(x))\times 0x = x\Phi(x)\)
GELUs can be approximated with: $$ 0.5x(1+\tanh\left[\sqrt{2/\pi}(x+0.044715x^3)\right]) $$