Advanced Activation Functions

This library implements several advanced activation functions used in modern neural networks. These functions are particularly important in transformer architectures and have shown significant improvements over traditional activation functions.

Overview

The activation functions in this library are implemented with a focus on:

Numerical stability
Performance optimization
Memory efficiency
Clear mathematical foundations

Function Implementations

Swish

Swish is an activation function introduced by Google Brain that often outperforms ReLU. It provides smoother gradients and better information flow.

swish(x) = x * sigmoid(β * x)

Where β is typically set to 1.0.

Key properties:

Non-monotonic function
Smooth gradient
Bounded below but unbounded above
Automatically learns the degree of non-linearity needed

GLU (Gated Linear Unit)

GLU was introduced in "Language Modeling with Gated Convolutional Networks" and is used in many transformer architectures. It provides a multiplicative interaction between the input and a gating mechanism.

GLU(x, W, V, b, c) = (x·W + b) ⊗ σ(x·V + c)

Where:

⊗ is element-wise multiplication
σ is the sigmoid function
W, V are weight matrices
b, c are bias vectors

Key properties:

Provides controlled information flow
Helps mitigate vanishing gradient problem
Effective for sequence modeling tasks

SwiGLU (Swish Gated Linear Unit)

SwiGLU is a variant of GLU that uses the Swish activation function instead of sigmoid. This combination has shown improved performance in many tasks.

SwiGLU(x, W, V, b, c) = (x·W + b) ⊗ swish(x·V + c)

Key properties:

Combines benefits of Swish and gating mechanisms
Better gradient flow than standard GLU
Particularly effective in deep transformer networks

Implementation Details

In our implementation, these activation functions are:

Vectorized for performance
Numerically stable through careful handling of edge cases
Memory efficient with in-place operations where possible
Thoroughly tested with comprehensive unit tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

activation_functions.md

activation_functions.md

Advanced Activation Functions

Overview

Function Implementations

Swish

GLU (Gated Linear Unit)

SwiGLU (Swish Gated Linear Unit)

Implementation Details

Files

activation_functions.md

Latest commit

History

activation_functions.md

File metadata and controls

Advanced Activation Functions

Overview

Function Implementations

Swish

GLU (Gated Linear Unit)

SwiGLU (Swish Gated Linear Unit)

Implementation Details