🏖️ Kaggle Solutions

Search

❯

❯

❯

linformers

Mar 01, 2025, 1 min read

https://arxiv.org/pdf/2006.04768.pdf

The main efficiency bottleneck in Transformer models is its self-attention mechanism. Here, each token’s representation is updated by attending to all other tokens in the previous layer
- this is O(n^2)

https://www.youtube.com/watch?v=-_2AF9Lhweo

if the matrix is low rank, we can speed it up
- how to tell if a matrix is low rank? by looking at the eigenvalues of the matrix
- a low rank matrix will have only a few eigen values (cause there are only a few significant vectors that have more info about the matrix)

Backlinks

Google - Fast or Slow? Predict AI Model Runtime

Created with Quartz v4.2.3 © 2025

Download these notes on Github!