- https://paperswithcode.com/method/alibi
- it allows transformers to consume, at inference time, sequences which are longer than the ones they were trained on.
- it just adds a fixed bias to attention scores
- This method was motivated by the simple reasoning that words that are close-by matter much more than ones that are far away.