Link preview
GitHub - egmaminta/attnhut: A collection of Transformer Attention mechanisms in PyTorch, all in one place.
A collection of Transformer Attention mechanisms in PyTorch, all in one place. - egmaminta/attnhut GitHub · github.com
Initially, I developed this so I can easily switch between different Attention mechanisms for my Small Language Model (SLM) experiments and benchmarking. However, I also realized that these implementations can be applicable in Computer Vision, modernize Vision Encoders, RL, and others. I hope this helps researchers, students, or educators in general. I also included MiniMax M3's sparse attention. This can be integrated with Andrej Karpathy's autoresearch framework. For contributing: I encourage you to please open a PR. I would like to see and learn implementations of other attention mechanisms I haven't covered in this repo. Thank you! GitHub Link: submitted by /u/AnyIce3007 [link] [Kommentare]
Comments