InFeeo
United States
technology
New
Language

Channels

Repo for implementations of various Transformer Attn mechanisms [P](github.com)
Initially, I developed this so I can easily switch between different Attention mechanisms for my Small Language Model (SLM) experiments and benchmarking. However, I also realized that these implementations can be applicable in Computer Vision, modernize Vision Encoders, RL, and others. I hope this helps researchers, students, or educators in general. I also included MiniMax M3's sparse attention. This can be integrated with Andrej Karpathy's autoresearch framework. For contributing: I encourage you to please open a PR. I would like to see and learn implementations of other attention mechanisms I haven't covered in this repo. Thank you! GitHub Link: submitted by /u/AnyIce3007 [link] [Kommentare]
How Do You Handle Ablation Studies When the Original Model Is Already Trained?[R](reddit.com)
I'm running into an issue with an ablation study for a paper I'm preparing. I trained a model. The model achieved my best result, and I saved the trained checkpoint (.pth file). Now my supervisor wants me to perform an ablation study by removing components and how it impacts the accuracy. My concern is that if I retrain from scratch, the accuracies will not exactly match the original run due to randomness, different seeds, etc. is there any way i can do the ablation study without retraining? I'd appreciate hearing how others have handled this situation in publications or thesis work. please help me out submitted by /u/Plane_Stick8394 [link] [Kommentare]