InFeeo
Language

Making FlashAttention-4 faster for inference(twitter.com)

×
Link preview Charles 🎉 Frye (@charles_irl) auf X memer of technical staff at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCb7OB X (formerly Twitter) · twitter.com
What part of "dtype = 'fp8', num_splits = 0, pack_gqa = True, q_stage = 1, page_size = 1" do you not understand?

Comments

Log in Log in to comment.

No comments yet.