InFeeo
Language

Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)(local-llm.utop.workers.dev)

×
Link preview Running Qwen 35B MoE at 450k Context on a Single 32GB GPU A complete technical report on extreme LLM local inference using llama.cpp, TurboQuant, and YaRN scaling on a 32GB RTX 5090. local-llm.utop.workers.dev · local-llm.utop.workers.dev
A complete technical report on extreme LLM local inference using llama.cpp, TurboQuant, and YaRN scaling on a 32GB RTX 5090.

Comments

Log in Log in to comment.

No comments yet.