Paper on LLM inference efficiency accepted at ICML’25!