market threads latest | past | mail | submit Alibaba Cloud says it cut Nvidia AI GPU use by 82% (www.tomshardware.com) 2 points by margarita 4 months ago | report | 1 comments A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s. add commentPlease login margarita 4 months ago Actual paper presented at SOSP https://ennanzhai.github.io/pub/sosp25-aega... share report reply
market threads
A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s.
Actual paper presented at SOSP https://ennanzhai.github.io/pub/sosp25-aega...
reply