market threads latest | past | mail | submit Alibaba Cloud says it cut Nvidia AI GPU use by 82% (www.tomshardware.com) 2 points by margarita 1 month ago | report | 1 comments A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s. add commentPlease login margarita 1 month ago Actual paper presented at SOSP https://ennanzhai.github.io/pub/sosp25-aega... reply share report
market threads
A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s.
Actual paper presented at SOSP https://ennanzhai.github.io/pub/sosp25-aega...
reply
share report