Alibaba Cloud introduced Aegaeon, a new GPU pooling system built with Peking University to transform AI workload management. The system scales models dynamically at the token level, running up to seven models on a single GPU. In testing, the team cut GPU usage from 1,192 to 213—an 82% drop. By tackling idle capacity from rarely used models, Aegaeon aims to make large-scale AI deployment far more efficient.
