Ajou University repository

OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems
  • Seo, Minseok ;
  • Hyun, Jungi ;
  • Jeong, Seongho ;
  • Nguyen, Xuan Truong ;
  • Lee, Hyuk Jae ;
  • Lee, Hyokeun
Citations

SCOPUS

0

Citation Export

Publication Year
2025-01-01
Journal
IEEE Computer Architecture Letters
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Computer Architecture Letters, Vol.24 No.1, pp.165-168
Keyword
ClusteringCXLKV cacheLLM inference
Mesh Keyword
ClusteringsCompute-express linkKey valuesKey-value cacheLanguage modelLarge language model inferenceMemory capacityMemory systemsModel inferenceScalings
All Science Classification Codes (ASJC)
Hardware and Architecture
Abstract
The key-value (KV) cache in large language models (LLMs) now necessitates a substantial amount of memory capacity as its size proportionally grows with the context's size. Recently, Compute-Express Link (CXL) memory becomes a promising method to secure memory capacity. However, CXL memory in a GPU-based LLM inference platform entails performance and scalability challenges due to the limited bandwidth of CXL memory. This paper proposes OASIS, an outlier-aware KV cache clustering for scaling LLM inference in CXL memory systems. Our method is based on the observation that clustering is effective in trading off between performance and accuracy compared to previous quantization- or selection-based approaches if clustering is aware of outliers. Our evaluation shows OASIS yields 3.6× speedup compared to the case without clustering while preserving accuracy with just 5% of full KV cache.
ISSN
1556-6064
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/38323
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105004596638&origin=inward
DOI
https://doi.org/10.1109/lca.2025.3567844
Journal URL
http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=10208
Type
Article
Funding
This work was supported in part by the Korea Collaborative & High-tech Initiative for Prospective Semiconductor Research (K-CHIPS) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under Grant RS-2025-02305531 and in part by the Ministry of Science and ICT (MSIT) through Information Technology Research Center (ITRC) support Program under Grant IITP-2025- 2020-0-01461 supervised by the IITP.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Lee, Hyokeun  Image
Lee, Hyokeun 이효근
Department of Electrical and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.