Accelerating ML/DL Applications With Hierarchical Caching on Deduplication Storage Clusters

Hamandawana, Prince; Khan, Awais; Kim, Jongik; Chung, Tae Sun

DC Field	Value	Language
dc.contributor.author	Hamandawana, Prince	-
dc.contributor.author	Khan, Awais	-
dc.contributor.author	Kim, Jongik	-
dc.contributor.author	Chung, Tae Sun	-
dc.date.issued	2022-12-01	-
dc.identifier.issn	2332-7790	-
dc.identifier.uri	https://dspace.ajou.ac.kr/dev/handle/2018.oak/32215	-
dc.description.abstract	Large scale machine learning (ML) and deep learning (DL) platforms face challenges when integrated with deduplication enabled storage clusters. In the quest to achieve smart and efficient storage utilization, removal of duplicate data introduces bottlenecks, since deduplication alters the I/O transaction layout of the storage system. Therefore, it is critical to address such deduplication overhead for acceleration of ML/DL computation in deduplication storage. Existing state of the art ML/DL storage solutions such as Alluxio and AutoCache adopt non deduplication-aware caching mechanisms, which lacks the much needed performance boost when adopted in deduplication enabled ML/DL clusters. In this paper, we introduce Redup, which eliminates the performance drop caused by enabling deduplication in ML/DL storage clusters. At the core, is a Redup Caching Manager (RDCM), composed of a 2-tier deduplication layout-aware caching mechanism. The RDCM provides an abstraction of the underlying deduplication storage layout to ML/DL applications and provisions a decoupled acceleration of object reconstruction during ML/DL read operations. Our Redup evaluation shows negligible performance drop in ML/DL training performances as compared to a cluster without deduplication, whilst significantly outperforming Alluxio and AutoCache in terms of various performance metrics.	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.subject.mesh	Caching mechanism	-
dc.subject.mesh	Deduplication	-
dc.subject.mesh	Deep learning	-
dc.subject.mesh	Large-scale machine learning	-
dc.subject.mesh	Learning platform	-
dc.subject.mesh	Machine-learning	-
dc.subject.mesh	Performance	-
dc.subject.mesh	State of the art	-
dc.subject.mesh	Storage systems	-
dc.subject.mesh	Storage utilization	-
dc.title	Accelerating ML/DL Applications With Hierarchical Caching on Deduplication Storage Clusters	-
dc.type	Article	-
dc.citation.endPage	1636	-
dc.citation.startPage	1622	-
dc.citation.title	IEEE Transactions on Big Data	-
dc.citation.volume	8	-
dc.identifier.bibliographicCitation	IEEE Transactions on Big Data, Vol.8, pp.1622-1636	-
dc.identifier.doi	10.1109/tbdata.2021.3106345	-
dc.identifier.scopusid	2-s2.0-85113281987	-
dc.identifier.url	https://www.ieee.org/membership-catalog/productdetail/showProductDetailPage.html?product=PER472-ELE	-
dc.subject.keyword	big data	-
dc.subject.keyword	deduplication	-
dc.subject.keyword	deep learning	-
dc.subject.keyword	Machine learning	-
dc.subject.keyword	storage management	-
dc.description.isoa	true	-
dc.subject.subarea	Information Systems	-
dc.subject.subarea	Information Systems and Management	-

Show simple item record

qrcode

트윗하기

Related Researcher

HAMANDAWANA PRINCEHAMANDAWANA, PRINCE: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download