Ajou University repository

An Optimized Storage Architecture for Improving ML Platforms Provisioned with Underlying Deduplication Enabled Storage Clusters
  • HAMANDAWANA PRINCE
Citations

SCOPUS

0

Citation Export

Advisor
Tae-Sun Chung
Affiliation
아주대학교 일반대학원
Department
일반대학원 인공지능학과
Publication Year
2021-02
Publisher
The Graduate School, Ajou University
Keyword
Machine Learning based storage architectures.
Description
학위논문(박사)--아주대학교 일반대학원 :인공지능학과,2021. 2
Alternative Abstract
The advancement and ubiquitousness of Machine Learning (ML) is unarguably the new wave driving modern day and future enterprise computing platforms. However, the incessant deluge of ML associated data, collected from millions of data sources presents data storage challenges. Continuous scaling of the storage to meet the ML storage demands results in unwarranted escalating storage demands. However, there exists a lot of duplicate data in ML/DL related workloads in which if eliminated will result in significant amortization of storage costs. The adoption of deduplication provisioned storage has been so far a storage cost cutting driver in today’s enterprise clusters. However, these large scale machine learning (ML) platforms are facing challenges when integrated with deduplication enabled storage clusters. In the quest to achieve smart and efficient storage utilization, removal of duplicate data introduce bottlenecks since deduplication alters the I/O transaction layout of the storage system. Therefore, it is critical to address such deduplication overhead for acceleration of ML/DL computation in deduplication storage. Existing state of the art ML/DL storage solutions such as Alluxio and Auto-Cache adopt non deduplication-aware caching mechanisms which lacks the much needed performance boost when adopted in deduplication enabled ML/DL clusters. In this paper, we introduce REDUP, which eliminates the performance drop caused by enabling deduplication in ML/DL storage clusters. At the core, is a REDUP Caching Manager (RDCM), composed of a 2-tier deduplication layout-aware caching mechanism. The RDCM provides an abstraction of the underlying deduplication storage layout to ML/DL applications and provisions a decoupled acceleration of object reconstruction during ML/DL read operations. Our REDUP evaluation shows negligible performance drop in ML/DL training performances as compared to a baseline cluster without deduplication. When compared to other state-of-the-art solutions, our proposed design outperforms Alluxio and Auto-Cache by 16% in worst case scenario, in terms of training speed.
Language
eng
URI
https://dspace.ajou.ac.kr/handle/2018.oak/20268
Fulltext

Type
Thesis
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Total Views & Downloads

File Download

  • There are no files associated with this item.