Ajou University repository

A Content Fingerprint-Based Cluster-Wide Inline Deduplication for Shared-Nothing Storage Systemsoa mark
Citations

SCOPUS

13

Citation Export

Publication Year
2020-01-01
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Access, Vol.8, pp.209163-209180
Keyword
data deduplicationParallel and distributed storage systemsshared-nothing architecture
Mesh Keyword
Data de duplicationsDesign and implementsDesign constraintsDesign specificationDistributed storage systemRead performanceRead/write performanceStorage systems
All Science Classification Codes (ASJC)
Computer Science (all)Materials Science (all)Engineering (all)
Abstract
Deduplication has been principally employed in distributed storage systems to improve storage space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Likewise, inline deduplication integration poses serious threats to storage system read/write performance, consistency, and scalability. Mainly, this is due to ineffective and error-prone deduplication metadata, duplicate lookup I/O redirection, and placement of content fingerprints and data chunks. Further, transaction failures after deduplication integration often render inconsistencies in data chunks, deduplication metadata, and garbage data chunks. results in rendering inconsistencies in data chunks, deduplication metadata, and garbage data chunks. In this paper, we propose Grate, a high-performance inline cluster-wide data deduplication, complying with the design constraints of shared-nothing storage systems. In particular, Grate eliminates duplicate copies across the cluster for high storage space efficiency without jeopardizing performance. We employ a distributed deduplication metadata shard, which promises high-performance deduplication metadata and duplicate fingerprint lookup I/Os without introducing a single point of failure. The placement of data and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. We decouple the deduplication metadata shard from read I/O path and replace it with a read manifestation object to further speedup read performance. To guarantee deduplication-enabled transaction consistency and efficient garbage identification, we design a flag-based asynchronous consistency scheme, capable of repairing the missing data chunks on duplicate arrival. We design and implement Grate in Ceph. The evaluation shows an average of 18% performance bandwidth improvement over the content addressable deduplication approach at smaller chunk sizes, i.e., less than 128KB while maintaining high storage space savings.
ISSN
2169-3536
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/31705
DOI
https://doi.org/10.1109/access.2020.3039056
Fulltext

Type
Article
Funding
This work was supported in part by the National Research Foundation of Korea (NRF), Korea government (MSIT), under Grant NRF-2018R1A1A1A05079398, and in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP), Korea government (MSIT) (Development of low-latency storage module for I/O intensive edge data processing) under Grant 2020-0-00104.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

HAMANDAWANA PRINCE Image
HAMANDAWANA PRINCEHAMANDAWANA, PRINCE
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.