A similarity clustering-based deduplication strategy in cloud storage systems

Journal: Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

Citation: Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.35-43

Keyword: Block fingerprint Cloud storage system Data partitioning Deduplication Similarity clustering

Mesh Keyword: Cloud computing technologies Cloud data centers Cloud storage systems Data de duplications Data partitioning Data partitioning algorithms Data preprocessing Duplicate elimination

Abstract: Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36579
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102337667&origin=inward

qrcode