Ajou University repository

A similarity clustering-based deduplication strategy in cloud storage systems
  • Long, Saiqin ;
  • Li, Zhetao ;
  • Liu, Zihao ;
  • Deng, Qingyong ;
  • Oh, Sangyoon ;
  • Komuro, Nobuyoshi
Citations

SCOPUS

3

Citation Export

Publication Year
2020-12-01
Journal
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Publisher
IEEE Computer Society
Citation
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.35-43
Keyword
Block fingerprintCloud storage systemData partitioningDeduplicationSimilarity clustering
Mesh Keyword
Cloud computing technologiesCloud data centersCloud storage systemsData de duplicationsData partitioningData partitioning algorithmsData preprocessingDuplicate elimination
All Science Classification Codes (ASJC)
Hardware and Architecture
Abstract
Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.
ISSN
1521-9097
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/36579
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102337667&origin=inward
DOI
https://doi.org/10.1109/icpads51040.2020.00015
Type
Conference Paper
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.