Ajou University repository

A similarity clustering-based deduplication strategy in cloud storage systems
  • Long, Saiqin ;
  • Li, Zhetao ;
  • Liu, Zihao ;
  • Deng, Qingyong ;
  • Oh, Sangyoon ;
  • Komuro, Nobuyoshi
Citations

SCOPUS

3

Citation Export

DC Field Value Language
dc.contributor.authorLong, Saiqin-
dc.contributor.authorLi, Zhetao-
dc.contributor.authorLiu, Zihao-
dc.contributor.authorDeng, Qingyong-
dc.contributor.authorOh, Sangyoon-
dc.contributor.authorKomuro, Nobuyoshi-
dc.date.issued2020-12-01-
dc.identifier.issn1521-9097-
dc.identifier.urihttps://aurora.ajou.ac.kr/handle/2018.oak/36579-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102337667&origin=inward-
dc.description.abstractDeduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.-
dc.language.isoeng-
dc.publisherIEEE Computer Society-
dc.subject.meshCloud computing technologies-
dc.subject.meshCloud data centers-
dc.subject.meshCloud storage systems-
dc.subject.meshData de duplications-
dc.subject.meshData partitioning-
dc.subject.meshData partitioning algorithms-
dc.subject.meshData preprocessing-
dc.subject.meshDuplicate elimination-
dc.titleA similarity clustering-based deduplication strategy in cloud storage systems-
dc.typeConference-
dc.citation.conferenceDate2020.12.2. ~ 2020.12.4.-
dc.citation.conferenceName26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020-
dc.citation.editionProceedings - 2020 IEEE 26th International Conference on Parallel and Distributed Systems, ICPADS 2020-
dc.citation.endPage43-
dc.citation.startPage35-
dc.citation.titleProceedings of the International Conference on Parallel and Distributed Systems - ICPADS-
dc.citation.volume2020-December-
dc.identifier.bibliographicCitationProceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.35-43-
dc.identifier.doi10.1109/icpads51040.2020.00015-
dc.identifier.scopusid2-s2.0-85102337667-
dc.subject.keywordBlock fingerprint-
dc.subject.keywordCloud storage system-
dc.subject.keywordData partitioning-
dc.subject.keywordDeduplication-
dc.subject.keywordSimilarity clustering-
dc.type.otherConference Paper-
dc.description.isoafalse-
dc.subject.subareaHardware and Architecture-
Show simple item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.