Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Long, Saiqin | - |
dc.contributor.author | Li, Zhetao | - |
dc.contributor.author | Liu, Zihao | - |
dc.contributor.author | Deng, Qingyong | - |
dc.contributor.author | Oh, Sangyoon | - |
dc.contributor.author | Komuro, Nobuyoshi | - |
dc.date.issued | 2020-12-01 | - |
dc.identifier.issn | 1521-9097 | - |
dc.identifier.uri | https://aurora.ajou.ac.kr/handle/2018.oak/36579 | - |
dc.identifier.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102337667&origin=inward | - |
dc.description.abstract | Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods. | - |
dc.language.iso | eng | - |
dc.publisher | IEEE Computer Society | - |
dc.subject.mesh | Cloud computing technologies | - |
dc.subject.mesh | Cloud data centers | - |
dc.subject.mesh | Cloud storage systems | - |
dc.subject.mesh | Data de duplications | - |
dc.subject.mesh | Data partitioning | - |
dc.subject.mesh | Data partitioning algorithms | - |
dc.subject.mesh | Data preprocessing | - |
dc.subject.mesh | Duplicate elimination | - |
dc.title | A similarity clustering-based deduplication strategy in cloud storage systems | - |
dc.type | Conference | - |
dc.citation.conferenceDate | 2020.12.2. ~ 2020.12.4. | - |
dc.citation.conferenceName | 26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020 | - |
dc.citation.edition | Proceedings - 2020 IEEE 26th International Conference on Parallel and Distributed Systems, ICPADS 2020 | - |
dc.citation.endPage | 43 | - |
dc.citation.startPage | 35 | - |
dc.citation.title | Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS | - |
dc.citation.volume | 2020-December | - |
dc.identifier.bibliographicCitation | Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.35-43 | - |
dc.identifier.doi | 10.1109/icpads51040.2020.00015 | - |
dc.identifier.scopusid | 2-s2.0-85102337667 | - |
dc.subject.keyword | Block fingerprint | - |
dc.subject.keyword | Cloud storage system | - |
dc.subject.keyword | Data partitioning | - |
dc.subject.keyword | Deduplication | - |
dc.subject.keyword | Similarity clustering | - |
dc.type.other | Conference Paper | - |
dc.description.isoa | false | - |
dc.subject.subarea | Hardware and Architecture | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.