A similarity clustering-based deduplication strategy in cloud storage systems

Long, Saiqin; Li, Zhetao; Liu, Zihao; Deng, Qingyong; Oh, Sangyoon; Komuro, Nobuyoshi

DC Field	Value	Language
dc.contributor.author	Long, Saiqin	-
dc.contributor.author	Li, Zhetao	-
dc.contributor.author	Liu, Zihao	-
dc.contributor.author	Deng, Qingyong	-
dc.contributor.author	Oh, Sangyoon	-
dc.contributor.author	Komuro, Nobuyoshi	-
dc.date.issued	2020-12-01	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/36579	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102337667&origin=inward	-
dc.description.abstract	Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.	-
dc.language.iso	eng	-
dc.publisher	IEEE Computer Society	-
dc.subject.mesh	Cloud computing technologies	-
dc.subject.mesh	Cloud data centers	-
dc.subject.mesh	Cloud storage systems	-
dc.subject.mesh	Data de duplications	-
dc.subject.mesh	Data partitioning	-
dc.subject.mesh	Data partitioning algorithms	-
dc.subject.mesh	Data preprocessing	-
dc.subject.mesh	Duplicate elimination	-
dc.title	A similarity clustering-based deduplication strategy in cloud storage systems	-
dc.type	Conference	-
dc.citation.conferenceDate	2020.12.02.~2020.12.04.	-
dc.citation.conferenceName	26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020	-
dc.citation.edition	Proceedings - 2020 IEEE 26th International Conference on Parallel and Distributed Systems, ICPADS 2020	-
dc.citation.endPage	43	-
dc.citation.startPage	35	-
dc.citation.title	Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS	-
dc.citation.volume	2020-December	-
dc.identifier.bibliographicCitation	Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.35-43	-
dc.identifier.doi	10.1109/icpads51040.2020.00015	-
dc.identifier.scopusid	2-s2.0-85102337667	-
dc.subject.keyword	Block fingerprint	-
dc.subject.keyword	Cloud storage system	-
dc.subject.keyword	Data partitioning	-
dc.subject.keyword	Deduplication	-
dc.subject.keyword	Similarity clustering	-
dc.type.other	Conference Paper	-
dc.identifier.pissn	15219097	-
dc.description.isoa	false	-
dc.subject.subarea	Hardware and Architecture	-

Show simple item record

qrcode

트윗하기

Related Researcher

Oh, Sangyoon오상윤: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download