Ajou University repository

An Efficient Fault-Tolerant and Reliable Data Integrity Framework for Object-Based Big Data Transfer Systems
  • PREETHIKA KASU
Citations

SCOPUS

0

Citation Export

Advisor
TAE-SUN CHUNG
Affiliation
아주대학교 일반대학원
Department
일반대학원 인공지능학과
Publication Year
2022-08
Publisher
The Graduate School, Ajou University
Keyword
Big databloom filterdata integritygeo-distributed data centershigh-performance computingparallel file system
Description
학위논문(박사)--아주대학교 일반대학원 :인공지능학과,2022. 8
Alternative Abstract
Data has overwhelmed the digital world in terms of volume, variety, and velocity. Individuals, business organizations, computational science simulations, and experiments produce huge volumes of data on a daily basis. Often, this data is shared by data centers distributed geographically for storage and analysis. However, for transferring such huge volumes of data across geo-distributed data centers in a timely manner, data transfer tools are facing unprecedented challenges. Fault is one of the major challenges in distributed environments; hardware, network, and software might fail at any instant. Thus, high-speed and fault tolerant data transfer frameworks are vital for transferring data efficiently between the data centers. In this thesis, we propose a novel bloom filter-based data aware probabilistic fault tolerance (DAFT) mechanism to efficiently recover from such failures. We also propose a data and layout aware mechanism for fault tolerance (DLFT) to effectively handle the false positive matches of DAFT. We evaluate the data transfer and recovery time overheads of the proposed fault tolerance mechanisms on the overall data transfer performance. The experimental results demonstrate that the DAFT and DLFT mechanisms are very efficient in recovering from the faults while minimizing the memory, storage, computation, and recovery time overheads. Furthermore, we observe negligible impact on the overall data transfer performance. Protecting the integrity of data against the failures of various intermediate components involved in the end-to-end path of data transfer is a salient feature of big data transfer tools. Although most of these components provide some degree of data integrity, they are either too expensive or inefficient in recovering corrupted data. This necessitates the need to maintain application-level end-to-end integrity verification during data transfer. However, owing to the sheer size of the data, supporting end-to-end integrity verification with big data transfer tools incurs computational, memory, and storage overheads. In this thesis, we propose a cross-referencing bloom filter based data integrity verification framework for big data transfer systems. This framework has three advantages over state-of-the-art data integrity techniques: lower computation and memory overhead, and zero false-positive errors for a restricted number of elements. We evaluate the computation, memory, recovery time, and false-positive overhead of the proposed framework and compare them with state-of-the-art solutions. The evaluation results show that the proposed framework is very efficient in detecting and recovering from integrity errors while eliminating false-positives of the bloom filter data structure. In addition, we observe negligible computation, memory, and recovery overheads for all workloads.
Language
eng
URI
https://dspace.ajou.ac.kr/handle/2018.oak/21195
Fulltext

Type
Thesis
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Total Views & Downloads

File Download

  • There are no files associated with this item.