The proliferation of current and next-generation mobile and sensing devices has increased at an alarming rate. With these state-of-the-art devices, the global positioning system (GPS) has made remote sensing and location tracking more viable. One such query is the All Nearest Neighbor (ANN) query, which extracts and returns all data objects that are in close vicinity to all query objects. An ANN is a combination of k -nearest neighbors (kNN), and join queries. Hence, ANN has useful for applications in different domains such as transportation optimization, locating safe zones, and ride-sharing. An example of its applications is, 'find the nearest gas station for each car parking lot'. Because these applications are responsible for generating a massive number of query requests, a large amount of computation is required to return these query requests. As a single machine cannot meet this demand in this study, we propose a distributed query processing framework to process ANN queries using the Apache Spark framework. In an empirical study, our proposed framework achieved superior query efficiency and scalability compared to other methods and design alternatives.
This work was supported in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development under Grant IITP-2023-RS-2023-00255968, and in part by the Information Technology Research Center (ITRC) Support Program funded by the Korean Government (MSIT) under Grant IITP-2021-0-02051. The work of Hyung-Ju Cho was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education under Grant NRF-2020R1I1A3052713.