Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.
ACKNOWLEDGMENT This research was supported by the MIST(Ministry of Science and ICT), Korea, under the National Program for Excellence in SW supervised by the IITP(Institute for Information & communications Technology Promotion)\ (2015-0-00908).