In spite of lots of cross-lingual word embedding models for various languages, approaches that support cross-lingual word embedding between languages that have different word order and different origin word are lacking. In this study, we address the problem of cross-lingual word embedding between Korean and English that have different word order and origin and perform experiments to examine its performance behavior. Cross-lingual models have different levels of supervision. For training between languages which have different word order, it is essential to reduce preprocessing time. Therefore, two sentence-level alignment cross-lingual models are chosen for our experiments. Our results show that cross-lingual embedding for Korean and English without word-alignment is possible. We also analyze which bilingual tasks are proper for each trained result by comparing characteristic of each model’s trained result.
This research was supported by the MISP (Ministry of Science, ICT & Future Planning), Korea, under the National Program for Excellence in SW) supervised by the IITP (Institute for Information & communications Technology Promotion) (R22151610020001002).