This article offers an empirical exploration on the efficient use of word-level convolutional neural networks (word-CNN) for large-scale text classification. Generally, the word-CNNs are difficult to train on large-scale datasets as the size of word embedding dramatically increases as the size of vocabulary increases. In order to handle this issue, this paper presents a de-noise approach to word embedding. We compare our model with several recently proposed CNN models on publicly available dataset. The experimental results show that proposed method improves the usefulness of word-CNN and increases the accuracy of text classification.
This research was supported by the MISP(Ministry of Science, ICT & Future Planning), Korea, under the National Program for Excellence in SW) supervised by the IITP(Institute for Information & communications Technology Promotion).