GRNTI 50.07 Теоретические основы вычислительной техники
BBK 3297 Вычислительная техника
The article is devoted to neural network text classification algorithms. The relevance of this topic is due to the ever-growing volume of information on the Internet and the need to navigate it. In this paper, in addition to the classification algorithm, a description is also given of the methods of text preprocessing and vectorization, these steps are the starting point for most NLP tasks and make neural network algorithms efficient on small data sets. In the work, a sampling of 50,000 English IMDB movie reviews will be used as a dataset for training and testing the neural network. To solve this problem, an approach based on the use of a convolutional neural network was used. The maximum achieved accuracy for the test sample was 90.16%.
text comprehension, natural language processing, convolutional neural networks, text classification
1. Vvedenie v analiz estestvennyh yazykov / Uchebnometodicheskoe posobie / I.V. Smirnov, 2014 g.
2. Spicyn V.G., Intellektual'nye sistemy: uchebnoe posobie /V.G. Spicyn, Yu.R. Coy; Tomskiy politehnicheskiy universitet. – Tomsk: Izd-vo Tomskogo politehnicheskogo universiteta, 2012.–176 s.
3. Fedyushkin N.A., Fedosin S. A. Ponyatie, problemy i raznovidnosti intellektual'nogo analiza teksta – Problemy i dostizheniya v nauke i tehnike. Sbornik nauchnyh trudov po itogam mezhdunarodnoy nauchnoprakticheskoy konferencii – № 3 – g. Omsk, 2016 – 206 s.
4. Haykin S. Neyronnye seti: polnyy kurs. M.: Vil'yams, 2006. - 1104 c.
5. Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arxiv.org/abs/1803.01271.
6. Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 1746–1751.
7. LeCun, Y. Efficient BackProp in Neural Networks: Tricks of the trade / Y.LeCun, L. Bottou, G. Orr, K. Muller – Springer, 1998.
8. LeCun, Y. Scaling learning algorithms towards AI / Y.LeCun, Y. Bengio – MIT Press, 2007.
9. Pennington, J., Soche, R., D. Manning, C. GloVe: Global Vectors for Word Representation [Elektronnyy resurs] Tochka dostupa: https://nlp.stanford.edu/projects/glove.
10. Zhang, X. Character-level convolutional networks for text classification / Xiang Zhang, Junbo Zhao, Yann LeCun // In Advances in Neural Information Processing Systems. - 2015. - Feb. - 649-657pp.