сотрудник с 01.01.2014 по 01.01.2019
Академия "Bolashaq" (Кафедра иностранных языков и межкультурной коммуникации, старший преподаватель)
сотрудник с 01.01.2014 по 01.01.2019
Караганда, Казахстан
УДК 81 Лингвистика. Языкознание. Языки
УДК 31 Статистика. Демография. Социология
ГРНТИ 14.85 Технические средства обучения и учебное оборудование
ГРНТИ 16.31 Прикладное языкознание
ОКСО 02.06.01 Компьютерные и информационные науки
ОКСО 02.04.02 Фундаментальная информатика и информационные технологии
ББК 811 Прикладное языкознание
ТБК 5004 Изобретательство. Рационализаторство
ТБК 841 Общее и прикладное языкознание
ТБК 8410 Общие вопросы. Лингвистика
ТБК 8419 Прикладное языкознание
В этой статье описывается авторский подход к анализу частотности биологических терминов в печатных текстах школьных учебников полного курса биологии общеобразовательных школ, позволяющий сократить затраты по времени на обработку текста примерно в десять раз. Авторы статьи анализируют преимущества и недостатки приложений для текстового анализа и голосовой обработки и предлагают свой алгоритм быстрого подсчета ключевых терминов в традиционных текстах. Данное исследование выполнено в рамках грантового проекта КН МОН РК на тему «Создание трехъязычного словаря биологических терминов полного курса биологии с лингвокультурологическим компонентом».
частотность, анализ частотности, биологические термины, подсчет, алгоритм, приложения, интеллектуальный анализ текста, печатные тексты.
Introduction
Nowadays, text mining, aimed at processing textual data, still seems one of the vital core methods in numerous philological and corpus linguistics projects [1]. Text frequency analysis deals with words or their clusters used in documents, with the help of which one can identify similarities or difference as well as their relations to other variables of interest in the data mining project [1]. Text mining is used in such areas as linguistics, marketing, computer science, and social studies – wherever researchers use the frequency of lexical units for a better understanding of the internet users’ keywords search [2]. Plenty of digital resources deal with a text frequency analysis and, as a result, provide new ways of creating, processing, and analyzing such data through the computer. However, few methods suit the text mining of the printed sources; so such issue should be raised and solved to simplify the research flow and reduce the time consumption. Research Issues and Objectives Frequency analysis became imperative for the authors of this article and the participants of a grant funding project on the creation of a dictionary of the biological terms with a linguacultural component, designed for Kazakhstani secondary school students, studying biology in English, according to the program “The Trinity of Languages [3].” One of the project stages involved the biological terms frequency analysis in the Kazakhstani textbooks of the entire school course of biology that later would be used for the creation of a significant vocabulary database.Having found only six digital format biology course textbooks for 5th, 7th, and 8th grades, the researchers faced some challenges regarding the rest of all books, existed only in printed hardcopies. So, the calculation of terms frequency turned into quite a laborand time-consuming work. Although such software as Acrobat Reader has the function of text recognition, the process of one textbooks scanning took about one hour and a half. Moreover, Kazakh texts were poorly recognized as well as Russian and English pages scanned in somewhat sufficient quality. The researchers tried to count the words manually by looking through the books; but, it often led to mistakes caused by the attention distraction. Thus, the need for creating a comfortable and people-friendly method of implementing the qualitative
analysis of any printed texts built the foundation for the following study. Research Methods and Variables The quantitative and qualitative empirical research methods were applied in the study on the designing the brand-new and comfortable way of printed text mining was applied to the 7th, 8th, 9th, 10th, and 11th grades textbooks of natural course for Kazakhstani secondary schools. Free ten tools for text frequency analysis, both online and offline, have been examined and compared regarding their possibilities of the words and word combinations frequency.
1. www.statsoft.com, n.d. Text Mining (Big Data, Unstructured Data) [WWW Document]. Support Vector Machines (SVM). URL: http://www.statsoft.com/Textbook/Text-Mining#overview (accessed 5.17.18).
2. Kobayashi V.B., Berkers H.A., Mol S.T. 2017. Text Mining in Organizational Research [WWW Document]. Philosophy of the Social Sciences. URL: http://journals.sagepub.com/doi/full/10.1177/1094428117722619 (accessed 5.17.18).
3. Официальный сайт Парламента Республики Казахстан [WWWDocument], n.d. [WWWDocument]. URL: http://www.parlam.kz/ru/presidend-speech/5 (accessed 5.16.18).
4. Семантический анализ текста онлайн, seo-анализ текста / Адвего [WWW Document], n.d. [WWW Document]. Адвего. URL: https://advego.com/text/seo/ (accessed 5.17.18).
5. Семантический нализ текста онлайн [WWW Document], n.d. [WWW Document]. Семантический анализ текста онлайн / istio.com — белое SEO. URL: https://istio.com/rus/text/analyz/ (accessed 5.17.18).
6. wordTabulator [WWW Document], n.d. [WWW Document]. SourceForge. URL: http://wordtabulator.sourceforge.net/ (accessed 5.17.18).
7. Simagin, A., n.d. Семантический анализ текста [WWW Document]. «Majento» — Продвижение Web-проектов. URL: http://www.majento.ru/index.php?page=seo-analize/text-semantic/index (accessed 5.17.18).
8. SEO [WWW Document], n.d. [WWW Document]. 1Y.ru. URL: http://1y.ru/text.php (accessed 5.17.18).
9. Анализ текста по закону Ципфа [WWW Document], n.d. [WWW Document]. PR-CY. URL: http://pr-cy.ru/zypfa/text (accessed 5.17.18).
10. Биржа копирайтинга, проверка текста на уникальность [WWW Document], n.d. [WWW Document]. Text.ru. URL: https://text.ru/ (accessed 5.17.18).
11. [WWW Document], n.d. [WWW Document]. Семантический анализ текста онлайн. URL: https://itop.media/tools.php?i=semantics (accessed 5.17.18).
12. TextAnalyzer: Универсальный анализатор текстов [WWW Document], n.d. [WWW Document]. Text Analyzer — Универсальный анализатор текста. URL: https://www.textanalyzer.ru/ (accessed 5.17.18).
13. Wladm, 2018. Lit Frequency Meter [WWW Document]. Software Informer. URL: http://litfrequencymeter.software.informer. com/5.2/ (accessed 5.17.18).
14. Speech to Text Online Notepad. Free [WWW Document], n.d. [WWW Document]. Speechnotes. URL: https://speechnotes.co/ (accessed 5.17.18).
15. ListNote Speech-to-Text Notes — Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https:// play.google.com/store/apps/details?id=com.khymaera.android.listnotefree&hl=en (accessed 5.17.18).
16. Speech to Text Translator TTS — Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https:// play.google.com/store/apps/details?id=com.fsm.speech2text&hl=en_US (accessed 5.17.18).
17. Voice Text — Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https://play.google.com/store/apps/details?id=com.matthew.rice.voice.text&hl=en (accessed 5.17.18).