CREATING THE DOMAIN VOCABULARY ON THE BASIS OF AUTOMATED ANALYSIS OF UKRAINIAN TEXTS

Author:

Kungurtsev Alexei, Odessa National Polytechnic University, Odessa, Ukraine

Kovalchuk Serhiy, Odessa National Polytechnic University, Odessa, Ukraine

Potochniak Iana, Odessa National Polytechnic University, Odessa, Ukraine

Shirokostup Maxim, Odessa National Polytechnic University, Odessa, Ukraine

Language: ukrainian

Annotation:

Urgency of the research. For today information systems often used in various fields. According to this, increases the number of users, who want to get access to information stored in information systems.

Target setting. Using of standard queries to the relational database is not valuable, because this standard queries can’t consider all users desire. The best option for that is natural language with interface.

Actual scientific researches and issues analysis. In recent studies of creating the domain vocabulary methods usage the Porter stemming algorithm to find keywords. On the basis scientific papers of general statements text characterized only by the number of keywords in the text.

Uninvestigated parts of general matters defining. Despite of the fact that a lot of research has been done on this topic, they do not completely solve the problem of creating the domain vocabulary.

The research objective. Aim of the paper is the develop the method of creating the domain vocabulary on basis automated analysis of Ukrainian texts.

The statement of basic materials. Developed the method of creating the domain vocabulary on basis automated analysis of Ukrainian texts. Developed the algorithms for automated analysis of Ukrainian texts. Developed the algorithms for search process of the sentences. Invited the method of description the tables of terms and formation process of the tables of terms. Described the selection process of the terms and saving them in table of terms.

Conclusions. Developed the method of creating the domain vocabulary on basis automated analysis of Ukrainian texts allows reduces the time to build vocabulary. Studies have shown that the creation of the dictionary was reduced 5 times compared to manually creating dictionary.

Key words:

information system, synonym, term, parser, domain knowledge

References:

1. Bolshakova, E.I., Klyshinskii, E.S., Lande, D.V., Noskov, A.A., Peskova, O.V., Yagunova, E.V. (2011). Avtomaticheskaia obrabotka tekstov na estestvennom iazyke i kompiuternaia lingvistika [Automatic processing of natural language texts and computational linguistics]. Moscow: MIEM (in Russian).

2. Bates, M. (1995). Models of natural language understanding. Proceedings of the National Academy of Sciences of the United States of America, vol. 92, no. 22, pp. 9977–9982.

3. Kunhurtsev, O.B., Barykina, I.V., Zavalin, O.A. (2004). Formuvannia shabloniv zapytiv do reliatsiinoi bazy danykh z vykorystanniam obiektnoho slovnyka [Creating standard queries to the relational database using object dictionary]. Naukovi pratsi ONKhAT – Proceedings of ONAFT, issue 27, pp. 233–236 (in Ukrainian).

4. Akimov, O.M., SHaptcev, V.A. (2009). Intellektualizatciia interfeisa bazy dannykh [Intellectualization database interface]. Izvestiia Tomskogo politekhnicheskogo universiteta – Bulletin of the Tomsk Polytechnic University, vol. 314, no. 5, pp. 137–139 (in Russian).

5. Steven Bird, Ewan Klein, Edward Loper (2009). Natural Language Processing with Python. Published by O’Reilly Media.

6. Metodologii modelirovaniia predmetnoi oblasti [Methodologies domain simulation]. Retrieved from http://www.intuit.ru/studies/courses/2195/55/lecture/1628.

7. Ruvinskaia, V.M., Troinina, A.S., Siliaev, D.A. (2015). Slovar predmetnoi oblasti dlia razrabotki ekspertnoi sistemy [Domain vocabulary for the development of an expert system]. Proceedings of the Trudy Mezhdunarodnoi nauchno tekhnicheskoi konferentcii “Informatcionnye tekhnologi v metallurgii i mashinostroenii” – International scientific conference “Information technologies in metallurgy and machinebuilding”(Dnepropetrovsk 24–26.03.2015). Dnepropetrovsk (in Russian).

8. Chernega, K.S., Tymchenko, B.I., Komleva, N.O. (2016). Decision support System for Automated Medical Diagnostics. Electrotechnic and Computer Systems. Kiev: Science and Technology, no. 23 (99), pp. 65–72.

9. Ruvinska, V., Troynina, A., Berkovich, E., Bilovzorov, A. (2015). Rules of Expert System for Safety Monitoring: Checking on Complete­ness and Consistency. Pratsi Odeskoho politekhnichnoho universytetu – Pratsi ONPU, issue 2 (46), pp. 103–110.

10. Liubchenko, V.V. (2013). O nekotorykh svoistvakh modelei predmetnykh oblastei informatcionnykh sistem [About some properties of the subject areas of information systems models]. Proceedings of the Trudy 14-i Mezhdunarodnoi nauchno-prakticheskoi konferentcii “Sovremennye informatcionnye i elektronnye tekhnologii” – 14th International scientific-practical conference ”Modern information and electronic technologies” (May 27-31, 2013, Odessa). Odessa, vol. 1, pp. 81–82 (in Russian).

11. Kungurtsev, А.B., Potochniak, I.B. (2014). Interfeis dlia obshcheniia polzovatelei s informatcionnymi sistemami na estestvennom iazyke [User interface for users communication with information systems in a natural language]. Elektrotekhnicheskie i kompiuternye sistemy – Electrical and computer systems, no. 14 (90), pp. 74–81 (in Russian).

12. Neobkhodimost vydeleniia kliuchevykh slov dlia svertyvaniia teksta [Selection of keywords for clotting text]. Retrieved from http://www.scienceforum.ru/2014/476/70.

13. LanguageToolRetrieved from https://languagetool.org/uk/.

14. Bisikalo, O.V., Vysotska, V.A. (2016). Vyiavlennia kliuchovykh sliv na osnovi metodu kontent-monitorynhu ukrainomovnykh tekstiv [Identifying keywords on the basis of content monitoring method in Ukrainian texts]. Radioelektronika, informatyka, upravlinnia – Radio Electronics, Computer Science, Control, no. 1 (36), pp. 74–83 (in Ukrainian).

15. Kliuchevye slova v tekste kak pravilno upotrebliat? [The key words in the text – how to use?]. Retrieved from http://dimokfm.ru/klyuchevyie-slova/.

16. Informatcionnye tekhnologii upravleniia Metody poiska tekstovoi informatcii [Information Technology Management. Search methods of textual information]. Retrieved from https://refdb.ru/look/2575304-p10.html.

17. Indeksy. Teoreticheskie osnovy [Indexes. Theoretical basis]. Retrieved from http://www.sql.ru/articles/mssql/03013101indexes.shtml.

18. Algoritm Boiera – Mura [Boyer–Moore string search algorithm]. Retrieved from http://www.algolib.narod.ru/Search/BoyerMur.html.

19. Kungurtcev, A.B., Potochniak, Ya.V., Siliaev, D.A. (2015). Metod avtomatizirovannogo postroeniia tolkovogo slovaria predmetnoi oblasti [Method automated construction explanatory vocabulary domain].Tekhnologicheskii audit i rezervy proizvodstva – Technology audit and production reserves, vol 2, no. 2 (22), pp. 58–63 (in Russian).

20. Kungurtcev, A.B., Gavrilova, A.I., Leongard, A.S., Potochniak, Ya.V. (2016). Uchet mezhfrazovykh sviazei pri avtomatizirovannom postroenii tolkovogo slovaria predmetnoi oblasti [Accounting of inter-phrase communication for automated construction the explanatory dictionary of domain knowledge]. Informatika i matematicheskie metody v modelirovanii – Informatics and Mathematical Methods in modeling, no. 2, pp. 173–183 (in Russian).

Download