A system of automatic translation developed for the purpose of enhancing public security

Project coordinator:

  • Dr. Krzysztof Jassem – Adam Mickiewicz University.

The institution performing the project:

  • Adam Mickiewicz University in Poznan, Department of Mathematics and Computer Science.

Project objective:

The objective of the project is to develop and implement a system of high-quality automatic translation for the purpose of enhancing international security.

Project description:

The automatic translation system to be designed will assure high quality of translation thanks to the use of language corpuses from the field of public security. Such corpuses will be used in two ways:

  1. a Polish-English dictionary counting over 1 million phrases will be elaborated;
  2. a translation memory will be built with over 8 million translation units stored.

A system of automatic translation may achieve the goal that is very difficult to achieve in human translation: uniform translation of terminology. For example, from a set of corresponding Polish and English texts from the Official Journal of the European Union, consisting of approximately 2,500,000 translation units, it was possible to automatically extract 470,000 different English phrases (traditional great dictionaries include barely 50,000-80,000 thousand phrases). A preliminary analysis indicates that approximately 60% of these phrases are terms what should be translated into Polish in one specific way (while in the analyzed documents most of the terms have two or more translations). However, translation of documents that are important to the security requires that sentences of the same content always be translated in the same way.

The work will focus most of all on assuring high quality of Polish-English and English-Polish translation. The system will also allow translation to and from other languages, in particular German, Russian, and French which are the most important from the standpoint of Poland’s geopolitical location. In order to achieve a correct syntax analysis of these languages, the researchers will use up-to-date linguistic models utilizing the existing syntax-based corpuses.

The developed tools will be used for automatic translation of European Union documents, e.g. those used in the Schengen Information System. Automatic translation will also be very useful during international mass events as it will significantly facilitate the work of organizations responsible for public safety, e.g. during the European Football Championship EURO 2012. The purpose of the research will be to develop a prototype of a speech translation system. In cooperation with the team of Professor Grazyna Demenko which works on the project of recognition of continuous speech in the Polish language, a prototype of a system for translation of speech in the Polish and the English languages has been developed.

 

Project financed by The National Centre for Research and Development