Английский язык: Аннотирование и реферирование текста

268 small company that created the machine translator called Systran – the Internet version of which provided the first paragraph of this article – to cope initially with voluminous demands to translate Russian documents into English. Systran is based on rules about the source and target languages, as was IBM's original "brain" system, which relied on six rudimentary rules that govern syntax, semantics and the like. For example, the word "o" in Russian could be translated by an IBM 701 computer as either "about" or "of." If "o" followed the word " nauka " (science), it looked for the appropriate rule that told it to translate "o" as "of" – in other words, the "science of," not the "science about." The Paris-based Systran company ranks as the biggest machine translation company in the world. Even with customers that include Google, Yahoo and Time Warner's AOL, its annual revenues were just $13 million for 2004 – in an overall market for translations of all varie- ties that is estimated worldwide to total nearly $10 billion. "We're so small, and we're the largest," says Dimitris Sabatakakis, Systran's, chairman and chief executive officer. No More Rules For rule-based systems, language experts and linguists in specific languages have to painstakingly craft large lexicons and rules related to grammar, syntax and semantics to generate text in a target language. Commercial systems contain tens of thousands of grammar rules for a corpus that is made up of hundreds of thousands of words. Beginning in the late 1980s, IBM created a system for translating French into English called Candide that required knowledge of neither grammar nor syntax. It eschewed rules in favor of taking substantial bodies of already translated text, matching words between the two lan- guages (more recent systems use whole phrases) and finally deriving probabilities – based on Bayes's theorem – to estimate whether an Eng- lish word was a correct translation from the French. Another analysis that relied solely on large English texts assessed whether the word translated into English fit in grammatically with sur-

RkJQdWJsaXNoZXIy MTY0OTYy