Monday, March 16, 2009

I wonder how is the knowledge of the outside world is represented in the current AI models?..

Just had a thought that we, internally, don't represent separately the "data" and the "actions". Rather, we see the world as "objects", where data and possible actions are combined.

For example, an apple:

- is a member of a class "fruit", and as such, it can be: ripe, unripe, good, rotten; we can eat it, peel it (if it has a property "detachable skin"); cook it (if a property "cookable" is on, according to our representation of this class), slice it;
- it is also a member of another (sub)class "small round object"; as such, we can throw, catch it, pick it up, lay it down, etc;- it is a member of a class "objects with a value", so it can be sold and bought;
- ...
- last but not least, as a member of a class "real object" it can be seen and touched, etc...

This pretty much the same in every language, and it will not pose a big problem in translation to most of the languages originated on Earth (at least, to these which speakers have seen apples on routine basis).

Now, there is a second part:
- "Apple of discord" could be understood only if you know some Greek mythology;
- "Adam's apple" requires yet another piece of cultural knowledge
- "Apple a day keeps a doctor away" is a set phrase in the English language, and its meaning would be certainly understood by any other reasonable human being, yet (for example) no Russian speaker (if he doesn't know English) would say something like that - Russian language doesn't have this knowledge incorporated as a set phrase! (The closest one which I could remember has a bit different meaning, but still represents the idea of escaping the guys and girls in the white coats the following way: "If you want to stay healthy, stay away from them doctors!" No apples or other means are mentioned though)

There are plenty of other connotations for an apple, involving various fairy tales, stories, movies and pictures. Some of them belong to one language, another require some cultural or social background to be understood. This is the most difficult part when it comes to conveying the meaning, in my opinion.

Apart from that, "apple" in English rhymes with "grapple" (I'd also try to use "ample") and it's a 1-syllable word (as far as I understand). "Apple" in Dutch becomes "appel" (2 syllables), and in Russian "yabloko" (same origin, 3 syllables. Not sure about Dutch, but in Russian it does not easily rhyme with anything. This can pose a problem but problems of this sort might, technically, be tried to get solved by brute force, provided the translation engine has enough knowledge of both target and source languages and instructed to look after the words with similar sound pattern (but there is no guarantee that such solution would exist).

When I am thinking about the second part, I feel a sort of despair. How could we correctly get the meaning of more complex utterances than mere descriptions of an outside world across the language barrier?

It seems to be that in order to understand an idea in more or less the same context as was intented by the person who expressed it, the recipient has to be made aware of the context (to the necessary extent). To achieve this, the translators of old times (and good translator of new times) have been adding to the books dozens of comments and explanations, where they thought them to be necessary for the average consumer of their work. It will still require some conscious effort on the recipient's part, of course.

Another approach was to translate the context so that the recipient will get "the same idea" but in the context to which he/she is accustomed. This has been mostly done in children literature. The most prominent examples in Russian are completely "localized" versions of Pinokkio (known as "The adventures of Buratino", where there is a capricious blue-haired doll instead of the Blue Fee and the little wooden boy doesn't want to become a real boy at all!) and "The Wizard of Oz" ("The Wizard of Emerald City", which was so popular that the author of the translation/migration wrote 6 sequels to the story; as to my knowledge, nobody it Russia is much interested in the original even now). The more recent attempts of this approach manifested in various coutries for the translations of Harry Potter series (having translated them up to letter with miriads of comments would have definitely spoiled the fun for the little readers).

The summary is that up to now there is no predefined procedure to decide which approach to undertake in every particular case, and no measure of success for any particular translation. After all, sharing the same native language is not enough to correctly understand the other person; social and cultural context is equally important. What do we want to share: the ideas which we came across, the emotions which we feel, something else?.. Will we ever be able to share both the ideas and the emotions in the adequate way, so that the language barriers become transparent? That would be the good AI usage for me...