Sunday, November 12, 2017

Zawgyi's beard, butterfly gamon, or black bat flower


Recently I've had the chance to see one of our nieces call for help on the Facebook to identify a flower via my wife's page. I at once recognized it through its shape to be what we knew as butterfly gamon which my late big sister grew a long time ago.



I googled for ဂမုန်းလိပ်ပြာ and found the following page on Wikipedia in Myanmar language:


Searching with the image of the flower on Google I found out its scientific name to be Tacca chantrieri. Looking up, I found Wikipedia's entry as:

Comparing the Myanmar version with the English version of the description of this flower I was unhappy because the Myanmar version seems to be relying too much on folklore and falls short on science as if the author(s) were entirely unaware of the English version. They could at least have given its scientific name in the Myanmar version, I thought.

This reminds me of the way modern day researchers criticized the math genius Ramanujan when he wrote about squaring the circle (Arndt and Haenel 2001, Pi Unleashed, p. 58).


What I would like to say is that including elements of Myanmar folklore in an Wikipedia article certainly makes it colorful and interesting. But they have to be pointed out as such. For example, instead of writing like
One who cares for the Zawgyi gamon is likely to win in lottery or Its leaves should not be cut off. If done, quarrels between husband and wife are likely to happen or A tea-cupful of liquid extract obtained by grinding its leaves taken for about ten days cure the coughing up of blood (consumption),
quotation marks could be placed around them. Or a phrase like “Many believe that ...” could be added to make it more explicit that we are dealing with folklore.

The Wikipedia's philosophy that an article like Zawgyi Beard Gamon which ranks as a Stub will have contributions to expand and improve it as it called for and hoped, didn't materialize for this particular case and maybe for many more. I guess that is because we Myanmars were so late in getting interested in, and involved with, Wikipedia or other sites and services on the Internet, except of course, the immensely popular Facebook.

If we Myanmars were not interested, wouldn't any non-Myanmars be? We don't know. But if they were interested, most of them may try for a Google translation, for example, to make sense out of an article like “Zawgyi Beard Gamon” in Myanmar language. This is what they would get now:

What you see is a translation where the original is not at all recognizable. It is distorted and looks funny. But in reality, it isn't a laughing matter at all. Yes, Google Translation has problems but it may not be entirely Google's fault because it is successful with other languages.

Google Translate first added Myanmar language in December 2014. According to the official Google Translate Blog:

  • Myanmar (Burmese, မြန်မာစာ) is the official language of Myanmar with 33 million native speakers. Myanmar language has been in the works for a long time as it's a challenging language for automatic translation, both from language structure and font encoding perspectives. While our system understands different Myanmar inputs, we encourage the use of open standards and therefore only output Myanmar translations in Unicode. ...
We’re just getting started with these new languages and have a long way to go. You can help us by suggesting your corrections using "Improve this translation" functionality on Translate and contributing to Translate Community.

Well, Google Translate is using 'neural machine translation engine - Google Neural Machine Translation (GNMT) - which translates "whole sentences at a time", rather than just piece by piece'. This sounded to me like they are doing something very advanced and very good.

While the technology of the translation engine is way beyond our heads, it is not hard to understand that it needs data to use in its process of grinding out translations . For that Google Translate seems to need at least a collection of the same text in a pair of languages (Myanmar and English versions, for example) of more than 150-200 million words, and another collection of more than a billion words each for Myanmar and English separately.

So it seems we could improve translations from Myanmar to other languages (not only with Google's GNMT, but possibly with other approaches) generally by making available a wider range of material to work on. That means making documents in Myanmar language in digital format widely available and in big volume - the bigger the better. That also means making sure they are in Unicode format. Why? Because, it seems so obvious.





No comments:

Post a Comment