Sunday, November 12, 2017

Zawgyi's beard, butterfly gamon, or black bat flower


Recently I've had the chance to see one of our nieces call for help on the Facebook to identify a flower via my wife's page. I at once recognized it through its shape to be what we knew as butterfly gamon which my late big sister grew a long time ago.



I googled for ဂမုန်းလိပ်ပြာ and found the following page on Wikipedia in Myanmar language:


Searching with the image of the flower on Google I found out its scientific name to be Tacca chantrieri. Looking up, I found Wikipedia's entry as:

Comparing the Myanmar version with the English version of the description of this flower I was unhappy because the Myanmar version seems to be relying too much on folklore and falls short on science as if the author(s) were entirely unaware of the English version. They could at least have given its scientific name in the Myanmar version, I thought.

This reminds me of the way modern day researchers criticized the math genius Ramanujan when he wrote about squaring the circle (Arndt and Haenel 2001, Pi Unleashed, p. 58).


What I would like to say is that including elements of Myanmar folklore in an Wikipedia article certainly makes it colorful and interesting. But they have to be pointed out as such. For example, instead of writing like
One who cares for the Zawgyi gamon is likely to win in lottery or Its leaves should not be cut off. If done, quarrels between husband and wife are likely to happen or A tea-cupful of liquid extract obtained by grinding its leaves taken for about ten days cure the coughing up of blood (consumption),
quotation marks could be placed around them. Or a phrase like “Many believe that ...” could be added to make it more explicit that we are dealing with folklore.

The Wikipedia's philosophy that an article like Zawgyi Beard Gamon which ranks as a Stub will have contributions to expand and improve it as it called for and hoped, didn't materialize for this particular case and maybe for many more. I guess that is because we Myanmars were so late in getting interested in, and involved with, Wikipedia or other sites and services on the Internet, except of course, the immensely popular Facebook.

If we Myanmars were not interested, wouldn't any non-Myanmars be? We don't know. But if they were interested, most of them may try for a Google translation, for example, to make sense out of an article like “Zawgyi Beard Gamon” in Myanmar language. This is what they would get now:

What you see is a translation where the original is not at all recognizable. It is distorted and looks funny. But in reality, it isn't a laughing matter at all. Yes, Google Translation has problems but it may not be entirely Google's fault because it is successful with other languages.

Google Translate first added Myanmar language in December 2014. According to the official Google Translate Blog:

  • Myanmar (Burmese, မြန်မာစာ) is the official language of Myanmar with 33 million native speakers. Myanmar language has been in the works for a long time as it's a challenging language for automatic translation, both from language structure and font encoding perspectives. While our system understands different Myanmar inputs, we encourage the use of open standards and therefore only output Myanmar translations in Unicode. ...
We’re just getting started with these new languages and have a long way to go. You can help us by suggesting your corrections using "Improve this translation" functionality on Translate and contributing to Translate Community.

Well, Google Translate is using 'neural machine translation engine - Google Neural Machine Translation (GNMT) - which translates "whole sentences at a time", rather than just piece by piece'. This sounded to me like they are doing something very advanced and very good.

While the technology of the translation engine is way beyond our heads, it is not hard to understand that it needs data to use in its process of grinding out translations . For that Google Translate seems to need at least a collection of the same text in a pair of languages (Myanmar and English versions, for example) of more than 150-200 million words, and another collection of more than a billion words each for Myanmar and English separately.

So it seems we could improve translations from Myanmar to other languages (not only with Google's GNMT, but possibly with other approaches) generally by making available a wider range of material to work on. That means making documents in Myanmar language in digital format widely available and in big volume - the bigger the better. That also means making sure they are in Unicode format. Why? Because, it seems so obvious.





Saturday, November 4, 2017

Confessions of a dumb blogger- III


Instead of the “clean” HTML as in my last post, what if I knit the .Rmd document to get the standard output? I tried that and then opened the result in Chrome browser. I got a beautifully formatted page.


Can blogger handle this HTML file?

To test that I opened this file in Chrome and viewed and copied its page source. In Blogger I pasted the source code into the HTML sheet. The preview showed the above post, but it didn't appear right. If you know HTML programming, you may be able to edit it, but I guess there will be quite some work to do!

First, let's look at the page source in Chrome. For the benefit of myself and my fellow dummies I exported the page source to pdf format in letter page size. It was a staggering 122 pages with only the last three pages contain the information we want to display.


So, Kyle was right: the HTML file that R produces is probably going to be swarmed with hundreds (or, gasp, thousands) of lines of disgusting Javascript”.

On the other hand, Kyle was talking about rendering RMarkdown files to HTML for blogging. I guess that the judgment “disgusting” is only relative, and if only we were to be working for reproducible research these would turn into lines of “lovely” Javascript. But that's another matter.

For our purpose if we could remove the superfluous matter from this source file manually, we might get the slim HTML source we want to get. So I tried using Notepad++ for editing the HTML page.

Here's what you see when the HTML file is opened in Notepad++.


Here I could try deleting what looks like superfluous matter (remember I have zero knowledge of HTML) so that I would get “clean” HTML. Then I could open it in Chrome (or other browser) via the “Run” menu. If it is not right I could undelete and try again.

And then through trial and error I finally ended up with exactly the same “clean” HTML as with Kyle's approach (see my last post). Well, that's not surprising at all because Kyle's approach is just the programmatic way of doing what I've done manually, with the added benefit of having no room for error!



Thursday, November 2, 2017

Confessions of a dumb blogger- II


I was perfectly happy with my make do solution of Open Office-StackEdit-Gimp trio for publishing to Blogger. However, I did strayed a little bit into—to choose the biggest name for a related subject—reproducible research. As noted in my last post, replacing pictures of R code in my posts with code that could be copied and run in R has been my significant achievement only very recently.

Without really understanding what is involved in reproducible research, I like this claim that “reproducibility was essential to pass wisdom on to the next generation”. For me, I should rephrase it as reproducibility is essential to pass the sense of healthy curiosity to my fellow dummies in the context of my blogs by way of allowing “the readers of my paper to verify and reproduce my computational experiments”. Obviously, the word post has to be substituted for “paper”.

So it is natural that I would peep into the world of reproducible research which the CRAN Task View says:

The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified. Packages in R for this purpose can be split into groups for: literate programming, package reproducibility, code/data formatting tools, format convertors, and object caching.The primary way that R facilitates reproducible research is using a document that is a combination of content and data analysis code.”

I wasn't too serious or ambitious with reproducible research. It was way above me. If I could grab some technique from there for packaging content and data analysis code to be able to publish to Blogger in a tidy way, I would be more than happy and I guess R Markdown may be the right package for that.

With the last quote (from Kyle Wurtz, September 21, 2014) of my last post, the author warned of HTML produced by RStudio would be too sophisticated to be handled by your blogging site. He then showed a work around:

It’s quite easy to get around this by using the knitr and markdown packages to manually to render your markdown file to HTML. The first step is to use knitr to turn the .Rmd into a regular .md markdown file. The second step is to use the markdownToHTML function from the markdown package to render the .md to an HTML file. The key is in the second step – the default for fragment.only is FALSE, which embeds all that nasty Javascript into the HTML file. But setting this to TRUE produces an HTML file without the header and body tags, CSS, and Javascript.

I followed Kyle's advice and it worked as good as using StackEdit. The difference is that I need to do a bit more work. Again I start with my post written in Open Office Writer, complete with narrative, R code, and pictures. I assume you know just a little bit of working with RStudio (like me), so:
  1. I created a new project in RStudio and created a new Rmarkdown document with the default option to output an HTML document. For that purpose I already have my example post with the titleOne Thousand Decimal Digits of Pi prepared with Open Office Writer. My idea is to present my R code for creating 1000 decimal digits of Pi using a Spigot algorithm.
  2. In the R Markdown pane of RStudio, I copied and pasted the first part of the narrative:
  1. The block of R code is pasted between line-18 and line-84.
  2. The last line of narrative is pasted at line-85.
  1. The R Markdown file is saved as spigotPI.Rmd.
  2. As advised by Kyle I run these four lines on the console:
      library(knitr)
library(markdown)
knit("spigotPI.Rmd") # produces a .md file
markdownToHTML("spigotPI.md", "spigotPI_1.html", fragment.only=TRUE) # produces clean .html

  1. I previewed the resulting "spigotPI_1.html" file in Blogger. It came out well with narrative, code, and output of 1000 pi digits—except for the name of the first author—on the first line of my example blog post shown below.


One Thousand Decimal Digits of Pi


“According to Jörg Arndt and Christoph Haenel, thirty-nine digits are sufficient to perform most cosmological calculations, because that is the accuracy necessary to calculate the volume of the known universe with a precision of one atom.” (Pi - Wikipedia, the free encyclopedia)

For everyday use pi=3.1416 would be quite sufficient. Being human, crunching out record breaking number of digits, in trillions now-a-days, is the full time job of some serious scientists as well as amateurs. Now with a laptop and some limited knowledge, anyone can produce a respectable number of pi digits way beyond the need for measuring the dimensions of the universe!
The R script below was an improvement over my attempt at generating 1000 decimal digits of pi using my laptop with i5 processor, 8GB RAM, and a method of calculation known as a spigot algorithm. The output of the original script was shown in my post Tea-shop PI- I of September 10, 2014.

    # Using Gosper's page's formula from
    # "The world of Pi - Spigot algorithm.pdf" by Rabinowitz et al.
    # Faster than orginal spigot (42.96 secs), here (17.34 secs)

    # (1) Initialization
    ## n is the desired number of decimal digits
    n <- 1002
    len <- ceiling(0.9*n)
    p <- 0
    nines <- 0
    ## create array and initialize, subscript 1 to len
    a <- array(dim=len)
    a <- (function(x){ 3+5*x })(0:(len-1))
    # (2) iteration
    for (j in 1:n) {
       ## multiply with new base 10
       a <- a*10

       ## normalize
       for (i in len:2){

         q <- a[i]%/% ((4+3*(i-2))*(4+3*(i-2)+1)*3)
         r <- a[i]%% ((4+3*(i-2))*(4+3*(i-2)+1)*3)

         a[i] <- r
         a[i-1] <- a[i-1]+ (q*(1+i-2)*(2*(i-2)+1))
       } # end for i

       ## calculate next provisional digit of pi
       q <- a[1]%/%10
       #p <- q
       a[1] <- a[1]%%10 

       ## correct the old provisional digits

       if (q == 9){
                nines <- nines + 1  # no digits ouput
          } else if (q == 10){
                p <- p + 1
            write(p,file="Spigot2-PI-digits.txt",append=TRUE)
                if (nines > 0){
                  for (k in 1:nines){
                write(0,file="Spigot2-PI-digits.txt",append=TRUE)
                  } # end for
                }
                p <- 0
                nines <- 0
           } else {
            write(p,file="Spigot2-PI-digits.txt",append=TRUE)
                if (nines > 0){
                  for (k in 1:nines){
                write(9,file="Spigot2-PI-digits.txt",append=TRUE)
                  } # end for
                }
                p <- q
                nines <- 0
          } #end if
    } # end for j

    # sink()
    # rm(list=ls())
    x <- scan("Spigot2-PI-digits.txt")
    write(x,file="",ncolumns = 50, sep="")

## 03141592653589793238462643383279502884197169399375
## 10582097494459230781640628620899862803482534211706
## 79821480865132823066470938446095505822317253594081
## 28481117450284102701938521105559644622948954930381
## 96442881097566593344612847564823378678316527120190
## 91456485669234603486104543266482133936072602491412
## 73724587006606315588174881520920962829254091715364
## 36789259036001133053054882046652138414695194151160
## 94330572703657595919530921861173819326117931051185
## 48074462379962749567351885752724891227938183011949
## 12983367336244065664308602139494639522473719070217
## 98609437027705392171762931767523846748184676694051
## 32000568127145263560827785771342757789609173637178
## 72146844090122495343014654958537105079227968925892
## 35420199561121290219608640344181598136297747713099
## 60518707211349999998372978049951059731732816096318
## 59502445945534690830264252230825334468503526193118
## 81710100031378387528865875332083814206171776691473
## 03598253490428755468731159562863882353787593751957
## 78185778053217122680661300192787661119590921642019
## 89

    # end spigot

Conclusion: If this dummy can do pi digits, so can you!