Thursday, September 24, 2015

R vs. XXX


Comparing R with some well-known commercial statistical software like SPSS, SAS, or Stata comes up time to time. From answers on stackoverflow five years ago, I like this one by Greg Snow and it still looks relevant:

When talking about user friendlyness of computer software I like the analogy of cars vs. busses:

Busses are very easy to use, you just need to know which bus to get on, where to get on, and where to get off (and you need to pay your fare). Cars on the other hand require much more work, you need to have some type of map or directions (even if the map is in your head), you need to put gas in every now and then, you need to know the rules of the road (have some type of drivers licence). The big advantage of the car is that it can take you a bunch of places that the bus does not go and it is quicker for some trips that would require transfering between busses.

Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.
R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the pasenger seat, and mountain climbing and spelunking gear in the back.

R can take you anywhere you want to go if you take time to leard how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS.”

And he added:

There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.”

The rest of the discussions you could read on, but my favorite is this one from a student I'd quoted in one of my earlier posts in connection with using R for econometrics. I am repeating here the answer that appeared on Quora in 2014.

Karem Tuzcuoglo, a PhD candidate in economics at Columbia explains:
"One-Click" Programs ((almost) no coding required, results obtained by one click)
STATA: Most of the econ undergrad programs use STATA. It is the best program (even at the PhD level) if you want to estimate panel data (i.e., where the data hava both cross sectional and time series dimension. Typical examples are surveys and international trade data sets).Eviews: Less famous than Stata, but provides much better time series analysis. If you don't want to do time series forget about Eviews.SPSS: I don't have much information about it. But I can tell that it's not widely used.
"Semi-Coding" Programs
SAS: It used to be a big deal 10-20 years ago. Right now not as famous as before - though there are some companies that still strictly prefer using SAS.R: Maybe the most popular program nowadays. First of all it's free! R network and R packages (pre-written algorithms by others) are getting larger and larger. Actually, R can be listed in the next section as well because one can definitely code everything in R. However, the fact that there are so many ready-to-use packages in R makes it also Semi-Coding program if one wants to.
"Pure-Coding" Programs
MATLAB: The most famous program among (high level) econometricians. Many applied economics have been done by Matlab. A lot of researchers put their Matlab codes online. It has a good Econometrics package - one still needs to code though.PYTHON: It's more powerful and faster than Matlab. However, it's a very new language; it's still developing. C++: If one wants to do hardcore coding, then C++ is the ultimate program. It's extremely fast in terms of computation (once, my simulation took 25 hours in Matlab, whereas C++ ran the code in 3-4 hours).FORTRAN: Professors above 55+ age will know this program. It's (almost) not used anymore - though we should show some respect to the Father of Coding Programs!
BONUS: There are several other programming languages of course. If you are in UK (especially in Oxford), you will end up using a program called Ox., which is an optimized program for matrix algebra and, thus, for econometrics. Gretl is an extremely easy to use - but less to offer- program.

Among all of the options, I would suggest you to learn R regardless of whether you want to work in academia or in industry (more and more companies begin using R by the way). But if you want to stay away from coding, then go for STATA.

A heated debate on R vs. SAS started in August 2012 in Cross-Validated with the question “R vs SAS, why is SAS preferred by private companies?" The last entry was on April 2014. Among the posters Frank Harrell is the maintainer of the Hmisc package in R. “The package contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX code, and recoding variables.”

Below I've biased myself by selecting only Frank Harrell's comments from the debate and have them taken out of context. However, I just wanted them to serve as teasers for the whole discussion.

  • I'm not sold by @PeterFlom 's point either. There are about 4000 packages in R. Not all have to be of the highest quality for add-on packages to have a net positive value. The number of reliable add-on packages exceeds the capabilities of SAS by a huge margin. (Aug 6 '12 at 20:06)
  • True, but it's hard to penalize a statistical computing system for its comprehensiveness. Or to say it another way, R's way of doing something is better than another system's way of not doing it. (Aug 7 '12 at 12:29)
  • I think these comments are not correct. In the server world, open source rules, and the Apache web server is the most popular web server. (Aug 11 '12 at 13:47)
  • I'm hoping that the 2nd edition of RMS will be available in just over a year. (Aug 12 '12 at 13:48)
  • I'm not familiar with that world but I suspect that scientists have more freedom than they think. (Aug 12 '12 at 13:49)
  • There is nothing that needs to be done to redo regulatory approval for the sake of switching to R. (Aug 11 '12 at 18:58)
  • What is the alternative to downloading a package that provides new capabilities (as most R packages do)? Is it to home grow those capabilities? Is that more reliable? (Aug 11 '12 at 18:59)
  • SAS comes with the same warranty as R: none. (Jan 8 '13 at 13:26)
  • Yes, people have to some extent discover R on their on. But much of the issue comes down to inertia of learning a new language. New languages are always coming out that have advantages over older languages yet users cling to the old languages (witness COBOL). Programming in SAS is hugely inefficient, requiring perhaps double the number of programmers to do the same job as R, but SAS experts are happy to hum along on their merry way and companies are afraid of the kind of disruption that would save them millions of dollars in salaries. (Jan 8 '13 at 13:33)
  • I don't follow your reasoning. The amount of money wasted paying programmers to program in an archaic language (SAS) vs. modern free languages is stunning. (Apr 15 '13 at 15:25)
  • Having used SAS for 23 years and S-Plus/R for 22 years I can say that a highly experienced SAS programmer can be highly productive, but that an experienced R programmer can be easily three times as productive. (Jun 10 '13 at 3:18)
Among the comments the following anti open source remark from the SAS representative was most remarkable and received an uproar. You should not miss reading the links provided.

... I think the worst anti open source quote I've heard was from SAS saying soemthing like 'would you trust a jumbo jet designed in open source, an engine might drop off'(PaulHurleyuk, Aug 7 '12 at 12:37)

@PaulHurleyuk: +1 The quote was “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” by a SAS marketing director in this New York Times article on R. The SAS representative clarified her remarks in a later blog post. (jthetzel, Aug 7 '12 at 12:53)

Note: The first link in above quote doesn't work. I looked for and found it here.

As usual, I would say the best way will be to read all these and more and then make up your mind on your own. As poor small guys we hardly need a second look.




Tuesday, September 22, 2015

Thakhin spirit and expansion to thousand lights


Only yesterday a friend told me that some young people from his nonpartisan research unit recently have the opportunity to learn SPSS. SPSS, as you know, is the Statistical Package for the Social Sciences and is close to a household word in Myanmar with people in some way connected with social sciences, or surveys, or statistics. Most of the time when I mentioned R, a complete environment for statistical computing, or CSPro, a survey data processing software, they would just ask: how does it compare with SPSS?

The point of the story is that this friend told me that the training was offered by some professionals from an industrial powerhouse nation in Asia, entirely free, and I suppose—with no strings attached. However, I was a bit worried. We have seen that when we began opening up not too long ago, a lot of swindlers disguised as businessmen came in by swarms (to make it a little more dramatic). I don't know how much they've squeezed out of our people, but later searching on the Web with the clues I heard of, I could identify that they were using the pyramid and ponzi schemes; may be more. I suspect they are still lurking somewhere.

Anyway I was carried away talking about our people being cheated. In the present context relating to our friend I am positive they are receiving a genuine transfer of know-how. Nevertheless, my concern with it is sustainability and the prospect for an expansion of the knowledge base. We should be aware that while learning to use SPSS or any other commercial software could be made free, the software itself is not free and successive sharing of knowledge would call for proportional expansion of financial resources. While such software could be free in the sense that some funding agency undertakes to license it for you for the time being, clearly the agency could not do it indefinitely, or respond to ever growing needs.

So, you have two options: (A) when the honey moon is over, or when you want to expand a significant number of computers on which SPSS has been installed initially, you could resort to using pirated copies of the software, or (B) use some open-source or free software from the beginning. Looking at the price list of SPSS just now, I found a lot of complicated arrangements for licensing. As far as I could understand, there are four package configurations with starting prices in (US dollars per user): Base ($1140) ; Standard ($2530); Professional ($5090); and Premium ($7590). With each of them it seems you get software support for only 12 months, but you could use the software indefinitely. Here, it should be noted that for complex samples, which virtually is for all sample surveys, you need to use the complex samples module for data analysis. Among the packages mentioned, only the Premium package came included with it. For others, license per user for indefinite time will cost an extra US$1450.

Let's do a little bit of calculation. Let's assume also that you need at least 10 of your computers installed with SPSS to be realistically operational. Then we have (i) Base + Complex Samples, total cost US$25,900 and (ii) Premium, total cost US$75,900. Well, these don't look much for an international donor, right? But think again. If you are mandated to share, or you intend by yourselves to share knowledge with people outside of your organization, that won't work.

This reminds me of a political joke of many years ago about a zawgyi, a mythical magician who at the height of his powers could fly in the air or bore through the earth. Apparently, this particular rookie zawgyi couldn't quite finish building up his magic. So he ended up neither flying nor walking but hovering at a man's height in midair.


For R, base module and any or all packages is free. For example, the “Survey” package is a dedicated package for the analysis of complex samples and you could download it any time you want to.

Obviously, my choice is for the option(B) and I have been advocating the R statistical environment in my earlier posts:

  • Spigot algorithm for calculation of pi (in Teashop PI-I)
  • Pigeon half, five-for-duck, quarter a-sparrow
  • An Unclaimed CD on Psychometrics with R or Intro to Anything with R
  • Big data: small guys could do it?
  • Big data: hands-on correlation, old and new
  • Correlates of labor productivity growth
  • Blind leading the 20/20
  • Econometrics for the Masses, Blind Boy, and Courage
  • Fooling around and having fun with PVT

I call the option (B) mentioned earlier as “Thakhin spirited” and the open-source model an expansion to thousand lights model, as lighting thousand candles from a single source would only increase the sum total of available lights and won't take away the light from the donor candle.

On the other hand, if you like option (A), take it, and then you may like to name it yourselves.  

Sunday, September 20, 2015

Myanmar Land tidbits: Did we miss the boat?


We used to have our headquarters on the middle floor of the Gandhi Hall building on the corner of Merchant Street and Bo Aung Gyaw Street in downtown Yangon. Apart from my other duties, I had to take care of the library. It is not much of a library though and consisted only of four or five large book cases lined across the hallway. Those days I was quite familiar with the contents of these bookcases but would be at a loss to describe them now, save for one.

It was a lather bound book three fingers thick and the title on the book says “Torrens System”. It was in the style of leather binding we find with religious books the book binders on the steps of our great Shwedagon pagoda used to make. I have no doubt therefore that it must have been a priced collection some time before, probably when the British Settlement Officers or the Commissioner were still in office. I went through the book and thought I was able to grasp and appreciate the idea behind the Torrens System. Thinking about its contents now, I was impressed most of all with the spirit of Torrens to make land transactions easy in contrast to the deeds registration system, its principle of indefeasibility of the title, and the attending cadastral system that could accurately reconstruct the boundaries of a given land holding on the ground in case of disputes.

Talking of the Torrens system, I was really surprised when a senior agricultural economist turned political economist, a Myanmar living in Down Under, told me that we have in fact the Torrens system in Myanmar. In my experience of working in the government agency that specifically deals with land administration including assessment of agricultural land tax, maintaining cadastral maps and registers, collecting agricultural statistics and handling land disputes I had never heard of or read about our cadastral system being seen as a Torrens system. I had worked there for 26 years, half of that in the districts and the other half at our headquarters in Yangon.

This friend told me that I could find the reference to the Torrens system in Maung Htin's well known work “Myanma-le-yar-myay-sanit”(Agricultural land system of Myanmar) and if I heard him right, he said this system was used particularly in the “Colony lands”. I was doubly surprised because I am quite familiar with this work and I was definite I didn't notice anything about the Torrens system in there. Afterward I looked for Maung Htin's book, read through it carefully, and yet couldn't find anything of Torrens!

Later, looking for the possible source of reference for Torrens system in Myanmar I found the following in Housing, Land, and Property Rights in Burma, 2004, by Nancy Hudson-Rodd:

The Land Records and Settlement Department in Burma adopted a modified Torrens System of land registration, for all areas settled by the colonial state. British. Burma was conquered in two stages, 1826 Lower Burma and 1886 in Upper Burma, becoming a colony of the British Empire. To suit these different jurisdictions, the Land and Revenue Act 1874 and the Upper Burma Land Revenue Act 1889 were two acts that effected the imposition of a tax to cover the cost of administration and governance by the British colonial government on settled and alienated land in both Lower and Upper Burma. Legal control and classification of land in Burma was initiated by the British in 1876 as part of their introduction of a revenue collection and taxation system. Cadastral surveys were conducted to classify all land according to ownership and use.” (p. 18)

Consulting resources on Torrens system on the Web, as of now, shows that Thailand, Malaysia, Singapore, and Philippines are using the system. A survey on the earlier adoption of the system by J E Hogg entitled Registration of the Title to Land Throughout the Empire, 1920, cited 17 statutes including that of “Federated Malay States”. However, there was nothing on “Burma” as I hastily looked through it.

Going back to Nancy Hudson-Rodd's statement, historical evidence of Myanmar shows that cadastral surveys initiated earlier on holding basis were superseded in 1878 “by field to field surveys on professional lines followed up by regular settlements.” According to Wikipedia entry on “Torrens Title”, the system originated in 1858 in South Australia:

A boom in land speculation and a haphazard grant system resulted in the loss of over 75% of the 40,000 land grants issued in the colony (now state) of South Australia in the early 1800s. To resolve the deficiencies of the common law and deeds registration system, Robert Torrens, a member of the colony's House of Assembly, proposed a new title system in 1858, and it was quickly adopted. The Torrens title system was based on a central registry of all the land in the jurisdiction of South Australia, embodied in the Real Property Act 1886 (SA).”

Recalling that by the time cadastral surveys on professional lines were adopted in Myanmar in 1878, the Torrens system had already been in place in South Australia for 20 years, and so it seems unthinkable that the colonial professionals taking care of cadastral surveys in Myanmar would have been entirely ignorant of the Torrens system. However, it is truly odd that as far as I can ascertain, no historical documents on land revenue administration in Myanmar ever mentioned the Torrens system. Besides, the cadastral system in Myanmar has not been significantly changed from those days till now. From my personal experience, I had never known any of my seniors or juniors ever discussing anything on the Torrens system and I may safely boast that I could have been the only one around that time who had looked through the Torrens book I talked about.

Perhaps Hudson-Rodd was passing her judgment on the characteristics of the rural land registration system in Myanmar as “Torrens like” and not meant to say about its origins. Perhaps my elder economist friend, a collaborator of Hudson-Rodd, has misread Maung Htin. Or was it a quirk of memory lapse?


To me, the real issue is that whether we would call the current system “Torrens like”, “Embryonic Torrens”, or by any other name, we should be doing a reality check. Should we not critically examine the successes of the Torrens system as practiced in Thailand, Malaysia, Singapore, and Philippines to see if we have missed the boat and act accordingly?   

Friday, September 18, 2015

Myanmar Land tidbits


When I came across the claim that “The British did not gave full proprietorship title to land therefore they called the dues collected on land as Land Revenue instead of Land Tax” I wasn't satisfied. I though it must have been just a play of words. To me revenue sounded like referring to what the government got out of the taxation process, while tax is the burden that fall on taxpayers. Nevertheless, it was the conventional wisdom among fellow officers, based on that assertion, that land ownership recognized by the British Government had been some form of inferior ownership. It was some twenty-five years since I had left that government agency, yet some of the papers written by my younger friends in recent times still carried that assertion, without scrutiny, as truth.

I thought I found this assertion in a booklet or a report by some high official of the Land Nationalization Department while I was a government employee. I am not sure, though. Looking back, I wonder if it carries the overtones of the ruling BSPP (Burmese Socialist Programme Party). Too bad I didn't discuss the merit of this assertion with my seniors. Anyway, most of us would have been wise enough those days not to be inquisitive.

By luck I happened to call up one of my retired younger co-workers a few days ago and he was able to email me a scanned page of the source for that assertion. The following excerpt was from the booklet explaining the prevailing settlement procedures for fixing rates of land revenue with the sponsorship of the Revolutionary Government. The booklet was distributed by the Settlements and Land Records Department in 1966.


It reads:
2. At the times of the British government, they did not give full possession of land (proprietorship) to the people. It was rather the right to hold land (Land Holder's Right). If it were proprietorship, the dues collectable on land has to be “Tax on Land”. If the people were treated only as tenants, the dues has to be called “Rent”. As the right on land given to people by the British government was not as good as proprietary, but still better than the mere rights of tenants, neither the term “Tax on Land” nor “Rent” was used and the compromise “Land Revenue” was coined. That was how land revenue came to exist.”

The concluding words of the excerpt seems to say that the the term Land Revenue originated in Myanmar, thanks to the ingenuity of the British administrators. However the British had used this term in India before us for the purpose of land taxation. This is from Full text of "Report Of The Land Revenue Commission Bengal Vol I".

14. ... All Governments in India have considered themselves entitled to a share of the produce, and 
this share of the produce, whether collected direct, or through farmers of revenue, or through 
subordinates or intermediate landlords, is called "land revenue".

On the other hand, the notion of Land Revenue as some halfway concept is easily disproved by this exceprt [The Land and Revenue Act (India Act II, 1876), in The Lower Burma Land Revenue Manual, 1945]:


Here we can see that instead of collecting land revenue for taungya-cultivation (slash and burn cultivation) “tax” will be collected. In this connection, we could easily see that tenure for the land used for taungya was hardly held with a full proprietorship, yet it was called “tax”.

As for whether Land Holder's Right is a proprietorship title would have been a deep and controversial topic. I guess it would have been hotly debated by Myanmar intellectuals and activists at least in the latter part of colonial rule and particularly after Myanmar's independence. Students of Myanmar land systems and historians would have something concrete to say on this topic.

Here, I am aware of the accounts in which Buddhist monks contested the King's confiscation of their religious lands in the Bagan period and won at the courts or tribunals. These episodes could be found in historian Dr Than Tun's Studies in Burmese History Number One, 1969 (pp. 164-166). They seem to signify that the King is not “The Lord paramount over, and the chief proprietor of, the soil” in Myanmar, at least in relation to religious lands.

A stronger judgment is from a younger generation of Myanmar historians. This is from Google Book entry on Thant Myint-U's “The making of modern Burma”:


I assume “A structure of genuinely private ownership, entirely free of gentry or aristocratic control or involvement” effectively means “allodial title”, may be with some restrictions.

In contrast, in page-148 of BSPP's publication “History of Myanmar Land, vol. 1” of 1970, it was stated that “right to ownership in land in reality was a mere right to hold land”.


While in page-85, it was stated that “Nevertheless, under the 1876 Land and Revenue Act, squatters who had worked their lands for 12 years without interruption are entitled to possess the land. So long as they pay land revenue regularly, government cannot evict them. Moreover, such an owner has the right to treat the land as his or her privately owned land and use it as he or she wishes.”


To me, these two statements look contradictory.