Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Thursday, October 8, 2009

Android's certainly uncertain future

In a week of CTIA-related mobile platform announcements, industry analyst Garner predicts that in the next 39 months, Android will rise from less than 2% market share to 14% market share, becoming 2nd in the global market after Symbian.

I don’t buy it: 14% seems plausible, but I think it implausible to assume that neither iPhone nor BlackBerry will get to 14% by then, given their strong recent growth. Also, by some estimates RIM is already well above 14%.

But then, there is the spurious precision of the Garner prediction for the year 2012 years from now that’s typical of the genre — which has Android edging out the iPhone, Blackberry and Windows Mobile by a fraction of a % in 2012 (not 2011 or 2013).

As Computerworld reported:

The complete Gartner forecast for smartphone OSes by the end of 2012 puts Symbian on top with 203 million devices sold, and 39% of the market. Android will be second with nearly 76 million units sold, and 14.5% of the market.

Coming in a close third, the iPhone will ship on 71.5 million devices in 2012, giving a 13.7% market share. Windows Mobile will finish fourth, with 66.8 million units sold, or 12.8% of the market.

Very close behind Windows Mobile, the BlackBerry OS will sell on 65.25 million devices in 2012, Gartner forecasts, making it fifth with 12.5% market share.

Various Linux devices will sell 28 million units, at 5.4% market share, in sixth place. Palm Inc.'s webOS will sell on 11 million units in 2012, about 2.1% of the market, in seventh place, Gartner says.
Why 62.25 million? Not 62.3 million or 61.9 million? This sort of precision is GIGO.

Last year, Gartner said Android will get 10% share in 2011. At least that’s an estimate of a single significant digit, without the pseudo-precision.

In the end, what was published is just a guess — maybe more of a WAG than a SWAG. It doesn’t take an industry analyst to predicutt aht Android will grow rapidly, but how fast and what the natural limit is unclear.

This also points out the stupidity of point estaimtes. If we accept the calculation as an unbiased estimate, then it is more reasonable to say Android will have 8-20% market share in 2012.

Saturday, August 15, 2009

The 62.5% solution

The Brits are mad that U.S. conservatives and libertarians are holding up their National Health Service as an example of what will happen to America under Obamacare. (As the past year has demonstrated, both the British and Germans are as nationalistic as the French or Americans when their countries are criticized by outsiders.)

As part of the self-organized British response, you will find that the talking point

The UK spends less per head on healthcare but has a higher life expectancy than the U.S.. The World Health Organisation ranks Britain's healthcare as 18th in the world, while the U.S. is in 37th place.
has shown up at least 400 times on the web.

Checking the 2000 WHO study, sure enough the US ranks #37 after Finland, Australia and Denmark and barely ahead of Cuba and New Zealand. But what do the rankings mean? There are five criteria:
  1. Health Level: 25 percent
  2. Health Distribution: 25 percent
  3. Responsiveness: 12.5 percent
  4. Responsiveness Distribution: 12.5 percent
  5. Financial Fairness: 25 percent
The libertarian flagship thinktank, Cato, has compiled a number of commentaries on the limitations of the WHO rankings, as well as a detailed report. I won’t rehash all the arguments here, but let me pick up two quick points.

As Cato analyst Glen Whitman notes, three of the measures — financial fairness, health distribution and responsiveness distribution — are about equity rather than about quality outcomes (either at the mean or even at the minimum). These measures are weighted 62.5% of the total. A bad but fair system would rank ahead of a good but unfair system for 5/8ths of the WHO points — so without a calculation eliminating those weightings, we don’t know what effect they have on the final result.

Whitman also observes:
The WHO rankings have also been adjusted to reflect efficiency: how well a country is doing relative to how much it spends. In the media, however, this distinction is often lost.

Costa Rica ranks higher than the United States (number 36 versus number 37), but that does not mean Costa Ricans get better healthcare than Americans. Americans most likely get better healthcare -- just not as much better as could be expected given how much we spend. If the question is health outcomes alone, without reference to spending, we should look at the unadjusted ranking, where the U.S. is number 15 and Costa Rica is number 45.
So saying “UK spends less per head” is double-counting. Without adjusting for efficiency, the UK is #9 and the US #15.

There are also other problems with the comparisons. As Dr. Ronald Wenger wrote last month:
Review of recent literature suggests that life expectancy is a poor statistic for determining the quality of a health care system because many people actually die with minimal interaction with the health care system (in auto accidents, homicide, and sudden death).

According to a 2007 article in the New England Journal of Medicine, only 10 percent of premature deaths in the U.S. are related to the health care system. The great majority (85 percent) of premature deaths are related to human behavior, genetic predisposition, and social circumstance.
Wenger also makes two other interesting points. First, Japanese-Americans in the US have life expectancy similar to Japanese living in Japan (which has the highest life expectancy in the world.)

Secondly, America spends a disproportionate share of its healthcare dollars on detecting and treating cancer, but even if all cancer deaths eliminated, US life expectancy would only increase by 2.4-3.0 years. So America values saving cancer patients far beyond any economically rational cost-benefit analysis for the current generation. (Of course, if treatments become more efficient and effective, the benefits may be realized by the whole world a generation later).

Thus, it’s impossible to directly compare the results of two vastly dissimilar trillion-dollar healthcare systems. All sorts of value judgements enter in to making adjustments for comparability, making the final result more subjective than anyone is willing to admit. (And the assumptions don’t seem to make it into the talking points.)

As with other aspects of life, there are three kinds of lies: “lies, damned lies, and statistics.”

Wednesday, July 29, 2009

IBM buying SPSS, but why now?

IBM is spending $1.2b to buy Chicago-based SPSS, one of the three major statistics and data mining software companies. The cash offer is a 42% premium to its previous close and 2.6x anticipated revenues.

Most academics know SPSS for its eponymous statistics software that is the standard for psychologists and other social scientists. Its main rivals in this area are SAS Institute — used by high-end data miners — and StataCorp, the favorite of economists. On a personal note, I dumped SPSS in 1996 when they abandoned the Mac (causing me to establish the MacStats web page), switching to Stata which I found more intuitive and easier to use.

SPSS has been following SAS into the data analytics segment for business since that’s a much bigger market than selling scientists statistics software. Personally, I think SAS is a tough competitor since they created this market and with 2008 revenues of $2.2b, are 5x as big as SPSS. More importantly, the privately held Cary, NC company has an admirable corporate culture that Google once studied to understand how to motivate and empower technical professionals.

IBM has a mixed record on software. They have a very successful software arm, and so (unlike Intel spending $884m to buy WindRiver) they understand the creation and sale of software. On the other hand, IBM’s largest software acquisition, spending $3.5b in 1995 to buy Lotus Development (instead of Apple) that turned out to be a declining business.

SPSS will have formidable competitors. SAS has rejected acquisition feelers with CEO/founder Jim Goodnight growling that “IBM and SAP acquire because they're so stagnant they're unable to grow themselves.”

More seriously, SPSS has a major open source competitor that’s gaining favor here among Silicon Valley dataminers (including at Google). The R software package was begun in 1996 as an open source knock-off to S from Bell Labs, and with nearly 2000 donated extensions, has the most vibrant third party community of any data analytics package. The popularity of the R platform has exploded in the past 4 or 5 years, and certainly SPSS must be feeling the competition.

So is this another example of IBM (as with Lotus) buying a software business too late? Or (also as with Lotus) is the value of integrating and aggregating the SPSS solutions with its other software and services create a value for the SPSS software that would not be available to a stand-alone company?

Thursday, January 15, 2009

R challenge to proprietary stat software

The NY Times published a glowing story Tuesday (with a follow up blog posting Wednesday) on the success of R, an open source project that has grown to fill most statistical software needs.

R was launched in 1996 as a knockoff of the Bell Labs statistical programming language, S. As with Apache or Perl, much of the value comes from add-on packages, and it has grown a remarkable library of donated packages stored at CRAN, which is modeled on Perl’s CPAN.

I first came across R when running the MacStats website in the late 1990s, and recommended it to fellow academics (interested in stat software and notoriously cheap) back in August 2000.

From an economic or organizational standpoint, R is just a new act of the original open source story: user-innovators solving their own problems. Or, as Eric Raymond observed a decade ago, good software (especially open source software) comes from “scratching a developer's personal itch.” That scratching gave us Project GNU, with programming language compilers, a text editor, and gradually bits and pieces of an operating system.

Once upon a time, statisticians had to write their own Fortran programs to solve their analyses. Even today, most have better math and computer skills than the average college graduate.

So R — as with compilers — had a large pool of potential users who could write their own code. (Unlike, say, those who write children’s edutainment software). Also, university professors have autonomy over use of their time — organizational slack — but often not a lot of discretionary cash. So spending a few days to write a library — rather than buying a $100 or $500 off-the-shelf package — made certain economic sense.

When I was first evaluating R in 2000, the problem was the lack of a GUI. Statistics teachers often could program but undergraduate psych (or business) students could not, nor would they be keen on navigating a line-oriented program.

To make a GUI solution available, R had a Windows version, and started on Mac OS X with an X11 (Unix workstation GUI) implementation. Now it has a native UI version for OS X, in addition to Windows, Linux (4 flavors) and Solaris. It has scientific, social science, probability and domain-specific statistical packages contributed by users. Where once social scientists fought to find any implementation of partial least squares — since Herman Wold, the implementor of the original PLS package, died in 1992 — there are now at least 4 PLS packages available (free) for R.

When I was recommending R back in 2000, it was rough but obviously ambitious in its goals. It’s gratifying to see how it’s evolved to success (and fame), even if it doesn’t teach us anything new about strategies for growing autonomous open source communities.

Saturday, May 17, 2008

Lies, dam lies, and advertisements

A phrase attributed to Disraeli (or perhaps Twain) is “'There are three kinds of lies: lies, damned lies, and statistics.”

It was the key quote from one of my favorite books of my childhood math geek days: Darrell Huff’s classic How to Lie With Statistics, which (Wikipedia claims) is the most widely read statistical text of the past 50 years. Alas, despite Huff’s wide distribution, such lies (such as truncated graphs) remain popular, particularly in the popular press.

Lying seems to be taken for granted in advertising. People can say things that aren’t true (“I lost 20 pounds in one week”) because that’s marketing license and thus people discount such claims. Still, the FTC has rules that says ads are deceptive if by omitting key information, a reasonable person would be misled. So nowadays the trick is to add small fine print, briefly flashed on the screen.

Even within this context, the AT&T ad of the past several months has been bothering me. It claims “best coverage”. But of course, that’s not true - according to Consumer Reports, AT&T places fourth after Verizon, Alltel and T-Mobile. Call quality is not a new problem for AT&T Wireless (née Cingular).

The footnote says “based on global coverage.” (Of course, AT&T has no coverage outside the U.S., just roaming agreements with other carriers). So for the fraction of minutes used by the 1% of Americans who use cellphones overseas, you might get better cell phone coverage. That’s assuming you’re willing to pay outrageous roaming rates, although it’s not clear how AT&T has better coverage than T-Mobile or a dual-mode CDMA phone. Meanwhile, the claim that a dad will get better coverage on lover’s lane because his AT&T phone roams to London is not misleading, it’s a lie.

The other lie — a new one this weekend that pushed me over the edge — is the claim that the latest Narnia movie is “even better than the first.” The first cognitive disconnect came with the review in my morning paper, which called it cliche and predictable due to hollow characters and wooden acting. However, the weekend onslaught of Disney ads is claiming Prince Caspian is “triumphant” and “the must-see film of 2008”. This is traditional movie hype, but when I can’t read the names of the critics or the periodical on my 26" TV, I got even more suspicious.

Sure enough, the only recognizable publication among the list of favorable “Caspian” quotes was CNN. Except that critic Gorman Woodfin is not a critic at CNN (founded by Ted Turner who called Christians “losers”), but instead is at CBN (founded by Pat , who wants Christians to take over the country). Given the status of C.S. Lewis as an iconic Christian philosopher and the Narinia novels as Christian allegory, the difference matters.

In strategy, we often expect ethical corner-cutting from schlocky little companies (or young high-growth companies like Worldcomm). Here AT&T and Disney are just the opposite. Both are Fortune 100 companies, and Disney is a top 10 global brand, even if the US-only AT&T is not.

So is this ethical decay in the executive suite? Another rationalization that “everybody does it”? I don’t know the explanation, but it’s not encouraging.