[vox-tech] Re: vox-tech Digest, Vol 14, Issue 11

Norm Matloff matloff at cs.ucdavis.edu
Wed Jul 13 12:31:04 PDT 2005


> Date: Tue, 12 Jul 2005 12:18:58 -0700
> From: Norm Matloff <matloff at cs.ucdavis.edu>
 
> In my opinion, as someone with a background in both CS and statistics
> (I'm a former statistics professor), R is the best package, either open
> source or commercial.  (I even consider it a little better than S-Plus, a
> commercial product which it is closely related to.)  It is statistically
> correct (which arguably SPSS is not), and it is a general programming language
> something people on this list can relate to.  

> Date: Tue, 12 Jul 2005 12:33:21 -0700
> From: Sameer Verma <sverma at sfsu.edu>

> Hi Norm,
> Can you speculate on why SPSS is not statistically correct? I have 
> rarely used SPSS (I use SAS), but curiosity just killed the cat :-)

The reputation of several popular statistical packages over the years
has been that they have not generally been developed by "hard core"
statisticians.  Instead, they've been developed by programmers who are
blindly applying statistical formulas they see in books.  This
reputation is possibly a bit exaggerated, but I believe it is largely
correct.

There are two problems that can arise from this:

   (a) The set of statistical operations offered by the packages does
       not include more modern, insight-producing statistical 
       methodology. 

   (b) The accuracy, in the numerical analysis sense, of the packages
       is often poor, which in some cases can produce serious
       distortions in the statistical analysis.

For example, concerning (b), see the work by Bruce McCollough, an
economist with the FCC, which found both SPSS and SAS to be inferior to
S+, which as I said is the commercial sister of R.  (S+ basically adds a
GUI, though there are some other differences too.)

S (the original pre-GUI name of S+) was developed at Bell Labs by some
big names in statistics at Bell Labs.  Some academic statisticians then
developed R as an open source version, and has been further developed 
with contributions from statisticians all over the world, some of them
very prominent people.  

So, R is generally considered to be much sounder on a purely statistical
basis then SPSS, SAS, Stata, etc.  Meanwhile, it is considered much
superior from a computer science point of view as well.  Its Bell Labs
genesis (the name S was a play on C) meant that it was designed to be a
usable programming language, not just a collection of stat routines.  

R/S+ is object-oriented.  For example, when one performs a statistical
operation, the return value is an object, whose member fields contain
the various aspects of the results.  This makes it easy to feed the
output of one operation into another one.  

As a general purpose programming language, R/S+ is somewhat similar to
Python.  It has both interactive and batch modes; in interactive mode,
one can print any object simply by typing its name; one can have named
arguments in function calls; etc.  Of course, it doesn't have the
elegance of Python, but for statistical applications its programmability
is really a delight.

R/S+ has very nice graphics capabilities.  This should be no surprise,
in light of the fact that the original authors of S at Bell Labs were
pioneers in the statistical graphics field.

R/S+ is multiplatform (the various Unixes, Windows, Macs).

Norm



More information about the vox-tech mailing list