Congratulations! You've just begun your quest to become an R programmer. So you don't pull any mental muscles, this chapter starts you off gently with a nice warm-up. Before you begin coding, we're going to talk about what R is, and how to install it and begin working with it. Then you'll try writing your first program and learn how to get help.
After reading this chapter, you should:
- Know some things that you can use R to do
- Know how to install R and an IDE to work with it
- Be able to write a simple program in R
- Know how to get help in R
What Is R?
Just to confuse you, R refers to two things. There is R, the programming language, and R, the piece of software that you use to run programs written in R. Fortunately, most of the time it should be clear from the context which R is being referred to.
R (the language) was created in the early 1990s by Ross Ihaka and Robert Gentleman, then both working at the University of Auckland. It is based upon the S language that was developed at Bell Laboratories in the 1970s, primarily by John Chambers. R (the software) is a GNU project, reflecting its status as important free and open source software. Both the language and the software are now developed by a group of (currently) 20 people known as the R Core Team.
The fact that R's history dates back to the 1970s is important, because it has evolved over the decades, rather than having been designed from scratch (contrast this with, for example, Microsoft's .NET Framework, which has a much more "created" feel). As with life-forms, the process of evolution has led to some quirks and inconsistencies. The upside of the more free-form nature of R (and the free license in particular) is that if you don't like how something in R is done, you can write a package to make it do things the way that you want. Many people have already done that, and the common question now is not "Can I do this in R?" but "Which of the three implementations should I use?"
R is an interpreted language (sometimes called a scripting language), which means that your code doesn't need to be compiled before you run it. It is a high-level language in that you don't have access to the inner workings of the computer you are running your code on; everything is pitched toward helping you analyze data.
R supports a mixture of programming paradigms. At its core, it is an imperative language (you write a script that does one calculation after another), but it also supports object-oriented programming (data and functions are combined inside classes) and functional programming (functions are first-class objects; you treat them like any other variable, and you can call them recursively). This mix of programming styles means that R code can bear a lot of similarity to several other languages. The curly braces mean that you can write imperative code that looks like C (but the vectorized nature of R that we'll discuss in Chapter 2 means that you have fewer loops). If you use reference classes, then you can write object-oriented code that looks a bit like C# or Java. The functional programming constructs are Lisp-inspired (the variable-scoping rules are taken from the Lisp dialect, Scheme), but there are fewer brackets. All this is a roundabout way of saying that R follows the http://bit.ly/148zbcF[Perl ethos]:
There is more than one way to do it.
If you are using a Linux machine, then it is likely that your package manager will have R available, though possibly not the latest version. For everyone else, to install R you must first go to http://www.r-project.org. Don't be deceived by the slightly archaic website; it doesn't reflect on the quality of R. Click the link that says http://cran.r-project.org/mirrors.html["download R"] in the "Getting Started" pane at the bottom of the page.
Once you've chosen a mirror close to you, choose a link in the "Download and Install R" pane at the top of the page that's appropriate to your operating system. After that there are one or two OS-specific clicks that you need to make to get to the download.
If you are a Windows user who doesn't like clicking, there is a cheeky shortcut to the setup file at _http:///bin/windows/base/release.htm.
Choosing an IDE
If you use R under Windows or Mac OS X, then a graphical user interface (GUI) is available to you. This consists of a command-line interpreter, facilities for displaying plots and help pages, and a basic text editor. It is perfectly possible to use R in this way, but for serious coding you'll at least want to use a more powerful text editor. There are countless text editors for programmers; if you already have a favorite, then take a look to see if you can get syntax highlighting of R code for it.
If you aren't already wedded to a particular editor, then I suggest that you'll get the best experience of R by using an integrated development environment (IDE). Using an IDE rather than a separate text editor gives you the benefit of only using one piece of software rather than two. You get all the facilities of the stock R GUI, but with a better editor, and in some cases things like integrated version control.
The following sections introduce five popular choices, but this is by no means an exhaustive list (a few additional suggestions follow). It is worth trying several IDEs; a development environment is a piece of software that you could be spending thousands of hours using, so it's worth taking the time to find one that you like. A few additional suggestions follow this selection.
Emacs + ESS
Although Emacs calls itself a text editor, 36 years (and counting) of development have given it an unparalleled number of features. If you've been programming for any substantial length of time, you probably already know whether or not you want to use it. Converts swear by its limitless customizability and raw editing power; others complain that it overcomplicates things and that the key chords give them repetitive strain injury. There is certainly a steep learning curve, so be willing to spend a month or two getting used to it. The other big benefit is that Emacs is not R-specific, so you can use it for programming in many languages. The original version of Emacs is (like R) a GNU project, available from http://www.gnu.org/software/emacs/.
Another popular fork is XEmacs, available from http://www.xemacs.org/.
Emacs Speaks Statistics (ESS) is an add-on for Emacs that assists you in writing R code. Actually, it works with S-Plus, SAS, and Stata, too, so you can write statistical code with whichever package you like (choose R!). Several of the authors of ESS are also R Core Team members, so you are guaranteed good integration with R. It is available through the Emacs package management system, or you can download it from http://ess.r-project.org/.
Use it if you want to write code in multiple languages, you want the most powerful editor available, and you are fearless with learning curves.
Eclipse is another cross-platform IDE, widely used in the Java community. Like Emacs, it is very powerful, and its plug-in system makes it highly customizable. The learning curve is shallower, though, and it allows for more pointing and clicking than the heavily keyboard-driven Emacs.
Architect is an R-oriented variant of Eclipse developed by statistics consultancy Open Analytics. It includes the StatET plug-in for integration with R, including a debugger that is superior to the one built into R GUI. Download it from http://www.openanalytics.eu/downloads/architect.
Alternatively, you can get the standard Eclipse IDE from http://eclipse.org and use its package manager to download the StatET plug-in from http://www.walware.de/goto/statet.
Use it if you want to write code in multiple languages, you don't have time to learn Emacs, and you don't mind a several-hundred-megabyte install.
RStudio is an R-specific IDE. That means that you lose the ability to code (easily) in multiple languages, but you do get some features especially for R. For example, the plot windows are better than the R GUI originals, and there are facilities for publishing code. The editor is more basic than either Emacs or Eclipse, but it's good enough for most purposes, and is easier to get started with than the other two. RStudio's party trick is that you can run it remotely through a browser, so you can run R on a powerful server, then access it from a netbook (or smartphone) without loss of computational power. Download it from http://www.rstudio.org.
Use it if you mainly write R code, don't need advanced editor features, and want a shallow learning curve or the ability to run remotely.
Revolution-R comes in two flavors: the free (as in beer) community edition and the paid-for enterprise edition. Both take a different tack from the other IDEs mentioned so far: whereas Emacs, Eclipse, and RStudio are pure graphical frontends that let you connect to any version of R, Revolution-R ships with its own customized version of R. Typically this is a stable release, one or two versions back from the most current. It also has some enhancements for working with big data, and some enterprise-related features. Download it from http://www.revolutionanalytics.com/products/revolution-r.php.
Use it if you mainly write R code, you work with big data or want a paid support contract, or you require extra stability in your R platform.
Live-R is a new player, in invite-only beta at the time this book is going to press. It provides an IDE for R as a web application. This avoids all the hassle of installing software on your machine and, like RStudio's remote installation, gives you the ability to run R calculations from an underpowered machine. Live-R also includes a number of features for collaboration, including a shared editor and code publishing, as well as some admin tools for running courses based upon R. The main downside is that not all the add-on packages for R are available; you are currently limited to about 200 or so that are compatible with the web application. Sign up at http://live-analytics.com/.
Use it if you mainly write R code, don't want to install any software, or want to teach a class based upon R.
Other IDEs and Editors
There are many more editors that you can use to write R code. Here's a quick roundup of a few more possibilities:
- http://rforge.net/JGR[JGR] [pronounced ``Jaguar''] is a Java-based GUI for R, essentially a souped-up version of the stock R GUI.
- http://www.sciviews.org/Tinn-R is a fork of the editor TINN that has extensions specifically to help you write R code.
- http://www.sciviews.org/SciViews-K, from the same team that makes Tinn-R, is an extension for the Komodo IDE to work with R.
- http://www.vim.org/scripts/script.php?script_id=2628 is a plug-in for Vim that provides R integration.
- http://sourceforge.net/projects/npptor plugs into Notepad++ to give R integration.
Your First Program
It is a law of programming books that the first example shall be a program to print the phrase "Hello world!" In R that's really boring, since you just type "Hello world!" at the command prompt, and it will parrot it back to you. Instead, we're going to write the simplest statistical program possible.
Open up R GUI, or whichever IDE you've decided to use, find the command prompt (in the code editor window), and type:
Hit Enter to run the line of code. Hopefully, you'll get the answer +3+. As you might have guessed, this code is calculating the arithmetic mean of the numbers from 1 to 5. The colon operator, +:+, creates a sequence of numbers from the first number, in this case 1, to the second number (5), each separated by 1. The resulting sequence is called a vector. mean is a function (that calculates the arithmetic mean), and the vector that we enclose inside the parentheses is called an argument to the function.
Well done! You've calculated a statistic using R.
In R GUI and most of the IDEs mentioned here, you can press the up arrow key to cycle back through previous commands.
How to Get Help in R
Before you get started writing R code, the most important thing to know is how to get help. There are lots of ways to do this. Firstly, if you want help on a function or a dataset that you know the name of, type +?+ followed by the name of the function. To find functions, type two question marks (+??+) followed by a keyword related to the problem to search. Special characters, reserved words, and multiword search terms need enclosing in double or single quotes. For example:
?mean #opens the help page for the mean function ?"+" #opens the help page for addition ?"if" #opens the help page for if, used for branching code ??plotting #searches for topics containing words like "plotting" ??"regression model" #searches for topics containing phrases like this
That # symbol denotes a comment. It means that R will ignore the rest of the line. Use comments to document your code, so that you can remember what you were doing six months ago.
The functions +help+ and +help.search+ do the same things as +?+ and +??+, respectively, but with these you always need to enclose your arguments in quotes. The following commands are equivalent to the previous lot:
help("mean") help("+") help("if") help.search("plotting") help.search("regression model")
The +apropos+ functionfootnote:[+apropos+ is Latin for "A Unix program that finds manpages."] finds variables (including functions) that match its input. This is really useful if you can only half-remember the name of a variable that you've created, or a function that you want to use. For example, suppose you create a variable +a_vector+:
a_vector <- c(1, 3, 6, 10)
You can then recall this variable using +apropos+:
apropos("vector") ##  ".__C__vector" "a_vector" "as.data.frame.vector" ##  "as.vector" "as.vector.factor" "is.vector" ##  "vector" "Vectorize"
The results contain the variable you just created, +a_vector+, and all other variables that contain the string +vector+. In this case, all the others are functions that are built into R.
Just finding variables that contain a particular string is fine, but you can also do fancier matching with +apropos+ using regular expressions.
Regular expressions are a cross-language syntax for matching strings. The details will only be touched upon in this book, but you need to learn to use them; they'll change your life. Start at http://www.regular-expressions.info/quickstart.html, and then try <<fitzgerald, Michael Fitzgerald's _Introducing Regular Expressions_>>.
A simple usage of +apropos+ could, for example, find all variables that end in +z+, or to find all variables containing a number between 4 and 9:
apropos("z$") ##  "alpe_d_huez" "alpe_d_huez" "force_tz" "indexTZ" "SSgompertz" ##  "toeplitz" "tz" "unz" "with_tz" apropos("[4-9]") ##  ".__C__S4" ".__T__xmlToS4:XML" ".parseISO8601" ##  ".SQL92Keywords" ".TAOCP1997init" "asS4" ##  "assert_is_64_bit_os" "assert_is_S4" "base64" ##  "base64Decode" "base64Encode" "blues9" ##  "car90" "enc2utf8" "fixPre1.8" ##  "Harman74.cor" "intToUtf8" "is_64_bit_os" ##  "is_S4" "isS4" "seemsS4Object" ##  "state.x77" "to.minutes15" "to.minutes5" ##  "utf8ToInt" "xmlToS4"
Most functions have examples that you can run to get a better idea of how they work. Use the +example+ function to run these. There are also some longer demonstrations of concepts that are accessible with the +demo+ function:
example(plot) demo() #list all demonstrations demo(Japanese)
R is modular and is split into packages (more on this later), some of which contain vignettes, which are short documents on how to use the packages. You can browse all the vignettes on your machine using +browseVignettes+:
You can also access a specific vignette using the +vignette+ function (but if your memory is as bad as mine, using +browseVignettes+ combined with a page search is easier than trying to remember the name of a vignette and which package it's in):
vignette("Sweave", package = "utils")
The help search operator +??+ and +browseVignettes+ will only find things in packages that you have installed on your machine. If you want to look in any package, you can use +RSiteSearch+, which runs a query at http://search.r-project.org. Multiword terms need to be wrapped in braces:
Learning to help yourself is extremely important. Think of a keyword related to your work and try +?+, +??+, +apropos+, and +RSiteSearch+ with it.
There are also lots of R-related resources on the Internet that are worth trying. There are too many to list here, but start with these:
- R has a number of http://www.r-project.org/mail.html[mailing lists] with archives containing years' worth of questions on the language. At the very least, it is worth signing up to the general-purpose list, R-help.
- http://rseek.org[RSeek] is a web search engine for R that returns functions, posts from the R mailing list archives, and blog posts.
- http://www.r-bloggers.com[R-bloggers] is the main R blogging community, and the best way to stay up to date with news and tips about R.
- The programming question and answer site http://www.stackoverflow.com[Stack Overflow] also has a vibrant R community, providing an alternative to the R-help mailing list. You also get points and badges for answering questions!
Installing Extra Related Software
There are a few other bits of software that R can use to extend its functionality. Under Linux, your package manager should be able to retrieve them. Under Windows, rather than hunting all over the Internet to track down this software, you can use the pass:[installr] add-on package to automatically install these extra pieces of software. None of this software is compulsory, so you can skip this section now if you want, but it's worth knowing that the package exists when you come to need the additional software. Installing and loading packages is discussed in detail in <>, so don't worry if you don't understand the commands yet:
install.packages("installr") #download and install the package named installr library(installr) #load the installr package install.RStudio() #download and install the RStudio IDE install.Rtools() #Rtools is needed for building your own packages install.git() #git provides version control for your code
- R is a free, open source language for data analysis.
- It's also a piece of software used to run programs written in R.
- You can download R from http://www.r-project.org.
- You can write R code in any text editor, but there are several IDEs that make development easier.
- You can get help on a function by typing +?+ then its name.
- You can find useful functions by typing +??+ then a search string, or by calling the +apropos+ function.
- There are many online resources for R.
Test Your Knowledge: Quiz
Which language is R an open source version of?
Name at least two programming paradigms in which you can write R code.
What is the command to create a vector of the numbers from 8 to 27?
What is the name of the function used to search for help within R?
What is the name of the function used to search for R-related help on the Internet?
Test Your Knowledge: Exercises
Visit http://www.r-project.org, download R, and install it. For extra credit, download and install one of the IDEs mentioned in <>.
The function +sd+ calculates the standard deviation. Calculate the standard deviation of the numbers from 0 to 100. Hint: the answer should be about +29.3+.
Watch the demonstration on mathematical symbols in plots, using +demo(plotmath)+.