Hadley(哈德利)出生在新西兰 · 汉密尔顿的一个从事数据统计的家庭。他的父亲布莱恩•韦翰是康奈尔大学动物育种方面的数据统计博士，妹妹获得了加州大学伯克利分校数据统计的博士学位。
和Hadley一样，R编程语言也来自新西兰。R语言成立于1993年，由奥克兰大学的统计学家Ross Ihaka和Robert Gentleman一起创建，主要用于数据分析，却也存在一些怪癖（如索引数据结构的方式、物理内存存储的方式等）。所以，其他开发语言的使用者大都认为R语言很奇怪。使用过Java、VBA和PHP之后，Hadley发现R“与众不同”。“（许多程序员）认为R语言荒谬、笨拙，我不这么认为，”他说，“我认为R非常有趣。”
Having seen your picture, some followers suggest, “With such a pretty face, why don't you make a living by appearance instead of coding?” In fact, it's a tendency for fans to praise their superstars with words like “With such a perfect face, (Johnny Depp,etc.) can completely live well, but he struggles with acting improvement.” So what's your reason for coding?
I love coding for two main reasons. Firstly, I really enjoy figuring out the underlying structure behind problems that on the surface seem very different. For example, I found it very satisfying to develop the ideas behind tidy data and the tidyr package because it I enjoyed figuring out the deeper underlying theory.
Secondly, I really enjoy programming because it helps other people. Producing R packages is a great way to turn my ideas in to tools that other people can take advantage, and I enjoy all the feedback that I get from the R community. Hearing that people are using my code and finding it useful is one of the things that keeps my motivated.
R Packages is available for free online. Don't you fear it may decrease paper version's sales? Or why do you choose to publish the book then, since there is little financial incentive?
My goal from writing books is not to make money, but to reach as many people as possible. I think making the book available in both forms achieves this goal well. Younger people who don’t have a lot of money to spend on books can use the website. People who enjoy reading physical books can still buy one, and the marketing around a physical book is more likely to reach people who aren’t as active on the internet.
I know R Packages was written in the open. Could you describe your understandings on the crowd-sourcing experience?
I think writing a book is a truly excellent way to write. One of the challenges of writing a book is that it is a large project that can take one or more years. It’s hard to maintain excitement and motivation about such a big project. However, when you write in the open, you constantly get feedback. This makes it much easier to stay motivated!
I’m also quite bad at proof reading, and I really enjoy that the R community can contribute through github pull requests to fix all of my silly mistakes! People also contribute larger fixes, and point out other problems with the text. All together, writing in the open makes the book much better than it would otherwise be!
Could we compare R packages development to API design? In addition to encapsulation, robustness and usability, is there anything special need to be paid attention to?
I think there are some general principles that make my packages work together particularly smoothly. Currently, those principles are mostly intutuitive to me: I know what to do, but I can’t explain it well so that other people can learn. I am trying to change that by writing up the principles that underlie the “tidyverse”, and you can find my first attempt at https://github.com/hadley/tidyverse/blob/master/vignettes/manifesto.Rmd. I think these are important principles for the design of R packages because they make an API feel like R, and help packages work together naturally.
R was designed for data analysis, but has some quirks, like data structures are indexed and have to be stored in physical memory. Do you think the memory management way of C++ and Spark would be referred to ?
R is not perfect, but I think it does a really good job of making the human data analyst as effective as possible. R is a very flexible language which means that it’s possible to design domain specific languages like ggplot2 and dplyr that help solve certain subdomains of the data anaylsis problem. That flexibility has it’s downsides: generally slower performance. I think it’s worthwhile to have different languages for different domains: R is great for making humans efficient at doing data analysis; C++ is great for making computers calculate as efficiently as possible. I personally don’t believe it’s possible to have one language that does both. (In other words, I believe in Ousterhout’s dichotomy, https://en.wikipedia.org/wiki/Ousterhout%27s_dichotomy)
Data statistics and analysis with R has its unique advantages but with low efficiency. Could interfaces of C be used in the development of R packages so as to build components easily and efficiently to be used ?
Yes, and many many packages now use Rcpp and C++ to do exactly that. As we see more experienced programmers learn R, and more R users become experienced programmers, I think we will see more and more packages that are designed for high-efficiency.
Microsoft and IBM have employed R. There are also commercial companies providing R packages with better performance like H2o. What's your idea concerning company's influences on R development?
I think it’s a great sign of R’s continued evolution and it’s growing-up as a programming language. R is now a critical part of many companies, and that means that there will be more resources to work on R generally. One particularly exciting initiative that I’m involved with is the R consortium (https://www.r-consortium.org). This is a way for companies to give back to the R community, and have their money be spend to make R better for everyone.
According to you, RStudio is the best development environment for R users. A few readers concern your books might be too focused on RStudio. They suggest it's better to separate from integration with RStudio.
There are other ways to use R apart from RStudio, and most popular tool after RStudio is ESS or Emacs speaks statistics. These tools are powerful, but because they’re more tailored for advanced users, I’ve chosen to focus on RStudio in my books. I think that’s a reasonable trade-off as if you don’t use RStudio, you can just ignore the bits that don’t apply (and you’re probably a more experience R programmer so you are able to figure out the equivalents yourself).
You've contributed so much to R, particularly in R packages. How could you be so productive?
Here are a few more thoughts from a personal perspective.
Writing. I have worked really hard to build a solid writing habit - I try and write for 60-90 minutes every morning. It's the first thing I do after I get out of bed. I think writing is really helpful to me for a few reasons. First, I often use my writing as a reference - I don't program in C++ every day, so I'm constantly referring to @Rcpp every time I do. Writing also makes me aware of gaps in my knowledge and my tools, and filling in those gaps tends to make me more efficient at tackling new problems.
Reading. I read a lot. I follow about 300 blogs, and keep a pretty close eye on the R tags on Twitter and Stack Overflow. I don't read most things deeply - the majority of content I only briefly skim. But this wide exposure helps me keep up with changes in technology, interesting new programming languages, and what others are doing with data. It's also helpful that if when you're tackling a new problem you can recognise the basic name - then googling for it will suggest possible solutions. If you don't know the name of a problem, it's very hard to research it.
Chunking. Context-switching is expensive, so if I worked on many packages at the same time, I'd never get anything done. Instead, at any point in time, most of my packages are lying fallow, steadily accumulating issues and ideas for new feature. Once a critical mass has accumulated, I'll spend a couple of days on the package.
Finally, it's hard to over-emphasise the impact that working full-time on R makes. Since I've left Rice, I now spend well over 90% of my work time thinking about and programming in R. This has a compounding effect because as I built better tools (cognitive and computational) it becomes even easier to build new tools. I can create a new package in seconds, and I have many techniques on-hand (in-brain) for solving new problems.