Anshu Aggarwal现任1 efficiency公司的技术副总裁。他有15年编程和技术领导经验，他创过业，也进过大公司。Anshu的专长是设计、开发、部署高性能的系统。2008年，Anshu创立了Zebek公司，这是世界上第一个以对象为基础的个人化搜索引擎。随后在2011年，Zebek被Quewey收购。在此之前，Anshu在Inkomi任职，并编写了HTTP处理引擎。Anshu取得了科罗拉多大学博德分校的计算机科学博士和硕士学位，并有波士顿大学电气工程硕士学位和计算机工程的学士学位。他是《HTTP权威指南》的合著者之一。
HTTP: The Definitive Guide had been published in 2002. After ten years, it has been published in China. Though it’s an ‘old book’, it is warmly welcomed by Chinese Web developers. What do you think might be the reasons?
The HTTP/1.1 RFC is a very well thought out specification because it resulted in the definition of a protocol that is used extensively and in a high performance manner. I think it has taken over a decade for the technology to catch up and it has still not fully exposed the limitations of the protocol completely so it continues to be popular now and for the foreseeable future.
In some ways it is like the next generation of TCP/IP. When the book was first published, many people were still writing network code and it was fairly well supported by languages like C/C++. But, network-level programming is difficult to write and absolutely horrible to debug. Network level programming was necessary for performance but hardware builders and software developers have both moved up the stack. Hardware builders have really made TCP/IP solid and fast by supporting it extensively in hardware and software developers no longer need to worry about it much. Instead, they can concentrate on HTTP and build new things assuming it exists.
It is because of this that HTTP has become a fundamental transport protocol. It is easy to use, widely supported through high level constructs in most web programming languages (e.g. Java, PHP, Ruby, etc.) – and very expandable. The RFC defines way to expand the protocol, for example by using extension headers, and in such a way that legacy systems are not impacted. It truly is very versatile because most software applications use it as is (there are no vendor-specific quirks).
The latest version of HTTP is HTTP/1.1, which is drafted in 1999. In the past 14 years, the internet has developed dramatically, why is there no updated version?
There has been some activity to develop HTTP-NG and now HTTP 2.0. It comes down to motivation. When HTTP first came, the 0.9 version was very, very limited and HTTP/1.0 was really the first usable version of HTTP. But, it led to severe performance limitations. HTTP/1.1 was proposed to address many of these performance limitations by introducing persistent connections, Range requests and support for intermediate network elements such as proxies and caches. Combined, these enhancements led to a dramatic increase in the performance of HTTP-based systems.
Software applications using HTTP performed well and to some extent, where performance suffered, it wasn’t so much of because of network latencies but rather client-side limitations. I think it has taken almost a decade for web content and client-server infrastructure to catch up. Now, will heavy multimedia proliferation, extensive Ajax-based websites we are approaching the limitations of HTTP/1.1 mainly in how requests are sent across connections.
Many solutions are economics driven. HTTP/1.1 provided high performance implementations and it wasn’t until Google came up with SPDY that there has been a compelling new proposal to increase performance to such an extent that existing software applications will be inclined to implement them. Disk I/O has been a bottleneck for the last several decades and now with the decrease in cost of solid state memory and many server-side caching solutions, time has come for further improvements in network communication. The new HTTP 2.0 protocol attempts to address these with connection multiplexing. I think in a few years these enhancements will lead to adoption of the HTTP/2.0 protocol.
Wide spreading mobile application and the availability of wifi infrastructure has reshaped the demand for internet, Google’s SPDY has answered to that question. What do you think of HTTP’s future (HTTP NG)?
See above. I think SPDY is very interesting and I am excitedly watching it progress. I feel that HTTP/1.1 was defined in a very well-thought out way with a lot of foresight. SPDY needs to be the same. The pace at which web technologies are changing is dramatic. SPDY needs to address not just today’s and tomorrow’s needs but also the needs of a decade from now.
Some readers remarked that they need to read HTTP: The Definitive Guide before they can fully understand Representational State Transfer. And REST was initially described in the context of HTTP, is there any other hidden gem in this book?
I think a good reading of the book will allow astute readers to discover ways to maximize the utility of caching technologies and connection use. Everyone expects the communication between the client and the server to be seamless. People’s patience with the browser is decreasing. The new infinite scrolling concepts introduced by Pinterest and Facebook have become de rigueur as have the automatic commits that Google docs made popular. Both of these techniques can benefit from smart use of connections and caching (and, of course, server application design).
Further, the book discusses delta-encoding. Think about how 14 or 15 years after HTTP/1.1 came out, the new SPDY protocol talks about header compression – something so basic. There might be countless such low-hanging fruit that the smart developers can use to improve performance.
As a search engine expert, how do you see SNS’s search engine? Would they pose certain threats to traditional search engine?
About six years back I started a company called Zebek to address the low quality search results in local search. I used the concept of directed weighted graphs to build a generalized model for search relevance. It was a generalization of the web-based ranking by replacing the need for a hyper-linked web page with any node that was connected to another node. For example, a business, it’s review by a user, published on a website is 4-node directed graph (the business is a node, the review is a node, the user is a node and the website is a node). Using this concept you can provide relevance when someone searches for the business. Zebek utilized this methodology to develop gigantic directed graphs that could provide each user with personalized relevance for whatever they searched for.
If you stay with this model, the content on SNSs are perfect for building these directed graphs. And the answer is, yes, they will provide much more relevant results. The concept, applied correctly, can lead to very relevant search results. Zebek attempted to do it for local search only, but it can be applied to anything. The big thing that I learned at Zebek is that the primary links in the SNS are not necessarily useful – it is the secondary and tertiary links that are more interesting. For example, when I am searching for Sichuan food, just using information about what my friends like is not enough (and sometimes it is wrong).
The trick is to somehow combine SNS data with standard search engine data. The only company that is in a position to do that is Google because with G+ they have some aspect of a social network.
HTTP processing engine and buildings energy efficiency seems to be far-fetched, what has inspired you to make this transition for your career?
I jumped into the clean energy space for two reasons: (1) I believe in increasing the efficiency of existing systems and (2) I wanted to learn how to apply what I know to a completely different field. In the end, the fundamental concepts are the same. One analyzes large amounts of data to extract meaningful insight. Energy efficiency is just that. By analyzing how a building’s energy usage varies based on multiple factors such as the weather, occupancy, construction, usage, one can figure out ways to reduce consumption.
When I was working with HTTP, the motivation was similar: some of the ISPs had trouble providing enough electric power to their servers to handle the volume of traffic that was passing through them. So, we built them a web cache which reduced their traffic volume by almost 40%. With the same resources they were able to handle the same amount of client traffic. A decade later, I am looking at a very similar problem.
Many Chinese readers are very interested in the work you are doing, would you please explain to us how does it work to improve a building’s efficiency with software?
In its most basic form, it is machine learning. Again, the interesting problems are economics driven. The electricity that is delivered to a building goes through many different players: the producer, the transmitter and the deliverer. Each buys and sells electricity and the rate is set based on market demand. Let us take the energy delivery company (the company that sends you the bill at the end of each month). This company has to build infrastructure to support the maximum amount of energy that will be consumed. The greater the energy consumption, the higher its infrastructure cost. Further, it has to get the electricity from the transmitter. The transmitter has a similar issue, but the problem that it faces is that it may need to have contracts with multiple energy suppliers. Suppliers will give people a lower rate if they will commit to purchasing a certain amount of energy and higher rates for “on-demand” energy. Now suppose the building starts to use a lot of energy. A lot of things happen – the deliverer has to get more energy from the transmitter, who has to get more energy from the supplier. Since this is on-demand, the supplier increases the rates. The transmitter’s costs go up and with the increased energy flow, the supplier’s system’s performance drops so its costs go up.
Everybody punishes everybody else and the cost is borne by the user. In fact, the way the system works, the rate for the end-user for a given month is set by the highest 15-minute usage of electricity in the previous month. So, during the middle of the day you turn on a few extra air conditioners, it will reset your rate for the whole of the next month.
This is an interesting problem. The question, of course, is how to warn the user to reduce their energy use if it looks like they will be setting a new record. And, how to tell the deliverer how much energy they should expect to deliver next month and how much energy the transmitter should pre-purchase from the supplier in order to get a better rate.
These problems can be addressed with modeling and machine intelligence.
What is the most challenging part of your job now? And what is the most rewarding part?
The two most difficult things are to understand the economics well enough to provide each member of the energy supply chain with valuable information and to get adequate real-time data to do the necessary analysis.
The most rewarding part is to see who plain and simple computer science can be used to reduce energy consumption and do something that is good for the environment.
Big data is beginning to change the world. What is the most intriguing part for you to expect with big data in the future?
Prediction. I think there is a lot more here. Big data will be used more and more for prediction and I think what will emerge is that as much as we would each like to think that we are very unique, we will find that our behavior, when analyzed by inanimate and boring computers, can be used to predict our decisions. This will be humbling but fascinating. The human aspect of this will be very interesting to watch – how does the next generation handle machines and computers that know them better than they know themselves.
What advice would you like to give to young developers?
Code. As much as you can. Code for your friends. For your family. Each time you finish one project, no matter how small, take on a similar one, but change something about it to force yourself to learn. For example, help a friend develop a Wordpress site. Once that is done, for your next friend, add fancy widgets, for your next friend, build a site from scratch, for the next one, write it in ember.js, for the next one use no-sql, for the next one find a way to learn Mahout, for next one, do it in python, for the next one, do web analytics, for the next one write a framework for doing A/B analysis, for the next one, code something on the Arduino, for the next one Raspberry Pi, etc.
Many years ago, I went skiing in Colorado with a friend who was a very, very good skier. I was not very good so I went on the gentle ski slope. She went with me and I asked her why she was wasting her time on a gentle slope. She said, “I can practice my turns and make them better on the gentle slope. I won’t waste my time.”
No matter how easy the task, there is probably something you can learn from it. So do not hesitate to take on tasks.
There is nothing like hands-on practice. It is the best way to learn. It is much easier to read a book. Close the book and code. It is harder. Push yourself. Get out of your comfort zone. Don’t be afraid to look like a fool and ask for help. In a class, the students that ask questions are typically the ones who have understood the lecture.
The world is changing. You will have to keep on adapting. I have been coding for the last 20 years but I still try to push myself in order to keep my brain nimble and stay current. Henry Ford invented the assembly line when he was in his 60s. It is never too early or too late to learn.