Michael T. Nygard是一位从业二十余年的资深程序员,现任Cognitect首席架构师,他被誉为在线业务的“流动解决问题专家”。Nygard曾先后为美国政府、军队、银行、金融、农业和零售等多个行业交付过运营系统,这种实际运营的经历改变了他对软件架构的看法,也让他对在相当不友好的环境下构建高性能、高可靠性的软件有了独特的见解。他写过多篇文章和社论,是软件架构经典著作《架构之美》和《软件架构师需要知道的97件事》的作者之一。Nygard最新出版的著作《发布!软件的设计与部署》详细展示了软件发布前可能出现的种种问题以及相应的解决之道,书中所有主题都是通过作者自己研究过的真实案例来阐述的。

iTuring: You have mentioned in your blog that you might write some new books (Three Book Ideas). Are there any developments with these books?

There's an amazing response any time you ask an author that question. He will look nervous, begin sweating, and mumble something incoherent, all while looking around wildly for the nearest exit. I'll just say that I have nothing to announce at this time.

iTuring: Some of the patterns being mentioned in Release It! are widely applied these days, such as Circuit Breaker, which Hystrix of Netflix has implemented pretty well. Considering Release It! is a book which has been published in 2007, eight years later have you found some new patterns of stability/capacity?

There's one major pattern that manifests in two ways: asynchronous style and reactive style. I see both of those as two sides to the same coin. Because many of the stability patterns result due to blocked threads, both these styles help.

iTuring: Sometimes simple mistakes can cause downtimes of the whole system. Is it a problem of a programmer’s single line of code? What mechanisms could be introduced to ensure complex systems’ stability?

Some problems really do begin with a single line of code, but there are always other factors that amplify the problem. Something may change in the external environment that causes a latent bug to manifest. Or an operator's action may cause a problem to surface with code that would normally not be executed.

Some problems, however, emerge due to the large-scale structure of the system. For example, I do not like the "entity service" model in SOA. The reason is that every application needs many entities. The laws of probability tell us that the extended system is likely to malfunction when any entity service is not working.

So, I try to create resilience (and even antifragility) at both the micro and macro scale. At the micro scale, I use design patterns like those in the book. At the macro scale, I analyze the "failure domains" in the system. That is, when one component (hardware or software) fails, what is the span of affected applications and features? It is often possible to separate the system into isolated failure domains by reallocating functionality among the applications and by splitting entities into facets.

iTuring: Does complex business induce complex systems? As an architect, how to keep the software simple while not compromising complex business models?

So far, I have not found a correlation between complex businesses and complex systems. I've found the strongest predictor of system complexity to be regulation.

iTuring: How do DevOps differentiate from traditional operation engineers?

DevOps emphasizes empathy. In a DevOps culture, developers care how their application affects operators as people. Does my application mean the administrator must stay awake late to do deployments? How can I change my application so she can spend time with her family instead of in a terminal? Operators reciprocate: How can we create an environment that lets developers create and deliver value with courage?

iTuring: From C/S and B/S of 2007 to App and NoSQL now, internet industry has been reformed. And many agile methodologies have been evolving too. What does software release change over the years? And what remain unchanged?

There are three things I think are the biggest changes:

First is the fall of the Sun and Microsoft hegemony. Then, nearly all development for companies was in Java or .Net, with an up-and-coming Ruby on Rails community. Today, it's common to see systems that use many different languages and runtime environments.

Second, cloud deployment environments have dramatically changed the economic.

Third, and largely a result of the first two, open source operations tools have democratized high-reliability operations. In 2007, it cost millions of dollars to roll out data center automation, centralized management, and monitoring. Today, you can download all of that.

iTuring: Since mobile internet and cloud services are more and more accepted by the general public, IT industry has some radical changes. What technology ideas would you recommend architects to focus?

Enterprise architects have previously focused on the technology "inside the box" on the diagram. That is, they aimed for technology standardization in the implementation details.

In today's world, I think architects must be much more concerned with data formats and representations. That is, they must focus on the arrows, not the boxes.

iTuring: Relevance code primarily in Clojure, which is very different from the major languages (C / Java / C#) most companies are using. How do you expect programming languages to become in the future?

I'm not a very good person to ask about this. All I can report is that I see many developers moving toward functional programming.

iTuring: At Relevnace on Fridays, you developers spend time on pet projects and open source software, which is 20% of your work time and it’s a big allocation. How do you benefit from this weekly event? Does it compensate your time loss?

One quick note, Relevance renamed to Cognitect in August 2013.

We've created some things in 20% time that many people would recognize, including the web framework Pedestal and the initial ClojureScript implementation. Today, our 20% time continues to go into developing Clojure, ClojureScript, Pedestal, and some new things that we'll unveil soon.

We have a long history of questioning our most basic assumptions about software development, and examining our own work to find better ways to build software. That extends to 20% time. So it's not just something we keep doing by habit or routine. We frequently assess whether it is worth it.

So far, we have always found it to be worthwhile. We are serious about making software development better for everyone. Our open source tools are part of that.

iTuring: Why did you write the tool Simulant for simulation testing? How is this project going?

Although I've been talking about Simulant a lot, it was written by Stuart Halloway based on architecture from Rich Hickey.

The Simulant library itself is stable for now. My focus is on helping people apply it successfully. To that end, I did a webinar about it last year. I've also made a sample project that you can find on GitHub. (https://github.com/mtnygard/simulant-example).

Right now, I'm working on a "solution blueprint" that should also help people do simulation testing, with or without Simulant itself.


更多精彩,加入图灵访谈微信!