Monday, July 10, 2017

The Role of the Software Architect

What's in a Name?

Collaboration improves when the roles of individual team members are clearly defined and well understood.
— Tammy Erickson April 05, 2012
I recently switched roles, from distributed systems engineer to distributed systems architect. I've been doing professional software development for about ten years. I remember telling my boss five years ago that my five year plan was to be a software architect. Guess that worked out! (Now what!?)

Looking back, it's obvious that I'm much better equipped for the position than I was five years ago. But the role itself also seems to have changed over time, as the tech world has evolved to integrate agile tendencies. So as I approached the new position, I found myself needing to repeatedly answer the same questions. Here are some of my notes.

What is a "Software Architect"?

An architect is an engineer who empowers other engineers as a force multiplier.

An architect focuses on the big picture of the system as a holistic product, rather than owning individual components.

An architect drives consensus between tech leads and product managers, rather than making or dictating critical decisions.

An architect promotes flexibility by identifying potential interface points and presenting engineers with options to reduce the cost of change, balancing short-term tactics and long-term strategy.

An architect is a specialist with strengths in diagramming, documenting, communicating, information architecture, and system design.

What is a "Distributed Systems Architect"?

The title itself doesn't seem to have a commonly accepted definition. So let's build one from similar titles.

A systems architect defines the architecture of computerized systems to fulfill specified requirements. Such definitions include the design and specification of components, component interactions, interfaces, technologies, resources, and dependencies.
Wikipedia - Systems architect
A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages to achieve a common goal. Significant characteristics of distributed systems include: concurrency of components, lack of a global clock, and independent failure of components.
Wikipedia - Distributed computing

Therefore, a distributed systems architect is a systems architect who defines the architecture of distributed systems, with special focus on communication and coordination between components.

Increasing Complexity

Whether you call yourself a software architect, a systems architect, or a distributed systems architect, the computing landscape continues to evolve with or without you. A few things specifically have changed in the last decade or two that make these roles harder than they used to be.

1) The demand for high availability and intolerance of down time has caused components that used to be simple to be replaced by distributed systems. Likewise, previously distributed systems became systems of systems. With multi-layer systems it's no longer appropriate to rely on legacy architectural design patterns, like a single API layer or a single communication bus.

2) The demand for agility and larger development teams has mutated service-oriented architecture into microservice architecture. This mode of development pushes even more glue code out of the individual components and into the platform or infrastructure layers, increasing their complexity. Deploying and managing a sufficiently large system of microservices increasingly requires a container or application platform above and beyond the scope of modern infrastructure platforms.

3) The demand for hyperscale (the ability to quickly scale up and down without re-architecting) often calls for the architect to design a dynamic system that considers dramatic change over time normal to system operation. The ability to survive a flash crowd of users (aka the slashdot effect) caused by social media is increasingly required by even the most mundane of business domains. Preparation for this and similar phenomenon happens largely at the system level.

4) The demand for open source software creates proprietary systems that aren't completely controlled by their developers. Unlike proprietary components, where one company is in charge of development, open source components can be both easier to change (with a fork or local modification) and harder to persist change (merging changes upstream often requires politics, persuasion, and modification). Partial control of the architecture of a system is nothing new, but partial control over individual components (specifically their APIs and communication patterns) increases demand for intermediary translation, proxy, and caching services. These ancillary components are often tightly coupled with one component to allow loose coupling with another, but tight coupling in a microservice environment often requires new deployment and management patterns, like sidecars and pods.


Not all software engineers will or should aspire to be software architects. Architecture is a specialization of the engineering track for people who enjoy abstract thinking, are good at and enjoy documentation and diagramming, have the ability to be political and drive consensus, and have demonstrated a solid grasp on prior art and the competitive landscape.

An architect may or may not also be an engineering manager or tech lead, depending on how much focus on architecture is required or desired. Since being hands off of code tends to erode peer confidence and personal skill set, it is desirable that an architect continue to be partly involved in programming efforts, either through tool, component, or ecosystem development.

The bigger an architecture team gets, the more they tend to lose touch with the every day life of the engineers they're supposed to be supporting. However, the more engineers an architect has to support, the less time they will have to stay hands on with code. So a balance must be struck such that architects retain respect and skill yet also spend enough time on architecture to actually provide the benefit of a higher level perspective. It's a fine balance.

Ontological Evolution

The idea of a distinct architect role has in recent past become somewhat contentious among software engineering agile practitioners. The more senior a thought worker, the less they want to be told what to do. But at the same time, modern development patterns are pushing complexity out of individual components with well defined teams into the between space. Plus each component may have a life of its own: independent release cycles, independent versioning, multiple interface/protocol paradigms, independent motivations, and external contributors. Someone has to align these components and the teams that develop them, to unify them into a single cohesive system. So just as engineering practice has evolved, so must architecture practice evolve from controlling and managing technical vision to leading the direction of its evolution. And since collaboration is easier with clearly defined roles, hopefully this new definition can help point the way.

(Revised on 2017-07-13 to expound on the increasing complexity of systems and the increasing demand for architectural design.)

Tuesday, July 21, 2015

My Other Computer Is A Datacenter

Managing a datacenter doesn't require a hyperwall, just the right operating system.
ersi hyperwall photo by Kris Krüg on Flicker
Today, your personal computer is local. You own the hardware. The data is in a box under your desk. The processes run on a local operating system you installed.

Even those simple statements, which used to be absolute assertions, are getting fuzzier and fuzzier. You probably have data in "the cloud". You probably have used Google Docs, an application in the browser, running in Google's datacenters. Your email hasn't lived on your machine for a long time.

The next generation of computing, however, runs on multiple machines. Businesses are already doing this, running large datacenters, hosting applications distributed across multiple machines. Increasingly this looks like a new form factor. Increasingly abstractions hide the distribution and present a unified view. Increasingly the big players in the field are talking about their software as presenting a distributed kernel or distributed operating system for applications and services to run on.

What does this look like?

Lets take a look at the most mature of these: Mesos.


Mesos is a generic distributed resource manager. Applications written for Mesos can schedule their processes on any number of distributed machines. Those processes might be short lived, like batch jobs, or they might be long lived, like web application servers. Those processes are also isolated, wrapped in containers, unable to negatively impact each other, enabling multiple applications and multiple users to live in harmony in the same shared environment.

If Mesos is a distributed kernel, what are the other operating system services?

Init System

How do you start up long lived applications? How do you restart them if they crash? What's the parent process that hosts other applications? On Mesos, the answer is Marathon: a distributed init system that runs applications in containers. It's simple; it accepts docker containers; and it's low level enough to run other applications, even ones that want to spawn other processes (Mesos tasks).


Compute resources (cpu, memory, processes) are core to an operating system, but you can't have a full OS without device drivers that talk to hard drives or other storage abstractions. In the distributed Mesos datacenter storage takes many forms. The most familiar of those is HDFS. It looks and acts very much like a traditional file system. It's easy to migrate to, and maintains multiple copies of stored data to survive individual drive failures. But it's not the only player. Need a document store? Hadoop works on Mesos. Need robust key/value storage? Cassandra works on Mesos. Need a SQL database? MySQL (Mysos) runs on Mesos.

Jobs & Time-Based Jobs

Not all applications are long lived. Some are ephemeral. For chaining those jobs together, use Spark. Need to run your jobs at a certain time, regularly? On a linux OS, cron handles those jobs. On a distributed Mesos datacenter, Chronos handles fault-tolerant time-dependent job scheduling. Got MapReduce workloads? YARN (Myriad) works on Mesos too.


Every app logs differently. You can easily log to one of the storage services, or you can use one of several logging or metrics services. Need a publish/subscribe data stream? Kafka runs on Mesos. Need a data stream backbone? Run Flume in Marathon and pipe the data to Kafka. Need real-time distributed search and analytics engine? ElasticSearch works on Mesos. 

User Space

Not all applications are run by the OS init system. User space applications are often more complicated, run on-demand, and have multiple components and service dependencies. For that, use Kubernetes.

Process Dashboard

The penultimate layer of this new ecosystem is the management of this new form factor. How do you observe something that's bigger than you? You have to step back and simplify. You have to take all the resource consumption data and distributed application process metrics and display them in a readable, browsable, navigable format. You need DCOS, which wraps Mesos to provide just that.

Package Management

Want to install all these newfangled datacenter applications and services? DCOS again comes to the rescue, with its command line package manager and app-store-like Mesosphere Universe.

The datacenter is more powerful than ever, and now you have the tools to take advantage of it.

Thursday, April 9, 2015

How To: Develop Good Managers

"people don’t quit companies—they quit managers" - Chris Loux
In my anecdotal experience, the above is often true.

Technically, I want the freedom to determine the best solution to a problem, even the freedom to figure out what the problem actually is. But career-wise, I want a manager who cares about my development and helps me identify and achieve my goals. If those aren't happening? I'm probably going to have a wandering eye, looking for better opportunities.

It's long been a quest of mine to figure out what a great manager looks like. I don't know if I want to be a manager yet, but I have pretty strong feelings about how I want people who manage me to behave.

While on that quest, I recently (re-)discovered a 2013 report by HBR on how Google researched and analyzed management to make Google a place top notch engineers want to work. I think the key takeaways are their top 8 behaviors of good managers:
A good manager:
1. Is a good coach
2. Empowers the team and does not micromanage
3. Expresses interest in and concern for team members’ success and personal well-being
4. Is productive and results-oriented
5. Is a good communicator—listens and shares information
6. Helps with career development
7. Has a clear vision and strategy for the team
8. Has key technical skills that help him or her advise the team
The other key takeaway is that in order to make sure managers knew what they needed to improve on they instituted a regular "Upward Feedback Survey" that would allow reports to provide numeric feedback about their managers in those 8 key areas. That feedback was then aggregated and reported back to each manager, so that they could improve.

Plus, this feedback loop was outside of the normal performance reviews. It was confidential and allowed managers to improve themselves. It's a development tool, not a performance metric.

Then on top of that Google provided optional management training courses, for managers to improve in the areas their feedback suggested.

It's really that simple.
  1. Create a feedback loop
  2. Analyze and report the metrics
  3. Allow people to improve on personal areas of weakness without threat to their job
People want to improve. If they don't, you hired the wrong people.