Tuesday, July 21, 2015

My Other Computer Is A Datacenter

Managing a datacenter doesn't require a hyperwall, just the right operating system.
ersi hyperwall photo by Kris Krüg on Flicker
Today, your personal computer is local. You own the hardware. The data is in a box under your desk. The processes run on a local operating system you installed.

Even those simple statements, which used to be absolute assertions, are getting fuzzier and fuzzier. You probably have data in "the cloud". You probably have used Google Docs, an application in the browser, running in Google's datacenters. Your email hasn't lived on your machine for a long time.

The next generation of computing, however, runs on multiple machines. Businesses are already doing this, running large datacenters, hosting applications distributed across multiple machines. Increasingly this looks like a new form factor. Increasingly abstractions hide the distribution and present a unified view. Increasingly the big players in the field are talking about their software as presenting a distributed kernel or distributed operating system for applications and services to run on.

What does this look like?

Lets take a look at the most mature of these: Mesos.

Kernel

Mesos is a generic distributed resource manager. Applications written for Mesos can schedule their processes on any number of distributed machines. Those processes might be short lived, like batch jobs, or they might be long lived, like web application servers. Those processes are also isolated, wrapped in containers, unable to negatively impact each other, enabling multiple applications and multiple users to live in harmony in the same shared environment.

If Mesos is a distributed kernel, what are the other operating system services?

Init System

How do you start up long lived applications? How do you restart them if they crash? What's the parent process that hosts other applications? On Mesos, the answer is Marathon: a distributed init system that runs applications in containers. It's simple; it accepts docker containers; and it's low level enough to run other applications, even ones that want to spawn other processes (Mesos tasks).

Storage

Compute resources (cpu, memory, processes) are core to an operating system, but you can't have a full OS without device drivers that talk to hard drives or other storage abstractions. In the distributed Mesos datacenter storage takes many forms. The most familiar of those is HDFS. It looks and acts very much like a traditional file system. It's easy to migrate to, and maintains multiple copies of stored data to survive individual drive failures. But it's not the only player. Need a document store? Hadoop works on Mesos. Need robust key/value storage? Cassandra works on Mesos. Need a SQL database? MySQL (Mysos) runs on Mesos.

Jobs & Time-Based Jobs

Not all applications are long lived. Some are ephemeral. For chaining those jobs together, use Spark. Need to run your jobs at a certain time, regularly? On a linux OS, cron handles those jobs. On a distributed Mesos datacenter, Chronos handles fault-tolerant time-dependent job scheduling. Got MapReduce workloads? YARN (Myriad) works on Mesos too.

Logging

Every app logs differently. You can easily log to one of the storage services, or you can use one of several logging or metrics services. Need a publish/subscribe data stream? Kafka runs on Mesos. Need a data stream backbone? Run Flume in Marathon and pipe the data to Kafka. Need real-time distributed search and analytics engine? ElasticSearch works on Mesos. 

User Space

Not all applications are run by the OS init system. User space applications are often more complicated, run on-demand, and have multiple components and service dependencies. For that, use Kubernetes.

Process Dashboard

The penultimate layer of this new ecosystem is the management of this new form factor. How do you observe something that's bigger than you? You have to step back and simplify. You have to take all the resource consumption data and distributed application process metrics and display them in a readable, browsable, navigable format. You need DCOS, which wraps Mesos to provide just that.

Package Management

Want to install all these newfangled datacenter applications and services? DCOS again comes to the rescue, with its command line package manager and app-store-like Mesosphere Universe.

The datacenter is more powerful than ever, and now you have the tools to take advantage of it.


Thursday, April 9, 2015

How To: Develop Good Managers

"people don’t quit companies—they quit managers" - Chris Loux
In my anecdotal experience, the above is often true.

Technically, I want the freedom to determine the best solution to a problem, even the freedom to figure out what the problem actually is. But career-wise, I want a manager who cares about my development and helps me identify and achieve my goals. If those aren't happening? I'm probably going to have a wandering eye, looking for better opportunities.

It's long been a quest of mine to figure out what a great manager looks like. I don't know if I want to be a manager yet, but I have pretty strong feelings about how I want people who manage me to behave.

While on that quest, I recently (re-)discovered a 2013 report by HBR on how Google researched and analyzed management to make Google a place top notch engineers want to work. I think the key takeaways are their top 8 behaviors of good managers:
A good manager:
1. Is a good coach
2. Empowers the team and does not micromanage
3. Expresses interest in and concern for team members’ success and personal well-being
4. Is productive and results-oriented
5. Is a good communicator—listens and shares information
6. Helps with career development
7. Has a clear vision and strategy for the team
8. Has key technical skills that help him or her advise the team
The other key takeaway is that in order to make sure managers knew what they needed to improve on they instituted a regular "Upward Feedback Survey" that would allow reports to provide numeric feedback about their managers in those 8 key areas. That feedback was then aggregated and reported back to each manager, so that they could improve.

Plus, this feedback loop was outside of the normal performance reviews. It was confidential and allowed managers to improve themselves. It's a development tool, not a performance metric.

Then on top of that Google provided optional management training courses, for managers to improve in the areas their feedback suggested.

It's really that simple.
  1. Create a feedback loop
  2. Analyze and report the metrics
  3. Allow people to improve on personal areas of weakness without threat to their job
People want to improve. If they don't, you hired the wrong people.