On Pager Duty – a reply

Recently Mathias Meyer of paperplanes.de wrote a blog post about on call duty (OCD) for his startup Travis CI. Apparently that field seemed new to him and he summarized his experience and findings and asked for other people’s experience in this special field.

As I’m working in the field of web operations since about 1999 I was on call most of the time on the last 13 years. And of course I’ve gather my share of experience and opinion on this topic and thought a reply might be worthwhile. Quickly I started drafting a reply. But quickly I realized that this can be quite an extensive topic and slowed down a bit. I started to draw a mind map of what I see related to this topic.

On Call Duty

That looks like material for a series of blog posts to me. And in order to give at least a bit of feedback rather sooner than a full reply in half a year I’m going start with my experiences around the mere on call duty itself first.

What does it mean to be “on call”?

Being on call to me mean being reachable by phone (or pager) all the time – 24h a day, 7 days a week. No matter what. And not only being reachable, but also being able to work on an issue.

This has several implications. One must not only carry a working phone/pager, but also make sure those devices have network coverage to accept the call. This limits my personal freedom of moving and traveling during those on call shifts. Most of the country sides in Germany (and most other countries actually) have pretty bad mobile network coverage. Some facilities explicitly block mobile reception like Cinemas, Hospitals and their likes. There are also some other areas where carrying a phone is not easy or acceptable like swimming pools, sauna etc.

The next aspect is the ability to quickly start working on issues. Depending on the Service Level Agreements you have with your customers and partners you have to start fixing a problem within minutes of detection. Lucky enough mobile computer and 3G/4G carriers enable us to work from almost everywhere (given the above mentioned restrictions). So I don’t have to stay in the office during my on call duty but can travel around to a certain extend. But it also means I have to carry around my work equipment.

In my company we’re using our normal work Laptops in combination with mobile hotspots. Some people use their normal smartphones to establish an internet connection but mostly we use additional 3G/4G equipment. One reason for this is, that we usually have to attend an incident conference call while working on the issue. Using the phone as a hotspot while being in a conference call will not work very well if at all. Separate WiFi equipment usually also comes with it’s own battery and thus extends the working time off a power socket a bit.

To be able to work from remote location you of course need a secure way to enter your compute environment. Means you have to have some sort of VPN and remote access in place to let you in from untrusted networks (the evil Interweb). To add more flexibility on the work equipment using open standards is the best way to ensure you get in when you need it. Using super proprietary VPN equipment might limit your ability to login with mobile equipment like iPads or Smartphones.

Although it’s technically possible to do SSH connections from mobile Phones or Tables I still prefer a Laptop for On Call work. It gives me more flexibility and is usually a bit quicker. And time usually is a sensitive factor when being on call on a support mission. Your customers want you to keep a possible service degradation or downtime as short as possible and you want to get back to bed or other activities asap as well.

[Update 2013-01-10 10:20pm] Another aspect of “being able to work” just came to my mind. Not only do you need your technical equipment ready to start working but also yourself. Taking a support mission working in a root shell after 5 beers might not be the wisest decision. So staying sober during OCD is mandatory…

 

Being on call definitely has an impact to family life. If you’re just on your own you might not mind being called in the middle of the night and spent 3 hours recovering a database. But when you have a family, maybe with young kids, it’s becoming an issue.

You potentially wake up the family when being called. Now you have the choice of fixing the issue or calming down a crying baby. Parents know the choice is easy. When you work half the night, you might be grumpy in the morning when kids expect you to be the lion king on their birthday party… Wife usually don’t appreciate taking call during dinner in a restaurant and opening a laptop on a dining table.

Summarized on call duty will put a virtual chain on you. It limits your ability to do what you would normally do. It requires you to plan your activities around potential support missions and affects also the people who are with you.

This is why on call duty usually deservers a special compensation (either as money or time off) in companies.

In my next article in this OCD series I’ll write about the organizational challenges of organizing on call duty. You may have a quick peek at the subtopics in the mind map linked at the above preview image.

5 Comments

  • David Mytton says:

    Our on-call schedule has primary and secondary responders – the secondary is usually part of the ops team with more knowledge of the entire infrastructure with the primary being an engineer who can fix most issues. To solve the last problem you mentioned regarding the social aspects of being on call, we have a policy of switching the roles for 24 hours following an on-call event. Depending on what the impact was, it could also be switching the on-call rota to entirely different people. This ensures whoever is on call is fresh, not tired from dealing with an issue the previous night, and also helps with disturbing other people multiple times.

  • Chris says:

    Thanks for the blog post. An interesting read.

    I was wondering in your experience what level of Compensation you’d expect, either as time or pay. Are you looking for/expecting an extra 10% of your salary, for being in an on call role?

    I’m interested in what sort of renumeration you’d find acceptable.

    Thanks for the info

  • falko says:

    I’m covering the compensation part in the next post on organizational challenges of the OCD.

Leave respond

  • January 3rd, 2013
  • Tagged with:
  • 5 Comments