« VAGUE meeting on April 20, LVS presentation | Main | Move along »

Linux boxes should have their RTC (hardware clocks) set to UTC

Linux boxes should have their RTC set to UTC, and not local time.

Help for the confused: When I say RTC, I mean the same thing that is commonly referred to as the Real Time Clock, the Hardware Clock, the BIOS Clock, the CMOS Clock, The Thing that is Battery Backed and Keeps your System's Time When It Is Turned Off, and probably some other names I have not heard of.

Diatribe (most likely seasoned with hyperbole):

As some of you who have installed RedHat of any variety may know, during the install, Redhat gives you a checkbox which allows you to "Set system's hardware clock to UTC". This checkbox, by default, is not checked, so most admins who do not care about provincialism simply hit next. This is the cause for all sadness in the world.

Historically, UNIX systems keep their system clock and hardware clock set to UTC, which does not change with the seasons. There is no Daylight Savings Time for UTC, the clock just monotonically plods on, only occasionally getting updated with leap seconds when xntpd says so, or when the sysadmin sets the clock by hand. What you see in the "date" command is altered to match your TZ environment variable, or (under recent Linux versions) what's in /etc/localtime. This is nice because it allows multiple users in different timezones to use the same system and have information displayed in their local timezone. Everyone's happy.

Now, in 1994 Redhat comes along with a nice installer, and one of the options they give you is to keep your RTC in your local timezone. Actually, it's the default, thank you very much... This makes life easier for all the folks that are just trying out Linux on their Windows box and set up a dual-boot environment, so that when they boot back into Windows their clocks are not set ahead 4 hours (or whatever.)

But what does that matter to Linux? Upon boot, it loads up the value from the RTC, then looks in /etc/adjtime to see whether the RTC is set in UTC or not. If the RTC is set to UTC, it just sets the system clock from it. If it is not set to UTC, it makes an appropriate adjustment assuming that the clock is correct and sets the system clock from that. When it's time for the system to shut down, it sets the RTC in the opposite way, so that when the system comes back up again, it is as accurate as it can be.

Problem is, if your system shuts down in an abnormal way, it doesn't set the RTC, so you'll get a fair amount of drift.

Worse, if you keep your RTC as local time and you've run your system through a timezone change (Standard to Daylight or vice versa), your RTC is now an hour off one way or another.

OK, what's the big deal about being an hour off? Just run NTP and you'll be fine, right?

But NTP takes time to synchronize... So when you've just booted, you're an hour or so off, and it's a few minutes before NTP is confident that it has the right time. By then, most of your applications are up and running.

At that point, depending on how it's configured, NTP will either carefully skew the clock (which will take two and a half months for one hour of difference) or more likely step the clock (which is instantaneous.)

Half the time that'll be ahead an hour, and all sorts of little odd things will happen - all the timeouts will go off at once for your screensaver, login session, layer 6 network session, etc. Weird... but it happens quickly enough, and if you're on a workstation you see it all happen at once, and it's easily remedied.

The other half that'll be back an hour... and any timeout clocks which are based upon monotonic system time will instantly be extended by 1 hour. Your system will work, but it will be as though it is frozen in time. Weird, huh?

Now take that behavior and put it in a distributed environment where clock synchronization is critical to the functioning of the applications you're trying to support.

After it was expired due to a deadlock condition last night, one of the GFS servers I maintain got fenced 6 times before I gave up and took it out of the cluster. After the initial expiration and fence, the system would come up and run happily for 3 or 4 minutes, then mysteriously stop sending heartbeat messages to the other cluster nodes. The other nodes would assume it had died and fence it. Repeat.

I spent a fair amount of time this morning looking for hardware problems on that same system before I found this little message:

 Nov  1 23:14:42 penpen ntpd[2130]: time set -3599.972277 s
 Nov  1 23:14:42 penpen ntpd[2130]: synchronisation lost

But it was another two hours before I figured out why that was a problem.

For those Linux admins of you who haven't long ago pressed "D", what can you do to prevent this sort of problem from happening to you?

  • When you're installing Redhat, make sure to tell the system to keep its RTC as UTC.
  • If you're on such a box right now, run "hwclock --show" as root. Clock ahead an hour? This could happen to you! Quick, before the ax falls, run "hwclock --systohc --utc". Disaster averted! Repeat for all boxes you admin. :-)

Together we can end the scourge of Linux boxes which store local time in their RTCs! Thank you.

TrackBack

TrackBack URL for this entry:
https://jtl.blog.uvm.edu/mt/mt-tb.cgi/11

Post a comment