Thursday, January 17, 2013

Auto-reboot a hung Raspberry Pi using the on-board watchdog timer

A watchdog timer is a special kind of timer commonly found on embedded systems that is used to detect when the running software is hung up on some task. The watchdog timer is basically a countdown timer that counts from some initial value down to zero. When zero is reached, the watchdog timer understands that the system is hung up and resets it. Therefore, the running software must periodically update the watchdog timer with a new value to stop it from reaching zero and causing a reset. When the running software is locked up doing a certain task and cannot update the watchdog timer, the timer will inevitably reach zero and a reset will occur.

Luckily for us, the Broadcom BCM2835 SoC on the Raspberry Pi comes with a hardware-based watchdog timer that can do just that. You will find this specially useful if you have a Raspberry Pi in a remote location and the operating system hangs and there's no one around to reboot it.

Load the bcm2708_wdog kernel module

To load the watchdog kernel module right now, issue the following command:
$ sudo modprobe bcm2708_wdog

If you are running Raspbian, to load the module the next time the system boots, add a line to your /etc/modules file with "bcm2708_wdog". The -a option makes sure tee appends instead.
$ echo "bcm2708_wdog" | sudo tee -a /etc/modules


If you are running Arch Linux, add a file called "bcm2708_wdog.conf" with the text "bcm2708_wdog" in /etc/modules-load.d/ with the following command:
$ echo "bcm2708_wdog" | sudo tee /etc/modules-load.d/bcm2708_wdog.conf

Install the software watchdog daemon

In Raspbian, run the following command:
$ sudo apt-get install watchdog
But in Arch we use pacman:
$ sudo pacman -S watchdog

Then, make sure it runs after every boot.
In Raspbian, run:
$ sudo update-rc.d watchdog defaults
OR
$ sudo chkconfig --add watchdog
In Arch run the following:
$ sudo systemctl enable watchdog

Configure the watchdog daemon

Open /etc/watchdog.conf with your favorite editor (mine is nano).
$ sudo nano /etc/watchdog.conf

Uncomment the line that starts with #watchdog-device by removing the hash (#) to enable the watchdog daemon to use the watchdog device.
Uncomment the line that says #max-load-1 = 24 by removing the hash symbol to reboot the device if the load goes over 24 over 1 minute. A load of 25 of one minute means that you would have needed 25 Raspberry Pis to complete that task in 1 minute. You may tweak this value to your liking.

Start the watchdog daemon

In Raspbian:
$ sudo chkconfig watchdog on
or
$ sudo /etc/init.d/watchdog start

In Arch:
$ sudo systemctl start watchdog.service

That's it!

You are done! You may play around with the settings in /etc/watchdog.conf if you'd like.
The watchdog daemon performs other tests that you will probably want to configure.

Arch Linux users: I'm well aware that the watchdog daemon is not necessary in Arch because you can enable watchdog features with systemd by editing /etc/systemd/system.conf but I prefer the watchdog daemon as it is much more featured.

Updates:
2013-01-17 11:45 AM EST: Fixed the chkconfig command to start and enable the watchdog daemon at startup.

Source: Raspberry Pi @ Gadgetoid