Monitor that the application has been restarted (unavailable)


#1

Hi

Let say we are restarting an application everyday 01:00. The application will be up running again after max. 5 min. We have defined a check_http with a subdue between 00:55 to 01:05 so everything is fine because this is a scheduled maintenance and we don’t want to be notified. But we want to be notified when the application has not been restarted 01:00. Do you have any idea how to solve this? Should we use logster to monitor the log file or check Graphite?


#2

Well, it depends on OS I suppose.

If it is apache on linux we are talking about, I will check the timestamp of the PID file (/usr/local/apache2/logs/httpd.pid, details: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#pidfile). If a new instance is started, the pid file will be updated


#3

Let say the sensu check runs every 1 minut how will you check at only between 01:00 and 01:05 that the pid has changed if it has not changed in this interval it will sent the event to the handler?

···

Den torsdag den 5. marts 2015 kl. 10.51.11 UTC+1 skrev Anthony Kong:

Well, it depends on OS I suppose.

If it is apache on linux we are talking about, I will check the timestamp of the PID file (/usr/local/apache2/logs/httpd.pid, details: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#pidfile). If a new instance is started, the pid file will be updated


#4

pseudo code:

class checker < Sensu:Plugin:Check:CLI

def run()

    modified_time = getTimeStamp(PIDFILE)

    if modified_time.between(01:00, 01:05)

       ok

    else

       critical
···

On Thursday, 5 March 2015 22:10:24 UTC+11, Khuong Dinh Pham wrote:

Let say the sensu check runs every 1 minut how will you check at only between 01:00 and 01:05 that the pid has changed if it has not changed in this interval it will sent the event to the handler?

Den torsdag den 5. marts 2015 kl. 10.51.11 UTC+1 skrev Anthony Kong:

Well, it depends on OS I suppose.

If it is apache on linux we are talking about, I will check the timestamp of the PID file (/usr/local/apache2/logs/httpd.pid, details: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#pidfile). If a new instance is started, the pid file will be updated


#5

Thx for the input. Sure i will use something like this (make own plugin) but this snippet will not work if the application has been restarted 02:00 (valid).

···

Den torsdag den 5. marts 2015 kl. 13.08.48 UTC+1 skrev Anthony Kong:

pseudo code:

class checker < Sensu:Plugin:Check:CLI

def run()

    modified_time = getTimeStamp(PIDFILE)
    if modified_time.between(01:00, 01:05)
       ok
    else
       critical

On Thursday, 5 March 2015 22:10:24 UTC+11, Khuong Dinh Pham wrote:

Let say the sensu check runs every 1 minut how will you check at only between 01:00 and 01:05 that the pid has changed if it has not changed in this interval it will sent the event to the handler?

Den torsdag den 5. marts 2015 kl. 10.51.11 UTC+1 skrev Anthony Kong:

Well, it depends on OS I suppose.

If it is apache on linux we are talking about, I will check the timestamp of the PID file (/usr/local/apache2/logs/httpd.pid, details: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#pidfile). If a new instance is started, the pid file will be updated


#6

I think the key part of my suggestion is that you can use the PID file to infer the start time. You will need to figure out implementation details to suit your need.

Cheers


#7

Maybe another way to think about this is that you want to be notified
if the process has been running for more than 24hours (+5m) ? right?

The venerable check_proc can do this:
https://www.monitoring-plugins.org/doc/man/check_procs.html

something like?
/usr/lib/nagios/plugins/check_procs --metric=ELAPSED -c :86400 -C /usr/bin/nginx

Or the sensu-community check-procs.rb can do this:
https://github.com/sensu/sensu-community-plugins/blob/master/plugins/processes/check-procs.rb

check-procs.rb --esec_over 86400 --cmd_pat nginx --crit_over 1 ?

This would allow for other manual restarts. But wouldn't actually
catch it if someone manually restarted it at midnight, and the cron
job failed at 01:00.

If you really want to monitor the *cron job*, I kinda like this:

···

On Thu, Mar 5, 2015 at 5:54 PM, Anthony Kong <anthony.hw.kong@gmail.com> wrote:

I think the key part of my suggestion is that you can use the PID file to
infer the start time. You will need to figure out implementation details to
suit your need.

Cheers