What if a script having a command in check took more time than interval?

Hi Geeks,

I have a situation. There is a script with some logic and one command communicate with DB. This command may take more time like 5 minutes or more etc. I am running this bash script in sensu check like. I want to run this check every minute no matter that script took more time. Just give 5 minutes (5 turns one minute each ) and after 5 minutes kill the process and start script again.

/opt/sensu-plugins-ruby/embedded/bin/CronJobsScheduleMonitor.sh

@todd @jspaleta Please guide me sir accordingly.

Am I understanding this correctly that you essentially would expect to possibly have 5 iterations of the same check running at a time with an interval of 60 seconds and a timeout of 300?

00:00 Check 1 starts run
00:01 Check 2 starts run
00:02 Check 3 starts run
00:03 Check 4 starts run
00:04 Check 5 starts run
00:05 Check 1is killed, Check 6 starts run
00:06 Check 2 is killed, Check 7 starts run

The above scenario will never work with Sensu as it will only spawn one version of the check at any one time. Checks 2-5 would never start, check 1 would timeout at the 5 minute mark and then Check 2 would be started one minute later.

00:00 Check 1 starts run
00:01 Check 1 still running
00:02 Check 1 still running
00:03 Check 1 still running
00:04 Check 1 still running
00:05 Check 1 is killed
00:06 Check 2 starts run
00:07 Check 2 is still running

We have a php script and added a sleep of 10 minutes . We added a time check function which check difference of modification time of pid file and current time. If time difference is 5 or more then script will kill process and reemove pid file. But I call the script using sensu check it wait for to complete sleep time of 10 minutes and then run again. I was expecting that sensu check run the script once a minute and dont care about the status of command. Just pick script from start every minutes and follow the logic in it.

#!/bin/bash
#set -o xtrace
#set -x
#trap read debug

DOMAIN_NAME=app14.abc.com
DB_VERSION=2019080901
CRON_PATH=/var/www/$DOMAIN_NAME/scrybe/amfphp1_9/services/db_services_$DB_VERSION/cronjobs
JOB_NAME=CronJobsScheduleMonitor
#PID_PATH=/opt/convo/CronJobsScheduleMonitor/$JOB_NAME.pid
PID_PATH=/opt/convo/CronJobsScheduleMonitor
PID_FILE=$PID_PATH/$JOB_NAME.pid
echo “$PID_PATH”
echo “$PID_FILE”
date >> /tmp/time.txt
PID_FILE_EXISTS=""
PROCESS_RUNNING=""

diff_time()
{
FTIME=(date -r CronJobsScheduleMonitor.pid "+%M") CTIME=(date “+%M”)
echo “$FTIME”
echo "CTIME" ETIME=( expr $CTIME - $FTIME )
echo $ETIME
}

start_job(){

#sudo runuser -l apache -s /bin/bash -c "php72 $CRON_PATH/monitoring/CronJobsScheduleMonitor.php" &
sudo runuser -l apache -s /bin/bash -c "php72 $CRON_PATH/monitoring/CronJobsScheduleMonitor.php /tmp/logly.log 2>&1  &"
#sudo runuser -l apache -s /bin/bash -c "php72 $CRON_PATH/monitoring/CronJobsScheduleMonitor.php & "
PID=$!
#PID=$(ps -aux | grep CronJobsScheduleMonitor | grep ^apache | awk '{print $2}')
echo "Started with  PID: $PID"
echo "start new:" $(date) >> /tmp/time.txt
echo $PID > "$PID_FILE"
sleep 5s
CURRENT_PID2=$(ps -aux | grep CronJobsScheduleMonitor | grep ^apache | awk '{print $2}')
if [ -z "$CURRENT_PID2" ];
then
  rm "$PID_FILE"
  echo "OK: All Cronjobs are Working Fine."
  exit 0
else
  echo "Cronjobs are still running. Please check cronjobs. "
  exit 1
fi

}

Check if PID file exists

if [ -f “$PID_FILE” ]; then
PID_FILE_EXISTS=“YES”
else
PID_FILE_EXISTS=“NO”
fi

Check if Proceess exists

PROCECSS_ID=$(ps -aux | grep CronJobsScheduleMonitor | grep ^apache | awk ’ {print $2} ')
if [ -z “$PROCECSS_ID” ]; then

PROCESS_RUNNING=“NO”
else
PROCESS_RUNNING=“YES”
fi

echo “$PROCESS_PID”

if [[ $PID_FILE_EXISTS == “YES” && $PROCESS_RUNNING == “NO” ]];then
echo “PID_FILE exist but Process was not Running.” >> /tmp/time.txt
rm “$PID_FILE”
#nohup php -c /etc/php.ini “$worker_name”.php queue_name 1>> "/opt/scripts/appworker/logs/{worker_name}_console.log" 2>> “/opt/scripts/appworker/logs/${worker_name}_error.log” < /dev/null &
start_job
fi

if [[ $PID_FILE_EXISTS == “YES” && PROCESS_RUNNING == "YES" ]];then echo "PID_FILE and PROCESS_ID both exist, Do calculate Grace Time." >> /tmp/time.txt GRACE_TIME=(diff_time)
echo “Grace Time: $GRACE_TIME”
#GRACE_TIME=3
if [ "GRACE_TIME" -gt 4 ]; then #PID1=(ps -aux | grep CronJobsScheduleMonitor | grep ^apache | awk ’ {print $2} ')
echo “Process ID is: $PROCESS_ID”
kill -9 $PROCESS_ID
rm “$PID_FILE”
echo “Run After Killing PID”
start_job
else
echo "Cronjobs are still running. Please check cronjobs. "
exit 1
fi

fi

if [[ $PID_FILE_EXISTS == “NO” && $PROCESS_RUNNING == “NO” ]];then
echo “Both PID_FILE and PROCESS_ID do not exist.” >> /tmp/time.txt
start_job
fi

if [[ $PID_FILE_EXISTS == “NO” && PROCESS_RUNNING == "YES" ]];then echo "PID_FILE does not exist but Process is Running. >> /tmp/time.txt #PID2=(ps -aux | grep CronJobsScheduleMonitor | grep ^apache | awk ’ {print $2} ')
echo “Process ID is: $PROCEESS_ID”
kill -9 $PROCESS_ID
start_job
fi

My question now becomes, if you don’t care about the status of the command, then why have Sensu run it as a check? You can run a check in Sensu for one or two reasons, to do a service check (you care about the exit status of the command) and/or for metrics collection.

From the sounds of this, you’d be just as well to let cron schedule and run the job for you, unless I’m missing some other reason for having Sensu run it.

Actually I just want to ensure that no matter status or output of command, ideally it should run smoothly every minute and if there is some problem them must start forcefully after 5 minutes.

Then set the check’s timeout value to 300 (5 minutes). It will kill a check that has taken over five minutes. And will start again at the next interval.

After timeout 300 s, if sensu kill the check, how can we say that script or command running by this check will also be killed and start new.

Any processes spawned by the check should be terminated when the check process is terminated due to timeout.