Ruby runtime, file already exists

Every once in a while we have this error message on our agents. It happens when we install agent to a new host and it starts to download required assets.

Error message is:

error getting assets for check: error extracting asset: reading file in tar archive: file already exists:
/var/cache/sensu/sensu-agent/2d7800432f90625a02aec4a10b084bc72e253572970694e932b5ccdc72fb30f5cf91ed4b51f90942965df5228e521b8f5f06da3d52b886b172ba08d4130251dc/bin/ruby

It seems that several assets request ruby runtime asset and unpack it at the same time. One succeeds and other fail and stay failed as they think that they need to unpack the asset.

We are able to resolve this removing already uncompressed ruby runtime to be able to get it probably re-unpacked.

Agent (and backend) is 6.5.4. Hosts are

Thanks in advance, Timo Waltari

hey,

So the checks never actual run? That’s weird.
I would have expected that the checks run successful after at least one of the initial asset downloads complete all the way to unpacking the asset.

While the race to download the same asset build multiple times is no bueno for sure. I would have thought that once one of those concurrent initial downloads finished then any check that wants that asset would see it was installed and ready to use on subsequent scheduled runs of the check.

Thanks for your reply! And you are correct. For one particular host there are 8 checks in unknown status complaining about not being able to extract ruby-runtime. In folder /var/cache/sensu/sensu-agent/2d7800432f90625a02aec4a10b084bc72e253572970694e932b5ccdc72fb30f5cf91ed4b51f90942965df5228e521b8f5f06da3d52b886b172ba08d4130251dc/ there is only bin and lib folders and not include or share as there should be.

Other non-ruby-runtime-requiring checks are running fine. And restarting agent does not fix the problem.

From first sensu-agent.log I found that before first “file already exists” message here is only 4 lines of log:

Dec 3 08:20:59 hostname sensu-agent: {“component”:“agent”,“level”:“warning”,“msg”:“signal "terminated" received, shutting down agent”,“time”:“2021-12-03T08:20:59+02:00”}
Dec 3 08:20:59 hostname sensu-agent: {“component”:“agent”,“level”:“warning”,“msg”:“not retrying to connect”,“time”:“2021-12-03T08:20:59+02:00”}
Dec 3 08:20:59 hostname sensu-agent: {“component”:“agent”,“error”:“Connection closed: websocket: close 1001 (going away)”,“level”:“error”,“msg”:“transport receive error”,“time”:“2021-12-03T08:20:59+02:00”}
Dec 3 08:21:00 hostname sensu-agent: {“component”:“agent”,“level”:“warning”,“msg”:“agent did not shut down gracefully”,“time”:“2021-12-03T08:21:00+02:00”}

Time stamps of the folder match

drwxr-xr-x. 2 sensu sensu 78 Dec 3 08:20 bin
drwxr-xr-x. 3 sensu sensu 17 Dec 3 08:20 lib

hmmm… is the filesystem where the cache directory full on that agent?

The only thing I know of from personal experience that would result in this situation is if the unpack failed due to a file system limit.
I’m going to go back and test this specific situation with many checks trying to download the same thing to be sure its not something else. But my gut tells me me a filesystem filled up

No.

df -h .

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-var 7.0G 1.1G 5.9G 16% /var