What to do when congestion occurs?
There are 6 kinds of congestion. check
vmware provided the following script to check
Script should be one line, copy and paste first in notepad
while true; do echo "================================================"; date; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F \: '{print $2}');plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F \: '{print $2}');llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');allGibTotal=$(expr $llogTotal \+ $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;echo " LLOG consumption: $llogGib";echo " PLOG consumption: $plogGib";echo " Total log consumption: $allGibTotal";done;sleep 30; done
Usually we seen this when there are resync going on. Therefore, check if there is any resync going on esxcli vsan debug resync summary get
If there is resync going on then the option recommended is to throttle the resync rate
This will help to reduce the congestion, but will make the resync longer.
If there is no resync then check the /var/log/vobd.log for sign of errors on the affected disk (e.g. possibly the disk is going bad)
Especially is you see unrecoverable medium or checksum error like the example below, open a ticket with VMware for assistance
2019-08-19T18:05:01.423Z: [VsanCorrelator] 8669259187358us: [vob.vsan.dom.unrecoverableerror] Virtual SAN detected an unrecoverable medium or checksum error for component e59a0c5d-6487-b65c-bb20-0cc47a66699c on disk group 520e027b-9a10-c812-9c52-4ee7ab80ff5b.
2019-08-19T18:05:01.423Z: [VsanCorrelator] 8669255481677us: [esx.problem.vob.vsan.dom.unrecoverableerror] vSAN detected an unrecoverable medium or checksum error for component e59a0c5d-6487-b65c-bb20-0cc47a66699c on disk group 520e027b-9a10-c812-9c52-4ee7ab80ff5b.
2019-08-19T18:06:18.549Z: [VsanCorrelator] 8669336313787us: [vob.vsan.dom.unrecoverableerror] Virtual SAN detected an unrecoverable medium or checksum error for component df46c35c-62e3-f0ad-ef4f-0cc47a62c1a0 on disk group 5221cada-60b6-779a-1ac4-f765355fdb0e.
Comments