top of page

vSAN Troubleshooting

dotsincloud

What to do when congestion occurs?

 

  • There are 6 kinds of congestion. check

 

  • vmware provided the following script to check


Script should be one line, copy and paste first in notepad

 

while true; do echo "================================================"; date; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F \: '{print $2}');plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F \: '{print $2}');llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');allGibTotal=$(expr $llogTotal \+ $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;echo " LLOG consumption: $llogGib";echo " PLOG consumption: $plogGib";echo " Total log consumption: $allGibTotal";done;sleep 30; done

 

  • Usually we seen this when there are resync going on. Therefore, check if there is any resync going on esxcli vsan debug resync summary get

 

  • If there is resync going on then the option recommended is to throttle the resync rate

 

 

This will help to reduce the congestion, but will make the resync longer.

 

If there is no resync then check the /var/log/vobd.log for sign of errors on the affected disk (e.g. possibly the disk is going bad)

 

  • Especially is you see unrecoverable medium or checksum error like the example below, open a ticket with VMware for assistance

 

2019-08-19T18:05:01.423Z: [VsanCorrelator] 8669259187358us: [vob.vsan.dom.unrecoverableerror] Virtual SAN detected an unrecoverable medium or checksum error for component e59a0c5d-6487-b65c-bb20-0cc47a66699c on disk group 520e027b-9a10-c812-9c52-4ee7ab80ff5b.

2019-08-19T18:05:01.423Z: [VsanCorrelator] 8669255481677us: [esx.problem.vob.vsan.dom.unrecoverableerror] vSAN detected an unrecoverable medium or checksum error for component e59a0c5d-6487-b65c-bb20-0cc47a66699c on disk group 520e027b-9a10-c812-9c52-4ee7ab80ff5b.

2019-08-19T18:06:18.549Z: [VsanCorrelator] 8669336313787us: [vob.vsan.dom.unrecoverableerror] Virtual SAN detected an unrecoverable medium or checksum error for component df46c35c-62e3-f0ad-ef4f-0cc47a62c1a0 on disk group 5221cada-60b6-779a-1ac4-f765355fdb0e.

 
 
 

Recent Posts

See All

Observability vs Monitoring

Observability is the practice of taking logs, traces, and metrics, and doing something with the data. ✅ Metrics: Collecting time series...

GIT Primer

"How to use Git to Make Changes to Code" Scenario: The scenario is that Development is working on an application called "ACME". We are...

Comments


Post: Blog2_Post
  • LinkedIn

©2021 by Dots in Cloud. Proudly created with Wix.com

bottom of page