VCF 4.5.1 to 5.0.0.1 Upgrade Notes

I recently upgraded a customer from VMware Cloud Foundation 4.5.1 to 5.0.0.1. The upgrade went well in the end, but I had some issues along the way that I would like to share in this quick post.

  1. When upgrading SDDC Manager, we get a nice status page telling us what it is currently doing. Suddenly the following disturbing message popped up:

    "Retrieving update detail failed. VCF services are not available.Unable to retrieve aggregated upgrade details: Failed to request http://127.0.0.1/inventory/domains api - undefined"

    Checking /var/log/vmware/vcf/lcm/lcm.log and lcm-debug.log didn’t give me any other clues than that the services were probably being restarted as part of the upgrade, so after refreshing my browser a couple of times the message went away.
     
  2. About 30 minutes into the NSX upgrade, the following error message popped up:

    "bgo-c01-ec01 in bgo-c01 domain failed upgrade at Nov 29, 2023, 9:23:41 AM. Please resolve the above upgrade failure for this bundle before applying any other available bundle."

    Checking the task in SDDC Manager gave me some more details:

    "bgo-c01-ec01 - NSX upgrade precheck timedout. Check for errors in the LCM log files at 127.0.0.1:/var/log/vmware/vcf/lcm, and address those errors. Check if the SDDC Manager is able to communicate with NSX Manager. If not, login to NSX and check if upgrade is running and wait for the completion. Please run the upgrade precheck and restart the upgrade."

    I logged into NSX Manager and did a health check without finding any problems. Checking the Upgrade page showed that the Edge precheck was still running with status "Checked 2 of 2". I let this run for several hours but it never finished. Manually stopping the precheck also never finished, so I rebooted all the NSX appliances to cancel it. I then retried the NSX upgrade but the same error happened again after about 30 minutes.

    After some research I found VMware KB91629, but it did not apply to my environment as I could not find "certificate expired" in /var/log/upgrade-coordinator/logical-migration.log and my certificate was still valid for 98 years. After talking to VMware Support we did the workaround in the KB anyway, and this made the NSX upgrade move on and complete successfully.
     
  3. Logging in to NSX Manager after the upgrade completed showed me 27 alarms about expired certificates. I quickly found that VMware KB93296 matched my environment so I contacted VMware Support. They instructed me to use the following doc to replace the certificates so not sure why the KB instructs us to contact them, but it could be that they want to make sure that only certain certifcates that can be safely replaced have expired: https://docs.vmware.com/en/VMware-NSX/4.1/administration/GUID-50C36862-A29D-48FA-8CE7-697E64E10E37.html#GUID-50C36862-A29D-48FA-8CE7-697E64E10E37

Hopefully you won’t run into these issues at all, but if you do, perhaps this post can help you move on a bit faster on your road towards VCF 5.0.

Leave a comment