vCenter Server blocked by NSX firewall

Recently I had a customer calling me with panic in his voice. He had managed to create a rule in NSX where sources and destinations were both any, and action was set to drop. This rule was added high up in the rule set so almost all their workloads were blocked from the network, including their vCenter Server. This environment was still running NSX for vSphere (NSX-V) where firewall rules are managed using the NSX plugin in vCenter Server, so he couldn’t fix the rule.

Since I have been working with NSX for many years, I am aware of this risk and knew exactly how to solve it. VMware has a KB (2079620) addressing this issue so we followed that and got the problem fixed in a few minutes. We used a REST API client and ran a call against their NSX Manager to roll back the distributed firewall to its default firewall rule set. This means one default Layer3 section with three default allow rules and one default Layer2 section with one default allow rule. This restored access to the network for all workloads including the vCenter Server appliances. Then we simply logged into vCenter Server and loaded an autosaved firewall configuration from a time before they made the error. We also made sure to add their vCenter Server appliances to the Exclusion List in NSX to avoid getting into this situation again in the future. The NSX Manager appliance is added to the Exclusion List automatically, but you can’t log in directly to NSX Manager GUI in NSX-V to edit the firewall configuration. Note that it may be a good idea to keep vCenter Server off the Exclusion List to be able to secure it with the firewall, but then you need to make sure you don’t make the same mistake as this customer did.

It is possible to retrieve the existing firewall configuration using the following API call:

GET /api/4.0/firewall/globalroot-0/config

This can be useful if you don’t trust that you have a valid autosaved firewall configuration to restore after resetting it. You can also use this to fix the exact rule locking you out instead of resetting the entire configuration, but I will not go into details on how to do that here.

This problem could also happen with NSX-T, but vCenter Server is not where you manage firewall rules in NSX-T, that is done directly in NSX Manager. According to VMware, NSX-T automatically adds NSX Manager and NSX Edge Node virtual machines to the firewall exclusion list. I have been checking all my NSX Managers, currently three separate instances, and none of them display the NSX Managers in the System Excluded VMs list, only the Edge Nodes like you can see in the screen shot below.

Exclusion List 
User Excluded Groups 
bgo-lab-edge-01 
bgo-lab-edge-02 
bgo-lab-tkgi-edge-01 
osl-lab-edge-01 
osl-lab-edge-02 
osl-lab-edge-03 
OSI-lab-edge-05 
osl-lab-edge-07 
System Excluded V Ms 
bgo-lab-esx-OS.nolab.local 
bgo-lab-esx-04.nolab.local 
bgo-ldb-esx-01 .nolab.local 
osl-mgmt-esx-02.nolab.local 
osl-mgmt-esx-03.nolab.local 
osl-mgmt-esx-01.nolab.local 
osl-mgmt-esx-02.nolab.local 
osl-mgmt-esx-02.nolab.local 
Tags 
Operating System 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Ubuntu Linux (64-bit) 
Filter by Name. Path and more 
Running 
Running 
Running 
Running 
Running 
Running 
Running 
Running

I have been trying to retrieve the exclusion list from the REST API, to see if the Managers are listed there, but so far, I have not been successful. My API calls keeps getting an empty list every time, so I am still investigating how to do this.

I also tried the following CLI command on the NSX Managers, but it lists the same content as the GUI:

get firewall exclude-list

I have been able to confirm that none of the NSX Manager VMs have any firewall rules applied by using the following commands on the ESXi hosts running the VMs, so they seem to be excluded, but I think it would be nice to actually see them on the list.

This is how we can verify if a VM is excluded from the distributed firewall. As you can see my NSX Manager appliance VM has no rules applied.

[root@bgo-mgmt-esx-01:~] summarize-dvfilter | grep -A 3 vmm
world 2130640 vmm0:bgo-mgmt-nsxmgr-01 vcUuid:'50 2b fe 43 98 6f d5 be-fe fd e3 eb 36 3e 17 1d'
 port 33554441 bgo-mgmt-nsxmgr-01.eth0
  vNic slot 2
   name: nic-2130640-eth0-vmware-sfw.2
--
world 4700303 vmm0:bgo-vrops-arc-01 vcUuid:'50 2b 40 6d 17 22 e0 48-d1 5b 31 c7 d6 30 48 04'
 port 33554442 bgo-vrops-arc-01.eth0
  vNic slot 2
   name: nic-4700303-eth0-vmware-sfw.2
--
world 8752832 vmm0:bgo-runecast-01 vcUuid:'50 2b 60 41 6b 35 e9 ca-e5 10 a6 57 95 2e f9 f7'
 port 33554443 bgo-runecast-01.eth0
  vNic slot 2
   name: nic-8752832-eth0-vmware-sfw.2
[root@bgo-mgmt-esx-01:~] vsipioctl getrules -f nic-2130640-eth0-vmware-sfw.2
No rules.
[root@bgo-mgmt-esx-01:~]

For comparison, this is how it looks like for a VM not being on the exclusion list:

[root@esxi-1:~] vsipioctl getrules -f nic-2105799-eth0-vmware-sfw.2
ruleset mainrs {
  # generation number: 0
  # realization time : 2021-03-11T12:58:27
  # FILTER (APP Category) rules
  rule 3 at 1 inout inet6 protocol ipv6-icmp icmptype 135 from any to any accept;
  rule 3 at 2 inout inet6 protocol ipv6-icmp icmptype 136 from any to any accept;
  rule 4 at 3 inout protocol udp from any to any port {67, 68} accept;
  rule 2 at 4 inout protocol any from any to any accept;
}

ruleset mainrs_L2 {
  # generation number: 0
  # realization time : 2021-03-11T12:58:27
  # FILTER rules
  rule 1 at 1 inout ethertype any stateless from any to any accept;
}

Since I have been talking about both NSX-V and NSX-T here I would like to remind you that NSX-V has end of general support 2022-01-16. It can be complex and time consuming to migrate from NSX-V to NSX-T so start planning today.

Thanks for reading.

3 thoughts on “vCenter Server blocked by NSX firewall

  1. Chris March 10, 2022 / 3:43 pm

    NSX-T FW policy apply to segment, NSX-V FW policy apply to pg

    Like

      • Chris March 10, 2022 / 11:52 pm

        so you don’t need add vcenter to exclude-list, if no connect to segment, that you can change default policy any any allow to drop.

        Like

Leave a comment