Do not use VMBUS ID to identify network devices on Azure VMs
In this article, we discuss why one shouldn’t use VMBUS ID to identify specific network devices on an Azure VM. A more reliable way to identity NICs consistently are the MAC addresses in combination with the Ethernet interface names.
As an example, we look at an Azure Ubuntu 18.04-LTS VM with 4 network interfaces and show that VMBUS IDs change when VM is stopped/deallocated and restarted.
You can see the details and a simple Bicep template used to create the VM in this this GitHub repo folder: https://github.com/arsenvlad/bicep-examples/tree/main/vm-multinic
When an Azure VM is stopped/deallocated in Azure Portal or via Azure CLI command like az vm deallocate and is later restarted, the VM will likely land on a different host. It is expected that the new host will expose different VM channels, in any order that it chooses, resulting in different VMBUS ID visible to the guest OS of the VM. This can also happen when the VM is automatically moved from one host to another even without you explicitly deallocating and restarting it. This can happen when the Azure platform is “service healing” the VM by moving it from a failed to a healthy host, in which case the platform will recreate the VM on the new host and the guest OS will be booted fresh. On the other hand, if Azure is able to perform Live Migration of the VM between hosts, it should be transparent to the guest OS and VMBUS IDs will remain the same.
Therefore, it is not reliable to use VMBUS ID values to identify a specific network device within your application. Instead, you should use the NIC MAC addresses in combination with Ethernet device names like eth0, eth1, eth2, etc. The MAC addresses are preserved across VM stop/deallocate/restart operations and should be used as the primary way to identify NICs. The order of the Ethernet device names may often match the sequence of the NICs attached to the VM but it is not guaranteed (it depends on multiple things including the kernel version/configuration and the timing of device enumeration and probing during boot).
In addition, most applications should not use the PCI network devices directly unless it is a very specialized workload with specific reasons to use the PCI devices directly — even DPDK workloads should use the NetVSC PMD (or previously recommended failsafe PMD when NetVSC was still experimental in 2018). Instead, the applications should always use the master synthetic network device that has an IP assigned to it.
Below, we look at an example using Azure Ubuntu 18.04-LTS VM with 4 network interfaces.
Network interfaces on the VM
We can use
ip addrto list network interfaces on the VMs. In this case, the sequence of the Ethernet interface names eth0, eth1, eth2, and eth3 maps to the order that NICs are attached to the VM. This order is not guaranteed (it depends on multiple things including the kernel version/configuration and the timing of device enumeration and probing during boot).
Because the VM was created with Accelerated Networking enabled, we can see the Ethernet controller devices using
lspci. However, we cannot reliably use the PCI device ids since they will change if VM is stopped-deallocated or live-migrated to another host.
We can also see the mapping between the bus info and the PCI devices via
lshw -c network -businfo
Non-verbose output of
lsvmbus looks like the following and does not include Device_ID, but shows that there are 4
Synthetic network adapters:
Verbose output of
lsvmbus -vv looks like the following and includes the
Sysfs path values for each of the devices:
Ethernet interface name mapping
By taking the
Sysfs path of the
Synthetic network adapter lines from the
lsvmbus -vv, we can lookup the Ethernet interface name assigned with it by looking in the
We can more easily see the mapping between Ethernet interface names (i.e., eth0, eth1, eth2, eth3) and the
Device_ID above using
ls -la /sys/class/net:
Different values after VM is deallocated and started
Since physical bus ids are going to change when VM moves to a different host (i.e., after deallocate and restart), the application should not use the bus id directly, but instead should use the MAC addresses and Ethernet interface names like eth0, eth1, eth2, eth3 and can lookup the device ids and bus info if required.
After stopping and restarting the VM in Azure portal, we can see that the bus info and PCI device ids changed:
We also see that there may be additional VMBUS IDs appearing and changing the
Synthetic network adapter values:
However, the mapping between Ethernet interface names (i.e., eth0, eth1, eth2, eth3) to physical device still allows us to properly identify each of the NICs
ls -la /sys/class/net:
Please also see the Troubleshoot Linux device names changes document describing similar considerations for data disks.
Please leave feedback and questions below or on Twitter https://twitter.com/ArsenVlad