Skip to content

non deterministic NIC order in multihomed instance with both static and dynamic network #2596

@gberche-orange

Description

@gberche-orange

Describe the bug

we're observing that some bosh deployments with an instance groups having two networks, end up having non determinist NIC order allocation (between eth0 and eth1) at initial deployment, on the index 1. Then, this order remains the same after any bosh recreate of the instance group.

This impacts the following use-cases which can't rely on a deterministic interface name (e.g. eth0) or a given network

keepalived.interface:
description: interface keepalived will use to mount the VIP. If set to 'auto', uses the default interface on the VM
default: auto

  • (We're fetching history and retesting to understand why we have disabled auto)

We have been comparing logs of two bosh deployments with the same manifest network specs such as the following (including cloud-config network ordering):

  name: proxy
  instances: 2
  networks:
    - name: tf-net-osb-data-plane-shared-pub2
      static_ips:
        - 10.xx.yy.189
        - 10.xx.yy.190
    - default:
        - dns
        - gateway
      name: tf-net-osb-data-plane-shared-priv
  stemcell: default 

The difference in logs during a bosh recreate is limited to the

  • DEBUG -- DirectorJobRunner: Fetching existing instance for: #<Bosh::Director::Models::Instance @values= which shows that the current instance networks are fetched from the agent settings and returned with a different order
    • the agent_settings.json have indeed a different order in the two instances of the instance group
  • Creating instance network reservations from database for instance (See sources) which list the ip_addresses in a different order
  • cpi call and response to create_vm which have network in different order

Looking into the bosh database instances table, the spec_json have a diverging order of networks for the two instances.

Is there a way to make the network interface assignment (eth0/eth1) deterministic for a new deployment ?

Thanks in advance for your help !

To Reproduce

See above manifest fragment that triggered the problem

Steps to reproduce the behavior (example):

  1. Deploy a bosh director on vsphere-cpi
  2. Deploy
  3. Check eth0/eth1 ordering

Expected behavior

Systematic determinist ordering of eth0/eth1

Versions (please complete the following information):

  • Infrastructure: vsphere 97.0.15
  • BOSH version: 280.1.5
  • Stemcell version '1.631'

/CC @ogrand

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Waiting for Changes | Open for Contribution

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions