Infiniband Special Ops?

Home » CentOS » Infiniband Special Ops?
CentOS 4 Comments

Hi guys.

Hoping some net experts my stumble upon this message, I have an IPoIB direct host to host connection and:

-> $ ethtool ib1
Settings for ib1:
    Supported ports: [  ]
    Supported link modes:   Not reported
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: Not reported
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: Not reported
    Speed: 40000Mb/s
    Duplex: Full
    Auto-negotiation: on
    Port: Other
    PHYAD: 255
    Transceiver: internal
    Link detected: yes

and that’s both ends, both hosts, yet:

> $ iperf3 -c 10.5.5.97
Connecting to host 10.5.5.97, port 5201
[  5] local 10.5.5.49 port 56874 connected to 10.5.5.97 port
5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.36 GBytes  11.6 Gbits/sec    0  
2.50 MBytes
[  5]   1.00-2.00   sec  1.87 GBytes  16.0 Gbits/sec    0  
2.50 MBytes
[  5]   2.00-3.00   sec  1.84 GBytes  15.8 Gbits/sec    0  
2.50 MBytes
[  5]   3.00-4.00   sec  1.83 GBytes  15.7 Gbits/sec    0  
2.50 MBytes
[  5]   4.00-5.00   sec  1.61 GBytes  13.9 Gbits/sec    0  
2.50 MBytes
[  5]   5.00-6.00   sec  1.60 GBytes  13.8 Gbits/sec    0  
2.50 MBytes
[  5]   6.00-7.00   sec  1.56 GBytes  13.4 Gbits/sec    0  
2.50 MBytes
[  5]   7.00-8.00   sec  1.52 GBytes  13.1 Gbits/sec    0  
2.50 MBytes
[  5]   8.00-9.00   sec  1.52 GBytes  13.1 Gbits/sec    0  
2.50 MBytes
[  5]   9.00-10.00  sec  1.52 GBytes  13.1 Gbits/sec    0  
2.50 MBytes
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  16.2 GBytes  13.9 Gbits/sec
0             sender
[  5]   0.00-10.00  sec  16.2 GBytes  13.9
Gbits/sec                  receiver

It’s rather an oldish platform which hosts the link, PCIe is only 2.0 but with link of x8 that should be able to carry more than ~13Gbits/sec. Infiniband is Mellanox’s ConnectX-3.

Any thoughts on how to track the bottleneck or any thoughts I’ll appreciate much. thanks, L

4 thoughts on - Infiniband Special Ops?

  • Care to capture (a few seconds) of the *sender* side .pcap?
    Often TCP receive window is too small or packet loss is to blame or round-trip-time. All of these would be evident in the packet capture.

    If you do multiple streams with the `-P 8` flag does that increase the throughput?

    Google says these endpoints are 1.5ms apart:

    (2.5 megabytes) / (13 Gbps) 1.53846154 milliseconds

  • If you want to test the infiniband performance you ib_write_bw for example not iperf.

    IPoIB will always be quite a bit slower than native IB.

    That said, if you want to optimize IPoIB for performance make sure you’re running connected mode not datagram mode.

    /Peter

  • bitrate goes down even further when CPUs are fully loaded &
    occupied.
    (I’ll try to keep on investigating)

    What I’m trying next is to have both ports(a dual-port card)
    “teamed” by NM, with runner set to broadcast. I’m leaving out “p-key” which NM sets to “default”(which is working with a “regular” IPoIP connection)
    RHEL’s “networking guide” docs say “…create a team from two or more Wired or InfiniBand connections…”
    When I try to stand up such a team, master starts but slaves, both, fail with:
    “…
      [1611588576.8887] device (ib1): Activation: starting connection ‘team1055-slave-ib1’
    (900d5073-366c-4a40-8c32-ac42c76f9c2e)
      [1611588576.8889] device (ib1): state change:
    disconnected -> prepare (reason ‘none’, sys-iface-state:
    ‘managed’)
      [1611588576.8973] device (ib1): state change:
    prepare -> config (reason ‘none’, sys-iface-state: ‘managed’)
      [1611588576.9199] device (ib1): state change: config
    -> ip-config (reason ‘none’, sys-iface-state: ‘managed’)
      [1611588576.9262] device (ib1): Activation:
    connection ‘team1055-slave-ib1’ could not be enslaved
      [1611588576.9272] device (ib1): state change:
    ip-config -> failed (reason ‘unknown’, sys-iface-state:
    ‘managed’)
      [1611588576.9280] device (ib1): released from master device nm-team
      [1611589045.6268] device (ib1): carrier: link connected
    …”

    Any suggestions also appreciated. thanks, L

  • I have never played with Infinibad, but I think that those cards most probably allow some checksum offloading capabilities. Have you explored in that direction and test with checksum in ofloaded mode ?

    Best Regards, Strahil Nikolov

    В 15:49 +0000 на 25.01.2021 (пн), lejeczek via CentOS написа: