Soft-RoCE Requires a Specific IPv6 Address

Problem

So you’ve successfully set up Soft-RoCE. rdma_rxe is loaded into your kernel. All the tools are reporting that you have an RDMA device:

[root@localhost ~]# rdma link
link rxe0/1 state ACTIVE physical_state LINK_UP netdev eth0
[root@localhost ~]# ibv_devinfo
hca_id:	rxe0
	transport:			InfiniBand (0)
	fw_ver:				0.0.0
	node_guid:			5054:00ff:fe52:8b53
	sys_image_guid:			5054:00ff:fe52:8b53
	vendor_id:			0xffffff
	vendor_part_id:			0
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		1024 (3)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

And now you’re trying to verify RDMA is working or validate RXE is working. Except, it isn’t.

Does your ibv_rc_pingpong (and friends) hit errors like:

[root@localhost ~]# ibv_rc_pingpong -g 0 -d rxe0 -i 1
  local address:  LID 0x0000, QPN 0x000028, PSN 0x0096f2, GID fe80::5054:ff:fe52:8b53
Failed to modify QP to RTR
Couldn't connect to remote QP
[root@localhost ~]# ibv_rc_pingpong -g 0 -d rxe0 -i 1 10.0.0.234
  local address:  LID 0x0000, QPN 0x000026, PSN 0x5137fe, GID fe80::5054:ff:fe7a:e08d
client read/write: No space left on device
Couldn't read/write remote address

Does your qperf fail with errors like:

[root@localhost ~]# qperf 10.0.0.234 ud_bw ud_lat
ud_bw:
failed to create address handle: Invalid argument

or

[root@localhost ~]# qperf 10.0.0.234 rc_bw
rc_bw:
failed to modify QP to RTR: Invalid argument
server: failed to modify QP to RTR: Invalid argument

Does your rping mysteriously work totally fine?

[root@localhost ~]# rping -s -a 10.0.0.234
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10
[root@localhost ~]# rping -c -a 10.0.0.234 -C 4 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu

Solution

It’s because rdma-core depends on having the eui64 encoded version of the MAC address registered as an IPv6 link local address on the same interface. See rdma_rxe usage problem on the linux-rdma mailing list.

Rather than go through the manual steps outlined therein with ipv6calc, the easy way to resolve this is just to get linux to do it for you:

ip link set dev eth0 addrgenmode eui64
ip link set dev eth0 down
ip link set dev eth0 up

To make this persist across reboots, use /etc/sysctl.d/ (or however your distro configures sysctls):

net.ipv6.conf.default.addr_gen_mode = 1
net.ipv6.conf.eth0.addr_gen_mode = 1

See discussion of this page on Reddit, HN, and lobsters.