af_smc man page
AF_SMC — Sockets for SMC communication
#define AF_SMC 43
tcp_socket = socket(AF_SMC, SOCK_STREAM, 0);
Shared Memory Communication via RDMA (SMC) is a socket over the RDMA communication protocol that allows existing TCP socket applications to transparently benefit from RDMA when exchanging data over an RDMA over Converged Ethernet (RoCE) network. Those networks are not routable. SMC provides host-to-host direct memory access without traditional TCP/IP processing overhead. SMC offers preservation of existing IP topology and IP security, and introduces minimal administrative and operational changes. The exploitation of SMC is transparent to TCP sockets applications.
The new address family AF_SMC supports the SMC protocol on Linux. It keeps the address format of AF_INET sockets and supports streaming socket types only.
Two usage modes are possible:
- AF_SMC native usage
exploits the socket domain AF_SMC instead of AF_INET.
- usage of AF_INET socket applications with SMC preload library
converts AF_INET sockets to AF_SMC sockets. The SMC preload library is part of the SMC tools package.
SMC capability of sockets is negotiated during connection setup. If one peer is not SMC capable, further socket processing falls back to TCP usage automatically.
Implementation details: Links and Link Groups
To run RDMA traffic to a peer, a so-called link is established between a local RoCE card and a remote RoCE card. To enhance availability, you can configure alternate links with automatic fail over. Primary and backup links to a certain peer are combined in a so-called link group. To guarantee untroubled RDMA traffic, the number of connections per link group and the number of send requests in flight per link are limited.
To enable SMC, smc-code must be active: If smc is not part of the kernel image, load the separate smc module (use "modprobe smc").
RoCE adapter mapping: Creation of a pnet table
The SMC protocol requires grouping of multiple physical networks - standard Ethernet and RoCE networks. Such groups are called Physical Networks (PNets). For SMC, RoCE Adapter mapping is configured within a table called pnet table. Any available Ethernet interface can be combined with available RDMA-capable network interface cards (RNICs), if they belong to the same Converged Ethernet fabric. To configure RoCE Adapter mapping, you must create a pnet table. Modify the table with the smc-tools command smc_pnet.
For details call "man smc_pnet".
Important sysctl settings
SMC-R supports IPv4 only. Use this sysctl setting: net.ipv6.bindv6only=1
Displaying SMC socket state information
SMC socket state information can be obtained with the smc-tools command smcss. For details call "man smcss".
Starting a TCP application to work over SMC
To start an existing TCP application to work through SMC, use the SMC preload library. The SMC Tools package provides a script called "smc_run" to convert AF_INET socket calls to AF_SMC socket calls by means of the preload technique.
This command-line example starts an FTP client over SMC.
MTU and Infiniband data transfer
Infiniband traffic may use MTU values 256, 512, 1024, 2048, or 4096. SMC determines the configured MTU size of the RoCE Ethernet port, announces this MTU size to the peer during connection start, and chooses the minimum MTU size of both peers.
Copyright IBM Corp. 2016, 2017 Published under the terms and conditions of the EPL (eclipse public license).
socket(2), ip(7), tcp(7), socket(7) smc_run(8) smcss(8) smc_pnet(8)
- AF_SMC, version 1.0.0
smc_pnet(8), smc_run(8), smcss(8).