Buffered RDMA Communication
Send and receive operation have a computational cost that can become evident when messages are transferred at high rate. To limit the overhead NetIO-next implements an internal data coalescence system referred to as buffered, as opposed to the unbuffered mode described in Unbuffered RDMA Communication. In buffered mode, messages to be sent are copied to larger network buffers that are sent as an occupancy threshold is crossed or a timeout expires.
Advantages of buffered communication are:
the RDMA buffers are managed internally by NetIO. Buffered communication requires less setup and management code compared to unbuffered communication;
if the workload consists of many small messages (say less than a kilobyte), buffered communication is more efficient as less messages are transmitted. The reduced overhead can increase overall application performance.
Disadvantages are:
if the workload consists mainly of larger messages (over a few kilobytes) the copy operation can become costly performance-wise;
latency can also be affected, even though this can be mitigated setting a short timeout (e.g. 1 ms);
it could be more meaningful to buffer messages on the user side instead of internally in NetIO-next. This allows for more logical data message boundaries. For example, a database application could group individual data accesses into transactions, and send out the transactions as a single buffer.
Point-to-Point Communication
NetIO-next supports unidirectional buffered point-to-point communication using socket types:
send sockets (struct netio_buffered_send_socket): the sending side of a connection.
listen sockets (struct netio_buffered_listen_socket): listen for incoming connections and creates receive sockets to form connection pairs.
receive sockets (struct netio_buffered_recv_socket): the receiving side of a connection, created by a listen socket.
Socket Initialization
Buffered send and listen sockets are initialized via the following functions:
-
void netio_buffered_send_socket_init(struct netio_buffered_send_socket *socket, struct netio_context *ctx, struct netio_buffered_socket_attr *attr)
Initializes a buffered send socket.
- Parameters:
socket – The socket to initialize
ctx – The NetIO context object in which to initialize the socket
attr – Buffer attributes of the socket. Attributes need to match on the sending and receiving side of a socket
-
void netio_buffered_listen_socket_init(struct netio_buffered_listen_socket *socket, struct netio_context *ctx, struct netio_buffered_socket_attr *attr)
Initializes a buffered listen socket.
- Parameters:
socket – The socket to initialize
ctx – The NetIO context object in which to initialize the socket
attr – Buffer attributes of the socket. Attributes need to match on the sending and receiving side of a socket
The most important parameter in both calls is attr. The structure defines the attributes of the buffers - called netio pages - that are allocated internally. Users need to set these attributes in order to configure the send/listen socket. Note that both send and listen socket need to be configured with the same pagesize.
struct netio_buffered_socket_attr
{
unsigned num_pages;
size_t pagesize;
size_t watermark;
unsigned long timeout_ms;
};
Below an explanation of the parameters:
num_pages: the number of buffers to be allocated. There may be a hardware-defined limit on the number of buffers per socket (typically 256).
pagesize: the size of an individual buffer. This can be fine-tuned to optimize throughput. The default value in NetIO-next is 64 kB. Note that pagesize defines an upper limit for the maximum size of a message that can be transmitted on the buffered socket, i.e., no message larger than pagesize can be sent on the given socket.
watermark: if the buffer contains more bytes than defined in watermark, it will be flushed automatically.
timeout_ms: a timer is associated to each socket. After timeout_ms ms a partially filled buffer is flushed disregarding of its occupancy.
Besides the buffer configuration, users can configure multiple callbacks for send and listen sockets. For send sockets these are:
void (*cb_connection_established)(struct netio_buffered_send_socket* socket);
void (*cb_connection_closed)(struct netio_buffered_send_socket* socket);
void (*cb_error_connection_refused)(struct netio_buffered_send_socket* socket);
And for listen sockets these are:
void (*cb_connection_established)(struct netio_buffered_recv_socket* socket);
void (*cb_connection_closed)(struct netio_buffered_recv_socket* socket);
void (*cb_msg_received)(struct netio_buffered_recv_socket* socket, void* data, size_t size);
Note
The callback parameter for listen socket callbacks points to the receive socket object that is created by the listen socket. The listen socket object can be accessed via the listen_socket member of struct netio_buffered_recv_socket.
Establishing a Connection
After initializing a buffered listen socket, the listen socket needs to be set in listening mode:
-
void netio_buffered_listen(struct netio_buffered_listen_socket *socket, const char *hostname, unsigned port)
Bind the listen socket to an interface and port number and bring the listen socket to ‘listening’ state.
- Parameters:
socket – The buffered listen socket
hostname – A hostname, typically an IP address, which identifies the interface on which to bind
port – The port name to listen on
Buffered send socket can connect to sockets in listening state using
-
void netio_buffered_connect(struct netio_buffered_send_socket *socket, const char *hostname, unsigned port)
Connect a buffered send socket to a remote.
- Parameters:
socket – The buffered send socket
hostname – Hostname or IP address of the remote endpoint
port – Port number of the remote endpoint
Upon establishing the connection between send and receive socket successfuly, the callback cb_connection_established will be called on both sides of the connection. If the connection cannot be established, cb_error_connection_refused will be called on the sending side. Connections can be closed using
-
void netio_disconnect(struct netio_send_socket *socket)
Disconnect a connected unbuffered send socket.
- Parameters:
socket – A connected unbuffered send socket
If a buffered connection closes (by user request or due to a connection error), cb_connection_closed is called on both ends of the connection.
Sending and Receiving Data
Once a connection has been established, sending of messaged can be initiated with the following functions:
-
int netio_buffered_send(struct netio_buffered_send_socket *socket, void *data, size_t size)
Send a message on a buffered connection.
- Parameters:
socket – The buffered send socket
data – Pointer to message
size – Size of the message
- Returns:
NETIO_STATUS_TOO_BIG
The message is too big to fit in the internal buffers. Increasepagesize
in the buffer attributes.NETIO_STATUS_AGAIN
Socket is busy, no buffers are available. Try again laterNETIO_STATUS_OK
Message was successfully copied to internal buffers
-
int netio_buffered_sendv(struct netio_buffered_send_socket *socket, struct iovec *iov, size_t num)
Send a message on a buffered connection.
- Parameters:
socket – The buffered send socket
iov – Pointer to a scatter/gather buffer
num – Number of elements in the scatter/gather buffer
- Returns:
NETIO_STATUS_TOO_BIG
The message is too big to fit in the internal buffers. Increasepagesize
in the buffer attributes.NETIO_STATUS_AGAIN
Socket is busy, no buffers are available. Try again laterNETIO_STATUS_OK
Message was successfully copied to internal buffers
If bufferspace is available, the message will be copied to the socket internal buffers. If no buffer is available or the operation cannot be completed by the underlying libraries, NETIO_STATUS_AGAIN is returned. In this case, the user needs re-attempt transmission at a later time. In case no buffer is available, struct netio_buffered_send_socket contains a NetIO-next signal that is fired once a buffer is available again. The name of the signal attribute is netio_signal signal_buffer_available. Users can set the callback and user data members of the signal.
As the buffer occupancy exceeds the user-configurable watermark or the timeout expires the buffer is flushes i.e. sent to the remote endpoint. Buffers can also be flushed manually at any point:
-
void netio_buffered_flush(struct netio_buffered_send_socket *socket)
Flushes the current buffer of the given buffered send socket.
- Parameters:
socket – The buffered send socket
The receiving side of the buffered connection fires a callback for every message received. Note that typically multiple messages are packed into a single buffer interleaved by a 32-bit fields containing the message sizes, so the callback will be fired multiple times in a row for each message contained in the buffer. The callback can be set in the listen socket mentioned above.
Buffered Publish/Subscribe Communication
The Publish/Subscribe communication pattern is implemented also for buffered communication. As in the unbuffered case the publisher maintain an internal subscription table which contains connections to the various subscribers. Connection management is automatic, publishers do not need to connect to any subscribers. Subscriptions for multiple tags between the same publish socket and subscribe socket share the same connection.
Buffered publish and subscribe sockets are initialised with:
-
void netio_publish_socket_init(struct netio_publish_socket *socket, struct netio_context *ctx, const char *hostname, unsigned port, struct netio_buffered_socket_attr *attr)
-
void netio_subscribe_socket_init(struct netio_subscribe_socket *socket, struct netio_context *ctx, struct netio_buffered_socket_attr *attr, const char *hostname, const char *remote_host, unsigned remote_port)
Initializes a buffered subscribe socket.
See also
netio_buffered_send_socket_init
for a description of the connection parameters- Parameters:
socket – The buffered subscribe socket to initialize
ctx – The NetIO context in which to initialize the socket
attr – Buffered connection settings to be used for the underlying connections
hostname – Hostname or IP address of the local interface to bind to
remote_host – Hostname or IP of the remote publish socket
remote_port – Port of the remote publish socket
Note that subscribe sockets are bound to a specific publish socket and cannot subscribe to tags of any other sockets than the one indicated when initalising the socket.
To subscribe to a given fid, the subscribe socket can use netio_subscribe one or more times:
-
int netio_subscribe(struct netio_subscribe_socket *socket, netio_tag_t tag)
Subscribe to a given message tag.
For a given subscribe socket,
netio_subscribe
can be called multiple times.- Parameters:
socket – The buffered subscribe socket.
tag – The subscription tag.
Internally netio_subscribe establishes a connection to the publish socket and sends a subscription message. The publish socket then connects to the subscribe or use an existing connection, and register the subscription in its subscription table. Messages published under the given fid will subsequently be sent to the subscribe socket. It is possible to unsubscribe from a given tag using the API function
-
int netio_unsubscribe(struct netio_subscribe_socket *socket, netio_tag_t tag)
Unsubscribe from a given message tag.
For a given subscribe socket,
netio_unsubscribe
can be called multiple times.- Parameters:
socket – The subscribe socket.
tag – The tag to unsubscribe from.
when a client closes all its subscription the server closes the connection. First, the server sends a FI_SHUTDOWN via its send socket using netio_disconnect. In response, the client deallocates its receiving socket and calls netio_disconnect. The client’s call of netio_disconnect makes it deallocate its send socket and the server’s receiving socket.
Publishing of messages is done with the following API function:
-
int netio_buffered_publish(struct netio_publish_socket *socket, netio_tag_t tag, void *data, size_t len, int flags, struct netio_subscription_cache *cache)
Publishes a message under a given tag
- Parameters:
socket – The socket to publish on
tag – The tag under which to publish
data – Message data
len – Message size
flags – NETIO_REENTRY publishing of this message was attempted before and resulted in NETIO_STATUS_AGAIN. Calling publish with this flag will only send on connections where the message was previously unpublished.
cache – Optional user-supplied cache for the subscription table lookup.
The parameters socket, tag, data and len are self-explanatory and describe the buffered publish socket, message tag, and message. The flags parameter requires some explanation. When calling netio_buffered_publish for the first time, flags should be set to 0. NetIO-next then attempts to send the message to all connected and subscribed subscribers. If this succeeds for all connections, the call will return with NETIO_STATUS_OK. However, it may be that one or multiple connections are out of resources, and the underlying send call results in NETIO_STATUS_AGAIN. In this case, also the call to publish returns NETIO_STATUS_AGAIN and the user has to re-attempt publishing the message at a later time. NetIO-next keeps track of which connections the message was already sent successfully. But in order to avoid sending the same message again on some connections, in the subsequent call netio_buffered_publish has to be called with the NETIO_REENTRY flag set. The flag is removed again when the user moves on to sending the next message.
For each publication of a message, NetIO needs make a lookup in the subscription table for the given tag. This can be expensive. As an optimization, users can supply a netio_subscription_cache object as last parameter to the netio_buffered_publish call. This is used internally to cache the lookup for the given tag. Users need to supply a separate cache object for every tag.
The subscription cache object needs to be initialized using
-
void netio_subscription_cache_init(struct netio_subscription_cache *cache)
Initialize a netio_subscription_cache object.
- Parameters:
cache – The cache to be initialized
The user does not need to do any further operations on the object. Internally the object contains a timestamp, so NetIO-next will automatically detect changes to the subscriptions table and update the cache objects without user intervention. Using the subscription cache is optional and users must supply NULL for this argument if the cache is not used.
As is implied with buffered connections, published messages are not immediately transferred to the remote endpoint, but only when the buffer reaches a certain fill-level or the timeout expires. To enforce immediate sending of all buffers for a certain tag, the following function can be used, similar to netio_buffered_flush:
-
void netio_buffered_publish_flush(struct netio_publish_socket *socket, netio_tag_t tag, struct netio_subscription_cache *cache)
Flushes buffers on all connections of a given publish socket for a certain tag.
- Parameters:
socket – The buffered publish socket
tag – The message tag
cache – An optional subscription cache object
In the case of buffered communication cb_send_completed is not exposed. Compared to unbuffered mode, the publisher completion stack is replaced by a buffer stack (struct netio_bufferstack) and the key is used as buffer identifier.