.. _BufferedCommunication: Buffered RDMA Communication =========================== Send and receive operation have a computational cost that can become evident when messages are transferred at high rate. To limit the overhead NetIO-next implements an internal data coalescence system referred to as *buffered*, as opposed to the *unbuffered* mode described in :ref:`UnbufferedCommunication`. In buffered mode, messages to be sent are copied to larger network buffers that are sent as an occupancy threshold is crossed or a timeout expires. Advantages of buffered communication are: - the RDMA buffers are managed internally by NetIO. Buffered communication requires less setup and management code compared to unbuffered communication; - if the workload consists of many small messages (say less than a kilobyte), buffered communication is more efficient as less messages are transmitted. The reduced overhead can increase overall application performance. Disadvantages are: - if the workload consists mainly of larger messages (over a few kilobytes) the copy operation can become costly performance-wise; - latency can also be affected, even though this can be mitigated setting a short timeout (e.g. 1 ms); - it could be more meaningful to buffer messages on the user side instead of internally in NetIO-next. This allows for more logical data message boundaries. For example, a database application could group individual data accesses into transactions, and send out the transactions as a single buffer. Point-to-Point Communication ---------------------------- NetIO-next supports unidirectional buffered point-to-point communication using socket types: - *send sockets* (`struct netio_buffered_send_socket`): the sending side of a connection. - *listen sockets* (`struct netio_buffered_listen_socket`): listen for incoming connections and creates receive sockets to form connection pairs. - *receive sockets* (`struct netio_buffered_recv_socket`): the receiving side of a connection, created by a listen socket. Socket Initialization ..................... Buffered send and listen sockets are initialized via the following functions: .. doxygenfunction:: netio_buffered_send_socket_init :no-link: .. doxygenfunction:: netio_buffered_listen_socket_init :no-link: The most important parameter in both calls is `attr`. The structure defines the attributes of the buffers - called netio pages - that are allocated internally. Users need to set these attributes in order to configure the send/listen socket. Note that both send and listen socket need to be configured with the same pagesize. .. code-block:: c struct netio_buffered_socket_attr { unsigned num_pages; size_t pagesize; size_t watermark; unsigned long timeout_ms; }; Below an explanation of the parameters: - `num_pages`: the number of buffers to be allocated. There may be a hardware-defined limit on the number of buffers per socket (typically 256). - `pagesize`: the size of an individual buffer. This can be fine-tuned to optimize throughput. The default value in NetIO-next is 64 kB. Note that `pagesize` defines an upper limit for the maximum size of a message that can be transmitted on the buffered socket, i.e., no message larger than `pagesize` can be sent on the given socket. - `watermark`: if the buffer contains more bytes than defined in `watermark`, it will be flushed automatically. - `timeout_ms`: a timer is associated to each socket. After `timeout_ms` ms a partially filled buffer is flushed disregarding of its occupancy. Besides the buffer configuration, users can configure multiple callbacks for send and listen sockets. For send sockets these are:: void (*cb_connection_established)(struct netio_buffered_send_socket* socket); void (*cb_connection_closed)(struct netio_buffered_send_socket* socket); void (*cb_error_connection_refused)(struct netio_buffered_send_socket* socket); And for listen sockets these are:: void (*cb_connection_established)(struct netio_buffered_recv_socket* socket); void (*cb_connection_closed)(struct netio_buffered_recv_socket* socket); void (*cb_msg_received)(struct netio_buffered_recv_socket* socket, void* data, size_t size); .. note:: The callback parameter for listen socket callbacks points to the receive socket object that is created by the listen socket. The listen socket object can be accessed via the `listen_socket` member of `struct netio_buffered_recv_socket`. Establishing a Connection ......................... After initializing a buffered listen socket, the listen socket needs to be set in listening mode: .. doxygenfunction:: netio_buffered_listen :no-link: Buffered send socket can connect to sockets in listening state using .. doxygenfunction:: netio_buffered_connect :no-link: Upon establishing the connection between send and receive socket successfuly, the callback `cb_connection_established` will be called on both sides of the connection. If the connection cannot be established, `cb_error_connection_refused` will be called on the sending side. Connections can be closed using .. doxygenfunction:: netio_disconnect :no-link: If a buffered connection closes (by user request or due to a connection error), `cb_connection_closed` is called on both ends of the connection. Sending and Receiving Data .......................... Once a connection has been established, sending of messaged can be initiated with the following functions: .. doxygenfunction:: netio_buffered_send :no-link: .. doxygenfunction:: netio_buffered_sendv :no-link: If bufferspace is available, the message will be copied to the socket internal buffers. If no buffer is available or the operation cannot be completed by the underlying libraries, `NETIO_STATUS_AGAIN` is returned. In this case, the user needs re-attempt transmission at a later time. In case no buffer is available, `struct netio_buffered_send_socket` contains a NetIO-next signal that is fired once a buffer is available again. The name of the signal attribute is `netio_signal signal_buffer_available`. Users can set the callback and user data members of the signal. As the buffer occupancy exceeds the user-configurable watermark or the timeout expires the buffer is flushes i.e. sent to the remote endpoint. Buffers can also be flushed manually at any point: .. doxygenfunction:: netio_buffered_flush :no-link: The receiving side of the buffered connection fires a callback for every message received. Note that typically multiple messages are packed into a single buffer interleaved by a 32-bit fields containing the message sizes, so the callback will be fired multiple times in a row for each message contained in the buffer. The callback can be set in the listen socket mentioned above. Buffered Publish/Subscribe Communication ---------------------------------------- The Publish/Subscribe communication pattern is implemented also for buffered communication. As in the unbuffered case the publisher maintain an internal subscription table which contains connections to the various subscribers. Connection management is automatic, publishers do not need to connect to any subscribers. Subscriptions for multiple tags between the same publish socket and subscribe socket share the same connection. Buffered publish and subscribe sockets are initialised with: .. doxygenfunction:: netio_publish_socket_init :no-link: .. doxygenfunction:: netio_subscribe_socket_init :no-link: Note that subscribe sockets are bound to a specific publish socket and cannot subscribe to tags of any other sockets than the one indicated when initalising the socket. To subscribe to a given fid, the subscribe socket can use `netio_subscribe` one or more times: .. doxygenfunction:: netio_subscribe :no-link: Internally `netio_subscribe` establishes a connection to the publish socket and sends a subscription message. The publish socket then connects to the subscribe or use an existing connection, and register the subscription in its subscription table. Messages published under the given fid will subsequently be sent to the subscribe socket. It is possible to unsubscribe from a given tag using the API function .. doxygenfunction:: netio_unsubscribe when a client closes all its subscription the server closes the connection. First, the server sends a FI_SHUTDOWN via its send socket using `netio_disconnect`. In response, the client deallocates its receiving socket and calls `netio_disconnect`. The client's call of `netio_disconnect` makes it deallocate its send socket and the server's receiving socket. Publishing of messages is done with the following API function: .. doxygenfunction:: netio_buffered_publish :no-link: The parameters `socket`, `tag`, `data` and `len` are self-explanatory and describe the buffered publish socket, message tag, and message. The `flags` parameter requires some explanation. When calling `netio_buffered_publish` for the first time, `flags` should be set to 0. NetIO-next then attempts to send the message to all connected and subscribed subscribers. If this succeeds for all connections, the call will return with `NETIO_STATUS_OK`. However, it may be that one or multiple connections are out of resources, and the underlying send call results in `NETIO_STATUS_AGAIN`. In this case, also the call to publish returns `NETIO_STATUS_AGAIN` and the user has to re-attempt publishing the message at a later time. NetIO-next keeps track of which connections the message was already sent successfully. But in order to avoid sending the same message again on some connections, in the subsequent call `netio_buffered_publish` has to be called with the `NETIO_REENTRY` flag set. The flag is removed again when the user moves on to sending the next message. For each publication of a message, NetIO needs make a lookup in the subscription table for the given tag. This can be expensive. As an optimization, users can supply a `netio_subscription_cache` object as last parameter to the `netio_buffered_publish` call. This is used internally to cache the lookup for the given tag. Users need to supply a separate cache object for every tag. The subscription cache object needs to be initialized using .. doxygenfunction:: netio_subscription_cache_init :no-link: The user does not need to do any further operations on the object. Internally the object contains a timestamp, so NetIO-next will automatically detect changes to the subscriptions table and update the cache objects without user intervention. Using the subscription cache is optional and users must supply NULL for this argument if the cache is not used. As is implied with buffered connections, published messages are not immediately transferred to the remote endpoint, but only when the buffer reaches a certain fill-level or the timeout expires. To enforce immediate sending of all buffers for a certain tag, the following function can be used, similar to `netio_buffered_flush`: .. doxygenfunction:: netio_buffered_publish_flush :no-link: In the case of buffered communication `cb_send_completed` is not exposed. Compared to unbuffered mode, the publisher completion stack is replaced by a buffer stack (`struct netio_bufferstack`) and the `key` is used as buffer identifier.