.. _BufferedCommunication:

Buffered RDMA Communication
===========================

Send and receive operation have a computational cost that can become evident
when messages are transferred at high rate. To limit the overhead NetIO-next
implements an internal data coalescence system referred to as *buffered*, as
opposed to the *unbuffered* mode described in :ref:`UnbufferedCommunication`.
In buffered mode, messages to be sent are copied to larger network buffers
that are sent as an occupancy threshold is crossed or a timeout expires.

Advantages of buffered communication are:

- the RDMA buffers are managed internally by NetIO. Buffered communication
  requires less setup and management code compared to unbuffered communication;

- if the workload consists of many small messages (say less than a kilobyte),
  buffered communication is more efficient as less messages are transmitted.
  The reduced overhead can increase overall application performance.

Disadvantages are:

- if the workload consists mainly of larger messages (over a few
  kilobytes) the copy operation can become costly performance-wise;

- latency can also be affected, even though this can be mitigated setting a
  short timeout (e.g. 1 ms);

- it could be more meaningful to buffer messages on the user side instead of
  internally in NetIO-next. This allows for more logical data message boundaries. 
  For example, a database application could group individual data accesses into
  transactions, and send out the transactions as a single buffer. 


Point-to-Point Communication
----------------------------

NetIO-next supports unidirectional buffered point-to-point communication
using socket types:

- *send sockets* (`struct netio_buffered_send_socket`): the sending side of a connection.
- *listen sockets* (`struct netio_buffered_listen_socket`): listen for incoming connections and creates receive sockets to form connection pairs.
- *receive sockets* (`struct netio_buffered_recv_socket`): the receiving side of a connection, created by a listen socket.


Socket Initialization
.....................

Buffered send and listen sockets are initialized via the following functions:

.. doxygenfunction:: netio_buffered_send_socket_init
  :no-link:

.. doxygenfunction:: netio_buffered_listen_socket_init
  :no-link:


The most important parameter in both calls is `attr`. The structure defines the
attributes of the buffers - called netio pages - that are allocated internally.
Users need to set these attributes in order to configure the send/listen socket.
Note that both send and listen socket need to be configured with the
same pagesize.

.. code-block:: c

  struct netio_buffered_socket_attr
  {
    unsigned num_pages;
    size_t pagesize;
    size_t watermark;
    unsigned long timeout_ms;
  };


Below an explanation of the parameters:

- `num_pages`: the number of buffers to be allocated. There may be a
  hardware-defined limit on the number of buffers per socket (typically 256).
- `pagesize`: the size of an individual buffer. This can be fine-tuned to
  optimize throughput. The default value in NetIO-next is 64 kB. 
  Note that `pagesize` defines an upper limit for the maximum size of a message
  that can be transmitted on the buffered socket, i.e., no message larger than
  `pagesize` can be sent on the given socket.
- `watermark`: if the buffer contains more bytes than defined in `watermark`,
  it will be flushed automatically. 
- `timeout_ms`: a timer is associated to each socket. After `timeout_ms` ms a
  partially filled buffer is flushed disregarding of its occupancy.

Besides the buffer configuration, users can configure multiple callbacks for
send and listen sockets. For send sockets these are::

  void (*cb_connection_established)(struct netio_buffered_send_socket* socket);
  void (*cb_connection_closed)(struct netio_buffered_send_socket* socket);
  void (*cb_error_connection_refused)(struct netio_buffered_send_socket* socket);

And for listen sockets these are::

  void (*cb_connection_established)(struct netio_buffered_recv_socket* socket);
  void (*cb_connection_closed)(struct netio_buffered_recv_socket* socket);
  void (*cb_msg_received)(struct netio_buffered_recv_socket* socket, void* data, size_t size);

.. note::
  The callback parameter for listen socket callbacks points to the receive
  socket object that is created by the listen socket. 
  The listen socket object can be accessed via the `listen_socket` member of
  `struct netio_buffered_recv_socket`.


Establishing a Connection
.........................

After initializing a buffered listen socket, the listen socket needs to be
set in listening mode:

.. doxygenfunction:: netio_buffered_listen
  :no-link:

Buffered send socket can connect to sockets in listening state using

.. doxygenfunction:: netio_buffered_connect
  :no-link:

Upon establishing the connection between send and receive socket successfuly,
the callback `cb_connection_established` will be called on both sides of the
connection. If the connection cannot be established,
`cb_error_connection_refused` will be called on the sending side.
Connections can be closed using

.. doxygenfunction:: netio_disconnect
  :no-link:

If a buffered connection closes (by user request or due to a connection error),
`cb_connection_closed` is called on both ends of the connection.


Sending and Receiving Data
..........................

Once a connection has been established, sending of messaged can be initiated
with the following functions:

.. doxygenfunction:: netio_buffered_send 
  :no-link:

.. doxygenfunction:: netio_buffered_sendv
  :no-link:

If bufferspace is available, the message will be copied to the socket
internal buffers. If no buffer is available or the operation cannot be completed
by the underlying libraries, `NETIO_STATUS_AGAIN` is returned.
In this case, the user needs re-attempt transmission at a later time.
In case no buffer is available, `struct netio_buffered_send_socket` contains a
NetIO-next signal that is fired once a buffer is available again.
The name of the signal attribute is `netio_signal signal_buffer_available`.
Users can set the callback and user data members of the signal.

As the buffer occupancy exceeds the user-configurable watermark or the timeout
expires the buffer is flushes i.e. sent to the remote endpoint.
Buffers can also be flushed manually at any point:

.. doxygenfunction:: netio_buffered_flush
  :no-link:

The receiving side of the buffered connection fires a callback for every message
received. Note that typically multiple messages are packed into a single buffer
interleaved by a 32-bit fields containing the message sizes, so the callback
will be fired multiple times in a row for each message contained in the buffer.
The callback can be set in the listen socket mentioned above.


Buffered Publish/Subscribe Communication
----------------------------------------

The Publish/Subscribe communication pattern is implemented also for buffered
communication. As in the unbuffered case the publisher maintain an internal
subscription table which contains connections to the various subscribers.
Connection management is automatic, publishers do not need to connect to
any subscribers. Subscriptions for multiple tags between the same publish
socket and subscribe socket share the same connection.

Buffered publish and subscribe sockets are initialised with:

.. doxygenfunction:: netio_publish_socket_init
  :no-link:

.. doxygenfunction:: netio_subscribe_socket_init
  :no-link:

Note that subscribe sockets are bound to a specific publish socket and cannot
subscribe to tags of any other sockets than the one indicated when initalising
the socket.

To subscribe to a given fid, the subscribe socket can use `netio_subscribe`
one or more times:

.. doxygenfunction:: netio_subscribe
  :no-link:

Internally `netio_subscribe` establishes a connection to the publish
socket and sends a subscription message. The publish socket then connects to
the subscribe or use an existing connection, and register the
subscription in its subscription table. Messages published under the given
fid will subsequently be sent to the subscribe socket.
It is possible to unsubscribe from a given tag using the API function

.. doxygenfunction:: netio_unsubscribe

when a client closes all its subscription the server closes the connection.
First, the server sends a FI_SHUTDOWN via its send socket using
`netio_disconnect`. In response, the client deallocates its receiving socket
and calls `netio_disconnect`. The client's call of `netio_disconnect` makes it
deallocate its send socket and the server's receiving socket.

Publishing of messages is done with the following API function:

.. doxygenfunction:: netio_buffered_publish
  :no-link:

The parameters `socket`, `tag`, `data` and `len` are self-explanatory and
describe the buffered publish socket, message tag, and message. The `flags`
parameter requires some explanation. When calling `netio_buffered_publish` for
the first time, `flags` should be set to 0. NetIO-next then attempts to send the
message to all connected and subscribed subscribers. If this succeeds for all
connections, the call will return with `NETIO_STATUS_OK`. However, it may be
that one or multiple connections are out of resources, and the underlying send
call results in `NETIO_STATUS_AGAIN`. In this case, also the call to publish
returns `NETIO_STATUS_AGAIN` and the user has to re-attempt publishing the
message at a later time. NetIO-next keeps track of which connections the message
was already sent successfully. But in order to avoid sending the same
message again on some connections, in the subsequent call
`netio_buffered_publish` has to be called with the `NETIO_REENTRY` flag set.
The flag is removed again when the user moves on to sending the next message.

For each publication of a message, NetIO needs make a lookup in the subscription
table for the given tag. This can be expensive. As an optimization, users can
supply a `netio_subscription_cache` object as last parameter to the
`netio_buffered_publish` call. This is used internally to cache the lookup for
the given tag. Users need to supply a separate cache object for every tag. 

The subscription cache object needs to be initialized using

.. doxygenfunction:: netio_subscription_cache_init
  :no-link:

The user does not need to do any further operations on the object. Internally
the object contains a timestamp, so NetIO-next will automatically detect changes
to the subscriptions table and update the cache objects without user
intervention. Using the subscription cache is optional and users must
supply NULL for this argument if the cache is not used.  

As is implied with buffered connections, published messages are not immediately
transferred to the remote endpoint, but only when the buffer reaches a certain
fill-level or the timeout expires.
To enforce immediate sending of all buffers for a certain tag, the following
function can be used, similar to `netio_buffered_flush`:

.. doxygenfunction:: netio_buffered_publish_flush
  :no-link:

In the case of buffered communication `cb_send_completed` is not exposed.
Compared to unbuffered mode, the publisher completion stack is replaced by a
buffer stack (`struct netio_bufferstack`) and the `key` is used as buffer
identifier.