Software Architecture

NetIO-next implements a reactor design consisting of a single event loop. The event loop processes incoming events and calls user-defined callbacks. Events are typically related to RDMA communication such as send or receive completions, or incoming connection requests. Other events could be timer events or signals. User code is mostly written inside the callbacks.

This section will introduce the concept of event loop and will show how to use signals and timers.

Warning

An important concept is that user callbacks may never block. Blocking code should be rewritten to non-blocking code. Instead of waiting for a results, it is better to return to the event loop and get notified of the completion of an operation by the event loop. This is crucial for both performance and correctness: blocking code will stop event loop from processing further events and may lead to dead-locks or other problems.

Hello World

The following example shows a minimal NetIO-next application that prints the text “Hello, NetIO!”:

#include <stdio.h>
#include "netio/netio.h"

// NetIO context object that encapsulates the event loop
struct netio_context ctx;

void on_init() {
  // this callback is executed one time at the start of the event loop
  puts("Hello, NetIO!");
  netio_terminate(&ctx.evloop);
}

int main(int argc, char** argv) {
  // initialize netio context
  netio_init(&ctx);
  ctx.evloop.cb_init = on_init;

  // run event loop
  netio_run(&ctx.evloop);

  return 0;
}

A single object of type struct netio_context is used to keep internal state. For example, the event loop data structure itself is part of the context. The context is initialized and the initialization callback is configured with the user function on_init.

The event loop is executed with netio_run. This will call the initialization callback, which will print the text. Note that the event loop is terminated using netio_terminate in the initialization callback. If this step was omitted, the event loop would keep running forever and the application would not terminate. All NetIO-next functions are declared in a single header file, netio.h, which is included at the beginning of the program.

The Event Loop

The event loop is the central concept in NetIO-next. All user-defined code will be called from here. Event sources such as network sockets can register in the event loop and trigger events, which will be processed and passed to the user.

The event loop implementation is based on the Linux epoll mechanism [1]. In short, Linux file descriptors (fd) are registered into the epoll_wait system. A callback function is associated to each fd. When epoll_wait report an event on a fd the corresponding callback is invoked.

A few basic functions are used to interact with the event loop.

void netio_eventloop_init(struct netio_eventloop *evloop)

Initializes a NetIO event loop.

In the background this creates an epoll file descriptor handle.

Parameters:

evloop – The event loop to initialize

Users normally do not have to call this function as it is implicitly called by netio_init [2]. An initialized event loop may be assigned an initialization callback:

evloop.cb_init = my_init_callback;

The init callback may be useful to initialize further resources such as sockets or other event sources. The initialized event loop can be executed:

void netio_run(struct netio_eventloop *evloop)

Executes the event loop.

The event loop is executed in an endless loop until it is explicitly terminated by netio_terminate. Before any processing any other event, netio_run will execute this initialization callback, if one was specified. The core of the event loop is epoll_wait. Note that epoll_wait returns only one event per fd, so MAX_EPOLL_EVENTS actually translates into the maximum number of fds that are processed in one iteration (the remaining fds are processed in a round-robin fashion in the next iteration).

Parameters:

evloop – The event loop to execute.

The first action that is performed when executed the event loop is to call the init callback. After that, the event loop is polled for new events. The event loop is executed until it is explicitly terminated by user request using netio_terminate:

Running the event loop is at the core of every NetIO-next application. However, without event sources the event loop is just idling. The following sections will discuss several basic types of events, and further chapters will describe RDMA communication and the associated events.

Signals

Signals are simple event sources that are triggered by user operations: a user can ‘fire’ a signal, which will then lead to the event loop calling the signal’s callback.

Signals are implemented using Linux eventfd [3]. An eventfd is a file descriptor that can be used as an event wait/notify mechanism by user-space applications. Firing the signal corresponds to perform a write operation in the file descript such that epoll_wait notifies a POLLIN event: when this happen the user callback stored in the netio_signal data structure is invoked. The epoll_wait notification corresponds to a read operation on the eventfd that reset the eventfd internal counter unless the EFD_SEMAPHORE flag is used. To ensure that a callback is invoked the same number of times that its signal is fired it is necessary to use EFD_SEMAPHORE.

Signals need to be initialized before they can be used. This is done with netio_signal_init:

void netio_signal_init(struct netio_eventloop *evloop, struct netio_signal *signal)

Initializes a signal and registers it in the event loop.

Internally, signals are implemented using eventfd.

Parameters:

evloop – The event loop in which the signal will be registered
signal – The signal to initialize

To fire a signal, the function NetIO-next_signal_fire is used.

void netio_signal_fire(struct netio_signal *signal)

Fires a signal.

Firing the signal triggers the execution of the signal’s callback. Firing a signal is thread-safe.

Parameters:

signal – The signal to fire

A user can optionally specify a data field of type void* that is passed to the signal’s callback as parameter. This is the data attribute of the signal structure.

The following listing gives an example of how to use signals:

#include <stdio.h>
#include "netio/netio.h"

struct netio_context ctx;
struct netio_signal signal;
int fired = 0;

void on_init() {
  netio_signal_fire(&signal);
  puts("on_init()");
}

void on_signal(void* ptr) {
  *((int*)ptr) = 1;
  puts("on_signal()");
  netio_terminate(&ctx.evloop);
}

int main(int argc, char** argv) {
  netio_init(&ctx);
  ctx.evloop.cb_init = on_init;

  netio_signal_init(&ctx.evloop, &signal);
  signal.cb = on_signal;
  signal.data = &fired;

  netio_run(&ctx.evloop);

  if(fired) {
    puts("the signal has been fired");
  }

  return 0;
}

Note that the signal is called by the event loop. The user callback is therefore only executed after the fire operation and after returning back to the event loop. The output of the above program is therefore:

on_init()
on_signal()
the signal has been fired

Tip

Signals are thread-safe and can be used for thread synchronization. The signal can be fired from any thread. The callback is executed by the thread running the event loop.

Tip

User callbacks should never poll as this will block the event loop. A simple way of implementing polling in an event-driven architecture is to fire a signal from within its callback. The callback will then be executed again and again, but, since the execution always returns to the event loop between callbacks, the event loop is not blocked.

Timers

Timers are used to generate events periodically, at a user-defined period. The underlying implementation relies on Linux timerfd [4]. To initialize a timer and register it with the event loop, use netio_timer_init:

void netio_timer_init(struct netio_eventloop *evloop, struct netio_timer *timer)

Initializes a timer and registers it with the event loop.

Internally, timers are implemented using timerfd.

Parameters:

evloop – The event loop in which the timer will be registered

The timer needs to be started before it will generate events. To start a timer and configure it with a specific period netio_timer_start_s can be used. For smaller time intervals the functions netio_timer_start_ms, netio_timer_start_us and netio_timer_start_ns can be used instead.

void netio_timer_start_s(struct netio_timer *timer, unsigned long long seconds)

Starts a timer with the defined period (given in seconds).

The period is given in seconds. The timer callback is executed at the defined frequency until it is explicitly stopped.

Parameters:

timer – The timer to start
seconds – The timer period, given in seconds

A timer can be stopped using netio_timer_stop:

void netio_timer_stop(struct netio_timer *timer)

Stops a timer.

The timer will not execute callbacks anymore until it is started again.

Parameters:

timer – The timer to stop

Similar to signals, timers offer a simple callback signature that takes a user-defined data field as parameter.

The following listing shows an example of how to use timers to implement a simple countdown:

#include <stdio.h>

#include "netio/netio.h"

#include "felixtag.h"

struct netio_context ctx;
struct netio_timer timer;

void on_timer(void* ptr) {
  int* ctr = (int*)ptr;
  printf("%d\n", (*ctr)--);
  if(*ctr == 0) {
    netio_terminate(&ctx.evloop);
  }
}

int main(int argc, char** argv) {
  int counter = 10;

  netio_init(&ctx);
  netio_timer_init(&ctx.evloop, &timer);
  timer.cb = on_timer;
  timer.data = &counter;
  netio_timer_start_s(&timer, 1);

  // run event loop
  netio_run(&ctx.evloop);

  return 0;
}

Network Events

Network events are reported similarly to user signals in the sense that an fd is used to wait for an event and a callback is invoked upon notification, except that the fds are signalled by the network stack. The fds used for network events are called native wait objects [5] in libfabric jargon and can be associated to either a Completion Queue (CQ) or an Event Queue (EQ).

EQs notify connection and disconnection events that originate from the RDMA connection manager (librdacm library).

CQs notify about the completion of an operation and carry a completion object (CO). On the sender side a completion object (CO) is generated upon completion of a send operation. On the receiving side a CO indicates a that a message has been received. In both cases the CO contains the address of the message. The rationale of CQs is linked to the concept of memory registration. Because the network card accesses its host memory, messages need to reside in pinned memory regions i.e. memory regions (MRs) not subject to paging, for which the mapping between virtual and physical is fixed. Thus a receiver posts a number of MRs - later called netio pages - and gets notified when the network card has written a message into one. Once the receiver has read the message it posts the page back. On the sending side a similar mechanism occurs: a message resides inside (a part of) an MR and until the send operation is not completed the application shall not overwrite the content of that MR. In case of ibverbs the wait object associated to a CQ is called completion channel (CC). Even though the CQ has been read until it is empty, the CC still reports events unless reset manually with libfabric function fi_trywait. The purpose of the CC is to throttle the large number of hardware interrupts corresponding to the generation of COs.

The next sections, dedicated to functional aspects of NetIO-next, will occasionally refer to these low level aspects. For more details refer to the libfabric documentation [6].