Software Architecture ===================== NetIO-next implements a reactor design consisting of a single event loop. The event loop processes incoming events and calls user-defined callbacks. Events are typically related to RDMA communication such as send or receive completions, or incoming connection requests. Other events could be timer events or signals. User code is mostly written inside the callbacks. This section will introduce the concept of event loop and will show how to use signals and timers. .. warning:: An important concept is that **user callbacks may never block**. Blocking code should be rewritten to non-blocking code. Instead of waiting for a results, it is better to return to the event loop and get notified of the completion of an operation by the event loop. This is crucial for both performance and correctness: blocking code will stop event loop from processing further events and may lead to dead-locks or other problems. Hello World ----------- The following example shows a minimal NetIO-next application that prints the text "Hello, NetIO!": .. literalinclude:: ../examples/main_helloworld.c :language: C :linenos: A single object of type `struct netio_context` is used to keep internal state. For example, the event loop data structure itself is part of the context. The context is initialized and the initialization callback is configured with the user function `on_init`. The event loop is executed with `netio_run`. This will call the initialization callback, which will print the text. Note that the event loop is terminated using `netio_terminate` in the initialization callback. If this step was omitted, the event loop would keep running forever and the application would not terminate. All NetIO-next functions are declared in a single header file, `netio.h`, which is included at the beginning of the program. The Event Loop -------------- The event loop is the central concept in NetIO-next. All user-defined code will be called from here. Event sources such as network sockets can register in the event loop and trigger events, which will be processed and passed to the user. The event loop implementation is based on the Linux epoll mechanism [#]_. In short, Linux file descriptors (fd) are registered into the `epoll_wait` system. A callback function is associated to each fd. When `epoll_wait` report an event on a fd the corresponding callback is invoked. A few basic functions are used to interact with the event loop. .. doxygenfunction:: netio_eventloop_init :no-link: Users normally do not have to call this function as it is implicitly called by netio_init [#]_. An initialized event loop may be assigned an initialization callback:: evloop.cb_init = my_init_callback; The init callback may be useful to initialize further resources such as sockets or other event sources. The initialized event loop can be executed: .. doxygenfunction:: netio_run :no-link: The first action that is performed when executed the event loop is to call the init callback. After that, the event loop is polled for new events. The event loop is executed until it is explicitly terminated by user request using `netio_terminate`: .. doxygenfunction::netio_terminate :no-link: Running the event loop is at the core of every NetIO-next application. However, without event sources the event loop is just idling. The following sections will discuss several basic types of events, and further chapters will describe RDMA communication and the associated events. Signals ------- Signals are simple event sources that are triggered by user operations: a user can 'fire' a signal, which will then lead to the event loop calling the signal's callback. Signals are implemented using Linux **eventfd** [#]_. An eventfd is a file descriptor that can be used as an event wait/notify mechanism by user-space applications. Firing the signal corresponds to perform a write operation in the file descript such that `epoll_wait` notifies a POLLIN event: when this happen the user callback stored in the `netio_signal` data structure is invoked. The `epoll_wait` notification corresponds to a read operation on the `eventfd` that reset the `eventfd` internal counter unless the EFD_SEMAPHORE flag is used. To ensure that a callback is invoked the same number of times that its signal is fired it is necessary to use EFD_SEMAPHORE. Signals need to be initialized before they can be used. This is done with `netio_signal_init`: .. doxygenfunction:: netio_signal_init :no-link: To fire a signal, the function `NetIO-next_signal_fire` is used. .. doxygenfunction:: netio_signal_fire :no-link: A user can optionally specify a data field of type `void*` that is passed to the signal's callback as parameter. This is the `data` attribute of the signal structure. The following listing gives an example of how to use signals: .. code-block:: C :linenos: #include #include "netio/netio.h" struct netio_context ctx; struct netio_signal signal; int fired = 0; void on_init() { netio_signal_fire(&signal); puts("on_init()"); } void on_signal(void* ptr) { *((int*)ptr) = 1; puts("on_signal()"); netio_terminate(&ctx.evloop); } int main(int argc, char** argv) { netio_init(&ctx); ctx.evloop.cb_init = on_init; netio_signal_init(&ctx.evloop, &signal); signal.cb = on_signal; signal.data = &fired; netio_run(&ctx.evloop); if(fired) { puts("the signal has been fired"); } return 0; } Note that the signal is called by the event loop. The user callback is therefore only executed after the fire operation and after returning back to the event loop. The output of the above program is therefore:: on_init() on_signal() the signal has been fired .. tip:: Signals are thread-safe and can be used for thread synchronization. The signal can be fired from any thread. The callback is executed by the thread running the event loop. .. tip:: User callbacks should never poll as this will block the event loop. A simple way of implementing polling in an event-driven architecture is to fire a signal from within its callback. The callback will then be executed again and again, but, since the execution always returns to the event loop between callbacks, the event loop is not blocked. Timers ------ Timers are used to generate events periodically, at a user-defined period. The underlying implementation relies on Linux `timerfd` [#]_. To initialize a timer and register it with the event loop, use `netio_timer_init`: .. doxygenfunction:: netio_timer_init :no-link: The timer needs to be started before it will generate events. To start a timer and configure it with a specific period `netio_timer_start_s` can be used. For smaller time intervals the functions `netio_timer_start_ms`, `netio_timer_start_us` and `netio_timer_start_ns` can be used instead. .. doxygenfunction:: netio_timer_start_s :no-link: A timer can be stopped using `netio_timer_stop`: .. doxygenfunction:: netio_timer_stop :no-link: Similar to signals, timers offer a simple callback signature that takes a user-defined data field as parameter. The following listing shows an example of how to use timers to implement a simple countdown: .. literalinclude:: ../examples/main_timer.c :language: C :linenos: Network Events -------------- Network events are reported similarly to user signals in the sense that an fd is used to wait for an event and a callback is invoked upon notification, except that the fds are signalled by the network stack. The fds used for network events are called *native wait objects* [#]_ in libfabric jargon and can be associated to either a Completion Queue (CQ) or an Event Queue (EQ). EQs notify connection and disconnection events that originate from the RDMA connection manager (`librdacm` library). CQs notify about the completion of an operation and carry a completion object (CO). On the sender side a completion object (CO) is generated upon completion of a send operation. On the receiving side a CO indicates a that a message has been received. In both cases the CO contains the address of the message. The rationale of CQs is linked to the concept of memory registration. Because the network card accesses its host memory, messages need to reside in *pinned* memory regions i.e. memory regions (MRs) not subject to paging, for which the mapping between virtual and physical is fixed. Thus a receiver *posts* a number of MRs - later called netio pages - and gets notified when the network card has written a message into one. Once the receiver has read the message it posts the page back. On the sending side a similar mechanism occurs: a message resides inside (a part of) an MR and until the send operation is not completed the application shall not overwrite the content of that MR. In case of `ibverbs` the `wait object` associated to a CQ is called completion channel (CC). Even though the CQ has been read until it is empty, the CC still reports events unless reset manually with libfabric function `fi_trywait`. The purpose of the CC is to throttle the large number of hardware interrupts corresponding to the generation of COs. The next sections, dedicated to functional aspects of NetIO-next, will occasionally refer to these low level aspects. For more details refer to the libfabric documentation [#]_. ------------ .. [#] https://man7.org/linux/man-pages/man2/epoll_wait.2.html .. [#] https://man7.org/linux/man-pages/man2/eventfd.2.html .. [#] https://man7.org/linux/man-pages/man2/timerfd_create.2.html .. [#] https://ofiwg.github.io/libfabric/main/man/fi_guide.7.html .. [#] Currently, `netio_context` is simply a wrapper around `netio_eventloop`. The distinction between the two objects is for historical reasons. Future versions of NetIO-next may merge the two structures. .. [#] The adjective *native* refers to the fact that a fd is defined in the Linux system.