Ibverbs Error Codes =================== CQ errors --------- Completion objects associated to work requests can report errors. The error code is contained in the `ibv_wc_status` field of `struct ibv_wc` and is printed by NetIO-next. All possible error codes can be found in ibverbs header `verbs.h `_ and a description taken from `RDMAmojo website `_ is reported below. .. list-table:: CQ Error codes :widths: 2 18 80 :header-rows: 1 * - Code - Name - Description * - 1 - IBV_WC_LOC_LEN_ERR - | Local Length Error: this happens if a Work Request that | was posted in a local Send Queue contains a message | that is greater than the maximum message size that is | supported by the RDMA device port that should send the | message or an Atomic operation which its size is | different than 8 bytes was sent. | This also may happen if a Work Request that was posted | in a local Receive Queue isn't big enough for holding | the incoming message or if the incoming message size | if greater the maximum message size supported by the | RDMA device port that received the message. * - 2 - IBV_WC_LOC_QP_OP_ERR - | Local QP Operation Error: an internal QP consistency | error was detected while processing this Work | Request: this happens if a Work Request that was | posted in a local Send Queue of a UD QP contains an | Address Handle that is associated with a Protection | Domain to a QP which is associated with a different | Protection Domain or an opcode which isn't supported | by the transport type of the QP isn't supported. * - 3 - IBV_WC_LOC_EEC_OP_ERR - | Local EE Context Operation Error: an internal EE | Context consistency error was detected while processing | this Work Request (unused, relevant only to unsupported | RD QPs or EE Context). * - 4 - IBV_WC_LOC_PROT_ERR - | Local Protection Error: the locally posted Work | Request’s buffers in the scatter/gather list does | not reference a Memory Region that is valid for the | requested operation. * - 5 - IBV_WC_WR_FLUSH_ERR - | Work Request Flushed Error: A Work Request was in process or | outstanding when the QP transitioned into the Error State. * - 6 - IBV_WC_MW_BIND_ERR - | Memory Window Binding Error: A failure happened when tried to | bind a MW to a MR. * - 7 - IBV_WC_BAD_RESP_ERR - | Bad Response Error: an unexpected transport layer | opcode was returned by the responder. | Relevant for RC QPs. * - 8 - IBV_WC_LOC_ACCESS_ERR - | Local Access Error: a protection error occurred on a local | data buffer during the processing of a RDMA Write with | Immediate operation sent from the remote node. | Relevant for RC QPs. * - 9 - IBV_WC_REM_INV_REQ_ERR - | Remote Invalid Request Error: The responder detected | an invalid message on the channel. Possible causes | include the operation is not supported by this | receive queue (qp_access_flags in remote | QP wasn't configured to support this operation), | insufficient buffering to receive a new RDMA or Atomic | Operation request, or the length specified in a RDMA | request is greater than 2^31 bytes. Relevant for RC QPs. * - 10 - IBV_WC_REM_ACCESS_ERR - | Remote Access Error: a protection error occurred | on a remote data buffer to be read by an RDMA Read, | written by an RDMA Write or accessed by an atomic | operation. This error is reported only on | RDMA operations or atomic operations. | Relevant for RC QPs. * - 11 - IBV_WC_REM_OP_ERR - | Remote Operation Error: the operation could not | be completed successfully by the responder. | Possible causes include a responder QP related error | that prevented the responder from completing the | request or a malformed WQE on the Receive Queue. | Relevant for RC QPs. * - 12 - IBV_WC_RETRY_EXC_ERR - | Transport Retry Counter Exceeded: The local | transport timeout retry counter was exceeded | while trying to send this message. | This means that the remote side didn't send | any Ack or Nack. If this happens when sending | the first message, usually this mean that the | connection attributes are wrong or the remote | side isn't in a state that it can respond to messages. | If this happens after sending the first message, | usually it means that the remote QP | isn't available anymore. Relevant for RC QPs. * - 13 - IBV_WC_RNR_RETRY_EXC_ERR - | RNR Retry Counter Exceeded: The RNR NAK retry | count was exceeded. This usually means that the | remote side didn't post any WR to its Receive Queue. * - 14 - IBV_WC_LOC_RDD_VIOL_ERR - | Local RDD Violation Error: The RDD associated with | the QP does not match the RDD associated with the | EE Context (unused, relevant only to unsupported | RD QPs or EE Context). * - 15 - IBV_WC_REM_INV_RD_REQ_ERR - | Remote Invalid RD Request: The responder detected an | invalid incoming RD message. Causes include a Q_Key | or RDD violation (unused, relevant only to | unsupported RD QPs or EE Context). * - 16 - IBV_WC_REM_ABORT_ERR - | Remote Aborted Error: for UD or UC QPs associated with | a SRQ, the responder aborted the operation. * - 17 - IBV_WC_INV_EECN_ERR - | Invalid EE Context Number: An invalid EE Context number | was detected (unused, relevant only to unsupported RD | QPs or EE Context). * - 18 - IBV_WC_INV_EEC_STATE_ERR - | Invalid EE Context State Error: Operation is not legal | for the specified EE Context state (unused, since its | relevant only to not supported RD QPs or EE Context). * - 19 - IBV_WC_FATAL_ERR - Fatal Error. * - 20 - IBV_WC_RESP_TIMEOUT_ERR - Response Timeout Error. * - 21 - IBV_WC_GENERAL_ERR - | General Error: other error which isn't one of the above | errors. EQ errors --------- Provider-specific errors are reported by NetIO-next. In particular, in case a connection request is rejected (`RDMA_CM_EVENT_REJECTED `_ event) by a remote endpoint the reason is reported. The possible reasons are defined as `enum ib_cm_rej_reason` in `ib_cm.h `_ and make their way in libfabric as `eq->err.prov_errno = -cma_event->status; `_. .. list-table:: Connection refused error codes :widths: 10 90 :header-rows: 1 * - Code - Name * - 1 - IB_CM_REJ_NO_QP * - 2 - IB_CM_REJ_NO_EEC * - 3 - IB_CM_REJ_NO_RESOURCES * - 4 - IB_CM_REJ_TIMEOUT * - 5 - IB_CM_REJ_UNSUPPORTED * - 6 - IB_CM_REJ_INVALID_COMM_ID * - 7 - IB_CM_REJ_INVALID_COMM_INSTANCE * - 8 - IB_CM_REJ_INVALID_SERVICE_ID * - 9 - IB_CM_REJ_INVALID_TRANSPORT_TYPE * - 10 - IB_CM_REJ_STALE_CONN * - 11 - IB_CM_REJ_RDC_NOT_EXIST * - 12 - IB_CM_REJ_INVALID_GID * - 13 - IB_CM_REJ_INVALID_LID * - 14 - IB_CM_REJ_INVALID_SL * - 15 - IB_CM_REJ_INVALID_TRAFFIC_CLASS * - 16 - IB_CM_REJ_INVALID_HOP_LIMIT * - 17 - IB_CM_REJ_INVALID_PACKET_RATE * - 18 - IB_CM_REJ_INVALID_ALT_GID * - 19 - IB_CM_REJ_INVALID_ALT_LID * - 20 - IB_CM_REJ_INVALID_ALT_SL * - 21 - IB_CM_REJ_INVALID_ALT_TRAFFIC_CLASS * - 22 - IB_CM_REJ_INVALID_ALT_HOP_LIMIT * - 23 - IB_CM_REJ_INVALID_ALT_PACKET_RATE * - 24 - IB_CM_REJ_PORT_CM_REDIRECT * - 25 - IB_CM_REJ_PORT_REDIRECT * - 26 - IB_CM_REJ_INVALID_MTU * - 27 - IB_CM_REJ_INSUFFICIENT_RESP_RESOURCES * - 28 - IB_CM_REJ_CONSUMER_DEFINED * - 29 - IB_CM_REJ_INVALID_RNR_RETRY * - 30 - IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID * - 31 - IB_CM_REJ_INVALID_CLASS_VERSION * - 32 - IB_CM_REJ_INVALID_FLOW_LABEL * - 33 - IB_CM_REJ_INVALID_ALT_FLOW_LABEL * - 35 - IB_CM_REJ_VENDOR_OPTION_NOT_SUPPORTED Linux errno ----------- Errno is an integer variable set by system calls and some library functions in the event of an error to indicate what went wrong. In the context of NetIO-next errno is manly used by functions that manipulate file descriptors. The list of errors can be retrieved running the command `errno -l` provided that `moreutils` is installed in the system. .. list-table:: errno - number of last error :widths: 5 15 80 :header-rows: 1 * - No. - Code - Description * - 1 - EPERM - Operation not permitted * - 2 - ENOENT - No such file or directory * - 3 - ESRCH - No such process * - 4 - EINTR - Interrupted system call * - 5 - EIO - I/O error * - 6 - ENXIO - No such device or address * - 7 - E2BIG - Argument list too long * - 8 - ENOEXEC - Exec format error * - 9 - EBADF - Bad file number * - 10 - ECHILD - No child processes * - 11 - EAGAIN - Try again * - 12 - ENOMEM - Out of memory * - 13 - EACCES - Permission denied * - 14 - EFAULT - Bad address * - 15 - ENOTBLK - Block device required * - 16 - EBUSY - Device or resource busy * - 17 - EEXIST - File exists * - 18 - EXDEV - Invalid cross-device link * - 19 - ENODEV - No such device * - 20 - ENOTDIR - Not a directory * - 21 - EISDIR - Is a directory * - 22 - EINVAL - Invalid argument * - 23 - ENFILE - Too many open files in system * - 24 - EMFILE - Too many open files * - 25 - ENOTTY - Inappropriate ioctl for device * - 26 - ETXTBSY - Text file busy * - 27 - EFBIG - File too large * - 28 - ENOSPC - No space left on device * - 29 - ESPIPE - Illegal seek * - 30 - EROFS - Read-only file system * - 31 - EMLINK - Too many links * - 32 - EPIPE - Broken pipe * - 33 - EDOM - Math argument out of domain of func * - 34 - ERANGE - Math result not representable * - 35 - EDEADLK - Resource deadlock avoided * - 36 - ENAMETOOLONG - File name too long * - 37 - ENOLCK - No locks available * - 38 - ENOSYS - Function not implemented * - 39 - ENOTEMPTY - Directory not empty * - 40 - ELOOP - Too many levels of symbolic links * - 42 - ENOMSG - No message of desired type * - 43 - EIDRM - Identifier removed * - 44 - ECHRNG - Channel number out of range * - 45 - EL2NSYNC - Level 2 not synchronized * - 46 - EL3HLT - Level 3 halted * - 47 - EL3RST - Level 3 reset * - 48 - ELNRNG - Link number out of range * - 49 - EUNATCH - Protocol driver not attached * - 50 - ENOCSI - No CSI structure available * - 51 - EL2HLT - Level 2 halted * - 52 - EBADE - Invalid exchange * - 53 - EBADR - Invalid request descriptor * - 54 - EXFULL - Exchange full * - 55 - ENOANO - No anode * - 56 - EBADRQC - Invalid request code * - 57 - EBADSLT - Invalid slot * - 59 - EBFONT - Bad font file format * - 60 - ENOSTR - Device not a stream * - 61 - ENODATA - No data available * - 62 - ETIME - Timer expired * - 63 - ENOSR - Out of streams resources * - 64 - ENONET - Machine is not on the network * - 65 - ENOPKG - Package not installed * - 66 - EREMOTE - Object is remote * - 67 - ENOLINK - Link has been severed * - 68 - EADV - Advertise error * - 69 - ESRMNT - Srmount error * - 70 - ECOMM - Communication error on send * - 71 - EPROTO - Protocol error * - 72 - EMULTIHOP - Multihop attempted * - 73 - EDOTDOT - RFS specific error * - 74 - EBADMSG - Not a data message * - 75 - EOVERFLOW - Value too large for defined data type * - 76 - ENOTUNIQ - Name not unique on network * - 77 - EBADFD - File descriptor in bad state * - 78 - EREMCHG - Remote address changed * - 79 - ELIBACC - Can not access a needed shared library * - 80 - ELIBBAD - Accessing a corrupted shared library * - 81 - ELIBSCN - .lib section in a.out corrupted * - 82 - ELIBMAX - Attempting to link in too many shared libraries * - 83 - ELIBEXEC - Cannot exec a shared library directly * - 84 - EILSEQ - Illegal byte sequence * - 85 - ERESTART - Interrupted system call should be restarted * - 86 - ESTRPIPE - Streams pipe error * - 87 - EUSERS - Too many users * - 88 - ENOTSOCK - Socket operation on non-socket * - 89 - EDESTADDRREQ - Destination address required * - 90 - EMSGSIZE - Message too long * - 91 - EPROTOTYPE - Protocol wrong type for socket * - 92 - ENOPROTOOPT - Protocol not available * - 93 - EPROTONOSUPPORT - Protocol not supported * - 94 - ESOCKTNOSUPPORT - Socket type not supported * - 95 - EOPNOTSUPP - Operation not supported on transport endpoint * - 96 - EPFNOSUPPORT - Protocol family not supported * - 97 - EAFNOSUPPORT - Address family not supported by protocol * - 98 - EADDRINUSE - Address already in use * - 99 - EADDRNOTAVAIL - Cannot assign requested address * - 100 - ENETDOWN - Network is down * - 101 - ENETUNREACH - Network is unreachable * - 102 - ENETRESET - Network dropped connection because of reset * - 103 - ECONNABORTED - Software caused connection abort * - 104 - ECONNRESET - Connection reset by peer * - 105 - ENOBUFS - No buffer space available * - 106 - EISCONN - Transport endpoint is already connected * - 107 - ENOTCONN - Transport endpoint is not connected * - 108 - ESHUTDOWN - Cannot send after transport endpoint shutdown * - 109 - ETOOMANYREFS - Too many references: cannot splice * - 110 - ETIMEDOUT - Connection timed out * - 111 - ECONNREFUSED - Connection refused * - 112 - EHOSTDOWN - Host is down * - 113 - EHOSTUNREACH - No route to host * - 114 - EALREADY - Operation already in progress * - 115 - EINPROGRESS - Operation now in progress * - 116 - ESTALE - Stale NFS file handle * - 117 - EUCLEAN - Structure needs cleaning * - 118 - ENOTNAM - Not a XENIX named type file * - 119 - ENAVAIL - No XENIX semaphores available * - 120 - EISNAM - Is a named type file * - 121 - EREMOTEIO - Remote I/O error * - 122 - EDQUOT - Quota exceeded * - 123 - ENOMEDIUM - No medium found * - 124 - EMEDIUMTYPE - Wrong medium type * - 125 - ECANCELED - Operation Canceled * - 126 - ENOKEY - Required key not available * - 127 - EKEYEXPIRED - Key has expired * - 128 - EKEYREVOKED - Key has been revoked * - 129 - EKEYREJECTED - Key was rejected by service * - 130 - EOWNERDEAD - Owner died * - 131 - ENOTRECOVERABLE - State not recoverable * - 132 - ERFKILL - Operation not possible due to RF-kill * - 133 - EHWPOISON - Memory page has hardware error