Ibverbs Error Codes
CQ errors
Completion objects associated to work requests can report errors. The error code is contained in the ibv_wc_status field of struct ibv_wc and is printed by NetIO-next. All possible error codes can be found in ibverbs header verbs.h and a description taken from RDMAmojo website is reported below.
Code |
Name |
Description |
---|---|---|
1 |
IBV_WC_LOC_LEN_ERR |
Local Length Error: this happens if a Work Request that
was posted in a local Send Queue contains a message
that is greater than the maximum message size that is
supported by the RDMA device port that should send the
message or an Atomic operation which its size is
different than 8 bytes was sent.
This also may happen if a Work Request that was posted
in a local Receive Queue isn’t big enough for holding
the incoming message or if the incoming message size
if greater the maximum message size supported by the
RDMA device port that received the message.
|
2 |
IBV_WC_LOC_QP_OP_ERR |
Local QP Operation Error: an internal QP consistency
error was detected while processing this Work
Request: this happens if a Work Request that was
posted in a local Send Queue of a UD QP contains an
Address Handle that is associated with a Protection
Domain to a QP which is associated with a different
Protection Domain or an opcode which isn’t supported
by the transport type of the QP isn’t supported.
|
3 |
IBV_WC_LOC_EEC_OP_ERR |
Local EE Context Operation Error: an internal EE
Context consistency error was detected while processing
this Work Request (unused, relevant only to unsupported
RD QPs or EE Context).
|
4 |
IBV_WC_LOC_PROT_ERR |
Local Protection Error: the locally posted Work
Request’s buffers in the scatter/gather list does
not reference a Memory Region that is valid for the
requested operation.
|
5 |
IBV_WC_WR_FLUSH_ERR |
Work Request Flushed Error: A Work Request was in process or
outstanding when the QP transitioned into the Error State.
|
6 |
IBV_WC_MW_BIND_ERR |
Memory Window Binding Error: A failure happened when tried to
bind a MW to a MR.
|
7 |
IBV_WC_BAD_RESP_ERR |
Bad Response Error: an unexpected transport layer
opcode was returned by the responder.
Relevant for RC QPs.
|
8 |
IBV_WC_LOC_ACCESS_ERR |
Local Access Error: a protection error occurred on a local
data buffer during the processing of a RDMA Write with
Immediate operation sent from the remote node.
Relevant for RC QPs.
|
9 |
IBV_WC_REM_INV_REQ_ERR |
Remote Invalid Request Error: The responder detected
an invalid message on the channel. Possible causes
include the operation is not supported by this
receive queue (qp_access_flags in remote
QP wasn’t configured to support this operation),
insufficient buffering to receive a new RDMA or Atomic
Operation request, or the length specified in a RDMA
request is greater than 2^31 bytes. Relevant for RC QPs.
|
10 |
IBV_WC_REM_ACCESS_ERR |
Remote Access Error: a protection error occurred
on a remote data buffer to be read by an RDMA Read,
written by an RDMA Write or accessed by an atomic
operation. This error is reported only on
RDMA operations or atomic operations.
Relevant for RC QPs.
|
11 |
IBV_WC_REM_OP_ERR |
Remote Operation Error: the operation could not
be completed successfully by the responder.
Possible causes include a responder QP related error
that prevented the responder from completing the
request or a malformed WQE on the Receive Queue.
Relevant for RC QPs.
|
12 |
IBV_WC_RETRY_EXC_ERR |
Transport Retry Counter Exceeded: The local
transport timeout retry counter was exceeded
while trying to send this message.
This means that the remote side didn’t send
any Ack or Nack. If this happens when sending
the first message, usually this mean that the
connection attributes are wrong or the remote
side isn’t in a state that it can respond to messages.
If this happens after sending the first message,
usually it means that the remote QP
isn’t available anymore. Relevant for RC QPs.
|
13 |
IBV_WC_RNR_RETRY_EXC_ERR |
RNR Retry Counter Exceeded: The RNR NAK retry
count was exceeded. This usually means that the
remote side didn’t post any WR to its Receive Queue.
|
14 |
IBV_WC_LOC_RDD_VIOL_ERR |
Local RDD Violation Error: The RDD associated with
the QP does not match the RDD associated with the
EE Context (unused, relevant only to unsupported
RD QPs or EE Context).
|
15 |
IBV_WC_REM_INV_RD_REQ_ERR |
Remote Invalid RD Request: The responder detected an
invalid incoming RD message. Causes include a Q_Key
or RDD violation (unused, relevant only to
unsupported RD QPs or EE Context).
|
16 |
IBV_WC_REM_ABORT_ERR |
Remote Aborted Error: for UD or UC QPs associated with
a SRQ, the responder aborted the operation.
|
17 |
IBV_WC_INV_EECN_ERR |
Invalid EE Context Number: An invalid EE Context number
was detected (unused, relevant only to unsupported RD
QPs or EE Context).
|
18 |
IBV_WC_INV_EEC_STATE_ERR |
Invalid EE Context State Error: Operation is not legal
for the specified EE Context state (unused, since its
relevant only to not supported RD QPs or EE Context).
|
19 |
IBV_WC_FATAL_ERR |
Fatal Error. |
20 |
IBV_WC_RESP_TIMEOUT_ERR |
Response Timeout Error. |
21 |
IBV_WC_GENERAL_ERR |
General Error: other error which isn’t one of the above
errors.
|
EQ errors
Provider-specific errors are reported by NetIO-next. In particular, in case a connection request is rejected (RDMA_CM_EVENT_REJECTED event) by a remote endpoint the reason is reported. The possible reasons are defined as enum ib_cm_rej_reason in ib_cm.h and make their way in libfabric as eq->err.prov_errno = -cma_event->status;.
Code |
Name |
---|---|
1 |
IB_CM_REJ_NO_QP |
2 |
IB_CM_REJ_NO_EEC |
3 |
IB_CM_REJ_NO_RESOURCES |
4 |
IB_CM_REJ_TIMEOUT |
5 |
IB_CM_REJ_UNSUPPORTED |
6 |
IB_CM_REJ_INVALID_COMM_ID |
7 |
IB_CM_REJ_INVALID_COMM_INSTANCE |
8 |
IB_CM_REJ_INVALID_SERVICE_ID |
9 |
IB_CM_REJ_INVALID_TRANSPORT_TYPE |
10 |
IB_CM_REJ_STALE_CONN |
11 |
IB_CM_REJ_RDC_NOT_EXIST |
12 |
IB_CM_REJ_INVALID_GID |
13 |
IB_CM_REJ_INVALID_LID |
14 |
IB_CM_REJ_INVALID_SL |
15 |
IB_CM_REJ_INVALID_TRAFFIC_CLASS |
16 |
IB_CM_REJ_INVALID_HOP_LIMIT |
17 |
IB_CM_REJ_INVALID_PACKET_RATE |
18 |
IB_CM_REJ_INVALID_ALT_GID |
19 |
IB_CM_REJ_INVALID_ALT_LID |
20 |
IB_CM_REJ_INVALID_ALT_SL |
21 |
IB_CM_REJ_INVALID_ALT_TRAFFIC_CLASS |
22 |
IB_CM_REJ_INVALID_ALT_HOP_LIMIT |
23 |
IB_CM_REJ_INVALID_ALT_PACKET_RATE |
24 |
IB_CM_REJ_PORT_CM_REDIRECT |
25 |
IB_CM_REJ_PORT_REDIRECT |
26 |
IB_CM_REJ_INVALID_MTU |
27 |
IB_CM_REJ_INSUFFICIENT_RESP_RESOURCES |
28 |
IB_CM_REJ_CONSUMER_DEFINED |
29 |
IB_CM_REJ_INVALID_RNR_RETRY |
30 |
IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID |
31 |
IB_CM_REJ_INVALID_CLASS_VERSION |
32 |
IB_CM_REJ_INVALID_FLOW_LABEL |
33 |
IB_CM_REJ_INVALID_ALT_FLOW_LABEL |
35 |
IB_CM_REJ_VENDOR_OPTION_NOT_SUPPORTED |
Linux errno
Errno is an integer variable set by system calls and some library functions in the event of an error to indicate what went wrong. In the context of NetIO-next errno is manly used by functions that manipulate file descriptors. The list of errors can be retrieved running the command errno -l provided that moreutils is installed in the system.
No. |
Code |
Description |
---|---|---|
1 |
EPERM |
Operation not permitted |
2 |
ENOENT |
No such file or directory |
3 |
ESRCH |
No such process |
4 |
EINTR |
Interrupted system call |
5 |
EIO |
I/O error |
6 |
ENXIO |
No such device or address |
7 |
E2BIG |
Argument list too long |
8 |
ENOEXEC |
Exec format error |
9 |
EBADF |
Bad file number |
10 |
ECHILD |
No child processes |
11 |
EAGAIN |
Try again |
12 |
ENOMEM |
Out of memory |
13 |
EACCES |
Permission denied |
14 |
EFAULT |
Bad address |
15 |
ENOTBLK |
Block device required |
16 |
EBUSY |
Device or resource busy |
17 |
EEXIST |
File exists |
18 |
EXDEV |
Invalid cross-device link |
19 |
ENODEV |
No such device |
20 |
ENOTDIR |
Not a directory |
21 |
EISDIR |
Is a directory |
22 |
EINVAL |
Invalid argument |
23 |
ENFILE |
Too many open files in system |
24 |
EMFILE |
Too many open files |
25 |
ENOTTY |
Inappropriate ioctl for device |
26 |
ETXTBSY |
Text file busy |
27 |
EFBIG |
File too large |
28 |
ENOSPC |
No space left on device |
29 |
ESPIPE |
Illegal seek |
30 |
EROFS |
Read-only file system |
31 |
EMLINK |
Too many links |
32 |
EPIPE |
Broken pipe |
33 |
EDOM |
Math argument out of domain of func |
34 |
ERANGE |
Math result not representable |
35 |
EDEADLK |
Resource deadlock avoided |
36 |
ENAMETOOLONG |
File name too long |
37 |
ENOLCK |
No locks available |
38 |
ENOSYS |
Function not implemented |
39 |
ENOTEMPTY |
Directory not empty |
40 |
ELOOP |
Too many levels of symbolic links |
42 |
ENOMSG |
No message of desired type |
43 |
EIDRM |
Identifier removed |
44 |
ECHRNG |
Channel number out of range |
45 |
EL2NSYNC |
Level 2 not synchronized |
46 |
EL3HLT |
Level 3 halted |
47 |
EL3RST |
Level 3 reset |
48 |
ELNRNG |
Link number out of range |
49 |
EUNATCH |
Protocol driver not attached |
50 |
ENOCSI |
No CSI structure available |
51 |
EL2HLT |
Level 2 halted |
52 |
EBADE |
Invalid exchange |
53 |
EBADR |
Invalid request descriptor |
54 |
EXFULL |
Exchange full |
55 |
ENOANO |
No anode |
56 |
EBADRQC |
Invalid request code |
57 |
EBADSLT |
Invalid slot |
59 |
EBFONT |
Bad font file format |
60 |
ENOSTR |
Device not a stream |
61 |
ENODATA |
No data available |
62 |
ETIME |
Timer expired |
63 |
ENOSR |
Out of streams resources |
64 |
ENONET |
Machine is not on the network |
65 |
ENOPKG |
Package not installed |
66 |
EREMOTE |
Object is remote |
67 |
ENOLINK |
Link has been severed |
68 |
EADV |
Advertise error |
69 |
ESRMNT |
Srmount error |
70 |
ECOMM |
Communication error on send |
71 |
EPROTO |
Protocol error |
72 |
EMULTIHOP |
Multihop attempted |
73 |
EDOTDOT |
RFS specific error |
74 |
EBADMSG |
Not a data message |
75 |
EOVERFLOW |
Value too large for defined data type |
76 |
ENOTUNIQ |
Name not unique on network |
77 |
EBADFD |
File descriptor in bad state |
78 |
EREMCHG |
Remote address changed |
79 |
ELIBACC |
Can not access a needed shared library |
80 |
ELIBBAD |
Accessing a corrupted shared library |
81 |
ELIBSCN |
.lib section in a.out corrupted |
82 |
ELIBMAX |
Attempting to link in too many shared libraries |
83 |
ELIBEXEC |
Cannot exec a shared library directly |
84 |
EILSEQ |
Illegal byte sequence |
85 |
ERESTART |
Interrupted system call should be restarted |
86 |
ESTRPIPE |
Streams pipe error |
87 |
EUSERS |
Too many users |
88 |
ENOTSOCK |
Socket operation on non-socket |
89 |
EDESTADDRREQ |
Destination address required |
90 |
EMSGSIZE |
Message too long |
91 |
EPROTOTYPE |
Protocol wrong type for socket |
92 |
ENOPROTOOPT |
Protocol not available |
93 |
EPROTONOSUPPORT |
Protocol not supported |
94 |
ESOCKTNOSUPPORT |
Socket type not supported |
95 |
EOPNOTSUPP |
Operation not supported on transport endpoint |
96 |
EPFNOSUPPORT |
Protocol family not supported |
97 |
EAFNOSUPPORT |
Address family not supported by protocol |
98 |
EADDRINUSE |
Address already in use |
99 |
EADDRNOTAVAIL |
Cannot assign requested address |
100 |
ENETDOWN |
Network is down |
101 |
ENETUNREACH |
Network is unreachable |
102 |
ENETRESET |
Network dropped connection because of reset |
103 |
ECONNABORTED |
Software caused connection abort |
104 |
ECONNRESET |
Connection reset by peer |
105 |
ENOBUFS |
No buffer space available |
106 |
EISCONN |
Transport endpoint is already connected |
107 |
ENOTCONN |
Transport endpoint is not connected |
108 |
ESHUTDOWN |
Cannot send after transport endpoint shutdown |
109 |
ETOOMANYREFS |
Too many references: cannot splice |
110 |
ETIMEDOUT |
Connection timed out |
111 |
ECONNREFUSED |
Connection refused |
112 |
EHOSTDOWN |
Host is down |
113 |
EHOSTUNREACH |
No route to host |
114 |
EALREADY |
Operation already in progress |
115 |
EINPROGRESS |
Operation now in progress |
116 |
ESTALE |
Stale NFS file handle |
117 |
EUCLEAN |
Structure needs cleaning |
118 |
ENOTNAM |
Not a XENIX named type file |
119 |
ENAVAIL |
No XENIX semaphores available |
120 |
EISNAM |
Is a named type file |
121 |
EREMOTEIO |
Remote I/O error |
122 |
EDQUOT |
Quota exceeded |
123 |
ENOMEDIUM |
No medium found |
124 |
EMEDIUMTYPE |
Wrong medium type |
125 |
ECANCELED |
Operation Canceled |
126 |
ENOKEY |
Required key not available |
127 |
EKEYEXPIRED |
Key has expired |
128 |
EKEYREVOKED |
Key has been revoked |
129 |
EKEYREJECTED |
Key was rejected by service |
130 |
EOWNERDEAD |
Owner died |
131 |
ENOTRECOVERABLE |
State not recoverable |
132 |
ERFKILL |
Operation not possible due to RF-kill |
133 |
EHWPOISON |
Memory page has hardware error |