Ibverbs Error Codes

CQ errors

Completion objects associated to work requests can report errors. The error code is contained in the ibv_wc_status field of struct ibv_wc and is printed by NetIO-next. All possible error codes can be found in ibverbs header verbs.h and a description taken from RDMAmojo website is reported below.

CQ Error codes

Code

Name

Description

1

IBV_WC_LOC_LEN_ERR

Local Length Error: this happens if a Work Request that
was posted in a local Send Queue contains a message
that is greater than the maximum message size that is
supported by the RDMA device port that should send the
message or an Atomic operation which its size is
different than 8 bytes was sent.
This also may happen if a Work Request that was posted
in a local Receive Queue isn’t big enough for holding
the incoming message or if the incoming message size
if greater the maximum message size supported by the
RDMA device port that received the message.

2

IBV_WC_LOC_QP_OP_ERR

Local QP Operation Error: an internal QP consistency
error was detected while processing this Work
Request: this happens if a Work Request that was
posted in a local Send Queue of a UD QP contains an
Address Handle that is associated with a Protection
Domain to a QP which is associated with a different
Protection Domain or an opcode which isn’t supported
by the transport type of the QP isn’t supported.

3

IBV_WC_LOC_EEC_OP_ERR

Local EE Context Operation Error: an internal EE
Context consistency error was detected while processing
this Work Request (unused, relevant only to unsupported
RD QPs or EE Context).

4

IBV_WC_LOC_PROT_ERR

Local Protection Error: the locally posted Work
Request’s buffers in the scatter/gather list does
not reference a Memory Region that is valid for the
requested operation.

5

IBV_WC_WR_FLUSH_ERR

Work Request Flushed Error: A Work Request was in process or
outstanding when the QP transitioned into the Error State.

6

IBV_WC_MW_BIND_ERR

Memory Window Binding Error: A failure happened when tried to
bind a MW to a MR.

7

IBV_WC_BAD_RESP_ERR

Bad Response Error: an unexpected transport layer
opcode was returned by the responder.
Relevant for RC QPs.

8

IBV_WC_LOC_ACCESS_ERR

Local Access Error: a protection error occurred on a local
data buffer during the processing of a RDMA Write with
Immediate operation sent from the remote node.
Relevant for RC QPs.

9

IBV_WC_REM_INV_REQ_ERR

Remote Invalid Request Error: The responder detected
an invalid message on the channel. Possible causes
include the operation is not supported by this
receive queue (qp_access_flags in remote
QP wasn’t configured to support this operation),
insufficient buffering to receive a new RDMA or Atomic
Operation request, or the length specified in a RDMA
request is greater than 2^31 bytes. Relevant for RC QPs.

10

IBV_WC_REM_ACCESS_ERR

Remote Access Error: a protection error occurred
on a remote data buffer to be read by an RDMA Read,
written by an RDMA Write or accessed by an atomic
operation. This error is reported only on
RDMA operations or atomic operations.
Relevant for RC QPs.

11

IBV_WC_REM_OP_ERR

Remote Operation Error: the operation could not
be completed successfully by the responder.
Possible causes include a responder QP related error
that prevented the responder from completing the
request or a malformed WQE on the Receive Queue.
Relevant for RC QPs.

12

IBV_WC_RETRY_EXC_ERR

Transport Retry Counter Exceeded: The local
transport timeout retry counter was exceeded
while trying to send this message.
This means that the remote side didn’t send
any Ack or Nack. If this happens when sending
the first message, usually this mean that the
connection attributes are wrong or the remote
side isn’t in a state that it can respond to messages.
If this happens after sending the first message,
usually it means that the remote QP
isn’t available anymore. Relevant for RC QPs.

13

IBV_WC_RNR_RETRY_EXC_ERR

RNR Retry Counter Exceeded: The RNR NAK retry
count was exceeded. This usually means that the
remote side didn’t post any WR to its Receive Queue.

14

IBV_WC_LOC_RDD_VIOL_ERR

Local RDD Violation Error: The RDD associated with
the QP does not match the RDD associated with the
EE Context (unused, relevant only to unsupported
RD QPs or EE Context).

15

IBV_WC_REM_INV_RD_REQ_ERR

Remote Invalid RD Request: The responder detected an
invalid incoming RD message. Causes include a Q_Key
or RDD violation (unused, relevant only to
unsupported RD QPs or EE Context).

16

IBV_WC_REM_ABORT_ERR

Remote Aborted Error: for UD or UC QPs associated with
a SRQ, the responder aborted the operation.

17

IBV_WC_INV_EECN_ERR

Invalid EE Context Number: An invalid EE Context number
was detected (unused, relevant only to unsupported RD
QPs or EE Context).

18

IBV_WC_INV_EEC_STATE_ERR

Invalid EE Context State Error: Operation is not legal
for the specified EE Context state (unused, since its
relevant only to not supported RD QPs or EE Context).

19

IBV_WC_FATAL_ERR

Fatal Error.

20

IBV_WC_RESP_TIMEOUT_ERR

Response Timeout Error.

21

IBV_WC_GENERAL_ERR

General Error: other error which isn’t one of the above
errors.

EQ errors

Provider-specific errors are reported by NetIO-next. In particular, in case a connection request is rejected (RDMA_CM_EVENT_REJECTED event) by a remote endpoint the reason is reported. The possible reasons are defined as enum ib_cm_rej_reason in ib_cm.h and make their way in libfabric as eq->err.prov_errno = -cma_event->status;.

Connection refused error codes

Code

Name

1

IB_CM_REJ_NO_QP

2

IB_CM_REJ_NO_EEC

3

IB_CM_REJ_NO_RESOURCES

4

IB_CM_REJ_TIMEOUT

5

IB_CM_REJ_UNSUPPORTED

6

IB_CM_REJ_INVALID_COMM_ID

7

IB_CM_REJ_INVALID_COMM_INSTANCE

8

IB_CM_REJ_INVALID_SERVICE_ID

9

IB_CM_REJ_INVALID_TRANSPORT_TYPE

10

IB_CM_REJ_STALE_CONN

11

IB_CM_REJ_RDC_NOT_EXIST

12

IB_CM_REJ_INVALID_GID

13

IB_CM_REJ_INVALID_LID

14

IB_CM_REJ_INVALID_SL

15

IB_CM_REJ_INVALID_TRAFFIC_CLASS

16

IB_CM_REJ_INVALID_HOP_LIMIT

17

IB_CM_REJ_INVALID_PACKET_RATE

18

IB_CM_REJ_INVALID_ALT_GID

19

IB_CM_REJ_INVALID_ALT_LID

20

IB_CM_REJ_INVALID_ALT_SL

21

IB_CM_REJ_INVALID_ALT_TRAFFIC_CLASS

22

IB_CM_REJ_INVALID_ALT_HOP_LIMIT

23

IB_CM_REJ_INVALID_ALT_PACKET_RATE

24

IB_CM_REJ_PORT_CM_REDIRECT

25

IB_CM_REJ_PORT_REDIRECT

26

IB_CM_REJ_INVALID_MTU

27

IB_CM_REJ_INSUFFICIENT_RESP_RESOURCES

28

IB_CM_REJ_CONSUMER_DEFINED

29

IB_CM_REJ_INVALID_RNR_RETRY

30

IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID

31

IB_CM_REJ_INVALID_CLASS_VERSION

32

IB_CM_REJ_INVALID_FLOW_LABEL

33

IB_CM_REJ_INVALID_ALT_FLOW_LABEL

35

IB_CM_REJ_VENDOR_OPTION_NOT_SUPPORTED

Linux errno

Errno is an integer variable set by system calls and some library functions in the event of an error to indicate what went wrong. In the context of NetIO-next errno is manly used by functions that manipulate file descriptors. The list of errors can be retrieved running the command errno -l provided that moreutils is installed in the system.

errno - number of last error

No.

Code

Description

1

EPERM

Operation not permitted

2

ENOENT

No such file or directory

3

ESRCH

No such process

4

EINTR

Interrupted system call

5

EIO

I/O error

6

ENXIO

No such device or address

7

E2BIG

Argument list too long

8

ENOEXEC

Exec format error

9

EBADF

Bad file number

10

ECHILD

No child processes

11

EAGAIN

Try again

12

ENOMEM

Out of memory

13

EACCES

Permission denied

14

EFAULT

Bad address

15

ENOTBLK

Block device required

16

EBUSY

Device or resource busy

17

EEXIST

File exists

18

EXDEV

Invalid cross-device link

19

ENODEV

No such device

20

ENOTDIR

Not a directory

21

EISDIR

Is a directory

22

EINVAL

Invalid argument

23

ENFILE

Too many open files in system

24

EMFILE

Too many open files

25

ENOTTY

Inappropriate ioctl for device

26

ETXTBSY

Text file busy

27

EFBIG

File too large

28

ENOSPC

No space left on device

29

ESPIPE

Illegal seek

30

EROFS

Read-only file system

31

EMLINK

Too many links

32

EPIPE

Broken pipe

33

EDOM

Math argument out of domain of func

34

ERANGE

Math result not representable

35

EDEADLK

Resource deadlock avoided

36

ENAMETOOLONG

File name too long

37

ENOLCK

No locks available

38

ENOSYS

Function not implemented

39

ENOTEMPTY

Directory not empty

40

ELOOP

Too many levels of symbolic links

42

ENOMSG

No message of desired type

43

EIDRM

Identifier removed

44

ECHRNG

Channel number out of range

45

EL2NSYNC

Level 2 not synchronized

46

EL3HLT

Level 3 halted

47

EL3RST

Level 3 reset

48

ELNRNG

Link number out of range

49

EUNATCH

Protocol driver not attached

50

ENOCSI

No CSI structure available

51

EL2HLT

Level 2 halted

52

EBADE

Invalid exchange

53

EBADR

Invalid request descriptor

54

EXFULL

Exchange full

55

ENOANO

No anode

56

EBADRQC

Invalid request code

57

EBADSLT

Invalid slot

59

EBFONT

Bad font file format

60

ENOSTR

Device not a stream

61

ENODATA

No data available

62

ETIME

Timer expired

63

ENOSR

Out of streams resources

64

ENONET

Machine is not on the network

65

ENOPKG

Package not installed

66

EREMOTE

Object is remote

67

ENOLINK

Link has been severed

68

EADV

Advertise error

69

ESRMNT

Srmount error

70

ECOMM

Communication error on send

71

EPROTO

Protocol error

72

EMULTIHOP

Multihop attempted

73

EDOTDOT

RFS specific error

74

EBADMSG

Not a data message

75

EOVERFLOW

Value too large for defined data type

76

ENOTUNIQ

Name not unique on network

77

EBADFD

File descriptor in bad state

78

EREMCHG

Remote address changed

79

ELIBACC

Can not access a needed shared library

80

ELIBBAD

Accessing a corrupted shared library

81

ELIBSCN

.lib section in a.out corrupted

82

ELIBMAX

Attempting to link in too many shared libraries

83

ELIBEXEC

Cannot exec a shared library directly

84

EILSEQ

Illegal byte sequence

85

ERESTART

Interrupted system call should be restarted

86

ESTRPIPE

Streams pipe error

87

EUSERS

Too many users

88

ENOTSOCK

Socket operation on non-socket

89

EDESTADDRREQ

Destination address required

90

EMSGSIZE

Message too long

91

EPROTOTYPE

Protocol wrong type for socket

92

ENOPROTOOPT

Protocol not available

93

EPROTONOSUPPORT

Protocol not supported

94

ESOCKTNOSUPPORT

Socket type not supported

95

EOPNOTSUPP

Operation not supported on transport endpoint

96

EPFNOSUPPORT

Protocol family not supported

97

EAFNOSUPPORT

Address family not supported by protocol

98

EADDRINUSE

Address already in use

99

EADDRNOTAVAIL

Cannot assign requested address

100

ENETDOWN

Network is down

101

ENETUNREACH

Network is unreachable

102

ENETRESET

Network dropped connection because of reset

103

ECONNABORTED

Software caused connection abort

104

ECONNRESET

Connection reset by peer

105

ENOBUFS

No buffer space available

106

EISCONN

Transport endpoint is already connected

107

ENOTCONN

Transport endpoint is not connected

108

ESHUTDOWN

Cannot send after transport endpoint shutdown

109

ETOOMANYREFS

Too many references: cannot splice

110

ETIMEDOUT

Connection timed out

111

ECONNREFUSED

Connection refused

112

EHOSTDOWN

Host is down

113

EHOSTUNREACH

No route to host

114

EALREADY

Operation already in progress

115

EINPROGRESS

Operation now in progress

116

ESTALE

Stale NFS file handle

117

EUCLEAN

Structure needs cleaning

118

ENOTNAM

Not a XENIX named type file

119

ENAVAIL

No XENIX semaphores available

120

EISNAM

Is a named type file

121

EREMOTEIO

Remote I/O error

122

EDQUOT

Quota exceeded

123

ENOMEDIUM

No medium found

124

EMEDIUMTYPE

Wrong medium type

125

ECANCELED

Operation Canceled

126

ENOKEY

Required key not available

127

EKEYEXPIRED

Key has expired

128

EKEYREVOKED

Key has been revoked

129

EKEYREJECTED

Key was rejected by service

130

EOWNERDEAD

Owner died

131

ENOTRECOVERABLE

State not recoverable

132

ERFKILL

Operation not possible due to RF-kill

133

EHWPOISON

Memory page has hardware error